WorldWideScience

Sample records for feature selection algorithm

  1. Naive Bayes-Guided Bat Algorithm for Feature Selection

    Directory of Open Access Journals (Sweden)

    Ahmed Majid Taha

    2013-01-01

    Full Text Available When the amount of data and information is said to double in every 20 months or so, feature selection has become highly important and beneficial. Further improvements in feature selection will positively affect a wide array of applications in fields such as pattern recognition, machine learning, or signal processing. Bio-inspired method called Bat Algorithm hybridized with a Naive Bayes classifier has been presented in this work. The performance of the proposed feature selection algorithm was investigated using twelve benchmark datasets from different domains and was compared to three other well-known feature selection algorithms. Discussion focused on four perspectives: number of features, classification accuracy, stability, and feature generalization. The results showed that BANB significantly outperformed other algorithms in selecting lower number of features, hence removing irrelevant, redundant, or noisy features while maintaining the classification accuracy. BANB is also proven to be more stable than other methods and is capable of producing more general feature subsets.

  2. Naive Bayes-Guided Bat Algorithm for Feature Selection

    Science.gov (United States)

    Taha, Ahmed Majid; Mustapha, Aida; Chen, Soong-Der

    2013-01-01

    When the amount of data and information is said to double in every 20 months or so, feature selection has become highly important and beneficial. Further improvements in feature selection will positively affect a wide array of applications in fields such as pattern recognition, machine learning, or signal processing. Bio-inspired method called Bat Algorithm hybridized with a Naive Bayes classifier has been presented in this work. The performance of the proposed feature selection algorithm was investigated using twelve benchmark datasets from different domains and was compared to three other well-known feature selection algorithms. Discussion focused on four perspectives: number of features, classification accuracy, stability, and feature generalization. The results showed that BANB significantly outperformed other algorithms in selecting lower number of features, hence removing irrelevant, redundant, or noisy features while maintaining the classification accuracy. BANB is also proven to be more stable than other methods and is capable of producing more general feature subsets. PMID:24396295

  3. Feature Selection Criteria for Real Time EKF-SLAM Algorithm

    Directory of Open Access Journals (Sweden)

    Fernando Auat Cheein

    2010-02-01

    Full Text Available This paper presents a seletion procedure for environmet features for the correction stage of a SLAM (Simultaneous Localization and Mapping algorithm based on an Extended Kalman Filter (EKF. This approach decreases the computational time of the correction stage which allows for real and constant-time implementations of the SLAM. The selection procedure consists in chosing the features the SLAM system state covariance is more sensible to. The entire system is implemented on a mobile robot equipped with a range sensor laser. The features extracted from the environment correspond to lines and corners. Experimental results of the real time SLAM algorithm and an analysis of the processing-time consumed by the SLAM with the feature selection procedure proposed are shown. A comparison between the feature selection approach proposed and the classical sequential EKF-SLAM along with an entropy feature selection approach is also performed.

  4. Effective traffic features selection algorithm for cyber-attacks samples

    Science.gov (United States)

    Li, Yihong; Liu, Fangzheng; Du, Zhenyu

    2018-05-01

    By studying the defense scheme of Network attacks, this paper propose an effective traffic features selection algorithm based on k-means++ clustering to deal with the problem of high dimensionality of traffic features which extracted from cyber-attacks samples. Firstly, this algorithm divide the original feature set into attack traffic feature set and background traffic feature set by the clustering. Then, we calculates the variation of clustering performance after removing a certain feature. Finally, evaluating the degree of distinctiveness of the feature vector according to the result. Among them, the effective feature vector is whose degree of distinctiveness exceeds the set threshold. The purpose of this paper is to select out the effective features from the extracted original feature set. In this way, it can reduce the dimensionality of the features so as to reduce the space-time overhead of subsequent detection. The experimental results show that the proposed algorithm is feasible and it has some advantages over other selection algorithms.

  5. A redundancy-removing feature selection algorithm for nominal data

    Directory of Open Access Journals (Sweden)

    Zhihua Li

    2015-10-01

    Full Text Available No order correlation or similarity metric exists in nominal data, and there will always be more redundancy in a nominal dataset, which means that an efficient mutual information-based nominal-data feature selection method is relatively difficult to find. In this paper, a nominal-data feature selection method based on mutual information without data transformation, called the redundancy-removing more relevance less redundancy algorithm, is proposed. By forming several new information-related definitions and the corresponding computational methods, the proposed method can compute the information-related amount of nominal data directly. Furthermore, by creating a new evaluation function that considers both the relevance and the redundancy globally, the new feature selection method can evaluate the importance of each nominal-data feature. Although the presented feature selection method takes commonly used MIFS-like forms, it is capable of handling high-dimensional datasets without expensive computations. We perform extensive experimental comparisons of the proposed algorithm and other methods using three benchmarking nominal datasets with two different classifiers. The experimental results demonstrate the average advantage of the presented algorithm over the well-known NMIFS algorithm in terms of the feature selection and classification accuracy, which indicates that the proposed method has a promising performance.

  6. Feature selection using genetic algorithms for fetal heart rate analysis

    International Nuclear Information System (INIS)

    Xu, Liang; Redman, Christopher W G; Georgieva, Antoniya; Payne, Stephen J

    2014-01-01

    The fetal heart rate (FHR) is monitored on a paper strip (cardiotocogram) during labour to assess fetal health. If necessary, clinicians can intervene and assist with a prompt delivery of the baby. Data-driven computerized FHR analysis could help clinicians in the decision-making process. However, selecting the best computerized FHR features that relate to labour outcome is a pressing research problem. The objective of this study is to apply genetic algorithms (GA) as a feature selection method to select the best feature subset from 64 FHR features and to integrate these best features to recognize unfavourable FHR patterns. The GA was trained on 404 cases and tested on 106 cases (both balanced datasets) using three classifiers, respectively. Regularization methods and backward selection were used to optimize the GA. Reasonable classification performance is shown on the testing set for the best feature subset (Cohen's kappa values of 0.45 to 0.49 using different classifiers). This is, to our knowledge, the first time that a feature selection method for FHR analysis has been developed on a database of this size. This study indicates that different FHR features, when integrated, can show good performance in predicting labour outcome. It also gives the importance of each feature, which will be a valuable reference point for further studies. (paper)

  7. Toward optimal feature selection using ranking methods and classification algorithms

    Directory of Open Access Journals (Sweden)

    Novaković Jasmina

    2011-01-01

    Full Text Available We presented a comparison between several feature ranking methods used on two real datasets. We considered six ranking methods that can be divided into two broad categories: statistical and entropy-based. Four supervised learning algorithms are adopted to build models, namely, IB1, Naive Bayes, C4.5 decision tree and the RBF network. We showed that the selection of ranking methods could be important for classification accuracy. In our experiments, ranking methods with different supervised learning algorithms give quite different results for balanced accuracy. Our cases confirm that, in order to be sure that a subset of features giving the highest accuracy has been selected, the use of many different indices is recommended.

  8. Improving permafrost distribution modelling using feature selection algorithms

    Science.gov (United States)

    Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail

    2016-04-01

    The availability of an increasing number of spatial data on the occurrence of mountain permafrost allows the employment of machine learning (ML) classification algorithms for modelling the distribution of the phenomenon. One of the major problems when dealing with high-dimensional dataset is the number of input features (variables) involved. Application of ML classification algorithms to this large number of variables leads to the risk of overfitting, with the consequence of a poor generalization/prediction. For this reason, applying feature selection (FS) techniques helps simplifying the amount of factors required and improves the knowledge on adopted features and their relation with the studied phenomenon. Moreover, taking away irrelevant or redundant variables from the dataset effectively improves the quality of the ML prediction. This research deals with a comparative analysis of permafrost distribution models supported by FS variable importance assessment. The input dataset (dimension = 20-25, 10 m spatial resolution) was constructed using landcover maps, climate data and DEM derived variables (altitude, aspect, slope, terrain curvature, solar radiation, etc.). It was completed with permafrost evidences (geophysical and thermal data and rock glacier inventories) that serve as training permafrost data. Used FS algorithms informed about variables that appeared less statistically important for permafrost presence/absence. Three different algorithms were compared: Information Gain (IG), Correlation-based Feature Selection (CFS) and Random Forest (RF). IG is a filter technique that evaluates the worth of a predictor by measuring the information gain with respect to the permafrost presence/absence. Conversely, CFS is a wrapper technique that evaluates the worth of a subset of predictors by considering the individual predictive ability of each variable along with the degree of redundancy between them. Finally, RF is a ML algorithm that performs FS as part of its

  9. The Research and Application of SURF Algorithm Based on Feature Point Selection Algorithm

    Directory of Open Access Journals (Sweden)

    Zhang Fang Hu

    2014-04-01

    Full Text Available As the pixel information of depth image is derived from the distance information, when implementing SURF algorithm with KINECT sensor for static sign language recognition, there can be some mismatched pairs in palm area. This paper proposes a feature point selection algorithm, by filtering the SURF feature points step by step based on the number of feature points within adaptive radius r and the distance between the two points, it not only greatly improves the recognition rate, but also ensures the robustness under the environmental factors, such as skin color, illumination intensity, complex background, angle and scale changes. The experiment results show that the improved SURF algorithm can effectively improve the recognition rate, has a good robustness.

  10. Relevant test set using feature selection algorithm for early detection ...

    African Journals Online (AJOL)

    The objective of feature selection is to find the most relevant features for classification. Thus, the dimensionality of the information will be reduced and may improve classification's accuracy. This paper proposed a minimum set of relevant questions that can be used for early detection of dyslexia. In this research, we ...

  11. A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large Peer-to-Peer Networks

    Data.gov (United States)

    National Aeronautics and Space Administration — In this paper we develop a local distributed privacy preserving algorithm for feature selection in a large peer-to-peer environment. Feature selection is often used...

  12. Fast Branch & Bound algorithms for optimal feature selection

    Czech Academy of Sciences Publication Activity Database

    Somol, Petr; Pudil, Pavel; Kittler, J.

    2004-01-01

    Roč. 26, č. 7 (2004), s. 900-912 ISSN 0162-8828 R&D Projects: GA ČR GA402/02/1271; GA ČR GA402/03/1310; GA AV ČR KSK1019101 Institutional research plan: CEZ:AV0Z1075907 Keywords : subset search * feature selection * search tree Subject RIV: BD - Theory of Information Impact factor: 4.352, year: 2004

  13. A study of metaheuristic algorithms for high dimensional feature selection on microarray data

    Science.gov (United States)

    Dankolo, Muhammad Nasiru; Radzi, Nor Haizan Mohamed; Sallehuddin, Roselina; Mustaffa, Noorfa Haszlinna

    2017-11-01

    Microarray systems enable experts to examine gene profile at molecular level using machine learning algorithms. It increases the potentials of classification and diagnosis of many diseases at gene expression level. Though, numerous difficulties may affect the efficiency of machine learning algorithms which includes vast number of genes features comprised in the original data. Many of these features may be unrelated to the intended analysis. Therefore, feature selection is necessary to be performed in the data pre-processing. Many feature selection algorithms are developed and applied on microarray which including the metaheuristic optimization algorithms. This paper discusses the application of the metaheuristics algorithms for feature selection in microarray dataset. This study reveals that, the algorithms have yield an interesting result with limited resources thereby saving computational expenses of machine learning algorithms.

  14. Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

    International Nuclear Information System (INIS)

    Wang Xiaojia; Mao Qirong; Zhan Yongzhao

    2008-01-01

    There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions. The experiments show that this method can improve the recognition rate and the time of feature extraction

  15. Analysis of Different Feature Selection Criteria Based on a Covariance Convergence Perspective for a SLAM Algorithm

    Science.gov (United States)

    Auat Cheein, Fernando A.; Carelli, Ricardo

    2011-01-01

    This paper introduces several non-arbitrary feature selection techniques for a Simultaneous Localization and Mapping (SLAM) algorithm. The feature selection criteria are based on the determination of the most significant features from a SLAM convergence perspective. The SLAM algorithm implemented in this work is a sequential EKF (Extended Kalman filter) SLAM. The feature selection criteria are applied on the correction stage of the SLAM algorithm, restricting it to correct the SLAM algorithm with the most significant features. This restriction also causes a decrement in the processing time of the SLAM. Several experiments with a mobile robot are shown in this work. The experiments concern the map reconstruction and a comparison between the different proposed techniques performance. The experiments were carried out at an outdoor environment composed by trees, although the results shown herein are not restricted to a special type of features. PMID:22346568

  16. An Efficient Cost-Sensitive Feature Selection Using Chaos Genetic Algorithm for Class Imbalance Problem

    Directory of Open Access Journals (Sweden)

    Jing Bian

    2016-01-01

    Full Text Available In the era of big data, feature selection is an essential process in machine learning. Although the class imbalance problem has recently attracted a great deal of attention, little effort has been undertaken to develop feature selection techniques. In addition, most applications involving feature selection focus on classification accuracy but not cost, although costs are important. To cope with imbalance problems, we developed a cost-sensitive feature selection algorithm that adds the cost-based evaluation function of a filter feature selection using a chaos genetic algorithm, referred to as CSFSG. The evaluation function considers both feature-acquiring costs (test costs and misclassification costs in the field of network security, thereby weakening the influence of many instances from the majority of classes in large-scale datasets. The CSFSG algorithm reduces the total cost of feature selection and trades off both factors. The behavior of the CSFSG algorithm is tested on a large-scale dataset of network security, using two kinds of classifiers: C4.5 and k-nearest neighbor (KNN. The results of the experimental research show that the approach is efficient and able to effectively improve classification accuracy and to decrease classification time. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.

  17. Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis.

    Science.gov (United States)

    Al-Rajab, Murad; Lu, Joan; Xu, Qiang

    2017-07-01

    This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process. In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these. It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%). It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure.

    Science.gov (United States)

    Foroughi Pour, Ali; Dalton, Lori A

    2018-03-21

    Many bioinformatics studies aim to identify markers, or features, that can be used to discriminate between distinct groups. In problems where strong individual markers are not available, or where interactions between gene products are of primary interest, it may be necessary to consider combinations of features as a marker family. To this end, recent work proposes a hierarchical Bayesian framework for feature selection that places a prior on the set of features we wish to select and on the label-conditioned feature distribution. While an analytical posterior under Gaussian models with block covariance structures is available, the optimal feature selection algorithm for this model remains intractable since it requires evaluating the posterior over the space of all possible covariance block structures and feature-block assignments. To address this computational barrier, in prior work we proposed a simple suboptimal algorithm, 2MNC-Robust, with robust performance across the space of block structures. Here, we present three new heuristic feature selection algorithms. The proposed algorithms outperform 2MNC-Robust and many other popular feature selection algorithms on synthetic data. In addition, enrichment analysis on real breast cancer, colon cancer, and Leukemia data indicates they also output many of the genes and pathways linked to the cancers under study. Bayesian feature selection is a promising framework for small-sample high-dimensional data, in particular biomarker discovery applications. When applied to cancer data these algorithms outputted many genes already shown to be involved in cancer as well as potentially new biomarkers. Furthermore, one of the proposed algorithms, SPM, outputs blocks of heavily correlated genes, particularly useful for studying gene interactions and gene networks.

  19. Selection of individual features of a speech signal using genetic algorithms

    Directory of Open Access Journals (Sweden)

    Kamil Kamiński

    2016-03-01

    Full Text Available The paper presents an automatic speaker’s recognition system, implemented in the Matlab environment, and demonstrates how to achieve and optimize various elements of the system. The main emphasis was put on features selection of a speech signal using a genetic algorithm which takes into account synergy of features. The results of optimization of selected elements of a classifier have been also shown, including the number of Gaussian distributions used to model each of the voices. In addition, for creating voice models, a universal voice model has been used.[b]Keywords[/b]: biometrics, automatic speaker recognition, genetic algorithms, feature selection

  20. featsel: A framework for benchmarking of feature selection algorithms and cost functions

    OpenAIRE

    Marcelo S. Reis; Gustavo Estrela; Carlos Eduardo Ferreira; Junior Barrera

    2017-01-01

    In this paper, we introduce featsel, a framework for benchmarking of feature selection algorithms and cost functions. This framework allows the user to deal with the search space as a Boolean lattice and has its core coded in C++ for computational efficiency purposes. Moreover, featsel includes Perl scripts to add new algorithms and/or cost functions, generate random instances, plot graphs and organize results into tables. Besides, this framework already comes with dozens of algorithms and co...

  1. Degree of contribution (DoC) feature selection algorithm for structural brain MRI volumetric features in depression detection.

    Science.gov (United States)

    Kipli, Kuryati; Kouzani, Abbas Z

    2015-07-01

    Accurate detection of depression at an individual level using structural magnetic resonance imaging (sMRI) remains a challenge. Brain volumetric changes at a structural level appear to have importance in depression biomarkers studies. An automated algorithm is developed to select brain sMRI volumetric features for the detection of depression. A feature selection (FS) algorithm called degree of contribution (DoC) is developed for selection of sMRI volumetric features. This algorithm uses an ensemble approach to determine the degree of contribution in detection of major depressive disorder. The DoC is the score of feature importance used for feature ranking. The algorithm involves four stages: feature ranking, subset generation, subset evaluation, and DoC analysis. The performance of DoC is evaluated on the Duke University Multi-site Imaging Research in the Analysis of Depression sMRI dataset. The dataset consists of 115 brain sMRI scans of 88 healthy controls and 27 depressed subjects. Forty-four sMRI volumetric features are used in the evaluation. The DoC score of forty-four features was determined as the accuracy threshold (Acc_Thresh) was varied. The DoC performance was compared with that of four existing FS algorithms. At all defined Acc_Threshs, DoC outperformed the four examined FS algorithms for the average classification score and the maximum classification score. DoC has a good ability to generate reduced-size subsets of important features that could yield high classification accuracy. Based on the DoC score, the most discriminant volumetric features are those from the left-brain region.

  2. Human activity recognition based on feature selection in smart home using back-propagation algorithm.

    Science.gov (United States)

    Fang, Hongqing; He, Lei; Si, Hao; Liu, Peng; Xie, Xiaolei

    2014-09-01

    In this paper, Back-propagation(BP) algorithm has been used to train the feed forward neural network for human activity recognition in smart home environments, and inter-class distance method for feature selection of observed motion sensor events is discussed and tested. And then, the human activity recognition performances of neural network using BP algorithm have been evaluated and compared with other probabilistic algorithms: Naïve Bayes(NB) classifier and Hidden Markov Model(HMM). The results show that different feature datasets yield different activity recognition accuracy. The selection of unsuitable feature datasets increases the computational complexity and degrades the activity recognition accuracy. Furthermore, neural network using BP algorithm has relatively better human activity recognition performances than NB classifier and HMM. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.

  3. A New Feature Selection Algorithm Based on the Mean Impact Variance

    Directory of Open Access Journals (Sweden)

    Weidong Cheng

    2014-01-01

    Full Text Available The selection of fewer or more representative features from multidimensional features is important when the artificial neural network (ANN algorithm is used as a classifier. In this paper, a new feature selection method called the mean impact variance (MIVAR method is proposed to determine the feature that is more suitable for classification. Moreover, this method is constructed on the basis of the training process of the ANN algorithm. To verify the effectiveness of the proposed method, the MIVAR value is used to rank the multidimensional features of the bearing fault diagnosis. In detail, (1 70-dimensional all waveform features are extracted from a rolling bearing vibration signal with four different operating states, (2 the corresponding MIVAR values of all 70-dimensional features are calculated to rank all features, (3 14 groups of 10-dimensional features are separately generated according to the ranking results and the principal component analysis (PCA algorithm and a back propagation (BP network is constructed, and (4 the validity of the ranking result is proven by training this BP network with these seven groups of 10-dimensional features and by comparing the corresponding recognition rates. The results prove that the features with larger MIVAR value can lead to higher recognition rates.

  4. Hybrid feature selection algorithm using symmetrical uncertainty and a harmony search algorithm

    Science.gov (United States)

    Salameh Shreem, Salam; Abdullah, Salwani; Nazri, Mohd Zakree Ahmad

    2016-04-01

    Microarray technology can be used as an efficient diagnostic system to recognise diseases such as tumours or to discriminate between different types of cancers in normal tissues. This technology has received increasing attention from the bioinformatics community because of its potential in designing powerful decision-making tools for cancer diagnosis. However, the presence of thousands or tens of thousands of genes affects the predictive accuracy of this technology from the perspective of classification. Thus, a key issue in microarray data is identifying or selecting the smallest possible set of genes from the input data that can achieve good predictive accuracy for classification. In this work, we propose a two-stage selection algorithm for gene selection problems in microarray data-sets called the symmetrical uncertainty filter and harmony search algorithm wrapper (SU-HSA). Experimental results show that the SU-HSA is better than HSA in isolation for all data-sets in terms of the accuracy and achieves a lower number of genes on 6 out of 10 instances. Furthermore, the comparison with state-of-the-art methods shows that our proposed approach is able to obtain 5 (out of 10) new best results in terms of the number of selected genes and competitive results in terms of the classification accuracy.

  5. Feature Selection for Motor Imagery EEG Classification Based on Firefly Algorithm and Learning Automata

    Directory of Open Access Journals (Sweden)

    Aiming Liu

    2017-11-01

    Full Text Available Motor Imagery (MI electroencephalography (EEG is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP and local characteristic-scale decomposition (LCD algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA classifier. Both the fourth brain–computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain–computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain–computer interface systems.

  6. Feature Selection for Motor Imagery EEG Classification Based on Firefly Algorithm and Learning Automata.

    Science.gov (United States)

    Liu, Aiming; Chen, Kun; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi

    2017-11-08

    Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain-computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain-computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain-computer interface systems.

  7. Cancer microarray data feature selection using multi-objective binary particle swarm optimization algorithm

    Science.gov (United States)

    Annavarapu, Chandra Sekhara Rao; Dara, Suresh; Banka, Haider

    2016-01-01

    Cancer investigations in microarray data play a major role in cancer analysis and the treatment. Cancer microarray data consists of complex gene expressed patterns of cancer. In this article, a Multi-Objective Binary Particle Swarm Optimization (MOBPSO) algorithm is proposed for analyzing cancer gene expression data. Due to its high dimensionality, a fast heuristic based pre-processing technique is employed to reduce some of the crude domain features from the initial feature set. Since these pre-processed and reduced features are still high dimensional, the proposed MOBPSO algorithm is used for finding further feature subsets. The objective functions are suitably modeled by optimizing two conflicting objectives i.e., cardinality of feature subsets and distinctive capability of those selected subsets. As these two objective functions are conflicting in nature, they are more suitable for multi-objective modeling. The experiments are carried out on benchmark gene expression datasets, i.e., Colon, Lymphoma and Leukaemia available in literature. The performance of the selected feature subsets with their classification accuracy and validated using 10 fold cross validation techniques. A detailed comparative study is also made to show the betterment or competitiveness of the proposed algorithm. PMID:27822174

  8. Optimal Feature Space Selection in Detecting Epileptic Seizure based on Recurrent Quantification Analysis and Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Saleh LAshkari

    2016-06-01

    Full Text Available Selecting optimal features based on nature of the phenomenon and high discriminant ability is very important in the data classification problems. Since it doesn't require any assumption about stationary condition and size of the signal and the noise in Recurrent Quantification Analysis (RQA, it may be useful for epileptic seizure Detection. In this study, RQA was used to discriminate ictal EEG from the normal EEG where optimal features selected by combination of algorithm genetic and Bayesian Classifier. Recurrence plots of hundred samples in each two categories were obtained with five distance norms in this study: Euclidean, Maximum, Minimum, Normalized and Fixed Norm. In order to choose optimal threshold for each norm, ten threshold of ε was generated and then the best feature space was selected by genetic algorithm in combination with a bayesian classifier. The results shown that proposed method is capable of discriminating the ictal EEG from the normal EEG where for Minimum norm and 0.1˂ε˂1, accuracy was 100%. In addition, the sensitivity of proposed framework to the ε and the distance norm parameters was low. The optimal feature presented in this study is Trans which it was selected in most feature spaces with high accuracy.

  9. Feature Selection of Network Intrusion Data using Genetic Algorithm and Particle Swarm Optimization

    Directory of Open Access Journals (Sweden)

    Iwan Syarif

    2016-12-01

    Full Text Available This paper describes the advantages of using Evolutionary Algorithms (EA for feature selection on network intrusion dataset. Most current Network Intrusion Detection Systems (NIDS are unable to detect intrusions in real time because of high dimensional data produced during daily operation. Extracting knowledge from huge data such as intrusion data requires new approach. The more complex the datasets, the higher computation time and the harder they are to be interpreted and analyzed. This paper investigates the performance of feature selection algoritms in network intrusiona data. We used Genetic Algorithms (GA and Particle Swarm Optimizations (PSO as feature selection algorithms. When applied to network intrusion datasets, both GA and PSO have significantly reduces the number of features. Our experiments show that GA successfully reduces the number of attributes from 41 to 15 while PSO reduces the number of attributes from 41 to 9. Using k Nearest Neighbour (k-NN as a classifier,the GA-reduced dataset which consists of 37% of original attributes, has accuracy improvement from 99.28% to 99.70% and its execution time is also 4.8 faster than the execution time of original dataset. Using the same classifier, PSO-reduced dataset which consists of 22% of original attributes, has the fastest execution time (7.2 times faster than the execution time of original datasets. However, its accuracy is slightly reduced 0.02% from 99.28% to 99.26%. Overall, both GA and PSO are good solution as feature selection techniques because theyhave shown very good performance in reducing the number of features significantly while still maintaining and sometimes improving the classification accuracy as well as reducing the computation time.

  10. Feature Selection and Parameter Optimization of Support Vector Machines Based on Modified Artificial Fish Swarm Algorithms

    Directory of Open Access Journals (Sweden)

    Kuan-Cheng Lin

    2015-01-01

    Full Text Available Rapid advances in information and communication technology have made ubiquitous computing and the Internet of Things popular and practicable. These applications create enormous volumes of data, which are available for analysis and classification as an aid to decision-making. Among the classification methods used to deal with big data, feature selection has proven particularly effective. One common approach involves searching through a subset of the features that are the most relevant to the topic or represent the most accurate description of the dataset. Unfortunately, searching through this kind of subset is a combinatorial problem that can be very time consuming. Meaheuristic algorithms are commonly used to facilitate the selection of features. The artificial fish swarm algorithm (AFSA employs the intelligence underlying fish swarming behavior as a means to overcome optimization of combinatorial problems. AFSA has proven highly successful in a diversity of applications; however, there remain shortcomings, such as the likelihood of falling into a local optimum and a lack of multiplicity. This study proposes a modified AFSA (MAFSA to improve feature selection and parameter optimization for support vector machine classifiers. Experiment results demonstrate the superiority of MAFSA in classification accuracy using subsets with fewer features for given UCI datasets, compared to the original FASA.

  11. A stereo remote sensing feature selection method based on artificial bee colony algorithm

    Science.gov (United States)

    Yan, Yiming; Liu, Pigang; Zhang, Ye; Su, Nan; Tian, Shu; Gao, Fengjiao; Shen, Yi

    2014-05-01

    To improve the efficiency of stereo information for remote sensing classification, a stereo remote sensing feature selection method is proposed in this paper presents, which is based on artificial bee colony algorithm. Remote sensing stereo information could be described by digital surface model (DSM) and optical image, which contain information of the three-dimensional structure and optical characteristics, respectively. Firstly, three-dimensional structure characteristic could be analyzed by 3D-Zernike descriptors (3DZD). However, different parameters of 3DZD could descript different complexity of three-dimensional structure, and it needs to be better optimized selected for various objects on the ground. Secondly, features for representing optical characteristic also need to be optimized. If not properly handled, when a stereo feature vector composed of 3DZD and image features, that would be a lot of redundant information, and the redundant information may not improve the classification accuracy, even cause adverse effects. To reduce information redundancy while maintaining or improving the classification accuracy, an optimized frame for this stereo feature selection problem is created, and artificial bee colony algorithm is introduced for solving this optimization problem. Experimental results show that the proposed method can effectively improve the computational efficiency, improve the classification accuracy.

  12. A Hybrid Feature Subset Selection Algorithm for Analysis of High Correlation Proteomic Data

    Science.gov (United States)

    Kordy, Hussain Montazery; Baygi, Mohammad Hossein Miran; Moradi, Mohammad Hassan

    2012-01-01

    Pathological changes within an organ can be reflected as proteomic patterns in biological fluids such as plasma, serum, and urine. The surface-enhanced laser desorption and ionization time-of-flight mass spectrometry (SELDI-TOF MS) has been used to generate proteomic profiles from biological fluids. Mass spectrometry yields redundant noisy data that the most data points are irrelevant features for differentiating between cancer and normal cases. In this paper, we have proposed a hybrid feature subset selection algorithm based on maximum-discrimination and minimum-correlation coupled with peak scoring criteria. Our algorithm has been applied to two independent SELDI-TOF MS datasets of ovarian cancer obtained from the NCI-FDA clinical proteomics databank. The proposed algorithm has used to extract a set of proteins as potential biomarkers in each dataset. We applied the linear discriminate analysis to identify the important biomarkers. The selected biomarkers have been able to successfully diagnose the ovarian cancer patients from the noncancer control group with an accuracy of 100%, a sensitivity of 100%, and a specificity of 100% in the two datasets. The hybrid algorithm has the advantage that increases reproducibility of selected biomarkers and able to find a small set of proteins with high discrimination power. PMID:23717808

  13. An Empirical Study of Wrappers for Feature Subset Selection based on a Parallel Genetic Algorithm: The Multi-Wrapper Model

    KAUST Repository

    Soufan, Othman

    2012-01-01

    proper criterion seeks to find the best subset of features describing data (relevance) and achieving better performance (optimality). Wrapper approaches are feature selection methods which are wrapped around a classification algorithm and use a

  14. DWFS: A Wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm

    KAUST Repository

    Soufan, Othman

    2015-02-26

    Many scientific problems can be formulated as classification tasks. Data that harbor relevant information are usually described by a large number of features. Frequently, many of these features are irrelevant for the class prediction. The efficient implementation of classification models requires identification of suitable combinations of features. The smaller number of features reduces the problem\\'s dimensionality and may result in higher classification performance. We developed DWFS, a web-based tool that allows for efficient selection of features for a variety of problems. DWFS follows the wrapper paradigm and applies a search strategy based on Genetic Algorithms (GAs). A parallel GA implementation examines and evaluates simultaneously large number of candidate collections of features. DWFS also integrates various filteringmethods thatmay be applied as a pre-processing step in the feature selection process. Furthermore, weights and parameters in the fitness function of GA can be adjusted according to the application requirements. Experiments using heterogeneous datasets from different biomedical applications demonstrate that DWFS is fast and leads to a significant reduction of the number of features without sacrificing performance as compared to several widely used existing methods. DWFS can be accessed online at www.cbrc.kaust.edu.sa/dwfs.

  15. DWFS: A Wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm

    KAUST Repository

    Soufan, Othman; Kleftogiannis, Dimitrios A.; Kalnis, Panos; Bajic, Vladimir B.

    2015-01-01

    Many scientific problems can be formulated as classification tasks. Data that harbor relevant information are usually described by a large number of features. Frequently, many of these features are irrelevant for the class prediction. The efficient implementation of classification models requires identification of suitable combinations of features. The smaller number of features reduces the problem's dimensionality and may result in higher classification performance. We developed DWFS, a web-based tool that allows for efficient selection of features for a variety of problems. DWFS follows the wrapper paradigm and applies a search strategy based on Genetic Algorithms (GAs). A parallel GA implementation examines and evaluates simultaneously large number of candidate collections of features. DWFS also integrates various filteringmethods thatmay be applied as a pre-processing step in the feature selection process. Furthermore, weights and parameters in the fitness function of GA can be adjusted according to the application requirements. Experiments using heterogeneous datasets from different biomedical applications demonstrate that DWFS is fast and leads to a significant reduction of the number of features without sacrificing performance as compared to several widely used existing methods. DWFS can be accessed online at www.cbrc.kaust.edu.sa/dwfs.

  16. An application of locally linear model tree algorithm with combination of feature selection in credit scoring

    Science.gov (United States)

    Siami, Mohammad; Gholamian, Mohammad Reza; Basiri, Javad

    2014-10-01

    Nowadays, credit scoring is one of the most important topics in the banking sector. Credit scoring models have been widely used to facilitate the process of credit assessing. In this paper, an application of the locally linear model tree algorithm (LOLIMOT) was experimented to evaluate the superiority of its performance to predict the customer's credit status. The algorithm is improved with an aim of adjustment by credit scoring domain by means of data fusion and feature selection techniques. Two real world credit data sets - Australian and German - from UCI machine learning database were selected to demonstrate the performance of our new classifier. The analytical results indicate that the improved LOLIMOT significantly increase the prediction accuracy.

  17. An Empirical Study of Wrappers for Feature Subset Selection based on a Parallel Genetic Algorithm: The Multi-Wrapper Model

    KAUST Repository

    Soufan, Othman

    2012-09-01

    Feature selection is the first task of any learning approach that is applied in major fields of biomedical, bioinformatics, robotics, natural language processing and social networking. In feature subset selection problem, a search methodology with a proper criterion seeks to find the best subset of features describing data (relevance) and achieving better performance (optimality). Wrapper approaches are feature selection methods which are wrapped around a classification algorithm and use a performance measure to select the best subset of features. We analyze the proper design of the objective function for the wrapper approach and highlight an objective based on several classification algorithms. We compare the wrapper approaches to different feature selection methods based on distance and information based criteria. Significant improvement in performance, computational time, and selection of minimally sized feature subsets is achieved by combining different objectives for the wrapper model. In addition, considering various classification methods in the feature selection process could lead to a global solution of desirable characteristics.

  18. Feature selection for disruption prediction from scratch in JET by using genetic algorithms and probabilistic predictors

    Energy Technology Data Exchange (ETDEWEB)

    Pereira, Augusto, E-mail: augusto.pereira@ciemat.es [Laboratorio Nacional de Fusión, CIEMAT, Madrid (Spain); Vega, Jesús; Moreno, Raúl [Laboratorio Nacional de Fusión, CIEMAT, Madrid (Spain); Dormido-Canto, Sebastián [Dpto. Informática y Automática – UNED, Madrid (Spain); Rattá, Giuseppe A. [Laboratorio Nacional de Fusión, CIEMAT, Madrid (Spain); Pavón, Fernando [Dpto. Informática y Automática – UNED, Madrid (Spain)

    2015-10-15

    Recently, a probabilistic classifier has been developed at JET to be used as predictor from scratch. It has been applied to a database of 1237 JET ITER-like wall (ILW) discharges (of which 201 disrupted) with good results: success rate of 94% and false alarm rate of 4.21%. A combinatorial analysis between 14 features to ensure the selection of the best ones to achieve good enough results in terms of success rate and false alarm rate was performed. All possible combinations with a number of features between 2 and 7 were tested and 9893 different predictors were analyzed. An important drawback in this analysis was the time required to compute the results that can be estimated in 1731 h (∼2.4 months). Genetic algorithms (GA) are searching algorithms that simulate the process of natural selection. In this article, the GA and the Venn predictors are combined with the objective not only of finding good enough features within the 14 available ones but also of reducing the computational time requirements. Five different performance metrics as measures of the GA fitness function have been evaluated. The best metric was the measurement called Informedness, with just 6 generations (168 predictors at 29.4 h).

  19. Feature selection for disruption prediction from scratch in JET by using genetic algorithms and probabilistic predictors

    International Nuclear Information System (INIS)

    Pereira, Augusto; Vega, Jesús; Moreno, Raúl; Dormido-Canto, Sebastián; Rattá, Giuseppe A.; Pavón, Fernando

    2015-01-01

    Recently, a probabilistic classifier has been developed at JET to be used as predictor from scratch. It has been applied to a database of 1237 JET ITER-like wall (ILW) discharges (of which 201 disrupted) with good results: success rate of 94% and false alarm rate of 4.21%. A combinatorial analysis between 14 features to ensure the selection of the best ones to achieve good enough results in terms of success rate and false alarm rate was performed. All possible combinations with a number of features between 2 and 7 were tested and 9893 different predictors were analyzed. An important drawback in this analysis was the time required to compute the results that can be estimated in 1731 h (∼2.4 months). Genetic algorithms (GA) are searching algorithms that simulate the process of natural selection. In this article, the GA and the Venn predictors are combined with the objective not only of finding good enough features within the 14 available ones but also of reducing the computational time requirements. Five different performance metrics as measures of the GA fitness function have been evaluated. The best metric was the measurement called Informedness, with just 6 generations (168 predictors at 29.4 h).

  20. Feature Selection and Predictors of Falls with Foot Force Sensors Using KNN-Based Algorithms

    Directory of Open Access Journals (Sweden)

    Shengyun Liang

    2015-11-01

    Full Text Available The aging process may lead to the degradation of lower extremity function in the elderly population, which can restrict their daily quality of life and gradually increase the fall risk. We aimed to determine whether objective measures of physical function could predict subsequent falls. Ground reaction force (GRF data, which was quantified by sample entropy, was collected by foot force sensors. Thirty eight subjects (23 fallers and 15 non-fallers participated in functional movement tests, including walking and sit-to-stand (STS. A feature selection algorithm was used to select relevant features to classify the elderly into two groups: at risk and not at risk of falling down, for three KNN-based classifiers: local mean-based k-nearest neighbor (LMKNN, pseudo nearest neighbor (PNN, local mean pseudo nearest neighbor (LMPNN classification. We compared classification performances, and achieved the best results with LMPNN, with sensitivity, specificity and accuracy all 100%. Moreover, a subset of GRFs was significantly different between the two groups via Wilcoxon rank sum test, which is compatible with the classification results. This method could potentially be used by non-experts to monitor balance and the risk of falling down in the elderly population.

  1. Improved feature selection based on genetic algorithms for real time disruption prediction on JET

    Energy Technology Data Exchange (ETDEWEB)

    Ratta, G.A., E-mail: garatta@gateme.unsj.edu.ar [GATEME, Facultad de Ingenieria, Universidad Nacional de San Juan, Avda. San Martin 1109 (O), 5400 San Juan (Argentina); JET EFDA, Culham Science Centre, OX14 3DB Abingdon (United Kingdom); Vega, J. [Asociacion EURATOM/CIEMAT para Fusion, Avda. Complutense, 40, 28040 Madrid (Spain); JET EFDA, Culham Science Centre, OX14 3DB Abingdon (United Kingdom); Murari, A. [Associazione EURATOM-ENEA per la Fusione, Consorzio RFX, 4-35127 Padova (Italy); JET EFDA, Culham Science Centre, OX14 3DB Abingdon (United Kingdom)

    2012-09-15

    Highlights: Black-Right-Pointing-Pointer A new signal selection methodology to improve disruption prediction is reported. Black-Right-Pointing-Pointer The approach is based on Genetic Algorithms. Black-Right-Pointing-Pointer An advanced predictor has been created with the new set of signals. Black-Right-Pointing-Pointer The new system obtains considerably higher prediction rates. - Abstract: The early prediction of disruptions is an important aspect of the research in the field of Tokamak control. A very recent predictor, called 'Advanced Predictor Of Disruptions' (APODIS), developed for the 'Joint European Torus' (JET), implements the real time recognition of incoming disruptions with the best success rate achieved ever and an outstanding stability for long periods following training. In this article, a new methodology to select the set of the signals' parameters in order to maximize the performance of the predictor is reported. The approach is based on 'Genetic Algorithms' (GAs). With the feature selection derived from GAs, a new version of APODIS has been developed. The results are significantly better than the previous version not only in terms of success rates but also in extending the interval before the disruption in which reliable predictions are achieved. Correct disruption predictions with a success rate in excess of 90% have been achieved 200 ms before the time of the disruption. The predictor response is compared with that of JET's Protection System (JPS) and the ADODIS predictor is shown to be far superior. Both systems have been carefully tested with a wide number of discharges to understand their relative merits and the most profitable directions of further improvements.

  2. Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets

    NARCIS (Netherlands)

    Aalaei, Shokoufeh; Shahraki, Hadi; Rowhanimanesh, Alireza; Eslami, Saeid

    2016-01-01

    This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. To

  3. Improved feature selection based on genetic algorithms for real time disruption prediction on JET

    International Nuclear Information System (INIS)

    Rattá, G.A.; Vega, J.; Murari, A.

    2012-01-01

    Highlights: ► A new signal selection methodology to improve disruption prediction is reported. ► The approach is based on Genetic Algorithms. ► An advanced predictor has been created with the new set of signals. ► The new system obtains considerably higher prediction rates. - Abstract: The early prediction of disruptions is an important aspect of the research in the field of Tokamak control. A very recent predictor, called “Advanced Predictor Of Disruptions” (APODIS), developed for the “Joint European Torus” (JET), implements the real time recognition of incoming disruptions with the best success rate achieved ever and an outstanding stability for long periods following training. In this article, a new methodology to select the set of the signals’ parameters in order to maximize the performance of the predictor is reported. The approach is based on “Genetic Algorithms” (GAs). With the feature selection derived from GAs, a new version of APODIS has been developed. The results are significantly better than the previous version not only in terms of success rates but also in extending the interval before the disruption in which reliable predictions are achieved. Correct disruption predictions with a success rate in excess of 90% have been achieved 200 ms before the time of the disruption. The predictor response is compared with that of JET's Protection System (JPS) and the ADODIS predictor is shown to be far superior. Both systems have been carefully tested with a wide number of discharges to understand their relative merits and the most profitable directions of further improvements.

  4. Multi-Stage Feature Selection by Using Genetic Algorithms for Fault Diagnosis in Gearboxes Based on Vibration Signal

    Directory of Open Access Journals (Sweden)

    Mariela Cerrada

    2015-09-01

    Full Text Available There are growing demands for condition-based monitoring of gearboxes, and techniques to improve the reliability, effectiveness and accuracy for fault diagnosis are considered valuable contributions. Feature selection is still an important aspect in machine learning-based diagnosis in order to reach good performance in the diagnosis system. The main aim of this research is to propose a multi-stage feature selection mechanism for selecting the best set of condition parameters on the time, frequency and time-frequency domains, which are extracted from vibration signals for fault diagnosis purposes in gearboxes. The selection is based on genetic algorithms, proposing in each stage a new subset of the best features regarding the classifier performance in a supervised environment. The selected features are augmented at each stage and used as input for a neural network classifier in the next step, while a new subset of feature candidates is treated by the selection process. As a result, the inherent exploration and exploitation of the genetic algorithms for finding the best solutions of the selection problem are locally focused. The Sensors 2015, 15 23904 approach is tested on a dataset from a real test bed with several fault classes under different running conditions of load and velocity. The model performance for diagnosis is over 98%.

  5. Online feature selection with streaming features.

    Science.gov (United States)

    Wu, Xindong; Yu, Kui; Ding, Wei; Wang, Hao; Zhu, Xingquan

    2013-05-01

    We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.

  6. Day-ahead price forecasting of electricity markets by a new feature selection algorithm and cascaded neural network technique

    International Nuclear Information System (INIS)

    Amjady, Nima; Keynia, Farshid

    2009-01-01

    With the introduction of restructuring into the electric power industry, the price of electricity has become the focus of all activities in the power market. Electricity price forecast is key information for electricity market managers and participants. However, electricity price is a complex signal due to its non-linear, non-stationary, and time variant behavior. In spite of performed research in this area, more accurate and robust price forecast methods are still required. In this paper, a new forecast strategy is proposed for day-ahead price forecasting of electricity markets. Our forecast strategy is composed of a new two stage feature selection technique and cascaded neural networks. The proposed feature selection technique comprises modified Relief algorithm for the first stage and correlation analysis for the second stage. The modified Relief algorithm selects candidate inputs with maximum relevancy with the target variable. Then among the selected candidates, the correlation analysis eliminates redundant inputs. Selected features by the two stage feature selection technique are used for the forecast engine, which is composed of 24 consecutive forecasters. Each of these 24 forecasters is a neural network allocated to predict the price of 1 h of the next day. The whole proposed forecast strategy is examined on the Spanish and Australia's National Electricity Markets Management Company (NEMMCO) and compared with some of the most recent price forecast methods.

  7. Icing Forecasting of High Voltage Transmission Line Using Weighted Least Square Support Vector Machine with Fireworks Algorithm for Feature Selection

    Directory of Open Access Journals (Sweden)

    Tiannan Ma

    2016-12-01

    Full Text Available Accurate forecasting of icing thickness has great significance for ensuring the security and stability of the power grid. In order to improve the forecasting accuracy, this paper proposes an icing forecasting system based on the fireworks algorithm and weighted least square support vector machine (W-LSSVM. The method of the fireworks algorithm is employed to select the proper input features with the purpose of eliminating redundant influence. In addition, the aim of the W-LSSVM model is to train and test the historical data-set with the selected features. The capability of this proposed icing forecasting model and framework is tested through simulation experiments using real-world icing data from the monitoring center of the key laboratory of anti-ice disaster, Hunan, South China. The results show that the proposed W-LSSVM-FA method has a higher prediction accuracy and it may be a promising alternative for icing thickness forecasting.

  8. Evaluation of feature selection algorithms for classification in temporal lobe epilepsy based on MR images

    Science.gov (United States)

    Lai, Chunren; Guo, Shengwen; Cheng, Lina; Wang, Wensheng; Wu, Kai

    2017-02-01

    It's very important to differentiate the temporal lobe epilepsy (TLE) patients from healthy people and localize the abnormal brain regions of the TLE patients. The cortical features and changes can reveal the unique anatomical patterns of brain regions from the structural MR images. In this study, structural MR images from 28 normal controls (NC), 18 left TLE (LTLE), and 21 right TLE (RTLE) were acquired, and four types of cortical feature, namely cortical thickness (CTh), cortical surface area (CSA), gray matter volume (GMV), and mean curvature (MCu), were explored for discriminative analysis. Three feature selection methods, the independent sample t-test filtering, the sparse-constrained dimensionality reduction model (SCDRM), and the support vector machine-recursive feature elimination (SVM-RFE), were investigated to extract dominant regions with significant differences among the compared groups for classification using the SVM classifier. The results showed that the SVM-REF achieved the highest performance (most classifications with more than 92% accuracy), followed by the SCDRM, and the t-test. Especially, the surface area and gray volume matter exhibited prominent discriminative ability, and the performance of the SVM was improved significantly when the four cortical features were combined. Additionally, the dominant regions with higher classification weights were mainly located in temporal and frontal lobe, including the inferior temporal, entorhinal cortex, fusiform, parahippocampal cortex, middle frontal and frontal pole. It was demonstrated that the cortical features provided effective information to determine the abnormal anatomical pattern and the proposed method has the potential to improve the clinical diagnosis of the TLE.

  9. Hybridization between multi-objective genetic algorithm and support vector machine for feature selection in walker-assisted gait.

    Science.gov (United States)

    Martins, Maria; Costa, Lino; Frizera, Anselmo; Ceres, Ramón; Santos, Cristina

    2014-03-01

    Walker devices are often prescribed incorrectly to patients, leading to the increase of dissatisfaction and occurrence of several problems, such as, discomfort and pain. Thus, it is necessary to objectively evaluate the effects that assisted gait can have on the gait patterns of walker users, comparatively to a non-assisted gait. A gait analysis, focusing on spatiotemporal and kinematics parameters, will be issued for this purpose. However, gait analysis yields redundant information that often is difficult to interpret. This study addresses the problem of selecting the most relevant gait features required to differentiate between assisted and non-assisted gait. For that purpose, it is presented an efficient approach that combines evolutionary techniques, based on genetic algorithms, and support vector machine algorithms, to discriminate differences between assisted and non-assisted gait with a walker with forearm supports. For comparison purposes, other classification algorithms are verified. Results with healthy subjects show that the main differences are characterized by balance and joints excursion in the sagittal plane. These results, confirmed by clinical evidence, allow concluding that this technique is an efficient feature selection approach. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  10. Optimum location of external markers using feature selection algorithms for real-time tumor tracking in external-beam radiotherapy: a virtual phantom study.

    Science.gov (United States)

    Nankali, Saber; Torshabi, Ahmad Esmaili; Miandoab, Payam Samadi; Baghizadeh, Amin

    2016-01-08

    In external-beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation-based Feature Selection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two "Genetic" and "Ranker" searching procedures. The performance of these algorithms has been evaluated using four-dimensional extended cardiac-torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro-fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each feature selection algorithm. Duncan statistical test, followed by F-test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation-based feature selection algorithm, in

  11. Optimum location of external markers using feature selection algorithms for real‐time tumor tracking in external‐beam radiotherapy: a virtual phantom study

    Science.gov (United States)

    Nankali, Saber; Miandoab, Payam Samadi; Baghizadeh, Amin

    2016-01-01

    In external‐beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation‐based Feature Selection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two “Genetic” and “Ranker” searching procedures. The performance of these algorithms has been evaluated using four‐dimensional extended cardiac‐torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro‐fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each feature selection algorithm. Duncan statistical test, followed by F‐test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation‐based feature

  12. Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy.

    Science.gov (United States)

    Welikala, R A; Fraz, M M; Dehmeshki, J; Hoppe, A; Tah, V; Mann, S; Williamson, T H; Barman, S A

    2015-07-01

    Proliferative diabetic retinopathy (PDR) is a condition that carries a high risk of severe visual impairment. The hallmark of PDR is the growth of abnormal new vessels. In this paper, an automated method for the detection of new vessels from retinal images is presented. This method is based on a dual classification approach. Two vessel segmentation approaches are applied to create two separate binary vessel map which each hold vital information. Local morphology features are measured from each binary vessel map to produce two separate 4-D feature vectors. Independent classification is performed for each feature vector using a support vector machine (SVM) classifier. The system then combines these individual outcomes to produce a final decision. This is followed by the creation of additional features to generate 21-D feature vectors, which feed into a genetic algorithm based feature selection approach with the objective of finding feature subsets that improve the performance of the classification. Sensitivity and specificity results using a dataset of 60 images are 0.9138 and 0.9600, respectively, on a per patch basis and 1.000 and 0.975, respectively, on a per image basis. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. Feature Selection and Fault Classification of Reciprocating Compressors using a Genetic Algorithm and a Probabilistic Neural Network

    International Nuclear Information System (INIS)

    Ahmed, M; Gu, F; Ball, A

    2011-01-01

    Reciprocating compressors are widely used in industry for various purposes and faults occurring in them can degrade their performance, consume additional energy and even cause severe damage to the machine. Vibration monitoring techniques are often used for early fault detection and diagnosis, but it is difficult to prescribe a given set of effective diagnostic features because of the wide variety of operating conditions and the complexity of the vibration signals which originate from the many different vibrating and impact sources. This paper studies the use of genetic algorithms (GAs) and neural networks (NNs) to select effective diagnostic features for the fault diagnosis of a reciprocating compressor. A large number of common features are calculated from the time and frequency domains and envelope analysis. Applying GAs and NNs to these features found that envelope analysis has the most potential for differentiating three common faults: valve leakage, inter-cooler leakage and a loose drive belt. Simultaneously, the spread parameter of the probabilistic NN was also optimised. The selected subsets of features were examined based on vibration source characteristics. The approach developed and the trained NN are confirmed as possessing general characteristics for fault detection and diagnosis.

  14. Feature Selection and Fault Classification of Reciprocating Compressors using a Genetic Algorithm and a Probabilistic Neural Network

    Energy Technology Data Exchange (ETDEWEB)

    Ahmed, M; Gu, F; Ball, A, E-mail: M.Ahmed@hud.ac.uk [Diagnostic Engineering Research Group, University of Huddersfield, HD1 3DH (United Kingdom)

    2011-07-19

    Reciprocating compressors are widely used in industry for various purposes and faults occurring in them can degrade their performance, consume additional energy and even cause severe damage to the machine. Vibration monitoring techniques are often used for early fault detection and diagnosis, but it is difficult to prescribe a given set of effective diagnostic features because of the wide variety of operating conditions and the complexity of the vibration signals which originate from the many different vibrating and impact sources. This paper studies the use of genetic algorithms (GAs) and neural networks (NNs) to select effective diagnostic features for the fault diagnosis of a reciprocating compressor. A large number of common features are calculated from the time and frequency domains and envelope analysis. Applying GAs and NNs to these features found that envelope analysis has the most potential for differentiating three common faults: valve leakage, inter-cooler leakage and a loose drive belt. Simultaneously, the spread parameter of the probabilistic NN was also optimised. The selected subsets of features were examined based on vibration source characteristics. The approach developed and the trained NN are confirmed as possessing general characteristics for fault detection and diagnosis.

  15. A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data

    Directory of Open Access Journals (Sweden)

    Sheng Yang

    2015-01-01

    Full Text Available Sequencing is widely used to discover associations between microRNAs (miRNAs and diseases. However, the negative binomial distribution (NB and high dimensionality of data obtained using sequencing can lead to low-power results and low reproducibility. Several statistical learning algorithms have been proposed to address sequencing data, and although evaluation of these methods is essential, such studies are relatively rare. The performance of seven feature selection (FS algorithms, including baySeq, DESeq, edgeR, the rank sum test, lasso, particle swarm optimistic decision tree, and random forest (RF, was compared by simulation under different conditions based on the difference of the mean, the dispersion parameter of the NB, and the signal to noise ratio. Real data were used to evaluate the performance of RF, logistic regression, and support vector machine. Based on the simulation and real data, we discuss the behaviour of the FS and classification algorithms. The Apriori algorithm identified frequent item sets (mir-133a, mir-133b, mir-183, mir-937, and mir-96 from among the deregulated miRNAs of six datasets from The Cancer Genomics Atlas. Taking these findings altogether and considering computational memory requirements, we propose a strategy that combines edgeR and DESeq for large sample sizes.

  16. Building an intrusion detection system using a filter-based feature selection algorithm

    NARCIS (Netherlands)

    Ambusaidi, Mohammed A.; He, Xiangjian; Nanda, Priyadarsi; Tan, Zhiyuan

    2016-01-01

    Redundant and irrelevant features in data have caused a long-term problem in network traffic classification. These features not only slow down the process of classification but also prevent a classifier from making accurate decisions, especially when coping with big data. In this paper, we propose a

  17. Feature Selection Method Based on Artificial Bee Colony Algorithm and Support Vector Machines for Medical Datasets Classification

    Directory of Open Access Journals (Sweden)

    Mustafa Serter Uzer

    2013-01-01

    Full Text Available This paper offers a hybrid approach that uses the artificial bee colony (ABC algorithm for feature selection and support vector machines for classification. The purpose of this paper is to test the effect of elimination of the unimportant and obsolete features of the datasets on the success of the classification, using the SVM classifier. The developed approach conventionally used in liver diseases and diabetes diagnostics, which are commonly observed and reduce the quality of life, is developed. For the diagnosis of these diseases, hepatitis, liver disorders and diabetes datasets from the UCI database were used, and the proposed system reached a classification accuracies of 94.92%, 74.81%, and 79.29%, respectively. For these datasets, the classification accuracies were obtained by the help of the 10-fold cross-validation method. The results show that the performance of the method is highly successful compared to other results attained and seems very promising for pattern recognition applications.

  18. Gesture Recognition from Data Streams of Human Motion Sensor Using Accelerated PSO Swarm Search Feature Selection Algorithm

    Directory of Open Access Journals (Sweden)

    Simon Fong

    2015-01-01

    Full Text Available Human motion sensing technology gains tremendous popularity nowadays with practical applications such as video surveillance for security, hand signing, and smart-home and gaming. These applications capture human motions in real-time from video sensors, the data patterns are nonstationary and ever changing. While the hardware technology of such motion sensing devices as well as their data collection process become relatively mature, the computational challenge lies in the real-time analysis of these live feeds. In this paper we argue that traditional data mining methods run short of accurately analyzing the human activity patterns from the sensor data stream. The shortcoming is due to the algorithmic design which is not adaptive to the dynamic changes in the dynamic gesture motions. The successor of these algorithms which is known as data stream mining is evaluated versus traditional data mining, through a case of gesture recognition over motion data by using Microsoft Kinect sensors. Three different subjects were asked to read three comic strips and to tell the stories in front of the sensor. The data stream contains coordinates of articulation points and various positions of the parts of the human body corresponding to the actions that the user performs. In particular, a novel technique of feature selection using swarm search and accelerated PSO is proposed for enabling fast preprocessing for inducing an improved classification model in real-time. Superior result is shown in the experiment that runs on this empirical data stream. The contribution of this paper is on a comparative study between using traditional and data stream mining algorithms and incorporation of the novel improved feature selection technique with a scenario where different gesture patterns are to be recognized from streaming sensor data.

  19. AHIMSA - Ad hoc histogram information measure sensing algorithm for feature selection in the context of histogram inspired clustering techniques

    Science.gov (United States)

    Dasarathy, B. V.

    1976-01-01

    An algorithm is proposed for dimensionality reduction in the context of clustering techniques based on histogram analysis. The approach is based on an evaluation of the hills and valleys in the unidimensional histograms along the different features and provides an economical means of assessing the significance of the features in a nonparametric unsupervised data environment. The method has relevance to remote sensing applications.

  20. Feature Selection by Reordering

    Czech Academy of Sciences Publication Activity Database

    Jiřina, Marcel; Jiřina jr., M.

    2005-01-01

    Roč. 2, č. 1 (2005), s. 155-161 ISSN 1738-6438 Institutional research plan: CEZ:AV0Z10300504 Keywords : feature selection * data reduction * ordering of features Subject RIV: BA - General Mathematics

  1. Unsupervised Feature Subset Selection

    DEFF Research Database (Denmark)

    Søndberg-Madsen, Nicolaj; Thomsen, C.; Pena, Jose

    2003-01-01

    This paper studies filter and hybrid filter-wrapper feature subset selection for unsupervised learning (data clustering). We constrain the search for the best feature subset by scoring the dependence of every feature on the rest of the features, conjecturing that these scores discriminate some ir...... irrelevant features. We report experimental results on artificial and real data for unsupervised learning of naive Bayes models. Both the filter and hybrid approaches perform satisfactorily....

  2. Feature selection toolbox software package

    Czech Academy of Sciences Publication Activity Database

    Pudil, Pavel; Novovičová, Jana; Somol, Petr

    2002-01-01

    Roč. 23, č. 4 (2002), s. 487-492 ISSN 0167-8655 R&D Projects: GA ČR GA402/01/0981 Institutional research plan: CEZ:AV0Z1075907 Keywords : pattern recognition * feature selection * loating search algorithms Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.409, year: 2002

  3. Feature Selection with the Boruta Package

    OpenAIRE

    Kursa, Miron B.; Rudnicki, Witold R.

    2010-01-01

    This article describes a R package Boruta, implementing a novel feature selection algorithm for finding emph{all relevant variables}. The algorithm is designed as a wrapper around a Random Forest classification algorithm. It iteratively removes the features which are proved by a statistical test to be less relevant than random probes. The Boruta package provides a convenient interface to the algorithm. The short description of the algorithm and examples of its application are presented.

  4. Feature Selection with the Boruta Package

    Directory of Open Access Journals (Sweden)

    Miron B. Kursa

    2010-10-01

    Full Text Available This article describes a R package Boruta, implementing a novel feature selection algorithm for finding emph{all relevant variables}. The algorithm is designed as a wrapper around a Random Forest classification algorithm. It iteratively removes the features which are proved by a statistical test to be less relevant than random probes. The Boruta package provides a convenient interface to the algorithm. The short description of the algorithm and examples of its application are presented.

  5. Normed kernel function-based fuzzy possibilistic C-means (NKFPCM) algorithm for high-dimensional breast cancer database classification with feature selection is based on Laplacian Score

    Science.gov (United States)

    Lestari, A. W.; Rustam, Z.

    2017-07-01

    In the last decade, breast cancer has become the focus of world attention as this disease is one of the primary leading cause of death for women. Therefore, it is necessary to have the correct precautions and treatment. In previous studies, Fuzzy Kennel K-Medoid algorithm has been used for multi-class data. This paper proposes an algorithm to classify the high dimensional data of breast cancer using Fuzzy Possibilistic C-means (FPCM) and a new method based on clustering analysis using Normed Kernel Function-Based Fuzzy Possibilistic C-Means (NKFPCM). The objective of this paper is to obtain the best accuracy in classification of breast cancer data. In order to improve the accuracy of the two methods, the features candidates are evaluated using feature selection, where Laplacian Score is used. The results show the comparison accuracy and running time of FPCM and NKFPCM with and without feature selection.

  6. Automatic Algorithm Selection for Complex Simulation Problems

    CERN Document Server

    Ewald, Roland

    2012-01-01

    To select the most suitable simulation algorithm for a given task is often difficult. This is due to intricate interactions between model features, implementation details, and runtime environment, which may strongly affect the overall performance. An automated selection of simulation algorithms supports users in setting up simulation experiments without demanding expert knowledge on simulation. Roland Ewald analyzes and discusses existing approaches to solve the algorithm selection problem in the context of simulation. He introduces a framework for automatic simulation algorithm selection and

  7. Leukemia and colon tumor detection based on microarray data classification using momentum backpropagation and genetic algorithm as a feature selection method

    Science.gov (United States)

    Wisesty, Untari N.; Warastri, Riris S.; Puspitasari, Shinta Y.

    2018-03-01

    Cancer is one of the major causes of mordibility and mortality problems in the worldwide. Therefore, the need of a system that can analyze and identify a person suffering from a cancer by using microarray data derived from the patient’s Deoxyribonucleic Acid (DNA). But on microarray data has thousands of attributes, thus making the challenges in data processing. This is often referred to as the curse of dimensionality. Therefore, in this study built a system capable of detecting a patient whether contracted cancer or not. The algorithm used is Genetic Algorithm as feature selection and Momentum Backpropagation Neural Network as a classification method, with data used from the Kent Ridge Bio-medical Dataset. Based on system testing that has been done, the system can detect Leukemia and Colon Tumor with best accuracy equal to 98.33% for colon tumor data and 100% for leukimia data. Genetic Algorithm as feature selection algorithm can improve system accuracy, which is from 64.52% to 98.33% for colon tumor data and 65.28% to 100% for leukemia data, and the use of momentum parameters can accelerate the convergence of the system in the training process of Neural Network.

  8. A new model of flavonoids affinity towards P-glycoprotein: genetic algorithm-support vector machine with features selected by a modified particle swarm optimization algorithm.

    Science.gov (United States)

    Cui, Ying; Chen, Qinggang; Li, Yaxiao; Tang, Ling

    2017-02-01

    Flavonoids exhibit a high affinity for the purified cytosolic NBD (C-terminal nucleotide-binding domain) of P-glycoprotein (P-gp). To explore the affinity of flavonoids for P-gp, quantitative structure-activity relationship (QSAR) models were developed using support vector machines (SVMs). A novel method coupling a modified particle swarm optimization algorithm with random mutation strategy and a genetic algorithm coupled with SVM was proposed to simultaneously optimize the kernel parameters of SVM and determine the subset of optimized features for the first time. Using DRAGON descriptors to represent compounds for QSAR, three subsets (training, prediction and external validation set) derived from the dataset were employed to investigate QSAR. With excluding of the outlier, the correlation coefficient (R 2 ) of the whole training set (training and prediction) was 0.924, and the R 2 of the external validation set was 0.941. The root-mean-square error (RMSE) of the whole training set was 0.0588; the RMSE of the cross-validation of the external validation set was 0.0443. The mean Q 2 value of leave-many-out cross-validation was 0.824. With more informations from results of randomization analysis and applicability domain, the proposed model is of good predictive ability, stability.

  9. SIFT based algorithm for point feature tracking

    Directory of Open Access Journals (Sweden)

    Adrian BURLACU

    2007-12-01

    Full Text Available In this paper a tracking algorithm for SIFT features in image sequences is developed. For each point feature extracted using SIFT algorithm a descriptor is computed using information from its neighborhood. Using an algorithm based on minimizing the distance between two descriptors tracking point features throughout image sequences is engaged. Experimental results, obtained from image sequences that capture scaling of different geometrical type object, reveal the performances of the tracking algorithm.

  10. Feature Selection for Object-Based Classification of High-Resolution Remote Sensing Images Based on the Combination of a Genetic Algorithm and Tabu Search

    Directory of Open Access Journals (Sweden)

    Lei Shi

    2018-01-01

    Full Text Available In object-based image analysis of high-resolution images, the number of features can reach hundreds, so it is necessary to perform feature reduction prior to classification. In this paper, a feature selection method based on the combination of a genetic algorithm (GA and tabu search (TS is presented. The proposed GATS method aims to reduce the premature convergence of the GA by the use of TS. A prematurity index is first defined to judge the convergence situation during the search. When premature convergence does take place, an improved mutation operator is executed, in which TS is performed on individuals with higher fitness values. As for the other individuals with lower fitness values, mutation with a higher probability is carried out. Experiments using the proposed GATS feature selection method and three other methods, a standard GA, the multistart TS method, and ReliefF, were conducted on WorldView-2 and QuickBird images. The experimental results showed that the proposed method outperforms the other methods in terms of the final classification accuracy.

  11. Feature Selection for Object-Based Classification of High-Resolution Remote Sensing Images Based on the Combination of a Genetic Algorithm and Tabu Search

    Science.gov (United States)

    Shi, Lei; Wan, Youchuan; Gao, Xianjun

    2018-01-01

    In object-based image analysis of high-resolution images, the number of features can reach hundreds, so it is necessary to perform feature reduction prior to classification. In this paper, a feature selection method based on the combination of a genetic algorithm (GA) and tabu search (TS) is presented. The proposed GATS method aims to reduce the premature convergence of the GA by the use of TS. A prematurity index is first defined to judge the convergence situation during the search. When premature convergence does take place, an improved mutation operator is executed, in which TS is performed on individuals with higher fitness values. As for the other individuals with lower fitness values, mutation with a higher probability is carried out. Experiments using the proposed GATS feature selection method and three other methods, a standard GA, the multistart TS method, and ReliefF, were conducted on WorldView-2 and QuickBird images. The experimental results showed that the proposed method outperforms the other methods in terms of the final classification accuracy. PMID:29581721

  12. Identification of Subtype-Specific Prognostic Genes for Early-Stage Lung Adenocarcinoma and Squamous Cell Carcinoma Patients Using an Embedded Feature Selection Algorithm.

    Directory of Open Access Journals (Sweden)

    Suyan Tian

    Full Text Available The existence of fundamental differences between lung adenocarcinoma (AC and squamous cell carcinoma (SCC in their underlying mechanisms motivated us to postulate that specific genes might exist relevant to prognosis of each histology subtype. To test on this research hypothesis, we previously proposed a simple Cox-regression model based feature selection algorithm and identified successfully some subtype-specific prognostic genes when applying this method to real-world data. In this article, we continue our effort on identification of subtype-specific prognostic genes for AC and SCC, and propose a novel embedded feature selection method by extending Threshold Gradient Descent Regularization (TGDR algorithm and minimizing on a corresponding negative partial likelihood function. Using real-world datasets and simulated ones, we show these two proposed methods have comparable performance whereas the new proposal is superior in terms of model parsimony. Our analysis provides some evidence on the existence of such subtype-specific prognostic genes, more investigation is warranted.

  13. Pap Smear Diagnosis Using a Hybrid Intelligent Scheme Focusing on Genetic Algorithm Based Feature Selection and Nearest Neighbor Classification

    DEFF Research Database (Denmark)

    Marinakis, Yannis; Dounias, Georgios; Jantzen, Jan

    2009-01-01

    The term pap-smear refers to samples of human cells stained by the so-called Papanicolaou method. The purpose of the Papanicolaou method is to diagnose pre-cancerous cell changes before they progress to invasive carcinoma. In this paper a metaheuristic algorithm is proposed in order to classify t...... other previously applied intelligent approaches....

  14. Genetic search feature selection for affective modeling

    DEFF Research Database (Denmark)

    Martínez, Héctor P.; Yannakakis, Georgios N.

    2010-01-01

    Automatic feature selection is a critical step towards the generation of successful computational models of affect. This paper presents a genetic search-based feature selection method which is developed as a global-search algorithm for improving the accuracy of the affective models built....... The method is tested and compared against sequential forward feature selection and random search in a dataset derived from a game survey experiment which contains bimodal input features (physiological and gameplay) and expressed pairwise preferences of affect. Results suggest that the proposed method...

  15. The Short-Term Power Load Forecasting Based on Sperm Whale Algorithm and Wavelet Least Square Support Vector Machine with DWT-IR for Feature Selection

    Directory of Open Access Journals (Sweden)

    Jin-peng Liu

    2017-07-01

    Full Text Available Short-term power load forecasting is an important basis for the operation of integrated energy system, and the accuracy of load forecasting directly affects the economy of system operation. To improve the forecasting accuracy, this paper proposes a load forecasting system based on wavelet least square support vector machine and sperm whale algorithm. Firstly, the methods of discrete wavelet transform and inconsistency rate model (DWT-IR are used to select the optimal features, which aims to reduce the redundancy of input vectors. Secondly, the kernel function of least square support vector machine LSSVM is replaced by wavelet kernel function for improving the nonlinear mapping ability of LSSVM. Lastly, the parameters of W-LSSVM are optimized by sperm whale algorithm, and the short-term load forecasting method of W-LSSVM-SWA is established. Additionally, the example verification results show that the proposed model outperforms other alternative methods and has a strong effectiveness and feasibility in short-term power load forecasting.

  16. Classification Using Markov Blanket for Feature Selection

    DEFF Research Database (Denmark)

    Zeng, Yifeng; Luo, Jian

    2009-01-01

    Selecting relevant features is in demand when a large data set is of interest in a classification task. It produces a tractable number of features that are sufficient and possibly improve the classification performance. This paper studies a statistical method of Markov blanket induction algorithm...... for filtering features and then applies a classifier using the Markov blanket predictors. The Markov blanket contains a minimal subset of relevant features that yields optimal classification performance. We experimentally demonstrate the improved performance of several classifiers using a Markov blanket...... induction as a feature selection method. In addition, we point out an important assumption behind the Markov blanket induction algorithm and show its effect on the classification performance....

  17. Linear feature detection algorithm for astronomical surveys - I. Algorithm description

    Science.gov (United States)

    Bektešević, Dino; Vinković, Dejan

    2017-11-01

    Computer vision algorithms are powerful tools in astronomical image analyses, especially when automation of object detection and extraction is required. Modern object detection algorithms in astronomy are oriented towards detection of stars and galaxies, ignoring completely the detection of existing linear features. With the emergence of wide-field sky surveys, linear features attract scientific interest as possible trails of fast flybys of near-Earth asteroids and meteors. In this work, we describe a new linear feature detection algorithm designed specifically for implementation in big data astronomy. The algorithm combines a series of algorithmic steps that first remove other objects (stars and galaxies) from the image and then enhance the line to enable more efficient line detection with the Hough algorithm. The rate of false positives is greatly reduced thanks to a step that replaces possible line segments with rectangles and then compares lines fitted to the rectangles with the lines obtained directly from the image. The speed of the algorithm and its applicability in astronomical surveys are also discussed.

  18. A compound structure of ELM based on feature selection and parameter optimization using hybrid backtracking search algorithm for wind speed forecasting

    International Nuclear Information System (INIS)

    Zhang, Chu; Zhou, Jianzhong; Li, Chaoshun; Fu, Wenlong; Peng, Tian

    2017-01-01

    Highlights: • A novel hybrid approach is proposed for wind speed forecasting. • The variational mode decomposition (VMD) is optimized to decompose the original wind speed series. • The input matrix and parameters of ELM are optimized simultaneously by using a hybrid BSA. • Results show that OVMD-HBSA-ELM achieves better performance in terms of prediction accuracy. - Abstract: Reliable wind speed forecasting is essential for wind power integration in wind power generation system. The purpose of paper is to develop a novel hybrid model for short-term wind speed forecasting and demonstrates its efficiency. In the proposed model, a compound structure of extreme learning machine (ELM) based on feature selection and parameter optimization using hybrid backtracking search algorithm (HBSA) is employed as the predictor. The real-valued BSA (RBSA) is exploited to search for the optimal combination of weights and bias of ELM while the binary-valued BSA (BBSA) is exploited as a feature selection method applying on the candidate inputs predefined by partial autocorrelation function (PACF) values to reconstruct the input-matrix. Due to the volatility and randomness of wind speed signal, an optimized variational mode decomposition (OVMD) is employed to eliminate the redundant noises. The parameters of the proposed OVMD are determined according to the center frequencies of the decomposed modes and the residual evaluation index (REI). The wind speed signal is decomposed into a few modes via OVMD. The aggregation of the forecasting results of these modes constructs the final forecasting result of the proposed model. The proposed hybrid model has been applied on the mean half-hour wind speed observation data from two wind farms in Inner Mongolia, China and 10-min wind speed data from the Sotavento Galicia wind farm are studied as an additional case. Parallel experiments have been designed to compare with the proposed model. Results obtained from this study indicate that the

  19. A Fast Algorithm of Cartographic Sounding Selection

    Institute of Scientific and Technical Information of China (English)

    SUI Haigang; HUA Li; ZHAO Haitao; ZHANG Yongli

    2005-01-01

    An effective strategy and framework that adequately integrate the automated and manual processes for fast cartographic sounding selection is presented. The important submarine topographic features are extracted for important soundings selection, and an improved "influence circle" algorithm is introduced for sounding selection. For automatic configuration of soundings distribution pattern, a special algorithm considering multi-factors is employed. A semi-automatic method for solving the ambiguous conflicts is described. On the basis of the algorithms and strategies a system named HGIS for fast cartographic sounding selection is developed and applied in Chinese Marine Safety Administration Bureau (CMSAB). The application experiments show that the system is effective and reliable. At last some conclusions and the future work are given.

  20. Document localization algorithms based on feature points and straight lines

    Science.gov (United States)

    Skoryukina, Natalya; Shemiakina, Julia; Arlazarov, Vladimir L.; Faradjev, Igor

    2018-04-01

    The important part of the system of a planar rectangular object analysis is the localization: the estimation of projective transform from template image of an object to its photograph. The system also includes such subsystems as the selection and recognition of text fields, the usage of contexts etc. In this paper three localization algorithms are described. All algorithms use feature points and two of them also analyze near-horizontal and near- vertical lines on the photograph. The algorithms and their combinations are tested on a dataset of real document photographs. Also the method of localization quality estimation is proposed that allows configuring the localization subsystem independently of the other subsystems quality.

  1. Feature Selection via Chaotic Antlion Optimization.

    Directory of Open Access Journals (Sweden)

    Hossam M Zawbaa

    Full Text Available Selecting a subset of relevant properties from a large set of features that describe a dataset is a challenging machine learning task. In biology, for instance, the advances in the available technologies enable the generation of a very large number of biomarkers that describe the data. Choosing the more informative markers along with performing a high-accuracy classification over the data can be a daunting task, particularly if the data are high dimensional. An often adopted approach is to formulate the feature selection problem as a biobjective optimization problem, with the aim of maximizing the performance of the data analysis model (the quality of the data training fitting while minimizing the number of features used.We propose an optimization approach for the feature selection problem that considers a "chaotic" version of the antlion optimizer method, a nature-inspired algorithm that mimics the hunting mechanism of antlions in nature. The balance between exploration of the search space and exploitation of the best solutions is a challenge in multi-objective optimization. The exploration/exploitation rate is controlled by the parameter I that limits the random walk range of the ants/prey. This variable is increased iteratively in a quasi-linear manner to decrease the exploration rate as the optimization progresses. The quasi-linear decrease in the variable I may lead to immature convergence in some cases and trapping in local minima in other cases. The chaotic system proposed here attempts to improve the tradeoff between exploration and exploitation. The methodology is evaluated using different chaotic maps on a number of feature selection datasets. To ensure generality, we used ten biological datasets, but we also used other types of data from various sources. The results are compared with the particle swarm optimizer and with genetic algorithm variants for feature selection using a set of quality metrics.

  2. Hierarchical feature selection for erythema severity estimation

    Science.gov (United States)

    Wang, Li; Shi, Chenbo; Shu, Chang

    2014-10-01

    At present PASI system of scoring is used for evaluating erythema severity, which can help doctors to diagnose psoriasis [1-3]. The system relies on the subjective judge of doctors, where the accuracy and stability cannot be guaranteed [4]. This paper proposes a stable and precise algorithm for erythema severity estimation. Our contributions are twofold. On one hand, in order to extract the multi-scale redness of erythema, we design the hierarchical feature. Different from traditional methods, we not only utilize the color statistical features, but also divide the detect window into small window and extract hierarchical features. Further, a feature re-ranking step is introduced, which can guarantee that extracted features are irrelevant to each other. On the other hand, an adaptive boosting classifier is applied for further feature selection. During the step of training, the classifier will seek out the most valuable feature for evaluating erythema severity, due to its strong learning ability. Experimental results demonstrate the high precision and robustness of our algorithm. The accuracy is 80.1% on the dataset which comprise 116 patients' images with various kinds of erythema. Now our system has been applied for erythema medical efficacy evaluation in Union Hosp, China.

  3. Adversarial Feature Selection Against Evasion Attacks.

    Science.gov (United States)

    Zhang, Fei; Chan, Patrick P K; Biggio, Battista; Yeung, Daniel S; Roli, Fabio

    2016-03-01

    Pattern recognition and machine learning techniques have been increasingly adopted in adversarial settings such as spam, intrusion, and malware detection, although their security against well-crafted attacks that aim to evade detection by manipulating data at test time has not yet been thoroughly assessed. While previous work has been mainly focused on devising adversary-aware classification algorithms to counter evasion attempts, only few authors have considered the impact of using reduced feature sets on classifier security against the same attacks. An interesting, preliminary result is that classifier security to evasion may be even worsened by the application of feature selection. In this paper, we provide a more detailed investigation of this aspect, shedding some light on the security properties of feature selection against evasion attacks. Inspired by previous work on adversary-aware classifiers, we propose a novel adversary-aware feature selection model that can improve classifier security against evasion attacks, by incorporating specific assumptions on the adversary's data manipulation strategy. We focus on an efficient, wrapper-based implementation of our approach, and experimentally validate its soundness on different application examples, including spam and malware detection.

  4. Oscillating feature subset search algorithm for text categorization

    Czech Academy of Sciences Publication Activity Database

    Novovičová, Jana; Somol, Petr; Pudil, Pavel

    2006-01-01

    Roč. 44, č. 4225 (2006), s. 578-587 ISSN 0302-9743 R&D Projects: GA AV ČR IAA2075302; GA MŠk 2C06019 EU Projects: European Commission(XE) 507752 - MUSCLE Institutional research plan: CEZ:AV0Z10750506 Keywords : text classification * feature selection * oscillating search algorithm * Bhattacharyya distance Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.402, year: 2005

  5. Feature selection for splice site prediction: A new method using EDA-based feature ranking

    Directory of Open Access Journals (Sweden)

    Rouzé Pierre

    2004-05-01

    Full Text Available Abstract Background The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data. Results In this paper we present a novel method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms, a more general framework of genetic algorithms. From the estimated distribution of the algorithm, a feature ranking is derived. Afterwards this ranking is used to iteratively discard features. We apply this technique to the problem of splice site prediction, and show how it can be used to gain insight into the underlying biological process of splicing. Conclusion We show that this technique proves to be more robust than the traditional use of estimation of distribution algorithms for feature selection: instead of returning a single best subset of features (as they normally do this method provides a dynamical view of the feature selection process, like the traditional sequential wrapper methods. However, the method is faster than the traditional techniques, and scales better to datasets described by a large number of features.

  6. FEATURE SELECTION METHODS BASED ON MUTUAL INFORMATION FOR CLASSIFYING HETEROGENEOUS FEATURES

    Directory of Open Access Journals (Sweden)

    Ratri Enggar Pawening

    2016-06-01

    Full Text Available Datasets with heterogeneous features can affect feature selection results that are not appropriate because it is difficult to evaluate heterogeneous features concurrently. Feature transformation (FT is another way to handle heterogeneous features subset selection. The results of transformation from non-numerical into numerical features may produce redundancy to the original numerical features. In this paper, we propose a method to select feature subset based on mutual information (MI for classifying heterogeneous features. We use unsupervised feature transformation (UFT methods and joint mutual information maximation (JMIM methods. UFT methods is used to transform non-numerical features into numerical features. JMIM methods is used to select feature subset with a consideration of the class label. The transformed and the original features are combined entirely, then determine features subset by using JMIM methods, and classify them using support vector machine (SVM algorithm. The classification accuracy are measured for any number of selected feature subset and compared between UFT-JMIM methods and Dummy-JMIM methods. The average classification accuracy for all experiments in this study that can be achieved by UFT-JMIM methods is about 84.47% and Dummy-JMIM methods is about 84.24%. This result shows that UFT-JMIM methods can minimize information loss between transformed and original features, and select feature subset to avoid redundant and irrelevant features.

  7. Discriminative semi-supervised feature selection via manifold regularization.

    Science.gov (United States)

    Xu, Zenglin; King, Irwin; Lyu, Michael Rung-Tsong; Jin, Rong

    2010-07-01

    Feature selection has attracted a huge amount of interest in both research and application communities of data mining. We consider the problem of semi-supervised feature selection, where we are given a small amount of labeled examples and a large amount of unlabeled examples. Since a small number of labeled samples are usually insufficient for identifying the relevant features, the critical problem arising from semi-supervised feature selection is how to take advantage of the information underneath the unlabeled data. To address this problem, we propose a novel discriminative semi-supervised feature selection method based on the idea of manifold regularization. The proposed approach selects features through maximizing the classification margin between different classes and simultaneously exploiting the geometry of the probability distribution that generates both labeled and unlabeled data. In comparison with previous semi-supervised feature selection algorithms, our proposed semi-supervised feature selection method is an embedded feature selection method and is able to find more discriminative features. We formulate the proposed feature selection method into a convex-concave optimization problem, where the saddle point corresponds to the optimal solution. To find the optimal solution, the level method, a fairly recent optimization method, is employed. We also present a theoretic proof of the convergence rate for the application of the level method to our problem. Empirical evaluation on several benchmark data sets demonstrates the effectiveness of the proposed semi-supervised feature selection method.

  8. A Hybrid Feature Selection Approach for Arabic Documents Classification

    NARCIS (Netherlands)

    Habib, Mena Badieh; Sarhan, Ahmed A. E.; Salem, Abdel-Badeeh M.; Fayed, Zaki T.; Gharib, Tarek F.

    Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. Text categorization algorithms usually represent documents as bags of words and consequently have to deal with huge number of features. Feature selection tries to

  9. Portfolio selection using genetic algorithms | Yahaya | International ...

    African Journals Online (AJOL)

    In this paper, one of the nature-inspired evolutionary algorithms – a Genetic Algorithms (GA) was used in solving the portfolio selection problem (PSP). Based on a real dataset from a popular stock market, the performance of the algorithm in relation to those obtained from one of the popular quadratic programming (QP) ...

  10. Feature selection and nearest centroid classification for protein mass spectrometry

    Directory of Open Access Journals (Sweden)

    Levner Ilya

    2005-03-01

    Full Text Available Abstract Background The use of mass spectrometry as a proteomics tool is poised to revolutionize early disease diagnosis and biomarker identification. Unfortunately, before standard supervised classification algorithms can be employed, the "curse of dimensionality" needs to be solved. Due to the sheer amount of information contained within the mass spectra, most standard machine learning techniques cannot be directly applied. Instead, feature selection techniques are used to first reduce the dimensionality of the input space and thus enable the subsequent use of classification algorithms. This paper examines feature selection techniques for proteomic mass spectrometry. Results This study examines the performance of the nearest centroid classifier coupled with the following feature selection algorithms. Student-t test, Kolmogorov-Smirnov test, and the P-test are univariate statistics used for filter-based feature ranking. From the wrapper approaches we tested sequential forward selection and a modified version of sequential backward selection. Embedded approaches included shrunken nearest centroid and a novel version of boosting based feature selection we developed. In addition, we tested several dimensionality reduction approaches, namely principal component analysis and principal component analysis coupled with linear discriminant analysis. To fairly assess each algorithm, evaluation was done using stratified cross validation with an internal leave-one-out cross-validation loop for automated feature selection. Comprehensive experiments, conducted on five popular cancer data sets, revealed that the less advocated sequential forward selection and boosted feature selection algorithms produce the most consistent results across all data sets. In contrast, the state-of-the-art performance reported on isolated data sets for several of the studied algorithms, does not hold across all data sets. Conclusion This study tested a number of popular feature

  11. Particle swarm optimization and genetic algorithm as feature selection techniques for the QSAR modeling of imidazo[1,5-a]pyrido[3,2-e]pyrazines, inhibitors of phosphodiesterase 10A.

    Science.gov (United States)

    Goodarzi, Mohammad; Saeys, Wouter; Deeb, Omar; Pieters, Sigrid; Vander Heyden, Yvan

    2013-12-01

    Quantitative structure-activity relationship (QSAR) modeling was performed for imidazo[1,5-a]pyrido[3,2-e]pyrazines, which constitute a class of phosphodiesterase 10A inhibitors. Particle swarm optimization (PSO) and genetic algorithm (GA) were used as feature selection techniques to find the most reliable molecular descriptors from a large pool. Modeling of the relationship between the selected descriptors and the pIC50 activity data was achieved by linear [multiple linear regression (MLR)] and non-linear [locally weighted regression (LWR) based on both Euclidean (E) and Mahalanobis (M) distances] methods. In addition, a stepwise MLR model was built using only a limited number of quantum chemical descriptors, selected because of their correlation with the pIC50 . The model was not found interesting. It was concluded that the LWR model, based on the Euclidean distance, applied on the descriptors selected by PSO has the best prediction ability. However, some other models behaved similarly. The root-mean-squared errors of prediction (RMSEP) for the test sets obtained by PSO/MLR, GA/MLR, PSO/LWRE, PSO/LWRM, GA/LWRE, and GA/LWRM models were 0.333, 0.394, 0.313, 0.333, 0.421, and 0.424, respectively. The PSO-selected descriptors resulted in the best prediction models, both linear and non-linear. © 2013 John Wiley & Sons A/S.

  12. Template Generation and Selection Algorithms

    NARCIS (Netherlands)

    Guo, Y.; Smit, Gerardus Johannes Maria; Broersma, Haitze J.; Heysters, P.M.; Badaway, W.; Ismail, Y.

    The availability of high-level design entry tooling is crucial for the viability of any reconfigurable SoC architecture. This paper presents a template generation method to extract functional equivalent structures, i.e. templates, from a control data flow graph. By inspecting the graph the algorithm

  13. Efficient Multi-Label Feature Selection Using Entropy-Based Label Selection

    Directory of Open Access Journals (Sweden)

    Jaesung Lee

    2016-11-01

    Full Text Available Multi-label feature selection is designed to select a subset of features according to their importance to multiple labels. This task can be achieved by ranking the dependencies of features and selecting the features with the highest rankings. In a multi-label feature selection problem, the algorithm may be faced with a dataset containing a large number of labels. Because the computational cost of multi-label feature selection increases according to the number of labels, the algorithm may suffer from a degradation in performance when processing very large datasets. In this study, we propose an efficient multi-label feature selection method based on an information-theoretic label selection strategy. By identifying a subset of labels that significantly influence the importance of features, the proposed method efficiently outputs a feature subset. Experimental results demonstrate that the proposed method can identify a feature subset much faster than conventional multi-label feature selection methods for large multi-label datasets.

  14. Input significance analysis: feature selection through synaptic ...

    African Journals Online (AJOL)

    Connection Weights (CW) and Garson's Algorithm (GA) and the classifier selected ... from the UCI Machine Learning Repository and executed in an online ... connectionist systems; evolving fuzzy neural network; connection weights; Garson's

  15. Feature selection for portfolio optimization

    DEFF Research Database (Denmark)

    Bjerring, Thomas Trier; Ross, Omri; Weissensteiner, Alex

    2016-01-01

    Most portfolio selection rules based on the sample mean and covariance matrix perform poorly out-of-sample. Moreover, there is a growing body of evidence that such optimization rules are not able to beat simple rules of thumb, such as 1/N. Parameter uncertainty has been identified as one major....... While most of the diversification benefits are preserved, the parameter estimation problem is alleviated. We conduct out-of-sample back-tests to show that in most cases different well-established portfolio selection rules applied on the reduced asset universe are able to improve alpha relative...

  16. Simultaneous Channel and Feature Selection of Fused EEG Features Based on Sparse Group Lasso

    Directory of Open Access Journals (Sweden)

    Jin-Jia Wang

    2015-01-01

    Full Text Available Feature extraction and classification of EEG signals are core parts of brain computer interfaces (BCIs. Due to the high dimension of the EEG feature vector, an effective feature selection algorithm has become an integral part of research studies. In this paper, we present a new method based on a wrapped Sparse Group Lasso for channel and feature selection of fused EEG signals. The high-dimensional fused features are firstly obtained, which include the power spectrum, time-domain statistics, AR model, and the wavelet coefficient features extracted from the preprocessed EEG signals. The wrapped channel and feature selection method is then applied, which uses the logistical regression model with Sparse Group Lasso penalized function. The model is fitted on the training data, and parameter estimation is obtained by modified blockwise coordinate descent and coordinate gradient descent method. The best parameters and feature subset are selected by using a 10-fold cross-validation. Finally, the test data is classified using the trained model. Compared with existing channel and feature selection methods, results show that the proposed method is more suitable, more stable, and faster for high-dimensional feature fusion. It can simultaneously achieve channel and feature selection with a lower error rate. The test accuracy on the data used from international BCI Competition IV reached 84.72%.

  17. Joint Feature Selection and Classification for Multilabel Learning.

    Science.gov (United States)

    Huang, Jun; Li, Guorong; Huang, Qingming; Wu, Xindong

    2018-03-01

    Multilabel learning deals with examples having multiple class labels simultaneously. It has been applied to a variety of applications, such as text categorization and image annotation. A large number of algorithms have been proposed for multilabel learning, most of which concentrate on multilabel classification problems and only a few of them are feature selection algorithms. Current multilabel classification models are mainly built on a single data representation composed of all the features which are shared by all the class labels. Since each class label might be decided by some specific features of its own, and the problems of classification and feature selection are often addressed independently, in this paper, we propose a novel method which can perform joint feature selection and classification for multilabel learning, named JFSC. Different from many existing methods, JFSC learns both shared features and label-specific features by considering pairwise label correlations, and builds the multilabel classifier on the learned low-dimensional data representations simultaneously. A comparative study with state-of-the-art approaches manifests a competitive performance of our proposed method both in classification and feature selection for multilabel learning.

  18. [Electroencephalogram Feature Selection Based on Correlation Coefficient Analysis].

    Science.gov (United States)

    Zhou, Jinzhi; Tang, Xiaofang

    2015-08-01

    In order to improve the accuracy of classification with small amount of motor imagery training data on the development of brain-computer interface (BCD systems, we proposed an analyzing method to automatically select the characteristic parameters based on correlation coefficient analysis. Throughout the five sample data of dataset IV a from 2005 BCI Competition, we utilized short-time Fourier transform (STFT) and correlation coefficient calculation to reduce the number of primitive electroencephalogram dimension, then introduced feature extraction based on common spatial pattern (CSP) and classified by linear discriminant analysis (LDA). Simulation results showed that the average rate of classification accuracy could be improved by using correlation coefficient feature selection method than those without using this algorithm. Comparing with support vector machine (SVM) optimization features algorithm, the correlation coefficient analysis can lead better selection parameters to improve the accuracy of classification.

  19. Features Selection for Skin Micro-Image Symptomatic Recognition

    Institute of Scientific and Technical Information of China (English)

    HUYue-li; CAOJia-lin; ZHAOQian; FENGXu

    2004-01-01

    Automatic recognition of skin micro-image symptom is important in skin diagnosis and treatment. Feature selection is to improve the classification performance of skin micro-image symptom.This paper proposes a hybrid approach based on the support vector machine (SVM) technique and genetic algorithm (GA) to select an optimum feature subset from the feature group extracted from the skin micro-images. An adaptive GA is introduced for maintaining the convergence rate. With the proposed method, the average cross validation accuracy is increased from 88.25% using all features to 96.92% using only selected features provided by a classifier for classification of 5 classes of skin symptoms. The experimental results are satisfactory.

  20. Features Selection for Skin Micro-Image Symptomatic Recognition

    Institute of Scientific and Technical Information of China (English)

    HU Yue-li; CAO Jia-lin; ZHAO Qian; FENG Xu

    2004-01-01

    Automatic recognition of skin micro-image symptom is important in skin diagnosis and treatment. Feature selection is to improve the classification performance of skin micro-image symptom.This paper proposes a hybrid approach based on the support vector machine (SVM) technique and genetic algorithm (GA) to select an optimum feature subset from the feature group extracted from the skin micro-images. An adaptive GA is introduced for maintaining the convergence rate. With the proposed method, the average cross validation accuracy is increased from 88.25% using all features to 96.92 % using only selected features provided by a classifier for classification of 5 classes of skin symptoms. The experimental results are satisfactory.

  1. Face recognition algorithm using extended vector quantization histogram features.

    Science.gov (United States)

    Yan, Yan; Lee, Feifei; Wu, Xueqian; Chen, Qiu

    2018-01-01

    In this paper, we propose a face recognition algorithm based on a combination of vector quantization (VQ) and Markov stationary features (MSF). The VQ algorithm has been shown to be an effective method for generating features; it extracts a codevector histogram as a facial feature representation for face recognition. Still, the VQ histogram features are unable to convey spatial structural information, which to some extent limits their usefulness in discrimination. To alleviate this limitation of VQ histograms, we utilize Markov stationary features (MSF) to extend the VQ histogram-based features so as to add spatial structural information. We demonstrate the effectiveness of our proposed algorithm by achieving recognition results superior to those of several state-of-the-art methods on publicly available face databases.

  2. Feature Selection for Audio Surveillance in Urban Environment

    Directory of Open Access Journals (Sweden)

    KIKTOVA Eva

    2014-05-01

    Full Text Available This paper presents the work leading to the acoustic event detection system, which is designed to recognize two types of acoustic events (shot and breaking glass in urban environment. For this purpose, a huge front-end processing was performed for the effective parametric representation of an input sound. MFCC features and features computed during their extraction (MELSPEC and FBANK, then MPEG-7 audio descriptors and other temporal and spectral characteristics were extracted. High dimensional feature sets were created and in the next phase reduced by the mutual information based selection algorithms. Hidden Markov Model based classifier was applied and evaluated by the Viterbi decoding algorithm. Thus very effective feature sets were identified and also the less important features were found.

  3. Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention.

    Science.gov (United States)

    Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Li, Peijun; Fang, Fang; Sun, Pei

    2016-01-13

    An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how these two brain functions correlate with each other remains to be elucidated. In this functional magnetic resonance imaging (fMRI) study, we explored the neural mechanism by which feature-selective attention modulates audiovisual semantic integration. During the fMRI experiment, the subjects were presented with visual-only, auditory-only, or audiovisual dynamical facial stimuli and performed several feature-selective attention tasks. Our results revealed that a distribution of areas, including heteromodal areas and brain areas encoding attended features, may be involved in audiovisual semantic integration. Through feature-selective attention, the human brain may selectively integrate audiovisual semantic information from attended features by enhancing functional connectivity and thus regulating information flows from heteromodal areas to brain areas encoding the attended features.

  4. Discrete Biogeography Based Optimization for Feature Selection in Molecular Signatures.

    Science.gov (United States)

    Liu, Bo; Tian, Meihong; Zhang, Chunhua; Li, Xiangtao

    2015-04-01

    Biomarker discovery from high-dimensional data is a complex task in the development of efficient cancer diagnoses and classification. However, these data are usually redundant and noisy, and only a subset of them present distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in the field of bioinformatics. In this paper, a discrete biogeography based optimization is proposed to select the good subset of informative gene relevant to the classification. In the proposed algorithm, firstly, the fisher-markov selector is used to choose fixed number of gene data. Secondly, to make biogeography based optimization suitable for the feature selection problem; discrete migration model and discrete mutation model are proposed to balance the exploration and exploitation ability. Then, discrete biogeography based optimization, as we called DBBO, is proposed by integrating discrete migration model and discrete mutation model. Finally, the DBBO method is used for feature selection, and three classifiers are used as the classifier with the 10 fold cross-validation method. In order to show the effective and efficiency of the algorithm, the proposed algorithm is tested on four breast cancer dataset benchmarks. Comparison with genetic algorithm, particle swarm optimization, differential evolution algorithm and hybrid biogeography based optimization, experimental results demonstrate that the proposed method is better or at least comparable with previous method from literature when considering the quality of the solutions obtained. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Tag SNP selection via a genetic algorithm.

    Science.gov (United States)

    Mahdevar, Ghasem; Zahiri, Javad; Sadeghi, Mehdi; Nowzari-Dalini, Abbas; Ahrabian, Hayedeh

    2010-10-01

    Single Nucleotide Polymorphisms (SNPs) provide valuable information on human evolutionary history and may lead us to identify genetic variants responsible for human complex diseases. Unfortunately, molecular haplotyping methods are costly, laborious, and time consuming; therefore, algorithms for constructing full haplotype patterns from small available data through computational methods, Tag SNP selection problem, are convenient and attractive. This problem is proved to be an NP-hard problem, so heuristic methods may be useful. In this paper we present a heuristic method based on genetic algorithm to find reasonable solution within acceptable time. The algorithm was tested on a variety of simulated and experimental data. In comparison with the exact algorithm, based on brute force approach, results show that our method can obtain optimal solutions in almost all cases and runs much faster than exact algorithm when the number of SNP sites is large. Our software is available upon request to the corresponding author.

  6. A Variance Minimization Criterion to Feature Selection Using Laplacian Regularization.

    Science.gov (United States)

    He, Xiaofei; Ji, Ming; Zhang, Chiyuan; Bao, Hujun

    2011-10-01

    In many information processing tasks, one is often confronted with very high-dimensional data. Feature selection techniques are designed to find the meaningful feature subset of the original features which can facilitate clustering, classification, and retrieval. In this paper, we consider the feature selection problem in unsupervised learning scenarios, which is particularly difficult due to the absence of class labels that would guide the search for relevant information. Based on Laplacian regularized least squares, which finds a smooth function on the data manifold and minimizes the empirical loss, we propose two novel feature selection algorithms which aim to minimize the expected prediction error of the regularized regression model. Specifically, we select those features such that the size of the parameter covariance matrix of the regularized regression model is minimized. Motivated from experimental design, we use trace and determinant operators to measure the size of the covariance matrix. Efficient computational schemes are also introduced to solve the corresponding optimization problems. Extensive experimental results over various real-life data sets have demonstrated the superiority of the proposed algorithms.

  7. Penalized feature selection and classification in bioinformatics

    OpenAIRE

    Ma, Shuangge; Huang, Jian

    2008-01-01

    In bioinformatics studies, supervised classification with high-dimensional input variables is frequently encountered. Examples routinely arise in genomic, epigenetic and proteomic studies. Feature selection can be employed along with classifier construction to avoid over-fitting, to generate more reliable classifier and to provide more insights into the underlying causal relationships. In this article, we provide a review of several recently developed penalized feature selection and classific...

  8. Embedded Incremental Feature Selection for Reinforcement Learning

    Science.gov (United States)

    2012-05-01

    Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature

  9. Selection method of terrain matching area for TERCOM algorithm

    Science.gov (United States)

    Zhang, Qieqie; Zhao, Long

    2017-10-01

    The performance of terrain aided navigation is closely related to the selection of terrain matching area. The different matching algorithms have different adaptability to terrain. This paper mainly studies the adaptability to terrain of TERCOM algorithm, analyze the relation between terrain feature and terrain characteristic parameters by qualitative and quantitative methods, and then research the relation between matching probability and terrain characteristic parameters by the Monte Carlo method. After that, we propose a selection method of terrain matching area for TERCOM algorithm, and verify the method correctness with real terrain data by simulation experiment. Experimental results show that the matching area obtained by the method in this paper has the good navigation performance and the matching probability of TERCOM algorithm is great than 90%

  10. A comparative study of image low level feature extraction algorithms

    Directory of Open Access Journals (Sweden)

    M.M. El-gayar

    2013-07-01

    Full Text Available Feature extraction and matching is at the base of many computer vision problems, such as object recognition or structure from motion. Current methods for assessing the performance of popular image matching algorithms are presented and rely on costly descriptors for detection and matching. Specifically, the method assesses the type of images under which each of the algorithms reviewed herein perform to its maximum or highest efficiency. The efficiency is measured in terms of the number of matches founds by the algorithm and the number of type I and type II errors encountered when the algorithm is tested against a specific pair of images. Current comparative studies asses the performance of the algorithms based on the results obtained in different criteria such as speed, sensitivity, occlusion, and others. This study addresses the limitations of the existing comparative tools and delivers a generalized criterion to determine beforehand the level of efficiency expected from a matching algorithm given the type of images evaluated. The algorithms and the respective images used within this work are divided into two groups: feature-based and texture-based. And from this broad classification only three of the most widely used algorithms are assessed: color histogram, FAST (Features from Accelerated Segment Test, SIFT (Scale Invariant Feature Transform, PCA-SIFT (Principal Component Analysis-SIFT, F-SIFT (fast-SIFT and SURF (speeded up robust features. The performance of the Fast-SIFT (F-SIFT feature detection methods are compared for scale changes, rotation, blur, illumination changes and affine transformations. All the experiments use repeatability measurement and the number of correct matches for the evaluation measurements. SIFT presents its stability in most situations although its slow. F-SIFT is the fastest one with good performance as the same as SURF, SIFT, PCA-SIFT show its advantages in rotation and illumination changes.

  11. Simultaneous feature selection and classification via Minimax Probability Machine

    Directory of Open Access Journals (Sweden)

    Liming Yang

    2010-12-01

    Full Text Available This paper presents a novel method for simultaneous feature selection and classification by incorporating a robust L1-norm into the objective function of Minimax Probability Machine (MPM. A fractional programming framework is derived by using a bound on the misclassification error involving the mean and covariance of the data. Furthermore, the problems are solved by the Quadratic Interpolation method. Experiments show that our methods can select fewer features to improve the generalization compared to MPM, which illustrates the effectiveness of the proposed algorithms.

  12. Novel Automatic Filter-Class Feature Selection for Machine Learning Regression

    DEFF Research Database (Denmark)

    Wollsen, Morten Gill; Hallam, John; Jørgensen, Bo Nørregaard

    2017-01-01

    With the increased focus on application of Big Data in all sectors of society, the performance of machine learning becomes essential. Efficient machine learning depends on efficient feature selection algorithms. Filter feature selection algorithms are model-free and therefore very fast, but require...... model in the feature selection process. PCA is often used in machine learning litterature and can be considered the default feature selection method. RDESF outperformed PCA in both experiments in both prediction error and computational speed. RDESF is a new step into filter-based automatic feature...

  13. DYNAMIC FEATURE SELECTION FOR WEB USER IDENTIFICATION ON LINGUISTIC AND STYLISTIC FEATURES OF ONLINE TEXTS

    Directory of Open Access Journals (Sweden)

    A. A. Vorobeva

    2017-01-01

    Full Text Available The paper deals with identification and authentication of web users participating in the Internet information processes (based on features of online texts.In digital forensics web user identification based on various linguistic features can be used to discover identity of individuals, criminals or terrorists using the Internet to commit cybercrimes. Internet could be used as a tool in different types of cybercrimes (fraud and identity theft, harassment and anonymous threats, terrorist or extremist statements, distribution of illegal content and information warfare. Linguistic identification of web users is a kind of biometric identification, it can be used to narrow down the suspects, identify a criminal and prosecute him. Feature set includes various linguistic and stylistic features extracted from online texts. We propose dynamic feature selection for each web user identification task. Selection is based on calculating Manhattan distance to k-nearest neighbors (Relief-f algorithm. This approach improves the identification accuracy and minimizes the number of features. Experiments were carried out on several datasets with different level of class imbalance. Experiment results showed that features relevance varies in different set of web users (probable authors of some text; features selection for each set of web users improves identification accuracy by 4% at the average that is approximately 1% higher than with the use of static set of features. The proposed approach is most effective for a small number of training samples (messages per user.

  14. Historical Feature Pattern Extraction Based Network Attack Situation Sensing Algorithm

    OpenAIRE

    Zeng, Yong; Liu, Dacheng; Lei, Zhou

    2014-01-01

    The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE). First, HFPE algorithm seeks similar indications from the history si...

  15. Aging, selective attention, and feature integration.

    Science.gov (United States)

    Plude, D J; Doussard-Roosevelt, J A

    1989-03-01

    This study used feature-integration theory as a means of determining the point in processing at which selective attention deficits originate. The theory posits an initial stage of processing in which features are registered in parallel and then a serial process in which features are conjoined to form complex stimuli. Performance of young and older adults on feature versus conjunction search is compared. Analyses of reaction times and error rates suggest that elderly adults in addition to young adults, can capitalize on the early parallel processing stage of visual information processing, and that age decrements in visual search arise as a result of the later, serial stage of processing. Analyses of a third, unconfounded, conjunction search condition reveal qualitatively similar modes of conjunction search in young and older adults. The contribution of age-related data limitations is found to be secondary to the contribution of age decrements in selective attention.

  16. Predictive Feature Selection for Genetic Policy Search

    Science.gov (United States)

    2014-05-22

    limited manual intervention are becoming increasingly desirable as more complex tasks in dynamic and high- tempo environments are explored. Reinforcement...states in many domains causes features relevant to the reward variations to be overlooked, which hinders the policy search. 3.4 Parameter Selection PFS...the current feature subset. This local minimum may be “deceptive,” meaning that it does not clearly lead to the global optimal policy ( Goldberg and

  17. A novel automated spike sorting algorithm with adaptable feature extraction.

    Science.gov (United States)

    Bestel, Robert; Daus, Andreas W; Thielemann, Christiane

    2012-10-15

    To study the electrophysiological properties of neuronal networks, in vitro studies based on microelectrode arrays have become a viable tool for analysis. Although in constant progress, a challenging task still remains in this area: the development of an efficient spike sorting algorithm that allows an accurate signal analysis at the single-cell level. Most sorting algorithms currently available only extract a specific feature type, such as the principal components or Wavelet coefficients of the measured spike signals in order to separate different spike shapes generated by different neurons. However, due to the great variety in the obtained spike shapes, the derivation of an optimal feature set is still a very complex issue that current algorithms struggle with. To address this problem, we propose a novel algorithm that (i) extracts a variety of geometric, Wavelet and principal component-based features and (ii) automatically derives a feature subset, most suitable for sorting an individual set of spike signals. Thus, there is a new approach that evaluates the probability distribution of the obtained spike features and consequently determines the candidates most suitable for the actual spike sorting. These candidates can be formed into an individually adjusted set of spike features, allowing a separation of the various shapes present in the obtained neuronal signal by a subsequent expectation maximisation clustering algorithm. Test results with simulated data files and data obtained from chick embryonic neurons cultured on microelectrode arrays showed an excellent classification result, indicating the superior performance of the described algorithm approach. Copyright © 2012 Elsevier B.V. All rights reserved.

  18. Feature Selection Based on Mutual Correlation

    Czech Academy of Sciences Publication Activity Database

    Haindl, Michal; Somol, Petr; Ververidis, D.; Kotropoulos, C.

    2006-01-01

    Roč. 19, č. 4225 (2006), s. 569-577 ISSN 0302-9743. [Iberoamerican Congress on Pattern Recognition. CIARP 2006 /11./. Cancun, 14.11.2006-17.11.2006] R&D Projects: GA AV ČR 1ET400750407; GA MŠk 1M0572; GA AV ČR IAA2075302 EU Projects: European Commission(XE) 507752 - MUSCLE Institutional research plan: CEZ:AV0Z10750506 Keywords : feature selection Subject RIV: BD - Theory of Information Impact factor: 0.402, year: 2005 http://library.utia.cas.cz/separaty/historie/haindl-feature selection based on mutual correlation.pdf

  19. Feature Import Vector Machine: A General Classifier with Flexible Feature Selection.

    Science.gov (United States)

    Ghosh, Samiran; Wang, Yazhen

    2015-02-01

    The support vector machine (SVM) and other reproducing kernel Hilbert space (RKHS) based classifier systems are drawing much attention recently due to its robustness and generalization capability. General theme here is to construct classifiers based on the training data in a high dimensional space by using all available dimensions. The SVM achieves huge data compression by selecting only few observations which lie close to the boundary of the classifier function. However when the number of observations are not very large (small n ) but the number of dimensions/features are large (large p ), then it is not necessary that all available features are of equal importance in the classification context. Possible selection of an useful fraction of the available features may result in huge data compression. In this paper we propose an algorithmic approach by means of which such an optimal set of features could be selected. In short, we reverse the traditional sequential observation selection strategy of SVM to that of sequential feature selection. To achieve this we have modified the solution proposed by Zhu and Hastie (2005) in the context of import vector machine (IVM), to select an optimal sub-dimensional model to build the final classifier with sufficient accuracy.

  20. Feature and Region Selection for Visual Learning.

    Science.gov (United States)

    Zhao, Ji; Wang, Liantao; Cabral, Ricardo; De la Torre, Fernando

    2016-03-01

    Visual learning problems, such as object classification and action recognition, are typically approached using extensions of the popular bag-of-words (BoWs) model. Despite its great success, it is unclear what visual features the BoW model is learning. Which regions in the image or video are used to discriminate among classes? Which are the most discriminative visual words? Answering these questions is fundamental for understanding existing BoW models and inspiring better models for visual recognition. To answer these questions, this paper presents a method for feature selection and region selection in the visual BoW model. This allows for an intermediate visualization of the features and regions that are important for visual learning. The main idea is to assign latent weights to the features or regions, and jointly optimize these latent variables with the parameters of a classifier (e.g., support vector machine). There are four main benefits of our approach: 1) our approach accommodates non-linear additive kernels, such as the popular χ(2) and intersection kernel; 2) our approach is able to handle both regions in images and spatio-temporal regions in videos in a unified way; 3) the feature selection problem is convex, and both problems can be solved using a scalable reduced gradient method; and 4) we point out strong connections with multiple kernel learning and multiple instance learning approaches. Experimental results in the PASCAL VOC 2007, MSR Action Dataset II and YouTube illustrate the benefits of our approach.

  1. Feature-extraction algorithms for the PANDA electromagnetic calorimeter

    NARCIS (Netherlands)

    Kavatsyuk, M.; Guliyev, E.; Lemmens, P. J. J.; Loehner, H.; Poelman, T. P.; Tambave, G.; Yu, B

    2009-01-01

    The feature-extraction algorithms are discussed which have been developed for the digital front-end electronics of the electromagnetic calorimeter of the PANDA detector at the future FAIR facility. Performance parameters have been derived in test measurements with cosmic rays, particle and photon

  2. A linear-time algorithm for Euclidean feature transform sets

    NARCIS (Netherlands)

    Hesselink, Wim H.

    2007-01-01

    The Euclidean distance transform of a binary image is the function that assigns to every pixel the Euclidean distance to the background. The Euclidean feature transform is the function that assigns to every pixel the set of background pixels with this distance. We present an algorithm to compute the

  3. Ad Hoc Access Gateway Selection Algorithm

    Science.gov (United States)

    Jie, Liu

    With the continuous development of mobile communication technology, Ad Hoc access network has become a hot research, Ad Hoc access network nodes can be used to expand capacity of multi-hop communication range of mobile communication system, even business adjacent to the community, improve edge data rates. For mobile nodes in Ad Hoc network to internet, internet communications in the peer nodes must be achieved through the gateway. Therefore, the key Ad Hoc Access Networks will focus on the discovery gateway, as well as gateway selection in the case of multi-gateway and handover problems between different gateways. This paper considers the mobile node and the gateway, based on the average number of hops from an average access time and the stability of routes, improved gateway selection algorithm were proposed. An improved gateway selection algorithm, which mainly considers the algorithm can improve the access time of Ad Hoc nodes and the continuity of communication between the gateways, were proposed. This can improve the quality of communication across the network.

  4. A review of channel selection algorithms for EEG signal processing

    Science.gov (United States)

    Alotaiby, Turky; El-Samie, Fathi E. Abd; Alshebeili, Saleh A.; Ahmad, Ishtiaq

    2015-12-01

    Digital processing of electroencephalography (EEG) signals has now been popularly used in a wide variety of applications such as seizure detection/prediction, motor imagery classification, mental task classification, emotion classification, sleep state classification, and drug effects diagnosis. With the large number of EEG channels acquired, it has become apparent that efficient channel selection algorithms are needed with varying importance from one application to another. The main purpose of the channel selection process is threefold: (i) to reduce the computational complexity of any processing task performed on EEG signals by selecting the relevant channels and hence extracting the features of major importance, (ii) to reduce the amount of overfitting that may arise due to the utilization of unnecessary channels, for the purpose of improving the performance, and (iii) to reduce the setup time in some applications. Signal processing tools such as time-domain analysis, power spectral estimation, and wavelet transform have been used for feature extraction and hence for channel selection in most of channel selection algorithms. In addition, different evaluation approaches such as filtering, wrapper, embedded, hybrid, and human-based techniques have been widely used for the evaluation of the selected subset of channels. In this paper, we survey the recent developments in the field of EEG channel selection methods along with their applications and classify these methods according to the evaluation approach.

  5. RESEARCH ON FOREST FLAME RECOGNITION ALGORITHM BASED ON IMAGE FEATURE

    Directory of Open Access Journals (Sweden)

    Z. Wang

    2017-09-01

    Full Text Available In recent years, fire recognition based on image features has become a hotspot in fire monitoring. However, due to the complexity of forest environment, the accuracy of forest fireworks recognition based on image features is low. Based on this, this paper proposes a feature extraction algorithm based on YCrCb color space and K-means clustering. Firstly, the paper prepares and analyzes the color characteristics of a large number of forest fire image samples. Using the K-means clustering algorithm, the forest flame model is obtained by comparing the two commonly used color spaces, and the suspected flame area is discriminated and extracted. The experimental results show that the extraction accuracy of flame area based on YCrCb color model is higher than that of HSI color model, which can be applied in different scene forest fire identification, and it is feasible in practice.

  6. Dentate Gyrus circuitry features improve performance of sparse approximation algorithms.

    Directory of Open Access Journals (Sweden)

    Panagiotis C Petrantonakis

    Full Text Available Memory-related activity in the Dentate Gyrus (DG is characterized by sparsity. Memory representations are seen as activated neuronal populations of granule cells, the main encoding cells in DG, which are estimated to engage 2-4% of the total population. This sparsity is assumed to enhance the ability of DG to perform pattern separation, one of the most valuable contributions of DG during memory formation. In this work, we investigate how features of the DG such as its excitatory and inhibitory connectivity diagram can be used to develop theoretical algorithms performing Sparse Approximation, a widely used strategy in the Signal Processing field. Sparse approximation stands for the algorithmic identification of few components from a dictionary that approximate a certain signal. The ability of DG to achieve pattern separation by sparsifing its representations is exploited here to improve the performance of the state of the art sparse approximation algorithm "Iterative Soft Thresholding" (IST by adding new algorithmic features inspired by the DG circuitry. Lateral inhibition of granule cells, either direct or indirect, via mossy cells, is shown to enhance the performance of the IST. Apart from revealing the potential of DG-inspired theoretical algorithms, this work presents new insights regarding the function of particular cell types in the pattern separation task of the DG.

  7. An evolutionary algorithm for model selection

    Energy Technology Data Exchange (ETDEWEB)

    Bicker, Karl [CERN, Geneva (Switzerland); Chung, Suh-Urk; Friedrich, Jan; Grube, Boris; Haas, Florian; Ketzer, Bernhard; Neubert, Sebastian; Paul, Stephan; Ryabchikov, Dimitry [Technische Univ. Muenchen (Germany)

    2013-07-01

    When performing partial-wave analyses of multi-body final states, the choice of the fit model, i.e. the set of waves to be used in the fit, can significantly alter the results of the partial wave fit. Traditionally, the models were chosen based on physical arguments and by observing the changes in log-likelihood of the fits. To reduce possible bias in the model selection process, an evolutionary algorithm was developed based on a Bayesian goodness-of-fit criterion which takes into account the model complexity. Starting from systematically constructed pools of waves which contain significantly more waves than the typical fit model, the algorithm yields a model with an optimal log-likelihood and with a number of partial waves which is appropriate for the number of events in the data. Partial waves with small contributions to the total intensity are penalized and likely to be dropped during the selection process, as are models were excessive correlations between single waves occur. Due to the automated nature of the model selection, a much larger part of the model space can be explored than would be possible in a manual selection. In addition the method allows to assess the dependence of the fit result on the fit model which is an important contribution to the systematic uncertainty.

  8. Multi-Objective Particle Swarm Optimization Approach for Cost-Based Feature Selection in Classification.

    Science.gov (United States)

    Zhang, Yong; Gong, Dun-Wei; Cheng, Jian

    2017-01-01

    Feature selection is an important data-preprocessing technique in classification problems such as bioinformatics and signal processing. Generally, there are some situations where a user is interested in not only maximizing the classification performance but also minimizing the cost that may be associated with features. This kind of problem is called cost-based feature selection. However, most existing feature selection approaches treat this task as a single-objective optimization problem. This paper presents the first study of multi-objective particle swarm optimization (PSO) for cost-based feature selection problems. The task of this paper is to generate a Pareto front of nondominated solutions, that is, feature subsets, to meet different requirements of decision-makers in real-world applications. In order to enhance the search capability of the proposed algorithm, a probability-based encoding technology and an effective hybrid operator, together with the ideas of the crowding distance, the external archive, and the Pareto domination relationship, are applied to PSO. The proposed PSO-based multi-objective feature selection algorithm is compared with several multi-objective feature selection algorithms on five benchmark datasets. Experimental results show that the proposed algorithm can automatically evolve a set of nondominated solutions, and it is a highly competitive feature selection method for solving cost-based feature selection problems.

  9. Gene selection heuristic algorithm for nutrigenomics studies.

    Science.gov (United States)

    Valour, D; Hue, I; Grimard, B; Valour, B

    2013-07-15

    Large datasets from -omics studies need to be deeply investigated. The aim of this paper is to provide a new method (LEM method) for the search of transcriptome and metabolome connections. The heuristic algorithm here described extends the classical canonical correlation analysis (CCA) to a high number of variables (without regularization) and combines well-conditioning and fast-computing in "R." Reduced CCA models are summarized in PageRank matrices, the product of which gives a stochastic matrix that resumes the self-avoiding walk covered by the algorithm. Then, a homogeneous Markov process applied to this stochastic matrix converges the probabilities of interconnection between genes, providing a selection of disjointed subsets of genes. This is an alternative to regularized generalized CCA for the determination of blocks within the structure matrix. Each gene subset is thus linked to the whole metabolic or clinical dataset that represents the biological phenotype of interest. Moreover, this selection process reaches the aim of biologists who often need small sets of genes for further validation or extended phenotyping. The algorithm is shown to work efficiently on three published datasets, resulting in meaningfully broadened gene networks.

  10. Finger vein recognition with personalized feature selection.

    Science.gov (United States)

    Xi, Xiaoming; Yang, Gongping; Yin, Yilong; Meng, Xianjing

    2013-08-22

    Finger veins are a promising biometric pattern for personalized identification in terms of their advantages over existing biometrics. Based on the spatial pyramid representation and the combination of more effective information such as gray, texture and shape, this paper proposes a simple but powerful feature, called Pyramid Histograms of Gray, Texture and Orientation Gradients (PHGTOG). For a finger vein image, PHGTOG can reflect the global spatial layout and local details of gray, texture and shape. To further improve the recognition performance and reduce the computational complexity, we select a personalized subset of features from PHGTOG for each subject by using the sparse weight vector, which is trained by using LASSO and called PFS-PHGTOG. We conduct extensive experiments to demonstrate the promise of the PHGTOG and PFS-PHGTOG, experimental results on our databases show that PHGTOG outperforms the other existing features. Moreover, PFS-PHGTOG can further boost the performance in comparison with PHGTOG.

  11. Finger Vein Recognition with Personalized Feature Selection

    Directory of Open Access Journals (Sweden)

    Xianjing Meng

    2013-08-01

    Full Text Available Finger veins are a promising biometric pattern for personalized identification in terms of their advantages over existing biometrics. Based on the spatial pyramid representation and the combination of more effective information such as gray, texture and shape, this paper proposes a simple but powerful feature, called Pyramid Histograms of Gray, Texture and Orientation Gradients (PHGTOG. For a finger vein image, PHGTOG can reflect the global spatial layout and local details of gray, texture and shape. To further improve the recognition performance and reduce the computational complexity, we select a personalized subset of features from PHGTOG for each subject by using the sparse weight vector, which is trained by using LASSO and called PFS-PHGTOG. We conduct extensive experiments to demonstrate the promise of the PHGTOG and PFS-PHGTOG, experimental results on our databases show that PHGTOG outperforms the other existing features. Moreover, PFS-PHGTOG can further boost the performance in comparison with PHGTOG.

  12. Attentional Selection of Feature Conjunctions Is Accomplished by Parallel and Independent Selection of Single Features.

    Science.gov (United States)

    Andersen, Søren K; Müller, Matthias M; Hillyard, Steven A

    2015-07-08

    Experiments that study feature-based attention have often examined situations in which selection is based on a single feature (e.g., the color red). However, in more complex situations relevant stimuli may not be set apart from other stimuli by a single defining property but by a specific combination of features. Here, we examined sustained attentional selection of stimuli defined by conjunctions of color and orientation. Human observers attended to one out of four concurrently presented superimposed fields of randomly moving horizontal or vertical bars of red or blue color to detect brief intervals of coherent motion. Selective stimulus processing in early visual cortex was assessed by recordings of steady-state visual evoked potentials (SSVEPs) elicited by each of the flickering fields of stimuli. We directly contrasted attentional selection of single features and feature conjunctions and found that SSVEP amplitudes on conditions in which selection was based on a single feature only (color or orientation) exactly predicted the magnitude of attentional enhancement of SSVEPs when attending to a conjunction of both features. Furthermore, enhanced SSVEP amplitudes elicited by attended stimuli were accompanied by equivalent reductions of SSVEP amplitudes elicited by unattended stimuli in all cases. We conclude that attentional selection of a feature-conjunction stimulus is accomplished by the parallel and independent facilitation of its constituent feature dimensions in early visual cortex. The ability to perceive the world is limited by the brain's processing capacity. Attention affords adaptive behavior by selectively prioritizing processing of relevant stimuli based on their features (location, color, orientation, etc.). We found that attentional mechanisms for selection of different features belonging to the same object operate independently and in parallel: concurrent attentional selection of two stimulus features is simply the sum of attending to each of those

  13. An Ensemble Method with Integration of Feature Selection and Classifier Selection to Detect the Landslides

    Science.gov (United States)

    Zhongqin, G.; Chen, Y.

    2017-12-01

    Abstract Quickly identify the spatial distribution of landslides automatically is essential for the prevention, mitigation and assessment of the landslide hazard. It's still a challenging job owing to the complicated characteristics and vague boundary of the landslide areas on the image. The high resolution remote sensing image has multi-scales, complex spatial distribution and abundant features, the object-oriented image classification methods can make full use of the above information and thus effectively detect the landslides after the hazard happened. In this research we present a new semi-supervised workflow, taking advantages of recent object-oriented image analysis and machine learning algorithms to quick locate the different origins of landslides of some areas on the southwest part of China. Besides a sequence of image segmentation, feature selection, object classification and error test, this workflow ensemble the feature selection and classifier selection. The feature this study utilized were normalized difference vegetation index (NDVI) change, textural feature derived from the gray level co-occurrence matrices (GLCM), spectral feature and etc. The improvement of this study shows this algorithm significantly removes some redundant feature and the classifiers get fully used. All these improvements lead to a higher accuracy on the determination of the shape of landslides on the high resolution remote sensing image, in particular the flexibility aimed at different kinds of landslides.

  14. A Meta-Heuristic Regression-Based Feature Selection for Predictive Analytics

    Directory of Open Access Journals (Sweden)

    Bharat Singh

    2014-11-01

    Full Text Available A high-dimensional feature selection having a very large number of features with an optimal feature subset is an NP-complete problem. Because conventional optimization techniques are unable to tackle large-scale feature selection problems, meta-heuristic algorithms are widely used. In this paper, we propose a particle swarm optimization technique while utilizing regression techniques for feature selection. We then use the selected features to classify the data. Classification accuracy is used as a criterion to evaluate classifier performance, and classification is accomplished through the use of k-nearest neighbour (KNN and Bayesian techniques. Various high dimensional data sets are used to evaluate the usefulness of the proposed approach. Results show that our approach gives better results when compared with other conventional feature selection algorithms.

  15. An opinion formation based binary optimization approach for feature selection

    Science.gov (United States)

    Hamedmoghadam, Homayoun; Jalili, Mahdi; Yu, Xinghuo

    2018-02-01

    This paper proposed a novel optimization method based on opinion formation in complex network systems. The proposed optimization technique mimics human-human interaction mechanism based on a mathematical model derived from social sciences. Our method encodes a subset of selected features to the opinion of an artificial agent and simulates the opinion formation process among a population of agents to solve the feature selection problem. The agents interact using an underlying interaction network structure and get into consensus in their opinions, while finding better solutions to the problem. A number of mechanisms are employed to avoid getting trapped in local minima. We compare the performance of the proposed method with a number of classical population-based optimization methods and a state-of-the-art opinion formation based method. Our experiments on a number of high dimensional datasets reveal outperformance of the proposed algorithm over others.

  16. Feature Selection using Multi-objective Genetic Algorith m: A Hybrid Approach

    OpenAIRE

    Ahuja, Jyoti; GJUST - Guru Jambheshwar University of Sciecne and Technology; Ratnoo, Saroj Dahiya; GJUST - Guru Jambheshwar University of Sciecne and Technology

    2015-01-01

    Feature selection is an important pre-processing task for building accurate and comprehensible classification models. Several researchers have applied filter, wrapper or hybrid approaches using genetic algorithms which are good candidates for optimization problems that involve large search spaces like in the case of feature selection. Moreover, feature selection is an inherently multi-objective problem with many competing objectives involving size, predictive power and redundancy of the featu...

  17. Multi-task feature selection in microarray data by binary integer programming.

    Science.gov (United States)

    Lan, Liang; Vucetic, Slobodan

    2013-12-20

    A major challenge in microarray classification is that the number of features is typically orders of magnitude larger than the number of examples. In this paper, we propose a novel feature filter algorithm to select the feature subset with maximal discriminative power and minimal redundancy by solving a quadratic objective function with binary integer constraints. To improve the computational efficiency, the binary integer constraints are relaxed and a low-rank approximation to the quadratic term is applied. The proposed feature selection algorithm was extended to solve multi-task microarray classification problems. We compared the single-task version of the proposed feature selection algorithm with 9 existing feature selection methods on 4 benchmark microarray data sets. The empirical results show that the proposed method achieved the most accurate predictions overall. We also evaluated the multi-task version of the proposed algorithm on 8 multi-task microarray datasets. The multi-task feature selection algorithm resulted in significantly higher accuracy than when using the single-task feature selection methods.

  18. The optimal extraction of feature algorithm based on KAZE

    Science.gov (United States)

    Yao, Zheyi; Gu, Guohua; Qian, Weixian; Wang, Pengcheng

    2015-10-01

    As a novel method of 2D features extraction algorithm over the nonlinear scale space, KAZE provide a special method. However, the computation of nonlinear scale space and the construction of KAZE feature vectors are more expensive than the SIFT and SURF significantly. In this paper, the given image is used to build the nonlinear space up to a maximum evolution time through the efficient Additive Operator Splitting (AOS) techniques and the variable conductance diffusion. Changing the parameter can improve the construction of nonlinear scale space and simplify the image conductivities for each dimension space, with the predigest computation. Then, the detection for points of interest can exhibit a maxima of the scale-normalized determinant with the Hessian response in the nonlinear scale space. At the same time, the detection of feature vectors is optimized by the Wavelet Transform method, which can avoid the second Gaussian smoothing in the KAZE Features and cut down the complexity of the algorithm distinctly in the building and describing vectors steps. In this way, the dominant orientation is obtained, similar to SURF, by summing the responses within a sliding circle segment covering an angle of π/3 in the circular area of radius 6σ with a sampling step of size σ one by one. Finally, the extraction in the multidimensional patch at the given scale, centered over the points of interest and rotated to align its dominant orientation to a canonical direction, is able to simplify the description of feature by reducing the description dimensions, just as the PCA-SIFT method. Even though the features are somewhat more expensive to compute than SIFT due to the construction of nonlinear scale space, but compared to SURF, the result revels a step forward in performance in detection, description and application against the previous ways by the following contrast experiments.

  19. Variable selection in near-infrared spectroscopy: Benchmarking of feature selection methods on biodiesel data

    International Nuclear Information System (INIS)

    Balabin, Roman M.; Smirnov, Sergey V.

    2011-01-01

    During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm -1 ) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic

  20. Receptive fields selection for binary feature description.

    Science.gov (United States)

    Fan, Bin; Kong, Qingqun; Trzcinski, Tomasz; Wang, Zhiheng; Pan, Chunhong; Fua, Pascal

    2014-06-01

    Feature description for local image patch is widely used in computer vision. While the conventional way to design local descriptor is based on expert experience and knowledge, learning-based methods for designing local descriptor become more and more popular because of their good performance and data-driven property. This paper proposes a novel data-driven method for designing binary feature descriptor, which we call receptive fields descriptor (RFD). Technically, RFD is constructed by thresholding responses of a set of receptive fields, which are selected from a large number of candidates according to their distinctiveness and correlations in a greedy way. Using two different kinds of receptive fields (namely rectangular pooling area and Gaussian pooling area) for selection, we obtain two binary descriptors RFDR and RFDG .accordingly. Image matching experiments on the well-known patch data set and Oxford data set demonstrate that RFD significantly outperforms the state-of-the-art binary descriptors, and is comparable with the best float-valued descriptors at a fraction of processing time. Finally, experiments on object recognition tasks confirm that both RFDR and RFDG successfully bridge the performance gap between binary descriptors and their floating-point competitors.

  1. Historical feature pattern extraction based network attack situation sensing algorithm.

    Science.gov (United States)

    Zeng, Yong; Liu, Dacheng; Lei, Zhou

    2014-01-01

    The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE). First, HFPE algorithm seeks similar indications from the history situation sequence recorded and weighs the link intensity between occurred indication and subsequent effect. Then it calculates the probability that a certain effect reappears according to the current indication and makes a prediction after weighting. Meanwhile, HFPE method gives an evolution algorithm to derive the prediction deviation from the views of pattern and accuracy. This algorithm can continuously promote the adaptability of HFPE through gradual fine-tuning. The method preserves the rules in sequence at its best, does not need data preprocessing, and can track and adapt to the variation of situation sequence continuously.

  2. Historical Feature Pattern Extraction Based Network Attack Situation Sensing Algorithm

    Directory of Open Access Journals (Sweden)

    Yong Zeng

    2014-01-01

    Full Text Available The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE. First, HFPE algorithm seeks similar indications from the history situation sequence recorded and weighs the link intensity between occurred indication and subsequent effect. Then it calculates the probability that a certain effect reappears according to the current indication and makes a prediction after weighting. Meanwhile, HFPE method gives an evolution algorithm to derive the prediction deviation from the views of pattern and accuracy. This algorithm can continuously promote the adaptability of HFPE through gradual fine-tuning. The method preserves the rules in sequence at its best, does not need data preprocessing, and can track and adapt to the variation of situation sequence continuously.

  3. ALGORITHM OF SELECTION EFFECTIVE SOLUTIONS FOR REPROFILING OF INDUSTRIAL BUILDINGS

    Directory of Open Access Journals (Sweden)

    MENEJLJUK A. I.

    2016-08-01

    Full Text Available Raising of problem.Non-compliance requirements of today's industrial enterprises, which were built during the Soviet period, as well as significant technical progress, economic reform and transition to market principles of performance evaluation leading to necessity to change their target and functionality. The technical condition of many industrial buildings in Ukraine allows to exploit them for decades.Redesigning manufacturing enterprises allows not only to reduce the cost of construction, but also to obtain new facilities in the city. Despite the large number of industrial buildings that have lost their effectiveness and relevance, as well as a significant investor interest in these objects, the scope of redevelopment in the construction remains unexplored. Analysis researches on the topic. The problem of reconstruction of industrial buildings considered in Topchy D. [3], Travin V. [9], as well as in the work of other scientists. However, there are no rules in regulatory documents and system studies for improving the organization of the reconstruction of buildings at realigning. The purpose of this work is the development an algorithm of actions for selection of effective organizational decisions at the planning stage of a reprofiling project of industrial buildings. The proposed algorithm allows you to select an effective organizational and technological solution for the re-profiling of industrial buildings, taking into account features of the building, its location, its state of structures and existing restrictions. The most effective organizational solution allows realize the reprofiling project of an industrial building in the most possible short terms and with the lowest possible use of material resources, taking into account the available features and restrictions. Conclusion. Each object has a number of unique features that necessary for considering at choosing an effective reprofiling variant. The developed algorithm for selecting

  4. NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.

    Directory of Open Access Journals (Sweden)

    Joeri Ruyssinck

    Full Text Available One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made

  5. Feature selection using feature dissimilarity measure and density ...

    Indian Academy of Sciences (India)

    2015-09-28

    Sep 28, 2015 ... that the value of λ2 tends to 0 as the absolute value of. ϱ(x,y) increases. ..... ison of speech recognition algorithms; in Acoustics, Speech, and ... single-cell rna-seq for marker-free decomposition of tissues into cell types.

  6. Automatic selective feature retention in patient specific elastic surface registration

    CSIR Research Space (South Africa)

    Jansen van Rensburg, GJ

    2011-01-01

    Full Text Available The accuracy with which a recent elastic surface registration algorithm deforms the complex geometry of a skull is examined. This algorithm is then coupled to a line based algorithm as is frequently used in patient specific feature registration...

  7. A Cancer Gene Selection Algorithm Based on the K-S Test and CFS

    Directory of Open Access Journals (Sweden)

    Qiang Su

    2017-01-01

    Full Text Available Background. To address the challenging problem of selecting distinguished genes from cancer gene expression datasets, this paper presents a gene subset selection algorithm based on the Kolmogorov-Smirnov (K-S test and correlation-based feature selection (CFS principles. The algorithm selects distinguished genes first using the K-S test, and then, it uses CFS to select genes from those selected by the K-S test. Results. We adopted support vector machines (SVM as the classification tool and used the criteria of accuracy to evaluate the performance of the classifiers on the selected gene subsets. This approach compared the proposed gene subset selection algorithm with the K-S test, CFS, minimum-redundancy maximum-relevancy (mRMR, and ReliefF algorithms. The average experimental results of the aforementioned gene selection algorithms for 5 gene expression datasets demonstrate that, based on accuracy, the performance of the new K-S and CFS-based algorithm is better than those of the K-S test, CFS, mRMR, and ReliefF algorithms. Conclusions. The experimental results show that the K-S test-CFS gene selection algorithm is a very effective and promising approach compared to the K-S test, CFS, mRMR, and ReliefF algorithms.

  8. Condition monitoring of face milling tool using K-star algorithm and histogram features of vibration signal

    Directory of Open Access Journals (Sweden)

    C.K. Madhusudana

    2016-09-01

    Full Text Available This paper deals with the fault diagnosis of the face milling tool based on machine learning approach using histogram features and K-star algorithm technique. Vibration signals of the milling tool under healthy and different fault conditions are acquired during machining of steel alloy 42CrMo4. Histogram features are extracted from the acquired signals. The decision tree is used to select the salient features out of all the extracted features and these selected features are used as an input to the classifier. K-star algorithm is used as a classifier and the output of the model is utilised to study and classify the different conditions of the face milling tool. Based on the experimental results, K-star algorithm is provided a better classification accuracy in the range from 94% to 96% with histogram features and is acceptable for fault diagnosis.

  9. Feature selection in classification of eye movements using electrooculography for activity recognition.

    Science.gov (United States)

    Mala, S; Latha, K

    2014-01-01

    Activity recognition is needed in different requisition, for example, reconnaissance system, patient monitoring, and human-computer interfaces. Feature selection plays an important role in activity recognition, data mining, and machine learning. In selecting subset of features, an efficient evolutionary algorithm Differential Evolution (DE), a very efficient optimizer, is used for finding informative features from eye movements using electrooculography (EOG). Many researchers use EOG signals in human-computer interactions with various computational intelligence methods to analyze eye movements. The proposed system involves analysis of EOG signals using clearness based features, minimum redundancy maximum relevance features, and Differential Evolution based features. This work concentrates more on the feature selection algorithm based on DE in order to improve the classification for faultless activity recognition.

  10. Feature extraction algorithm for space targets based on fractal theory

    Science.gov (United States)

    Tian, Balin; Yuan, Jianping; Yue, Xiaokui; Ning, Xin

    2007-11-01

    In order to offer a potential for extending the life of satellites and reducing the launch and operating costs, satellite servicing including conducting repairs, upgrading and refueling spacecraft on-orbit become much more frequently. Future space operations can be more economically and reliably executed using machine vision systems, which can meet real time and tracking reliability requirements for image tracking of space surveillance system. Machine vision was applied to the research of relative pose for spacecrafts, the feature extraction algorithm was the basis of relative pose. In this paper fractal geometry based edge extraction algorithm which can be used in determining and tracking the relative pose of an observed satellite during proximity operations in machine vision system was presented. The method gets the gray-level image distributed by fractal dimension used the Differential Box-Counting (DBC) approach of the fractal theory to restrain the noise. After this, we detect the consecutive edge using Mathematical Morphology. The validity of the proposed method is examined by processing and analyzing images of space targets. The edge extraction method not only extracts the outline of the target, but also keeps the inner details. Meanwhile, edge extraction is only processed in moving area to reduce computation greatly. Simulation results compared edge detection using the method which presented by us with other detection methods. The results indicate that the presented algorithm is a valid method to solve the problems of relative pose for spacecrafts.

  11. Fractal Complexity-Based Feature Extraction Algorithm of Communication Signals

    Science.gov (United States)

    Wang, Hui; Li, Jingchao; Guo, Lili; Dou, Zheng; Lin, Yun; Zhou, Ruolin

    How to analyze and identify the characteristics of radiation sources and estimate the threat level by means of detecting, intercepting and locating has been the central issue of electronic support in the electronic warfare, and communication signal recognition is one of the key points to solve this issue. Aiming at accurately extracting the individual characteristics of the radiation source for the increasingly complex communication electromagnetic environment, a novel feature extraction algorithm for individual characteristics of the communication radiation source based on the fractal complexity of the signal is proposed. According to the complexity of the received signal and the situation of environmental noise, use the fractal dimension characteristics of different complexity to depict the subtle characteristics of the signal to establish the characteristic database, and then identify different broadcasting station by gray relation theory system. The simulation results demonstrate that the algorithm can achieve recognition rate of 94% even in the environment with SNR of -10dB, and this provides an important theoretical basis for the accurate identification of the subtle features of the signal at low SNR in the field of information confrontation.

  12. Feature Selection for Chemical Sensor Arrays Using Mutual Information

    Science.gov (United States)

    Wang, X. Rosalind; Lizier, Joseph T.; Nowotny, Thomas; Berna, Amalia Z.; Prokopenko, Mikhail; Trowell, Stephen C.

    2014-01-01

    We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays. PMID:24595058

  13. Variable selection in Logistic regression model with genetic algorithm.

    Science.gov (United States)

    Zhang, Zhongheng; Trevino, Victor; Hoseini, Sayed Shahabuddin; Belciug, Smaranda; Boopathi, Arumugam Manivanna; Zhang, Ping; Gorunescu, Florin; Subha, Velappan; Dai, Songshi

    2018-02-01

    Variable or feature selection is one of the most important steps in model specification. Especially in the case of medical-decision making, the direct use of a medical database, without a previous analysis and preprocessing step, is often counterproductive. In this way, the variable selection represents the method of choosing the most relevant attributes from the database in order to build a robust learning models and, thus, to improve the performance of the models used in the decision process. In biomedical research, the purpose of variable selection is to select clinically important and statistically significant variables, while excluding unrelated or noise variables. A variety of methods exist for variable selection, but none of them is without limitations. For example, the stepwise approach, which is highly used, adds the best variable in each cycle generally producing an acceptable set of variables. Nevertheless, it is limited by the fact that it commonly trapped in local optima. The best subset approach can systematically search the entire covariate pattern space, but the solution pool can be extremely large with tens to hundreds of variables, which is the case in nowadays clinical data. Genetic algorithms (GA) are heuristic optimization approaches and can be used for variable selection in multivariable regression models. This tutorial paper aims to provide a step-by-step approach to the use of GA in variable selection. The R code provided in the text can be extended and adapted to other data analysis needs.

  14. Image Recommendation Algorithm Using Feature-Based Collaborative Filtering

    Science.gov (United States)

    Kim, Deok-Hwan

    As the multimedia contents market continues its rapid expansion, the amount of image contents used in mobile phone services, digital libraries, and catalog service is increasing remarkably. In spite of this rapid growth, users experience high levels of frustration when searching for the desired image. Even though new images are profitable to the service providers, traditional collaborative filtering methods cannot recommend them. To solve this problem, in this paper, we propose feature-based collaborative filtering (FBCF) method to reflect the user's most recent preference by representing his purchase sequence in the visual feature space. The proposed approach represents the images that have been purchased in the past as the feature clusters in the multi-dimensional feature space and then selects neighbors by using an inter-cluster distance function between their feature clusters. Various experiments using real image data demonstrate that the proposed approach provides a higher quality recommendation and better performance than do typical collaborative filtering and content-based filtering techniques.

  15. Genetic Particle Swarm Optimization–Based Feature Selection for Very-High-Resolution Remotely Sensed Imagery Object Change Detection

    Science.gov (United States)

    Chen, Qiang; Chen, Yunhao; Jiang, Weiguo

    2016-01-01

    In the field of multiple features Object-Based Change Detection (OBCD) for very-high-resolution remotely sensed images, image objects have abundant features and feature selection affects the precision and efficiency of OBCD. Through object-based image analysis, this paper proposes a Genetic Particle Swarm Optimization (GPSO)-based feature selection algorithm to solve the optimization problem of feature selection in multiple features OBCD. We select the Ratio of Mean to Variance (RMV) as the fitness function of GPSO, and apply the proposed algorithm to the object-based hybrid multivariate alternative detection model. Two experiment cases on Worldview-2/3 images confirm that GPSO can significantly improve the speed of convergence, and effectively avoid the problem of premature convergence, relative to other feature selection algorithms. According to the accuracy evaluation of OBCD, GPSO is superior at overall accuracy (84.17% and 83.59%) and Kappa coefficient (0.6771 and 0.6314) than other algorithms. Moreover, the sensitivity analysis results show that the proposed algorithm is not easily influenced by the initial parameters, but the number of features to be selected and the size of the particle swarm would affect the algorithm. The comparison experiment results reveal that RMV is more suitable than other functions as the fitness function of GPSO-based feature selection algorithm. PMID:27483285

  16. Genetic Particle Swarm Optimization-Based Feature Selection for Very-High-Resolution Remotely Sensed Imagery Object Change Detection.

    Science.gov (United States)

    Chen, Qiang; Chen, Yunhao; Jiang, Weiguo

    2016-07-30

    In the field of multiple features Object-Based Change Detection (OBCD) for very-high-resolution remotely sensed images, image objects have abundant features and feature selection affects the precision and efficiency of OBCD. Through object-based image analysis, this paper proposes a Genetic Particle Swarm Optimization (GPSO)-based feature selection algorithm to solve the optimization problem of feature selection in multiple features OBCD. We select the Ratio of Mean to Variance (RMV) as the fitness function of GPSO, and apply the proposed algorithm to the object-based hybrid multivariate alternative detection model. Two experiment cases on Worldview-2/3 images confirm that GPSO can significantly improve the speed of convergence, and effectively avoid the problem of premature convergence, relative to other feature selection algorithms. According to the accuracy evaluation of OBCD, GPSO is superior at overall accuracy (84.17% and 83.59%) and Kappa coefficient (0.6771 and 0.6314) than other algorithms. Moreover, the sensitivity analysis results show that the proposed algorithm is not easily influenced by the initial parameters, but the number of features to be selected and the size of the particle swarm would affect the algorithm. The comparison experiment results reveal that RMV is more suitable than other functions as the fitness function of GPSO-based feature selection algorithm.

  17. Feature Selection for Wheat Yield Prediction

    Science.gov (United States)

    Ruß, Georg; Kruse, Rudolf

    Carrying out effective and sustainable agriculture has become an important issue in recent years. Agricultural production has to keep up with an everincreasing population by taking advantage of a field’s heterogeneity. Nowadays, modern technology such as the global positioning system (GPS) and a multitude of developed sensors enable farmers to better measure their fields’ heterogeneities. For this small-scale, precise treatment the term precision agriculture has been coined. However, the large amounts of data that are (literally) harvested during the growing season have to be analysed. In particular, the farmer is interested in knowing whether a newly developed heterogeneity sensor is potentially advantageous or not. Since the sensor data are readily available, this issue should be seen from an artificial intelligence perspective. There it can be treated as a feature selection problem. The additional task of yield prediction can be treated as a multi-dimensional regression problem. This article aims to present an approach towards solving these two practically important problems using artificial intelligence and data mining ideas and methodologies.

  18. Information Theory for Gabor Feature Selection for Face Recognition

    Directory of Open Access Journals (Sweden)

    Shen Linlin

    2006-01-01

    Full Text Available A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004, our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.

  19. Information Theory for Gabor Feature Selection for Face Recognition

    Science.gov (United States)

    Shen, Linlin; Bai, Li

    2006-12-01

    A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004), our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.

  20. Selecting Optimal Feature Set in High-Dimensional Data by Swarm Search

    Directory of Open Access Journals (Sweden)

    Simon Fong

    2013-01-01

    Full Text Available Selecting the right set of features from data of high dimensionality for inducing an accurate classification model is a tough computational challenge. It is almost a NP-hard problem as the combinations of features escalate exponentially as the number of features increases. Unfortunately in data mining, as well as other engineering applications and bioinformatics, some data are described by a long array of features. Many feature subset selection algorithms have been proposed in the past, but not all of them are effective. Since it takes seemingly forever to use brute force in exhaustively trying every possible combination of features, stochastic optimization may be a solution. In this paper, we propose a new feature selection scheme called Swarm Search to find an optimal feature set by using metaheuristics. The advantage of Swarm Search is its flexibility in integrating any classifier into its fitness function and plugging in any metaheuristic algorithm to facilitate heuristic search. Simulation experiments are carried out by testing the Swarm Search over some high-dimensional datasets, with different classification algorithms and various metaheuristic algorithms. The comparative experiment results show that Swarm Search is able to attain relatively low error rates in classification without shrinking the size of the feature subset to its minimum.

  1. Feature selection for neural network based defect classification of ceramic components using high frequency ultrasound.

    Science.gov (United States)

    Kesharaju, Manasa; Nagarajah, Romesh

    2015-09-01

    The motivation for this research stems from a need for providing a non-destructive testing method capable of detecting and locating any defects and microstructural variations within armour ceramic components before issuing them to the soldiers who rely on them for their survival. The development of an automated ultrasonic inspection based classification system would make possible the checking of each ceramic component and immediately alert the operator about the presence of defects. Generally, in many classification problems a choice of features or dimensionality reduction is significant and simultaneously very difficult, as a substantial computational effort is required to evaluate possible feature subsets. In this research, a combination of artificial neural networks and genetic algorithms are used to optimize the feature subset used in classification of various defects in reaction-sintered silicon carbide ceramic components. Initially wavelet based feature extraction is implemented from the region of interest. An Artificial Neural Network classifier is employed to evaluate the performance of these features. Genetic Algorithm based feature selection is performed. Principal Component Analysis is a popular technique used for feature selection and is compared with the genetic algorithm based technique in terms of classification accuracy and selection of optimal number of features. The experimental results confirm that features identified by Principal Component Analysis lead to improved performance in terms of classification percentage with 96% than Genetic algorithm with 94%. Copyright © 2015 Elsevier B.V. All rights reserved.

  2. Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear statistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two representative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method performs well in selecting genes and achieves high classification accuracies with these genes.

  3. Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach

    Directory of Open Access Journals (Sweden)

    Daniel Peralta

    2015-01-01

    Full Text Available Nowadays, many disciplines have to deal with big datasets that additionally involve a high number of features. Feature selection methods aim at eliminating noisy, redundant, or irrelevant features that may deteriorate the classification performance. However, traditional methods lack enough scalability to cope with datasets of millions of instances and extract successful results in a delimited time. This paper presents a feature selection algorithm based on evolutionary computation that uses the MapReduce paradigm to obtain subsets of features from big datasets. The algorithm decomposes the original dataset in blocks of instances to learn from them in the map phase; then, the reduce phase merges the obtained partial results into a final vector of feature weights, which allows a flexible application of the feature selection procedure using a threshold to determine the selected subset of features. The feature selection method is evaluated by using three well-known classifiers (SVM, Logistic Regression, and Naive Bayes implemented within the Spark framework to address big data problems. In the experiments, datasets up to 67 millions of instances and up to 2000 attributes have been managed, showing that this is a suitable framework to perform evolutionary feature selection, improving both the classification accuracy and its runtime when dealing with big data problems.

  4. Selection of views to materialize using simulated annealing algorithms

    Science.gov (United States)

    Zhou, Lijuan; Liu, Chi; Wang, Hongfeng; Liu, Daixin

    2002-03-01

    A data warehouse contains lots of materialized views over the data provided by the distributed heterogeneous databases for the purpose of efficiently implementing decision-support or OLAP queries. It is important to select the right view to materialize that answer a given set of queries. The goal is the minimization of the combination of the query evaluation and view maintenance costs. In this paper, we have addressed and designed algorithms for selecting a set of views to be materialized so that the sum of processing a set of queries and maintaining the materialized views is minimized. We develop an approach using simulated annealing algorithms to solve it. First, we explore simulated annealing algorithms to optimize the selection of materialized views. Then we use experiments to demonstrate our approach. The results show that our algorithm works better. We implemented our algorithms and a performance study of the algorithms shows that the proposed algorithm gives an optimal solution.

  5. Efficient Generation and Selection of Combined Features for Improved Classification

    KAUST Repository

    Shono, Ahmad N.

    2014-05-01

    This study contributes a methodology and associated toolkit developed to allow users to experiment with the use of combined features in classification problems. Methods are provided for efficiently generating combined features from an original feature set, for efficiently selecting the most discriminating of these generated combined features, and for efficiently performing a preliminary comparison of the classification results when using the original features exclusively against the results when using the selected combined features. The potential benefit of considering combined features in classification problems is demonstrated by applying the developed methodology and toolkit to three sample data sets where the discovery of combined features containing new discriminating information led to improved classification results.

  6. Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.

    Science.gov (United States)

    Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A

    2015-06-01

    Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. A Smartphone Indoor Localization Algorithm Based on WLAN Location Fingerprinting with Feature Extraction and Clustering.

    Science.gov (United States)

    Luo, Junhai; Fu, Liang

    2017-06-09

    With the development of communication technology, the demand for location-based services is growing rapidly. This paper presents an algorithm for indoor localization based on Received Signal Strength (RSS), which is collected from Access Points (APs). The proposed localization algorithm contains the offline information acquisition phase and online positioning phase. Firstly, the AP selection algorithm is reviewed and improved based on the stability of signals to remove useless AP; secondly, Kernel Principal Component Analysis (KPCA) is analyzed and used to remove the data redundancy and maintain useful characteristics for nonlinear feature extraction; thirdly, the Affinity Propagation Clustering (APC) algorithm utilizes RSS values to classify data samples and narrow the positioning range. In the online positioning phase, the classified data will be matched with the testing data to determine the position area, and the Maximum Likelihood (ML) estimate will be employed for precise positioning. Eventually, the proposed algorithm is implemented in a real-world environment for performance evaluation. Experimental results demonstrate that the proposed algorithm improves the accuracy and computational complexity.

  8. A Smartphone Indoor Localization Algorithm Based on WLAN Location Fingerprinting with Feature Extraction and Clustering

    Directory of Open Access Journals (Sweden)

    Junhai Luo

    2017-06-01

    Full Text Available With the development of communication technology, the demand for location-based services is growing rapidly. This paper presents an algorithm for indoor localization based on Received Signal Strength (RSS, which is collected from Access Points (APs. The proposed localization algorithm contains the offline information acquisition phase and online positioning phase. Firstly, the AP selection algorithm is reviewed and improved based on the stability of signals to remove useless AP; secondly, Kernel Principal Component Analysis (KPCA is analyzed and used to remove the data redundancy and maintain useful characteristics for nonlinear feature extraction; thirdly, the Affinity Propagation Clustering (APC algorithm utilizes RSS values to classify data samples and narrow the positioning range. In the online positioning phase, the classified data will be matched with the testing data to determine the position area, and the Maximum Likelihood (ML estimate will be employed for precise positioning. Eventually, the proposed algorithm is implemented in a real-world environment for performance evaluation. Experimental results demonstrate that the proposed algorithm improves the accuracy and computational complexity.

  9. Principal Feature Analysis: A Multivariate Feature Selection Method for fMRI Data

    Directory of Open Access Journals (Sweden)

    Lijun Wang

    2013-01-01

    Full Text Available Brain decoding with functional magnetic resonance imaging (fMRI requires analysis of complex, multivariate data. Multivoxel pattern analysis (MVPA has been widely used in recent years. MVPA treats the activation of multiple voxels from fMRI data as a pattern and decodes brain states using pattern classification methods. Feature selection is a critical procedure of MVPA because it decides which features will be included in the classification analysis of fMRI data, thereby improving the performance of the classifier. Features can be selected by limiting the analysis to specific anatomical regions or by computing univariate (voxel-wise or multivariate statistics. However, these methods either discard some informative features or select features with redundant information. This paper introduces the principal feature analysis as a novel multivariate feature selection method for fMRI data processing. This multivariate approach aims to remove features with redundant information, thereby selecting fewer features, while retaining the most information.

  10. EEG feature selection method based on decision tree.

    Science.gov (United States)

    Duan, Lijuan; Ge, Hui; Ma, Wei; Miao, Jun

    2015-01-01

    This paper aims to solve automated feature selection problem in brain computer interface (BCI). In order to automate feature selection process, we proposed a novel EEG feature selection method based on decision tree (DT). During the electroencephalogram (EEG) signal processing, a feature extraction method based on principle component analysis (PCA) was used, and the selection process based on decision tree was performed by searching the feature space and automatically selecting optimal features. Considering that EEG signals are a series of non-linear signals, a generalized linear classifier named support vector machine (SVM) was chosen. In order to test the validity of the proposed method, we applied the EEG feature selection method based on decision tree to BCI Competition II datasets Ia, and the experiment showed encouraging results.

  11. Classification Influence of Features on Given Emotions and Its Application in Feature Selection

    Science.gov (United States)

    Xing, Yin; Chen, Chuang; Liu, Li-Long

    2018-04-01

    In order to solve the problem that there is a large amount of redundant data in high-dimensional speech emotion features, we analyze deeply the extracted speech emotion features and select better features. Firstly, a given emotion is classified by each feature. Secondly, the recognition rate is ranked in descending order. Then, the optimal threshold of features is determined by rate criterion. Finally, the better features are obtained. When applied in Berlin and Chinese emotional data set, the experimental results show that the feature selection method outperforms the other traditional methods.

  12. Feature Selection Methods for Robust Decoding of Finger Movements in a Non-human Primate

    Science.gov (United States)

    Padmanaban, Subash; Baker, Justin; Greger, Bradley

    2018-01-01

    Objective: The performance of machine learning algorithms used for neural decoding of dexterous tasks may be impeded due to problems arising when dealing with high-dimensional data. The objective of feature selection algorithms is to choose a near-optimal subset of features from the original feature space to improve the performance of the decoding algorithm. The aim of our study was to compare the effects of four feature selection techniques, Wilcoxon signed-rank test, Relative Importance, Principal Component Analysis (PCA), and Mutual Information Maximization on SVM classification performance for a dexterous decoding task. Approach: A nonhuman primate (NHP) was trained to perform small coordinated movements—similar to typing. An array of microelectrodes was implanted in the hand area of the motor cortex of the NHP and used to record action potentials (AP) during finger movements. A Support Vector Machine (SVM) was used to classify which finger movement the NHP was making based upon AP firing rates. We used the SVM classification to examine the functional parameters of (i) robustness to simulated failure and (ii) longevity of classification. We also compared the effect of using isolated-neuron and multi-unit firing rates as the feature vector supplied to the SVM. Main results: The average decoding accuracy for multi-unit features and single-unit features using Mutual Information Maximization (MIM) across 47 sessions was 96.74 ± 3.5% and 97.65 ± 3.36% respectively. The reduction in decoding accuracy between using 100% of the features and 10% of features based on MIM was 45.56% (from 93.7 to 51.09%) and 4.75% (from 95.32 to 90.79%) for multi-unit and single-unit features respectively. MIM had best performance compared to other feature selection methods. Significance: These results suggest improved decoding performance can be achieved by using optimally selected features. The results based on clinically relevant performance metrics also suggest that the decoding

  13. A feature extraction algorithm based on corner and spots in self-driving vehicles

    Directory of Open Access Journals (Sweden)

    Yupeng FENG

    2017-06-01

    Full Text Available To solve the poor real-time performance problem of the visual odometry based on embedded system with limited computing resources, an image matching method based on Harris and SIFT is proposed, namely the Harris-SIFT algorithm. On the basis of the review of SIFT algorithm, the principle of Harris-SIFT algorithm is provided. First, Harris algorithm is used to extract the corners of the image as candidate feature points, and scale invariant feature transform (SIFT features are extracted from those candidate feature points. At last, through an example, the algorithm is simulated by Matlab, then the complexity and other performance of the algorithm are analyzed. The experimental results show that the proposed method reduces the computational complexity and improves the speed of feature extraction. Harris-SIFT algorithm can be used in the real-time vision odometer system, and will bring about a wide application of visual odometry in embedded navigation system.

  14. Feature Fusion Algorithm for Multimodal Emotion Recognition from Speech and Facial Expression Signal

    Directory of Open Access Journals (Sweden)

    Han Zhiyan

    2016-01-01

    Full Text Available In order to overcome the limitation of single mode emotion recognition. This paper describes a novel multimodal emotion recognition algorithm, and takes speech signal and facial expression signal as the research subjects. First, fuse the speech signal feature and facial expression signal feature, get sample sets by putting back sampling, and then get classifiers by BP neural network (BPNN. Second, measure the difference between two classifiers by double error difference selection strategy. Finally, get the final recognition result by the majority voting rule. Experiments show the method improves the accuracy of emotion recognition by giving full play to the advantages of decision level fusion and feature level fusion, and makes the whole fusion process close to human emotion recognition more, with a recognition rate 90.4%.

  15. Sensor-based vibration signal feature extraction using an improved composite dictionary matching pursuit algorithm.

    Science.gov (United States)

    Cui, Lingli; Wu, Na; Wang, Wenjing; Kang, Chenhui

    2014-09-09

    This paper presents a new method for a composite dictionary matching pursuit algorithm, which is applied to vibration sensor signal feature extraction and fault diagnosis of a gearbox. Three advantages are highlighted in the new method. First, the composite dictionary in the algorithm has been changed from multi-atom matching to single-atom matching. Compared to non-composite dictionary single-atom matching, the original composite dictionary multi-atom matching pursuit (CD-MaMP) algorithm can achieve noise reduction in the reconstruction stage, but it cannot dramatically reduce the computational cost and improve the efficiency in the decomposition stage. Therefore, the optimized composite dictionary single-atom matching algorithm (CD-SaMP) is proposed. Second, the termination condition of iteration based on the attenuation coefficient is put forward to improve the sparsity and efficiency of the algorithm, which adjusts the parameters of the termination condition constantly in the process of decomposition to avoid noise. Third, composite dictionaries are enriched with the modulation dictionary, which is one of the important structural characteristics of gear fault signals. Meanwhile, the termination condition of iteration settings, sub-feature dictionary selections and operation efficiency between CD-MaMP and CD-SaMP are discussed, aiming at gear simulation vibration signals with noise. The simulation sensor-based vibration signal results show that the termination condition of iteration based on the attenuation coefficient enhances decomposition sparsity greatly and achieves a good effect of noise reduction. Furthermore, the modulation dictionary achieves a better matching effect compared to the Fourier dictionary, and CD-SaMP has a great advantage of sparsity and efficiency compared with the CD-MaMP. The sensor-based vibration signals measured from practical engineering gearbox analyses have further shown that the CD-SaMP decomposition and reconstruction algorithm

  16. Sensor-Based Vibration Signal Feature Extraction Using an Improved Composite Dictionary Matching Pursuit Algorithm

    Directory of Open Access Journals (Sweden)

    Lingli Cui

    2014-09-01

    Full Text Available This paper presents a new method for a composite dictionary matching pursuit algorithm, which is applied to vibration sensor signal feature extraction and fault diagnosis of a gearbox. Three advantages are highlighted in the new method. First, the composite dictionary in the algorithm has been changed from multi-atom matching to single-atom matching. Compared to non-composite dictionary single-atom matching, the original composite dictionary multi-atom matching pursuit (CD-MaMP algorithm can achieve noise reduction in the reconstruction stage, but it cannot dramatically reduce the computational cost and improve the efficiency in the decomposition stage. Therefore, the optimized composite dictionary single-atom matching algorithm (CD-SaMP is proposed. Second, the termination condition of iteration based on the attenuation coefficient is put forward to improve the sparsity and efficiency of the algorithm, which adjusts the parameters of the termination condition constantly in the process of decomposition to avoid noise. Third, composite dictionaries are enriched with the modulation dictionary, which is one of the important structural characteristics of gear fault signals. Meanwhile, the termination condition of iteration settings, sub-feature dictionary selections and operation efficiency between CD-MaMP and CD-SaMP are discussed, aiming at gear simulation vibration signals with noise. The simulation sensor-based vibration signal results show that the termination condition of iteration based on the attenuation coefficient enhances decomposition sparsity greatly and achieves a good effect of noise reduction. Furthermore, the modulation dictionary achieves a better matching effect compared to the Fourier dictionary, and CD-SaMP has a great advantage of sparsity and efficiency compared with the CD-MaMP. The sensor-based vibration signals measured from practical engineering gearbox analyses have further shown that the CD-SaMP decomposition and

  17. ADAPTIVE SELECTION OF AUXILIARY OBJECTIVES IN MULTIOBJECTIVE EVOLUTIONARY ALGORITHMS

    Directory of Open Access Journals (Sweden)

    I. A. Petrova

    2016-05-01

    Full Text Available Subject of Research.We propose to modify the EA+RL method, which increases efficiency of evolutionary algorithms by means of auxiliary objectives. The proposed modification is compared to the existing objective selection methods on the example of travelling salesman problem. Method. In the EA+RL method a reinforcement learning algorithm is used to select an objective – the target objective or one of the auxiliary objectives – at each iteration of the single-objective evolutionary algorithm.The proposed modification of the EA+RL method adopts this approach for the usage with a multiobjective evolutionary algorithm. As opposed to theEA+RL method, in this modification one of the auxiliary objectives is selected by reinforcement learning and optimized together with the target objective at each step of the multiobjective evolutionary algorithm. Main Results.The proposed modification of the EA+RL method was compared to the existing objective selection methods on the example of travelling salesman problem. In the EA+RL method and its proposed modification reinforcement learning algorithms for stationary and non-stationary environment were used. The proposed modification of the EA+RL method applied with reinforcement learning for non-stationary environment outperformed the considered objective selection algorithms on the most problem instances. Practical Significance. The proposed approach increases efficiency of evolutionary algorithms, which may be used for solving discrete NP-hard optimization problems. They are, in particular, combinatorial path search problems and scheduling problems.

  18. A HYBRID FILTER AND WRAPPER FEATURE SELECTION APPROACH FOR DETECTING CONTAMINATION IN DRINKING WATER MANAGEMENT SYSTEM

    Directory of Open Access Journals (Sweden)

    S. VISALAKSHI

    2017-07-01

    Full Text Available Feature selection is an important task in predictive models which helps to identify the irrelevant features in the high - dimensional dataset. In this case of water contamination detection dataset, the standard wrapper algorithm alone cannot be applied because of the complexity. To overcome this computational complexity problem and making it lighter, filter-wrapper based algorithm has been proposed. In this work, reducing the feature space is a significant component of water contamination. The main findings are as follows: (1 The main goal is speeding up the feature selection process, so the proposed filter - based feature pre-selection is applied and guarantees that useful data are improbable to be detached in the initial stage which discussed briefly in this paper. (2 The resulting features are again filtered by using the Genetic Algorithm coded with Support Vector Machine method, where it facilitates to nutshell the subset of features with high accuracy and decreases the expense. Experimental results show that the proposed methods trim down redundant features effectively and achieved better classification accuracy.

  19. Selected PET radiomic features remain the same.

    Science.gov (United States)

    Tsujikawa, Tetsuya; Tsuyoshi, Hideaki; Kanno, Masafumi; Yamada, Shizuka; Kobayashi, Masato; Narita, Norihiko; Kimura, Hirohiko; Fujieda, Shigeharu; Yoshida, Yoshio; Okazawa, Hidehiko

    2018-04-17

    We investigated whether PET radiomic features are affected by differences in the scanner, scan protocol, and lesion location using 18 F-FDG PET/CT and PET/MR scans. SUV, TMR, skewness, kurtosis, entropy, and homogeneity strongly correlated between PET/CT and PET/MR images. SUVs were significantly higher on PET/MR 0-2 min and PET/MR 0-10 min than on PET/CT in gynecological cancer ( p = 0.008 and 0.008, respectively), whereas no significant difference was observed between PET/CT, PET/MR 0-2 min , and PET/MR 0-10 min images in oral cavity/oropharyngeal cancer. TMRs on PET/CT, PET/MR 0-2 min , and PET/MR 0-10 min increased in this order in gynecological cancer and oral cavity/oropharyngeal cancer. In contrast to conventional and histogram indices, 4 textural features (entropy, homogeneity, SRE, and LRE) were not significantly different between PET/CT, PET/MR 0-2 min , and PET/MR 0-10 min images. 18 F-FDG PET radiomic features strongly correlated between PET/CT and PET/MR images. Dixon-based attenuation correction on PET/MR images underestimated tumor tracer uptake more significantly in oral cavity/oropharyngeal cancer than in gynecological cancer. 18 F-FDG PET textural features were affected less by differences in the scanner and scan protocol than conventional and histogram features, possibly due to the resampling process using a medium bin width. Eight patients with gynecological cancer and 7 with oral cavity/oropharyngeal cancer underwent a whole-body 18 F-FDG PET/CT scan and regional PET/MR scan in one day. PET/MR scans were performed for 10 minutes in the list mode, and PET/CT and 0-2 min and 0-10 min PET/MR images were reconstructed. The standardized uptake value (SUV), tumor-to-muscle SUV ratio (TMR), skewness, kurtosis, entropy, homogeneity, short-run emphasis (SRE), and long-run emphasis (LRE) were compared between PET/CT, PET/MR 0-2 min , and PET/MR 0-10 min images.

  20. A new and fast image feature selection method for developing an optimal mammographic mass detection scheme.

    Science.gov (United States)

    Tan, Maxine; Pu, Jiantao; Zheng, Bin

    2014-08-01

    Selecting optimal features from a large image feature pool remains a major challenge in developing computer-aided detection (CAD) schemes of medical images. The objective of this study is to investigate a new approach to significantly improve efficacy of image feature selection and classifier optimization in developing a CAD scheme of mammographic masses. An image dataset including 1600 regions of interest (ROIs) in which 800 are positive (depicting malignant masses) and 800 are negative (depicting CAD-generated false positive regions) was used in this study. After segmentation of each suspicious lesion by a multilayer topographic region growth algorithm, 271 features were computed in different feature categories including shape, texture, contrast, isodensity, spiculation, local topological features, as well as the features related to the presence and location of fat and calcifications. Besides computing features from the original images, the authors also computed new texture features from the dilated lesion segments. In order to select optimal features from this initial feature pool and build a highly performing classifier, the authors examined and compared four feature selection methods to optimize an artificial neural network (ANN) based classifier, namely: (1) Phased Searching with NEAT in a Time-Scaled Framework, (2) A sequential floating forward selection (SFFS) method, (3) A genetic algorithm (GA), and (4) A sequential forward selection (SFS) method. Performances of the four approaches were assessed using a tenfold cross validation method. Among these four methods, SFFS has highest efficacy, which takes 3%-5% of computational time as compared to GA approach, and yields the highest performance level with the area under a receiver operating characteristic curve (AUC) = 0.864 ± 0.034. The results also demonstrated that except using GA, including the new texture features computed from the dilated mass segments improved the AUC results of the ANNs optimized

  1. Fuzzy Mutual Information Based min-Redundancy and Max-Relevance Heterogeneous Feature Selection

    Directory of Open Access Journals (Sweden)

    Daren Yu

    2011-08-01

    Full Text Available Feature selection is an important preprocessing step in pattern classification and machine learning, and mutual information is widely used to measure relevance between features and decision. However, it is difficult to directly calculate relevance between continuous or fuzzy features using mutual information. In this paper we introduce the fuzzy information entropy and fuzzy mutual information for computing relevance between numerical or fuzzy features and decision. The relationship between fuzzy information entropy and differential entropy is also discussed. Moreover, we combine fuzzy mutual information with qmin-Redundancy-Max-Relevanceq, qMax-Dependencyq and min-Redundancy-Max-Dependencyq algorithms. The performance and stability of the proposed algorithms are tested on benchmark data sets. Experimental results show the proposed algorithms are effective and stable.

  2. An algorithm for preferential selection of spectroscopic targets in LEGUE

    International Nuclear Information System (INIS)

    Carlin, Jeffrey L.; Newberg, Heidi Jo; Lépine, Sébastien; Deng Licai; Chen Yuqin; Fu Xiaoting; Gao Shuang; Li Jing; Liu Chao; Beers, Timothy C.; Christlieb, Norbert; Grillmair, Carl J.; Guhathakurta, Puragra; Han Zhanwen; Hou Jinliang; Lee, Hsu-Tai; Liu Xiaowei; Pan Kaike; Sellwood, J. A.; Wang Hongchi

    2012-01-01

    We describe a general target selection algorithm that is applicable to any survey in which the number of available candidates is much larger than the number of objects to be observed. This routine aims to achieve a balance between a smoothly-varying, well-understood selection function and the desire to preferentially select certain types of targets. Some target-selection examples are shown that illustrate different possibilities of emphasis functions. Although it is generally applicable, the algorithm was developed specifically for the LAMOST Experiment for Galactic Understanding and Exploration (LEGUE) survey that will be carried out using the Chinese Guo Shou Jing Telescope. In particular, this algorithm was designed for the portion of LEGUE targeting the Galactic halo, in which we attempt to balance a variety of science goals that require stars at fainter magnitudes than can be completely sampled by LAMOST. This algorithm has been implemented for the halo portion of the LAMOST pilot survey, which began in October 2011.

  3. An input feature selection method applied to fuzzy neural networks for signal esitmation

    International Nuclear Information System (INIS)

    Na, Man Gyun; Sim, Young Rok

    2001-01-01

    It is well known that the performance of a fuzzy neural networks strongly depends on the input features selected for its training. In its applications to sensor signal estimation, there are a large number of input variables related with an output. As the number of input variables increases, the training time of fuzzy neural networks required increases exponentially. Thus, it is essential to reduce the number of inputs to a fuzzy neural networks and to select the optimum number of mutually independent inputs that are able to clearly define the input-output mapping. In this work, principal component analysis (PAC), genetic algorithms (GA) and probability theory are combined to select new important input features. A proposed feature selection method is applied to the signal estimation of the steam generator water level, the hot-leg flowrate, the pressurizer water level and the pressurizer pressure sensors in pressurized water reactors and compared with other input feature selection methods

  4. SIP-FS: a novel feature selection for data representation

    Directory of Open Access Journals (Sweden)

    Yiyou Guo

    2018-02-01

    Full Text Available Abstract Multiple features are widely used to characterize real-world datasets. It is desirable to select leading features with stability and interpretability from a set of distinct features for a comprehensive data description. However, most of existing feature selection methods focus on the predictability (e.g., prediction accuracy of selected results yet neglect stability. To obtain compact data representation, a novel feature selection method is proposed to improve stability, and interpretability without sacrificing predictability (SIP-FS. Instead of mutual information, generalized correlation is adopted in minimal redundancy maximal relevance to measure the relation between different feature types. Several feature types (each contains a certain number of features can then be selected and evaluated quantitatively to determine what types contribute to a specific class, thereby enhancing the so-called interpretability of features. Moreover, stability is introduced in the criterion of SIP-FS to obtain consistent results of ranking. We conduct experiments on three publicly available datasets using one-versus-all strategy to select class-specific features. The experiments illustrate that SIP-FS achieves significant performance improvements in terms of stability and interpretability with desirable prediction accuracy and indicates advantages over several state-of-the-art approaches.

  5. VHDL implementation of feature-extraction algorithm for the PANDA electromagnetic calorimeter

    Energy Technology Data Exchange (ETDEWEB)

    Guliyev, E. [Kernfysisch Versneller Instituut, University of Groningen, Zernikelaan 25, NL-9747 AA Groningen (Netherlands); Kavatsyuk, M., E-mail: m.kavatsyuk@rug.nl [Kernfysisch Versneller Instituut, University of Groningen, Zernikelaan 25, NL-9747 AA Groningen (Netherlands); Lemmens, P.J.J.; Tambave, G.; Loehner, H. [Kernfysisch Versneller Instituut, University of Groningen, Zernikelaan 25, NL-9747 AA Groningen (Netherlands)

    2012-02-01

    A simple, efficient, and robust feature-extraction algorithm, developed for the digital front-end electronics of the electromagnetic calorimeter of the PANDA spectrometer at FAIR, Darmstadt, is implemented in VHDL for a commercial 16 bit 100 MHz sampling ADC. The source-code is available as an open-source project and is adaptable for other projects and sampling ADCs. Best performance with different types of signal sources can be achieved through flexible parameter selection. The on-line data-processing in FPGA enables to construct an almost dead-time free data acquisition system which is successfully evaluated as a first step towards building a complete trigger-less readout chain. Prototype setups are studied to determine the dead-time of the implemented algorithm, the rate of false triggering, timing performance, and event correlations.

  6. VHDL implementation of feature-extraction algorithm for the PANDA electromagnetic calorimeter

    International Nuclear Information System (INIS)

    Guliyev, E.; Kavatsyuk, M.; Lemmens, P.J.J.; Tambave, G.; Löhner, H.

    2012-01-01

    A simple, efficient, and robust feature-extraction algorithm, developed for the digital front-end electronics of the electromagnetic calorimeter of the PANDA spectrometer at FAIR, Darmstadt, is implemented in VHDL for a commercial 16 bit 100 MHz sampling ADC. The source-code is available as an open-source project and is adaptable for other projects and sampling ADCs. Best performance with different types of signal sources can be achieved through flexible parameter selection. The on-line data-processing in FPGA enables to construct an almost dead-time free data acquisition system which is successfully evaluated as a first step towards building a complete trigger-less readout chain. Prototype setups are studied to determine the dead-time of the implemented algorithm, the rate of false triggering, timing performance, and event correlations.

  7. Enhancement of Selection, Bubble and Insertion Sorting Algorithm

    OpenAIRE

    Muhammad Farooq Umar; Ehsan Ullah Munir; Shafqat Ali Shad; Muhammad Wasif Nisar

    2014-01-01

    In everyday life there is a large amount of data to arrange because sorting removes any ambiguities and make the data analysis and data processing very easy, efficient and provides with cost less effort. In this study a set of improved sorting algorithms are proposed which gives better performance and design idea. In this study five new sorting algorithms (Bi-directional Selection Sort, Bi-directional bubble sort, MIDBiDirectional Selection Sort, MIDBidirectional bubble sort and linear insert...

  8. Cost-Sensitive Feature Selection of Numeric Data with Measurement Errors

    Directory of Open Access Journals (Sweden)

    Hong Zhao

    2013-01-01

    Full Text Available Feature selection is an essential process in data mining applications since it reduces a model’s complexity. However, feature selection with various types of costs is still a new research topic. In this paper, we study the cost-sensitive feature selection problem of numeric data with measurement errors. The major contributions of this paper are fourfold. First, a new data model is built to address test costs and misclassification costs as well as error boundaries. It is distinguished from the existing models mainly on the error boundaries. Second, a covering-based rough set model with normal distribution measurement errors is constructed. With this model, coverings are constructed from data rather than assigned by users. Third, a new cost-sensitive feature selection problem is defined on this model. It is more realistic than the existing feature selection problems. Fourth, both backtracking and heuristic algorithms are proposed to deal with the new problem. Experimental results show the efficiency of the pruning techniques for the backtracking algorithm and the effectiveness of the heuristic algorithm. This study is a step toward realistic applications of the cost-sensitive learning.

  9. Sparse Bayesian classification and feature selection for biological expression data with high correlations.

    Directory of Open Access Journals (Sweden)

    Xian Yang

    Full Text Available Classification models built on biological expression data are increasingly used to predict distinct disease subtypes. Selected features that separate sample groups can be the candidates of biomarkers, helping us to discover biological functions/pathways. However, three challenges are associated with building a robust classification and feature selection model: 1 the number of significant biomarkers is much smaller than that of measured features for which the search will be exhaustive; 2 current biological expression data are big in both sample size and feature size which will worsen the scalability of any search algorithms; and 3 expression profiles of certain features are typically highly correlated which may prevent to distinguish the predominant features. Unfortunately, most of the existing algorithms are partially addressing part of these challenges but not as a whole. In this paper, we propose a unified framework to address the above challenges. The classification and feature selection problem is first formulated as a nonconvex optimisation problem. Then the problem is relaxed and solved iteratively by a sequence of convex optimisation procedures which can be distributed computed and therefore allows the efficient implementation on advanced infrastructures. To illustrate the competence of our method over others, we first analyse a randomly generated simulation dataset under various conditions. We then analyse a real gene expression dataset on embryonal tumour. Further downstream analysis, such as functional annotation and pathway analysis, are performed on the selected features which elucidate several biological findings.

  10. Features and selection of vascular access devices.

    Science.gov (United States)

    Sansivero, Gail Egan

    2010-05-01

    To review venous anatomy and physiology, discuss assessment parameters before vascular access device (VAD) placement, and review VAD options. Journal articles, personal experience. A number of VAD options are available in clinical practice. Access planning should include comprehensive assessment, with attention to patient participation in the planning and selection process. Careful consideration should be given to long-term access needs and preservation of access sites. Oncology nurses are uniquely suited to perform a key role in VAD planning and placement. With knowledge of infusion therapy, anatomy and physiology, device options, and community resources, nurses can be key leaders in preserving vascular access and improving the safety and comfort of infusion therapy. Copyright 2010 Elsevier Inc. All rights reserved.

  11. Automatic Target Recognition: Statistical Feature Selection of Non-Gaussian Distributed Target Classes

    Science.gov (United States)

    2011-06-01

    implementing, and evaluating many feature selection algorithms. Mucciardi and Gose compared seven different techniques for choosing subsets of pattern...122 THIS PAGE INTENTIONALLY LEFT BLANK 123 LIST OF REFERENCES [1] A. Mucciardi and E. Gose , “A comparison of seven techniques for

  12. Online Feature Selection for Classifying Emphysema in HRCT Images

    Directory of Open Access Journals (Sweden)

    M. Prasad

    2008-06-01

    Full Text Available Feature subset selection, applied as a pre- processing step to machine learning, is valuable in dimensionality reduction, eliminating irrelevant data and improving classifier performance. In the classic formulation of the feature selection problem, it is assumed that all the features are available at the beginning. However, in many real world problems, there are scenarios where not all features are present initially and must be integrated as they become available. In such scenarios, online feature selection provides an efficient way to sort through a large space of features. It is in this context that we introduce online feature selection for the classification of emphysema, a smoking related disease that appears as low attenuation regions in High Resolution Computer Tomography (HRCT images. The technique was successfully evaluated on 61 HRCT scans and compared with different online feature selection approaches, including hill climbing, best first search, grafting, and correlation-based feature selection. The results were also compared against ldensity maskr, a standard approach used for emphysema detection in medical image analysis.

  13. A Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using Significant Features

    Directory of Open Access Journals (Sweden)

    P. Amudha

    2015-01-01

    Full Text Available Intrusion detection has become a main part of network security due to the huge number of attacks which affects the computers. This is due to the extensive growth of internet connectivity and accessibility to information systems worldwide. To deal with this problem, in this paper a hybrid algorithm is proposed to integrate Modified Artificial Bee Colony (MABC with Enhanced Particle Swarm Optimization (EPSO to predict the intrusion detection problem. The algorithms are combined together to find out better optimization results and the classification accuracies are obtained by 10-fold cross-validation method. The purpose of this paper is to select the most relevant features that can represent the pattern of the network traffic and test its effect on the success of the proposed hybrid classification algorithm. To investigate the performance of the proposed method, intrusion detection KDDCup’99 benchmark dataset from the UCI Machine Learning repository is used. The performance of the proposed method is compared with the other machine learning algorithms and found to be significantly different.

  14. Tracing the breeding farm of domesticated pig using feature selection (

    Directory of Open Access Journals (Sweden)

    Taehyung Kwon

    2017-11-01

    Full Text Available Objective Increasing food safety demands in the animal product market have created a need for a system to trace the food distribution process, from the manufacturer to the retailer, and genetic traceability is an effective method to trace the origin of animal products. In this study, we successfully achieved the farm tracing of 6,018 multi-breed pigs, using single nucleotide polymorphism (SNP markers strictly selected through least absolute shrinkage and selection operator (LASSO feature selection. Methods We performed farm tracing of domesticated pig (Sus scrofa from SNP markers and selected the most relevant features for accurate prediction. Considering multi-breed composition of our data, we performed feature selection using LASSO penalization on 4,002 SNPs that are shared between breeds, which also includes 179 SNPs with small between-breed difference. The 100 highest-scored features were extracted from iterative simulations and then evaluated using machine-leaning based classifiers. Results We selected 1,341 SNPs from over 45,000 SNPs through iterative LASSO feature selection, to minimize between-breed differences. We subsequently selected 100 highest-scored SNPs from iterative scoring, and observed high statistical measures in classification of breeding farms by cross-validation only using these SNPs. Conclusion The study represents a successful application of LASSO feature selection on multi-breed pig SNP data to trace the farm information, which provides a valuable method and possibility for further researches on genetic traceability.

  15. Adaptive feature selection using v-shaped binary particle swarm optimization.

    Science.gov (United States)

    Teng, Xuyang; Dong, Hongbin; Zhou, Xiurong

    2017-01-01

    Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers.

  16. A computational environment for long-term multi-feature and multi-algorithm seizure prediction.

    Science.gov (United States)

    Teixeira, C A; Direito, B; Costa, R P; Valderrama, M; Feldwisch-Drentrup, H; Nikolopoulos, S; Le Van Quyen, M; Schelter, B; Dourado, A

    2010-01-01

    The daily life of epilepsy patients is constrained by the possibility of occurrence of seizures. Until now, seizures cannot be predicted with sufficient sensitivity and specificity. Most of the seizure prediction studies have been focused on a small number of patients, and frequently assuming unrealistic hypothesis. This paper adopts the view that for an appropriate development of reliable predictors one should consider long-term recordings and several features and algorithms integrated in one software tool. A computational environment, based on Matlab (®), is presented, aiming to be an innovative tool for seizure prediction. It results from the need of a powerful and flexible tool for long-term EEG/ECG analysis by multiple features and algorithms. After being extracted, features can be subjected to several reduction and selection methods, and then used for prediction. The predictions can be conducted based on optimized thresholds or by applying computational intelligence methods. One important aspect is the integrated evaluation of the seizure prediction characteristic of the developed predictors.

  17. GAIN RATIO BASED FEATURE SELECTION METHOD FOR PRIVACY PRESERVATION

    Directory of Open Access Journals (Sweden)

    R. Praveena Priyadarsini

    2011-04-01

    Full Text Available Privacy-preservation is a step in data mining that tries to safeguard sensitive information from unsanctioned disclosure and hence protecting individual data records and their privacy. There are various privacy preservation techniques like k-anonymity, l-diversity and t-closeness and data perturbation. In this paper k-anonymity privacy protection technique is applied to high dimensional datasets like adult and census. since, both the data sets are high dimensional, feature subset selection method like Gain Ratio is applied and the attributes of the datasets are ranked and low ranking attributes are filtered to form new reduced data subsets. K-anonymization privacy preservation technique is then applied on reduced datasets. The accuracy of the privacy preserved reduced datasets and the original datasets are compared for their accuracy on the two functionalities of data mining namely classification and clustering using naïve Bayesian and k-means algorithm respectively. Experimental results show that classification and clustering accuracy are comparatively the same for reduced k-anonym zed datasets and the original data sets.

  18. Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information

    Directory of Open Access Journals (Sweden)

    Atiyeh Mortazavi

    2016-01-01

    Full Text Available High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches.

  19. Raman spectral feature selection using ant colony optimization for breast cancer diagnosis.

    Science.gov (United States)

    Fallahzadeh, Omid; Dehghani-Bidgoli, Zohreh; Assarian, Mohammad

    2018-06-04

    Pathology as a common diagnostic test of cancer is an invasive, time-consuming, and partially subjective method. Therefore, optical techniques, especially Raman spectroscopy, have attracted the attention of cancer diagnosis researchers. However, as Raman spectra contain numerous peaks involved in molecular bounds of the sample, finding the best features related to cancerous changes can improve the accuracy of diagnosis in this method. The present research attempted to improve the power of Raman-based cancer diagnosis by finding the best Raman features using the ACO algorithm. In the present research, 49 spectra were measured from normal, benign, and cancerous breast tissue samples using a 785-nm micro-Raman system. After preprocessing for removal of noise and background fluorescence, the intensity of 12 important Raman bands of the biological samples was extracted as features of each spectrum. Then, the ACO algorithm was applied to find the optimum features for diagnosis. As the results demonstrated, by selecting five features, the classification accuracy of the normal, benign, and cancerous groups increased by 14% and reached 87.7%. ACO feature selection can improve the diagnostic accuracy of Raman-based diagnostic models. In the present study, features corresponding to ν(C-C) αhelix proline, valine (910-940), νs(C-C) skeletal lipids (1110-1130), and δ(CH2)/δ(CH3) proteins (1445-1460) were selected as the best features in cancer diagnosis.

  20. A hybrid approach to select features and classify diseases based on medical data

    Science.gov (United States)

    AbdelLatif, Hisham; Luo, Jiawei

    2018-03-01

    Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms

  1. Genetic Fuzzy System (GFS based wavelet co-occurrence feature selection in mammogram classification for breast cancer diagnosis

    Directory of Open Access Journals (Sweden)

    Meenakshi M. Pawar

    2016-09-01

    Full Text Available Breast cancer is significant health problem diagnosed mostly in women worldwide. Therefore, early detection of breast cancer is performed with the help of digital mammography, which can reduce mortality rate. This paper presents wrapper based feature selection approach for wavelet co-occurrence feature (WCF using Genetic Fuzzy System (GFS in mammogram classification problem. The performance of GFS algorithm is explained using mini-MIAS database. WCF features are obtained from detail wavelet coefficients at each level of decomposition of mammogram image. At first level of decomposition, 18 features are applied to GFS algorithm, which selects 5 features with an average classification success rate of 39.64%. Subsequently, at second level it selects 9 features from 36 features and the classification success rate is improved to 56.75%. For third level, 16 features are selected from 54 features and average success rate is improved to 64.98%. Lastly, at fourth level 72 features are applied to GFS, which selects 16 features and thereby increasing average success rate to 89.47%. Hence, GFS algorithm is the effective way of obtaining optimal set of feature in breast cancer diagnosis.

  2. RSA Algorithm. Features of the C # Object Programming Implementation

    Directory of Open Access Journals (Sweden)

    Elena V. Staver

    2012-08-01

    Full Text Available Public-key algorithms depend on the encryption key and the decoding key, connected with the first one. For data public key encryption, the text is divided into blocks, each of which is represented as a number. To decrypt the message a secret key is used.

  3. Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics.

    Science.gov (United States)

    Lin, Xiaohui; Li, Chao; Zhang, Yanhui; Su, Benzhe; Fan, Meng; Wei, Hai

    2017-12-26

    Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data.

  4. Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics

    Directory of Open Access Journals (Sweden)

    Xiaohui Lin

    2017-12-01

    Full Text Available Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data.

  5. Feature-selective attention in healthy old age: a selective decline in selective attention?

    Science.gov (United States)

    Quigley, Cliodhna; Müller, Matthias M

    2014-02-12

    Deficient selection against irrelevant information has been proposed to underlie age-related cognitive decline. We recently reported evidence for maintained early sensory selection when older and younger adults used spatial selective attention to perform a challenging task. Here we explored age-related differences when spatial selection is not possible and feature-selective attention must be deployed. We additionally compared the integrity of feedforward processing by exploiting the well established phenomenon of suppression of visual cortical responses attributable to interstimulus competition. Electroencephalogram was measured while older and younger human adults responded to brief occurrences of coherent motion in an attended stimulus composed of randomly moving, orientation-defined, flickering bars. Attention was directed to horizontal or vertical bars by a pretrial cue, after which two orthogonally oriented, overlapping stimuli or a single stimulus were presented. Horizontal and vertical bars flickered at different frequencies and thereby elicited separable steady-state visual-evoked potentials, which were used to examine the effect of feature-based selection and the competitive influence of a second stimulus on ongoing visual processing. Age differences were found in feature-selective attentional modulation of visual responses: older adults did not show consistent modulation of magnitude or phase. In contrast, the suppressive effect of a second stimulus was robust and comparable in magnitude across age groups, suggesting that bottom-up processing of the current stimuli is essentially unchanged in healthy old age. Thus, it seems that visual processing per se is unchanged, but top-down attentional control is compromised in older adults when space cannot be used to guide selection.

  6. Enhancement web proxy cache performance using Wrapper Feature Selection methods with NB and J48

    Science.gov (United States)

    Mahmoud Al-Qudah, Dua'a.; Funke Olanrewaju, Rashidah; Wong Azman, Amelia

    2017-11-01

    Web proxy cache technique reduces response time by storing a copy of pages between client and server sides. If requested pages are cached in the proxy, there is no need to access the server. Due to the limited size and excessive cost of cache compared to the other storages, cache replacement algorithm is used to determine evict page when the cache is full. On the other hand, the conventional algorithms for replacement such as Least Recently Use (LRU), First in First Out (FIFO), Least Frequently Use (LFU), Randomized Policy etc. may discard important pages just before use. Furthermore, using conventional algorithm cannot be well optimized since it requires some decision to intelligently evict a page before replacement. Hence, most researchers propose an integration among intelligent classifiers and replacement algorithm to improves replacement algorithms performance. This research proposes using automated wrapper feature selection methods to choose the best subset of features that are relevant and influence classifiers prediction accuracy. The result present that using wrapper feature selection methods namely: Best First (BFS), Incremental Wrapper subset selection(IWSS)embedded NB and particle swarm optimization(PSO)reduce number of features and have a good impact on reducing computation time. Using PSO enhance NB classifier accuracy by 1.1%, 0.43% and 0.22% over using NB with all features, using BFS and using IWSS embedded NB respectively. PSO rises J48 accuracy by 0.03%, 1.91 and 0.04% over using J48 classifier with all features, using IWSS-embedded NB and using BFS respectively. While using IWSS embedded NB fastest NB and J48 classifiers much more than BFS and PSO. However, it reduces computation time of NB by 0.1383 and reduce computation time of J48 by 2.998.

  7. Gene expression network reconstruction by convex feature selection when incorporating genetic perturbations.

    Directory of Open Access Journals (Sweden)

    Benjamin A Logsdon

    Full Text Available Cellular gene expression measurements contain regulatory information that can be used to discover novel network relationships. Here, we present a new algorithm for network reconstruction powered by the adaptive lasso, a theoretically and empirically well-behaved method for selecting the regulatory features of a network. Any algorithms designed for network discovery that make use of directed probabilistic graphs require perturbations, produced by either experiments or naturally occurring genetic variation, to successfully infer unique regulatory relationships from gene expression data. Our approach makes use of appropriately selected cis-expression Quantitative Trait Loci (cis-eQTL, which provide a sufficient set of independent perturbations for maximum network resolution. We compare the performance of our network reconstruction algorithm to four other approaches: the PC-algorithm, QTLnet, the QDG algorithm, and the NEO algorithm, all of which have been used to reconstruct directed networks among phenotypes leveraging QTL. We show that the adaptive lasso can outperform these algorithms for networks of ten genes and ten cis-eQTL, and is competitive with the QDG algorithm for networks with thirty genes and thirty cis-eQTL, with rich topologies and hundreds of samples. Using this novel approach, we identify unique sets of directed relationships in Saccharomyces cerevisiae when analyzing genome-wide gene expression data for an intercross between a wild strain and a lab strain. We recover novel putative network relationships between a tyrosine biosynthesis gene (TYR1, and genes involved in endocytosis (RCY1, the spindle checkpoint (BUB2, sulfonate catabolism (JLP1, and cell-cell communication (PRM7. Our algorithm provides a synthesis of feature selection methods and graphical model theory that has the potential to reveal new directed regulatory relationships from the analysis of population level genetic and gene expression data.

  8. Feature Selection Using Adaboost for Face Expression Recognition

    National Research Council Canada - National Science Library

    Silapachote, Piyanuch; Karuppiah, Deepak R; Hanson, Allen R

    2005-01-01

    We propose a classification technique for face expression recognition using AdaBoost that learns by selecting the relevant global and local appearance features with the most discriminating information...

  9. A Modified Image Comparison Algorithm Using Histogram Features

    OpenAIRE

    Al-Oraiqat, Anas M.; Kostyukova, Natalya S.

    2018-01-01

    This article discuss the problem of color image content comparison. Particularly, methods of image content comparison are analyzed, restrictions of color histogram are described and a modified method of images content comparison is proposed. This method uses the color histograms and considers color locations. Testing and analyzing of based and modified algorithms are performed. The modified method shows 97% average precision for a collection containing about 700 images without loss of the adv...

  10. An ant colony optimization based feature selection for web page classification.

    Science.gov (United States)

    Saraç, Esra; Özel, Selma Ayşe

    2014-01-01

    The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

  11. Comparative Study on Feature Selection and Fusion Schemes for Emotion Recognition from Speech

    Directory of Open Access Journals (Sweden)

    Santiago Planet

    2012-09-01

    Full Text Available The automatic analysis of speech to detect affective states may improve the way users interact with electronic devices. However, the analysis only at the acoustic level could be not enough to determine the emotion of a user in a realistic scenario. In this paper we analyzed the spontaneous speech recordings of the FAU Aibo Corpus at the acoustic and linguistic levels to extract two sets of features. The acoustic set was reduced by a greedy procedure selecting the most relevant features to optimize the learning stage. We compared two versions of this greedy selection algorithm by performing the search of the relevant features forwards and backwards. We experimented with three classification approaches: Naïve-Bayes, a support vector machine and a logistic model tree, and two fusion schemes: decision-level fusion, merging the hard-decisions of the acoustic and linguistic classifiers by means of a decision tree; and feature-level fusion, concatenating both sets of features before the learning stage. Despite the low performance achieved by the linguistic data, a dramatic improvement was achieved after its combination with the acoustic information, improving the results achieved by this second modality on its own. The results achieved by the classifiers using the parameters merged at feature level outperformed the classification results of the decision-level fusion scheme, despite the simplicity of the scheme. Moreover, the extremely reduced set of acoustic features obtained by the greedy forward search selection algorithm improved the results provided by the full set.

  12. A DYNAMIC FEATURE SELECTION METHOD FOR DOCUMENT RANKING WITH RELEVANCE FEEDBACK APPROACH

    Directory of Open Access Journals (Sweden)

    K. Latha

    2010-07-01

    Full Text Available Ranking search results is essential for information retrieval and Web search. Search engines need to not only return highly relevant results, but also be fast to satisfy users. As a result, not all available features can be used for ranking, and in fact only a small percentage of these features can be used. Thus, it is crucial to have a feature selection mechanism that can find a subset of features that both meets latency requirements and achieves high relevance. In this paper we describe a 0/1 knapsack procedure for automatically selecting features to use within Generalization model for Document Ranking. We propose an approach for Relevance Feedback using Expectation Maximization method and evaluate the algorithm on the TREC Collection for describing classes of feedback textual information retrieval features. Experimental results, evaluated on standard TREC-9 part of the OHSUMED collections, show that our feature selection algorithm produces models that are either significantly more effective than, or equally effective as, models such as Markov Random Field model, Correlation Co-efficient and Count Difference method

  13. Hybrid feature selection for supporting lightweight intrusion detection systems

    Science.gov (United States)

    Song, Jianglong; Zhao, Wentao; Liu, Qiang; Wang, Xin

    2017-08-01

    Redundant and irrelevant features not only cause high resource consumption but also degrade the performance of Intrusion Detection Systems (IDS), especially when coping with big data. These features slow down the process of training and testing in network traffic classification. Therefore, a hybrid feature selection approach in combination with wrapper and filter selection is designed in this paper to build a lightweight intrusion detection system. Two main phases are involved in this method. The first phase conducts a preliminary search for an optimal subset of features, in which the chi-square feature selection is utilized. The selected set of features from the previous phase is further refined in the second phase in a wrapper manner, in which the Random Forest(RF) is used to guide the selection process and retain an optimized set of features. After that, we build an RF-based detection model and make a fair comparison with other approaches. The experimental results on NSL-KDD datasets show that our approach results are in higher detection accuracy as well as faster training and testing processes.

  14. Evaluation of feature detection algorithms for structure from motion

    CSIR Research Space (South Africa)

    Govender, N

    2009-11-01

    Full Text Available technique with an application to stereo vision,” in International Joint Conference on Artificial Intelligence, April 1981. [17] C.Tomasi and T.Kanade, “Detection and tracking of point fetaures,” Carnegie Mellon, Tech. Rep., April 1991. [18] P. Torr... Algorithms for Structure from Motion Natasha Govender Mobile Intelligent Autonomous Systems CSIR Pretoria Email: ngovender@csir.co.za Abstract—Structure from motion is a widely-used technique in computer vision to perform 3D reconstruction. The 3D...

  15. Improving Naive Bayes with Online Feature Selection for Quick Adaptation to Evolving Feature Usefulness

    Energy Technology Data Exchange (ETDEWEB)

    Pon, R K; Cardenas, A F; Buttler, D J

    2007-09-19

    The definition of what makes an article interesting varies from user to user and continually evolves even for a single user. As a result, for news recommendation systems, useless document features can not be determined a priori and all features are usually considered for interestingness classification. Consequently, the presence of currently useless features degrades classification performance [1], particularly over the initial set of news articles being classified. The initial set of document is critical for a user when considering which particular news recommendation system to adopt. To address these problems, we introduce an improved version of the naive Bayes classifier with online feature selection. We use correlation to determine the utility of each feature and take advantage of the conditional independence assumption used by naive Bayes for online feature selection and classification. The augmented naive Bayes classifier performs 28% better than the traditional naive Bayes classifier in recommending news articles from the Yahoo! RSS feeds.

  16. Advances in feature selection methods for hyperspectral image processing in food industry applications: a review.

    Science.gov (United States)

    Dai, Qiong; Cheng, Jun-Hu; Sun, Da-Wen; Zeng, Xin-An

    2015-01-01

    There is an increased interest in the applications of hyperspectral imaging (HSI) for assessing food quality, safety, and authenticity. HSI provides abundance of spatial and spectral information from foods by combining both spectroscopy and imaging, resulting in hundreds of contiguous wavebands for each spatial position of food samples, also known as the curse of dimensionality. It is desirable to employ feature selection algorithms for decreasing computation burden and increasing predicting accuracy, which are especially relevant in the development of online applications. Recently, a variety of feature selection algorithms have been proposed that can be categorized into three groups based on the searching strategy namely complete search, heuristic search and random search. This review mainly introduced the fundamental of each algorithm, illustrated its applications in hyperspectral data analysis in the food field, and discussed the advantages and disadvantages of these algorithms. It is hoped that this review should provide a guideline for feature selections and data processing in the future development of hyperspectral imaging technique in foods.

  17. Selective epidemic vaccination under the performant routing algorithms

    Science.gov (United States)

    Bamaarouf, O.; Alweimine, A. Ould Baba; Rachadi, A.; EZ-Zahraouy, H.

    2018-04-01

    Despite the extensive research on traffic dynamics and epidemic spreading, the effect of the routing algorithms strategies on the traffic-driven epidemic spreading has not received an adequate attention. It is well known that more performant routing algorithm strategies are used to overcome the congestion problem. However, our main result shows unexpectedly that these algorithms favor the virus spreading more than the case where the shortest path based algorithm is used. In this work, we studied the virus spreading in a complex network using the efficient path and the global dynamic routing algorithms as compared to shortest path strategy. Some previous studies have tried to modify the routing rules to limit the virus spreading, but at the expense of reducing the traffic transport efficiency. This work proposed a solution to overcome this drawback by using a selective vaccination procedure instead of a random vaccination used often in the literature. We found that the selective vaccination succeeded in eradicating the virus better than a pure random intervention for the performant routing algorithm strategies.

  18. Oculomotor selection underlies feature retention in visual working memory.

    Science.gov (United States)

    Hanning, Nina M; Jonikaitis, Donatas; Deubel, Heiner; Szinte, Martin

    2016-02-01

    Oculomotor selection, spatial task relevance, and visual working memory (WM) are described as three processes highly intertwined and sustained by similar cortical structures. However, because task-relevant locations always constitute potential saccade targets, no study so far has been able to distinguish between oculomotor selection and spatial task relevance. We designed an experiment that allowed us to dissociate in humans the contribution of task relevance, oculomotor selection, and oculomotor execution to the retention of feature representations in WM. We report that task relevance and oculomotor selection lead to dissociable effects on feature WM maintenance. In a first task, in which an object's location was encoded as a saccade target, its feature representations were successfully maintained in WM, whereas they declined at nonsaccade target locations. Likewise, we observed a similar WM benefit at the target of saccades that were prepared but never executed. In a second task, when an object's location was marked as task relevant but constituted a nonsaccade target (a location to avoid), feature representations maintained at that location did not benefit. Combined, our results demonstrate that oculomotor selection is consistently associated with WM, whereas task relevance is not. This provides evidence for an overlapping circuitry serving saccade target selection and feature-based WM that can be dissociated from processes encoding task-relevant locations. Copyright © 2016 the American Physiological Society.

  19. Algorithms for selecting informative marker panels for population assignment.

    Science.gov (United States)

    Rosenberg, Noah A

    2005-11-01

    Given a set of potential source populations, genotypes of an individual of unknown origin at a collection of markers can be used to predict the correct source population of the individual. For improved efficiency, informative markers can be chosen from a larger set of markers to maximize the accuracy of this prediction. However, selecting the loci that are individually most informative does not necessarily produce the optimal panel. Here, using genotypes from eight species--carp, cat, chicken, dog, fly, grayling, human, and maize--this univariate accumulation procedure is compared to new multivariate "greedy" and "maximin" algorithms for choosing marker panels. The procedures generally suggest similar panels, although the greedy method often recommends inclusion of loci that are not chosen by the other algorithms. In seven of the eight species, when applied to five or more markers, all methods achieve at least 94% assignment accuracy on simulated individuals, with one species--dog--producing this level of accuracy with only three markers, and the eighth species--human--requiring approximately 13-16 markers. The new algorithms produce substantial improvements over use of randomly selected markers; where differences among the methods are noticeable, the greedy algorithm leads to slightly higher probabilities of correct assignment. Although none of the approaches necessarily chooses the panel with optimal performance, the algorithms all likely select panels with performance near enough to the maximum that they all are suitable for practical use.

  20. Genetic algorithms for thyroid gland ultrasound image feature reduction

    Czech Academy of Sciences Publication Activity Database

    Tesař, Ludvík; Smutek, D.; Jiskra, J.

    2005-01-01

    Roč. 3612, č. - (2005), s. 841-844 ISSN 0302-9743. [International Conference ICNC 2005 /1./. Changsha, 27.08.2005-29.08.2005] R&D Projects: GA AV ČR 1ET101050403 Institutional research plan: CEZ:AV0Z10750506 Keywords : medical imaging * classification * Bayes classifier * Huzzolini feature * pattern recognition Subject RIV: BB - Applied Statistics, Operational Research http://library.utia.cas.cz/prace/20050229.pdf

  1. Feature Extraction and Selection Strategies for Automated Target Recognition

    Science.gov (United States)

    Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin

    2010-01-01

    Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory region of-interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.

  2. Regular Network Class Features Enhancement Using an Evolutionary Synthesis Algorithm

    Directory of Open Access Journals (Sweden)

    O. G. Monahov

    2014-01-01

    Full Text Available This paper investigates a solution of the optimization problem concerning the construction of diameter-optimal regular networks (graphs. Regular networks are of practical interest as the graph-theoretical models of reliable communication networks of parallel supercomputer systems, as a basis of the structure in a model of small world in optical and neural networks. It presents a new class of parametrically described regular networks - hypercirculant networks (graphs. An approach that uses evolutionary algorithms for the automatic generation of parametric descriptions of optimal hypercirculant networks is developed. Synthesis of optimal hypercirculant networks is based on the optimal circulant networks with smaller degree of nodes. To construct optimal hypercirculant networks is used a template of circulant network from the known optimal families of circulant networks with desired number of nodes and with smaller degree of nodes. Thus, a generating set of the circulant network is used as a generating subset of the hypercirculant network, and the missing generators are synthesized by means of the evolutionary algorithm, which is carrying out minimization of diameter (average diameter of networks. A comparative analysis of the structural characteristics of hypercirculant, toroidal, and circulant networks is conducted. The advantage hypercirculant networks under such structural characteristics, as diameter, average diameter, and the width of bisection, with comparable costs of the number of nodes and the number of connections is demonstrated. It should be noted the advantage of hypercirculant networks of dimension three over four higher-dimensional tori. Thus, the optimization of hypercirculant networks of dimension three is more efficient than the introduction of an additional dimension for the corresponding toroidal structures. The paper also notes the best structural parameters of hypercirculant networks in comparison with iBT-networks previously

  3. Less is more: Avoiding the LIBS dimensionality curse through judicious feature selection for explosive detection

    Science.gov (United States)

    Kumar Myakalwar, Ashwin; Spegazzini, Nicolas; Zhang, Chi; Kumar Anubham, Siva; Dasari, Ramachandra R.; Barman, Ishan; Kumar Gundawar, Manoj

    2015-01-01

    Despite its intrinsic advantages, translation of laser induced breakdown spectroscopy for material identification has been often impeded by the lack of robustness of developed classification models, often due to the presence of spurious correlations. While a number of classifiers exhibiting high discriminatory power have been reported, efforts in establishing the subset of relevant spectral features that enable a fundamental interpretation of the segmentation capability and avoid the ‘curse of dimensionality’ have been lacking. Using LIBS data acquired from a set of secondary explosives, we investigate judicious feature selection approaches and architect two different chemometrics classifiers –based on feature selection through prerequisite knowledge of the sample composition and genetic algorithm, respectively. While the full spectral input results in classification rate of ca.92%, selection of only carbon to hydrogen spectral window results in near identical performance. Importantly, the genetic algorithm-derived classifier shows a statistically significant improvement to ca. 94% accuracy for prospective classification, even though the number of features used is an order of magnitude smaller. Our findings demonstrate the impact of rigorous feature selection in LIBS and also hint at the feasibility of using a discrete filter based detector thereby enabling a cheaper and compact system more amenable to field operations. PMID:26286630

  4. A triangle voting algorithm based on double feature constraints for star sensors

    Science.gov (United States)

    Fan, Qiaoyun; Zhong, Xuyang

    2018-02-01

    A novel autonomous star identification algorithm is presented in this study. In the proposed algorithm, each sensor star constructs multi-triangle with its bright neighbor stars and obtains its candidates by triangle voting process, in which the triangle is considered as the basic voting element. In order to accelerate the speed of this algorithm and reduce the required memory for star database, feature extraction is carried out to reduce the dimension of triangles and each triangle is described by its base and height. During the identification period, the voting scheme based on double feature constraints is proposed to implement triangle voting. This scheme guarantees that only the catalog star satisfying two features can vote for the sensor star, which improves the robustness towards false stars. The simulation and real star image test demonstrate that compared with the other two algorithms, the proposed algorithm is more robust towards position noise, magnitude noise and false stars.

  5. Signal filtering algorithm for depth-selective diffuse optical topography

    International Nuclear Information System (INIS)

    Fujii, M; Nakayama, K

    2009-01-01

    A compact filtered backprojection algorithm that suppresses the undesirable effects of skin circulation for near-infrared diffuse optical topography is proposed. Our approach centers around a depth-selective filtering algorithm that uses an inverse problem technique and extracts target signals from observation data contaminated by noise from a shallow region. The filtering algorithm is reduced to a compact matrix and is therefore easily incorporated into a real-time system. To demonstrate the validity of this method, we developed a demonstration prototype for depth-selective diffuse optical topography and performed both computer simulations and phantom experiments. The results show that the proposed method significantly suppresses the noise from the shallow region with a minimal degradation of the target signal.

  6. Doubly sparse factor models for unifying feature transformation and feature selection

    International Nuclear Information System (INIS)

    Katahira, Kentaro; Okanoya, Kazuo; Okada, Masato; Matsumoto, Narihisa; Sugase-Miyamoto, Yasuko

    2010-01-01

    A number of unsupervised learning methods for high-dimensional data are largely divided into two groups based on their procedures, i.e., (1) feature selection, which discards irrelevant dimensions of the data, and (2) feature transformation, which constructs new variables by transforming and mixing over all dimensions. We propose a method that both selects and transforms features in a common Bayesian inference procedure. Our method imposes a doubly automatic relevance determination (ARD) prior on the factor loading matrix. We propose a variational Bayesian inference for our model and demonstrate the performance of our method on both synthetic and real data.

  7. Doubly sparse factor models for unifying feature transformation and feature selection

    Energy Technology Data Exchange (ETDEWEB)

    Katahira, Kentaro; Okanoya, Kazuo; Okada, Masato [ERATO, Okanoya Emotional Information Project, Japan Science Technology Agency, Saitama (Japan); Matsumoto, Narihisa; Sugase-Miyamoto, Yasuko, E-mail: okada@k.u-tokyo.ac.j [Human Technology Research Institute, National Institute of Advanced Industrial Science and Technology, Ibaraki (Japan)

    2010-06-01

    A number of unsupervised learning methods for high-dimensional data are largely divided into two groups based on their procedures, i.e., (1) feature selection, which discards irrelevant dimensions of the data, and (2) feature transformation, which constructs new variables by transforming and mixing over all dimensions. We propose a method that both selects and transforms features in a common Bayesian inference procedure. Our method imposes a doubly automatic relevance determination (ARD) prior on the factor loading matrix. We propose a variational Bayesian inference for our model and demonstrate the performance of our method on both synthetic and real data.

  8. Development of Base Transceiver Station Selection Algorithm for ...

    African Journals Online (AJOL)

    TEMS) equipment was carried out on the existing BTSs, and a linear algorithm optimization program based on the spectral link efficiency of each BTS was developed, the output of this site optimization gives the selected number of base station sites ...

  9. Discriminative region extraction and feature selection based on the combination of SURF and saliency

    Science.gov (United States)

    Deng, Li; Wang, Chunhong; Rao, Changhui

    2011-08-01

    The objective of this paper is to provide a possible optimization on salient region algorithm, which is extensively used in recognizing and learning object categories. Salient region algorithm owns the superiority of intra-class tolerance, global score of features and automatically prominent scale selection under certain range. However, the major limitation behaves on performance, and that is what we attempt to improve. By reducing the number of pixels involved in saliency calculation, it can be accelerated. We use interest points detected by fast-Hessian, the detector of SURF, as the candidate feature for saliency operation, rather than the whole set in image. This implementation is thereby called Saliency based Optimization over SURF (SOSU for short). Experiment shows that bringing in of such a fast detector significantly speeds up the algorithm. Meanwhile, Robustness of intra-class diversity ensures object recognition accuracy.

  10. A numeric comparison of variable selection algorithms for supervised learning

    International Nuclear Information System (INIS)

    Palombo, G.; Narsky, I.

    2009-01-01

    Datasets in modern High Energy Physics (HEP) experiments are often described by dozens or even hundreds of input variables. Reducing a full variable set to a subset that most completely represents information about data is therefore an important task in analysis of HEP data. We compare various variable selection algorithms for supervised learning using several datasets such as, for instance, imaging gamma-ray Cherenkov telescope (MAGIC) data found at the UCI repository. We use classifiers and variable selection methods implemented in the statistical package StatPatternRecognition (SPR), a free open-source C++ package developed in the HEP community ( (http://sourceforge.net/projects/statpatrec/)). For each dataset, we select a powerful classifier and estimate its learning accuracy on variable subsets obtained by various selection algorithms. When possible, we also estimate the CPU time needed for the variable subset selection. The results of this analysis are compared with those published previously for these datasets using other statistical packages such as R and Weka. We show that the most accurate, yet slowest, method is a wrapper algorithm known as generalized sequential forward selection ('Add N Remove R') implemented in SPR.

  11. An algorithm for 3D target scatterer feature estimation from sparse SAR apertures

    Science.gov (United States)

    Jackson, Julie Ann; Moses, Randolph L.

    2009-05-01

    We present an algorithm for extracting 3D canonical scattering features from complex targets observed over sparse 3D SAR apertures. The algorithm begins with complex phase history data and ends with a set of geometrical features describing the scene. The algorithm provides a pragmatic approach to initialization of a nonlinear feature estimation scheme, using regularization methods to deconvolve the point spread function and obtain sparse 3D images. Regions of high energy are detected in the sparse images, providing location initializations for scattering center estimates. A single canonical scattering feature, corresponding to a geometric shape primitive, is fit to each region via nonlinear optimization of fit error between the regularized data and parametric canonical scattering models. Results of the algorithm are presented using 3D scattering prediction data of a simple scene for both a densely-sampled and a sparsely-sampled SAR measurement aperture.

  12. Tumor recognition in wireless capsule endoscopy images using textural features and SVM-based feature selection.

    Science.gov (United States)

    Li, Baopu; Meng, Max Q-H

    2012-05-01

    Tumor in digestive tract is a common disease and wireless capsule endoscopy (WCE) is a relatively new technology to examine diseases for digestive tract especially for small intestine. This paper addresses the problem of automatic recognition of tumor for WCE images. Candidate color texture feature that integrates uniform local binary pattern and wavelet is proposed to characterize WCE images. The proposed features are invariant to illumination change and describe multiresolution characteristics of WCE images. Two feature selection approaches based on support vector machine, sequential forward floating selection and recursive feature elimination, are further employed to refine the proposed features for improving the detection accuracy. Extensive experiments validate that the proposed computer-aided diagnosis system achieves a promising tumor recognition accuracy of 92.4% in WCE images on our collected data.

  13. BLINCK?A diagnostic algorithm for skin cancer diagnosis combining clinical features with dermatoscopy findings

    OpenAIRE

    Bourne, Peter; Rosendahl, Cliff; Keir, Jeff; Cameron, Alan

    2012-01-01

    Background: Deciding whether a skin lesion requires biopsy to exclude skin cancer is often challenging for primary care clinicians in Australia. There are several published algorithms designed to assist with the diagnosis of skin cancer but apart from the clinical ABCD rule, these algorithms only evaluate the dermatoscopic features of a lesion. Objectives: The BLINCK algorithm explores the effect of combining clinical history and examination with fundamental dermatoscopic assessment in primar...

  14. Speed Bump Detection Using Accelerometric Features: A Genetic Algorithm Approach.

    Science.gov (United States)

    Celaya-Padilla, Jose M; Galván-Tejada, Carlos E; López-Monteagudo, F E; Alonso-González, O; Moreno-Báez, Arturo; Martínez-Torteya, Antonio; Galván-Tejada, Jorge I; Arceo-Olague, Jose G; Luna-García, Huizilopoztli; Gamboa-Rosales, Hamurabi

    2018-02-03

    Among the current challenges of the Smart City, traffic management and maintenance are of utmost importance. Road surface monitoring is currently performed by humans, but the road surface condition is one of the main indicators of road quality, and it may drastically affect fuel consumption and the safety of both drivers and pedestrians. Abnormalities in the road, such as manholes and potholes, can cause accidents when not identified by the drivers. Furthermore, human-induced abnormalities, such as speed bumps, could also cause accidents. In addition, while said obstacles ought to be signalized according to specific road regulation, they are not always correctly labeled. Therefore, we developed a novel method for the detection of road abnormalities (i.e., speed bumps). This method makes use of a gyro, an accelerometer, and a GPS sensor mounted in a car. After having the vehicle cruise through several streets, data is retrieved from the sensors. Then, using a cross-validation strategy, a genetic algorithm is used to find a logistic model that accurately detects road abnormalities. The proposed model had an accuracy of 0.9714 in a blind evaluation, with a false positive rate smaller than 0.018, and an area under the receiver operating characteristic curve of 0.9784. This methodology has the potential to detect speed bumps in quasi real-time conditions, and can be used to construct a real-time surface monitoring system.

  15. Speed Bump Detection Using Accelerometric Features: A Genetic Algorithm Approach

    Directory of Open Access Journals (Sweden)

    Jose M. Celaya-Padilla

    2018-02-01

    Full Text Available Among the current challenges of the Smart City, traffic management and maintenance are of utmost importance. Road surface monitoring is currently performed by humans, but the road surface condition is one of the main indicators of road quality, and it may drastically affect fuel consumption and the safety of both drivers and pedestrians. Abnormalities in the road, such as manholes and potholes, can cause accidents when not identified by the drivers. Furthermore, human-induced abnormalities, such as speed bumps, could also cause accidents. In addition, while said obstacles ought to be signalized according to specific road regulation, they are not always correctly labeled. Therefore, we developed a novel method for the detection of road abnormalities (i.e., speed bumps. This method makes use of a gyro, an accelerometer, and a GPS sensor mounted in a car. After having the vehicle cruise through several streets, data is retrieved from the sensors. Then, using a cross-validation strategy, a genetic algorithm is used to find a logistic model that accurately detects road abnormalities. The proposed model had an accuracy of 0.9714 in a blind evaluation, with a false positive rate smaller than 0.018, and an area under the receiver operating characteristic curve of 0.9784. This methodology has the potential to detect speed bumps in quasi real-time conditions, and can be used to construct a real-time surface monitoring system.

  16. Towards Feature Selection in Actor-Critic Algorithms

    National Research Council Canada - National Science Library

    Rohanimanesh, Khashayar; Roy, Nicholas; Tedrake, Russ

    2007-01-01

    .... They demonstrate that two popular representations for value methods -- the barycentric interpolators and the graph Laplacian proto-value functions -- can be used to represent the actor so as to satisfy these conditions...

  17. A harmony search algorithm for clustering with feature selection

    Directory of Open Access Journals (Sweden)

    Carlos Cobos

    2010-01-01

    Full Text Available En este artículo se presenta un nuevo algoritmo de clustering denominado IHSK, con la capacidad de seleccionar características en un orden de complejidad lineal. El algoritmo es inspirado en la combinación de los algoritmos de búsqueda armónica y K-means. Para la selección de las características se usó el concepto de variabilidad y un método heurístico que penaliza la presencia de dimensiones con baja probabilidad de aportar en la solución actual. El algoritmo fue probado con conjuntos de datos sintéticos y reales, obteniendo resultados prometedores.

  18. A Signature Comparing Android Mobile Application Utilizing Feature Extracting Algorithms

    Directory of Open Access Journals (Sweden)

    Paul Grafilon

    2017-08-01

    Full Text Available The paper presented one of the application that can be done using smartphones camera. Nowadays forgery is one of the most undetected crimes. With the forensic technology used today it is still difficult for authorities to compare and define what a real signature is and what a forged signature is. A signature is a legal representation of a person. All transactions are based on a signature. Forgers may use a signature to sign illegal contracts and withdraw from bank accounts undetected. A signature can also be forged during election periods for repeated voting. Addressing the issues a signature should always be secure. Signature verification is a reduced problem that still poses a real challenge for researchers. The literature on signature verification is quite extensive and shows two main areas of research off-line and on-line systems. Off-line systems deal with a static image of the signature i.e. the result of the action of signing while on-line systems work on the dynamic process of generating the signature i.e. the action of signing itself. The researchers have found a way to resolve the concerns. A mobile application that integrates the camera to take a picture of a signature analyzes it and compares it to other signatures for verification. It will exist to help citizens to be more cautious and aware with issues regarding the signatures. This might also be relevant to help organizations and institutions such as banks and insurance companies in verifying signatures that may avoid unwanted transactions and identity theft. Furthermore this might help the authorities in the never ending battle against crime especially against forgers and thieves. The project aimed to design and develop a mobile application that integrates the smartphone camera for verifying and comparing signatures for security using the best algorithm possible. As the result of the development the said smartphone camera application is functional and reliable.

  19. Hybrid Feature Selection Approach Based on GRASP for Cancer Microarray Data

    Directory of Open Access Journals (Sweden)

    Arpita Nagpal

    2017-01-01

    Full Text Available Microarray data usually contain a large number of genes, but a small number of samples. Feature subset selection for microarray data aims at reducing the number of genes so that useful information can be extracted from the samples. Reducing the dimension of data sets further helps in improving the computational efficiency of the learning model. In this paper, we propose a modified algorithm based on the tabu search as local search procedures to a Greedy Randomized Adaptive Search Procedure (GRASP for high dimensional microarray data sets. The proposed Tabu based Greedy Randomized Adaptive Search Procedure algorithm is named as TGRASP. In TGRASP, a new parameter has been introduced named as Tabu Tenure and the existing parameters, NumIter and size have been modified. We observed that different parameter settings affect the quality of the optimum. The second proposed algorithm known as FFGRASP (Firefly Greedy Randomized Adaptive Search Procedure uses a firefly optimization algorithm in the local search optimzation phase of the greedy randomized adaptive search procedure (GRASP. Firefly algorithm is one of the powerful algorithms for optimization of multimodal applications. Experimental results show that the proposed TGRASP and FFGRASP algorithms are much better than existing algorithm with respect to three performance parameters viz. accuracy, run time, number of a selected subset of features. We have also compared both the approaches with a unified metric (Extended Adjusted Ratio of Ratios which has shown that TGRASP approach outperforms existing approach for six out of nine cancer microarray datasets and FFGRASP performs better on seven out of nine datasets.

  20. A proposed framework on hybrid feature selection techniques for handling high dimensional educational data

    Science.gov (United States)

    Shahiri, Amirah Mohamed; Husain, Wahidah; Rashid, Nur'Aini Abd

    2017-10-01

    Huge amounts of data in educational datasets may cause the problem in producing quality data. Recently, data mining approach are increasingly used by educational data mining researchers for analyzing the data patterns. However, many research studies have concentrated on selecting suitable learning algorithms instead of performing feature selection process. As a result, these data has problem with computational complexity and spend longer computational time for classification. The main objective of this research is to provide an overview of feature selection techniques that have been used to analyze the most significant features. Then, this research will propose a framework to improve the quality of students' dataset. The proposed framework uses filter and wrapper based technique to support prediction process in future study.

  1. Effective Feature Selection for Classification of Promoter Sequences.

    Directory of Open Access Journals (Sweden)

    Kouser K

    Full Text Available Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine, KNN (K Nearest Neighbor and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.

  2. Core Business Selection Based on Ant Colony Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Yu Lan

    2014-01-01

    Full Text Available Core business is the most important business to the enterprise in diversified business. In this paper, we first introduce the definition and characteristics of the core business and then descript the ant colony clustering algorithm. In order to test the effectiveness of the proposed method, Tianjin Port Logistics Development Co., Ltd. is selected as the research object. Based on the current situation of the development of the company, the core business of the company can be acquired by ant colony clustering algorithm. Thus, the results indicate that the proposed method is an effective way to determine the core business for company.

  3. Naturally selecting solutions: the use of genetic algorithms in bioinformatics.

    Science.gov (United States)

    Manning, Timmy; Sleator, Roy D; Walsh, Paul

    2013-01-01

    For decades, computer scientists have looked to nature for biologically inspired solutions to computational problems; ranging from robotic control to scheduling optimization. Paradoxically, as we move deeper into the post-genomics era, the reverse is occurring, as biologists and bioinformaticians look to computational techniques, to solve a variety of biological problems. One of the most common biologically inspired techniques are genetic algorithms (GAs), which take the Darwinian concept of natural selection as the driving force behind systems for solving real world problems, including those in the bioinformatics domain. Herein, we provide an overview of genetic algorithms and survey some of the most recent applications of this approach to bioinformatics based problems.

  4. Pairwise Constraint-Guided Sparse Learning for Feature Selection.

    Science.gov (United States)

    Liu, Mingxia; Zhang, Daoqiang

    2016-01-01

    Feature selection aims to identify the most informative features for a compact and accurate data representation. As typical supervised feature selection methods, Lasso and its variants using L1-norm-based regularization terms have received much attention in recent studies, most of which use class labels as supervised information. Besides class labels, there are other types of supervised information, e.g., pairwise constraints that specify whether a pair of data samples belong to the same class (must-link constraint) or different classes (cannot-link constraint). However, most of existing L1-norm-based sparse learning methods do not take advantage of the pairwise constraints that provide us weak and more general supervised information. For addressing that problem, we propose a pairwise constraint-guided sparse (CGS) learning method for feature selection, where the must-link and the cannot-link constraints are used as discriminative regularization terms that directly concentrate on the local discriminative structure of data. Furthermore, we develop two variants of CGS, including: 1) semi-supervised CGS that utilizes labeled data, pairwise constraints, and unlabeled data and 2) ensemble CGS that uses the ensemble of pairwise constraint sets. We conduct a series of experiments on a number of data sets from University of California-Irvine machine learning repository, a gene expression data set, two real-world neuroimaging-based classification tasks, and two large-scale attribute classification tasks. Experimental results demonstrate the efficacy of our proposed methods, compared with several established feature selection methods.

  5. The selection and implementation of hidden line algorithms

    International Nuclear Information System (INIS)

    Schneider, A.

    1983-06-01

    One of the most challenging problems in the field of computer graphics is the elimination of hidden lines in images of nontransparent bodies. In the real world the nontransparent material hinders the light ray coming from hidden regions to the observer. In the computer based image formation process there is no automatic visibility regulation of this kind. So many lines are created which result in a poor quality of the spacial representation. Therefore a three-dimensional representation on the screen is only meaningfull if the hidden lines are eliminated. For this process many algorithms have been developed in the past. A common feature of these codes is the large amount of computer time needed. In the first generation of algorithms, which are commonly used today, the bodies are modeled by plane polygons. More recently, however, also algorithms are in use, which are able to treat curved surfaces without discretisation by plane surfaces. In this paper the first group of algorithms is reviewed, and the most important codes are described. The experience obtained during the implementation of two algorithms is presented. (orig.) [de

  6. Feature selection and multi-kernel learning for sparse representation on a manifold

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-03-01

    Sparse representation has been widely studied as a part-based data representation method and applied in many scientific and engineering fields, such as bioinformatics and medical imaging. It seeks to represent a data sample as a sparse linear combination of some basic items in a dictionary. Gao etal. (2013) recently proposed Laplacian sparse coding by regularizing the sparse codes with an affinity graph. However, due to the noisy features and nonlinear distribution of the data samples, the affinity graph constructed directly from the original feature space is not necessarily a reliable reflection of the intrinsic manifold of the data samples. To overcome this problem, we integrate feature selection and multiple kernel learning into the sparse coding on the manifold. To this end, unified objectives are defined for feature selection, multiple kernel learning, sparse coding, and graph regularization. By optimizing the objective functions iteratively, we develop novel data representation algorithms with feature selection and multiple kernel learning respectively. Experimental results on two challenging tasks, N-linked glycosylation prediction and mammogram retrieval, demonstrate that the proposed algorithms outperform the traditional sparse coding methods. © 2013 Elsevier Ltd.

  7. Feature selection and multi-kernel learning for sparse representation on a manifold.

    Science.gov (United States)

    Wang, Jim Jing-Yan; Bensmail, Halima; Gao, Xin

    2014-03-01

    Sparse representation has been widely studied as a part-based data representation method and applied in many scientific and engineering fields, such as bioinformatics and medical imaging. It seeks to represent a data sample as a sparse linear combination of some basic items in a dictionary. Gao et al. (2013) recently proposed Laplacian sparse coding by regularizing the sparse codes with an affinity graph. However, due to the noisy features and nonlinear distribution of the data samples, the affinity graph constructed directly from the original feature space is not necessarily a reliable reflection of the intrinsic manifold of the data samples. To overcome this problem, we integrate feature selection and multiple kernel learning into the sparse coding on the manifold. To this end, unified objectives are defined for feature selection, multiple kernel learning, sparse coding, and graph regularization. By optimizing the objective functions iteratively, we develop novel data representation algorithms with feature selection and multiple kernel learning respectively. Experimental results on two challenging tasks, N-linked glycosylation prediction and mammogram retrieval, demonstrate that the proposed algorithms outperform the traditional sparse coding methods. Copyright © 2013 Elsevier Ltd. All rights reserved.

  8. The Speech multi features fusion perceptual hash algorithm based on tensor decomposition

    Science.gov (United States)

    Huang, Y. B.; Fan, M. H.; Zhang, Q. Y.

    2018-03-01

    With constant progress in modern speech communication technologies, the speech data is prone to be attacked by the noise or maliciously tampered. In order to make the speech perception hash algorithm has strong robustness and high efficiency, this paper put forward a speech perception hash algorithm based on the tensor decomposition and multi features is proposed. This algorithm analyses the speech perception feature acquires each speech component wavelet packet decomposition. LPCC, LSP and ISP feature of each speech component are extracted to constitute the speech feature tensor. Speech authentication is done by generating the hash values through feature matrix quantification which use mid-value. Experimental results showing that the proposed algorithm is robust for content to maintain operations compared with similar algorithms. It is able to resist the attack of the common background noise. Also, the algorithm is highly efficiency in terms of arithmetic, and is able to meet the real-time requirements of speech communication and complete the speech authentication quickly.

  9. An Improved Iris Recognition Algorithm Based on Hybrid Feature and ELM

    Science.gov (United States)

    Wang, Juan

    2018-03-01

    The iris image is easily polluted by noise and uneven light. This paper proposed an improved extreme learning machine (ELM) based iris recognition algorithm with hybrid feature. 2D-Gabor filters and GLCM is employed to generate a multi-granularity hybrid feature vector. 2D-Gabor filter and GLCM feature work for capturing low-intermediate frequency and high frequency texture information, respectively. Finally, we utilize extreme learning machine for iris recognition. Experimental results reveal our proposed ELM based multi-granularity iris recognition algorithm (ELM-MGIR) has higher accuracy of 99.86%, and lower EER of 0.12% under the premise of real-time performance. The proposed ELM-MGIR algorithm outperforms other mainstream iris recognition algorithms.

  10. A Semidefinite Programming Based Search Strategy for Feature Selection with Mutual Information Measure.

    Science.gov (United States)

    Naghibi, Tofigh; Hoffmann, Sarah; Pfister, Beat

    2015-08-01

    Feature subset selection, as a special case of the general subset selection problem, has been the topic of a considerable number of studies due to the growing importance of data-mining applications. In the feature subset selection problem there are two main issues that need to be addressed: (i) Finding an appropriate measure function than can be fairly fast and robustly computed for high-dimensional data. (ii) A search strategy to optimize the measure over the subset space in a reasonable amount of time. In this article mutual information between features and class labels is considered to be the measure function. Two series expansions for mutual information are proposed, and it is shown that most heuristic criteria suggested in the literature are truncated approximations of these expansions. It is well-known that searching the whole subset space is an NP-hard problem. Here, instead of the conventional sequential search algorithms, we suggest a parallel search strategy based on semidefinite programming (SDP) that can search through the subset space in polynomial time. By exploiting the similarities between the proposed algorithm and an instance of the maximum-cut problem in graph theory, the approximation ratio of this algorithm is derived and is compared with the approximation ratio of the backward elimination method. The experiments show that it can be misleading to judge the quality of a measure solely based on the classification accuracy, without taking the effect of the non-optimum search strategy into account.

  11. Feature selection gait-based gender classification under different circumstances

    Science.gov (United States)

    Sabir, Azhin; Al-Jawad, Naseer; Jassim, Sabah

    2014-05-01

    This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.

  12. Cancer Feature Selection and Classification Using a Binary Quantum-Behaved Particle Swarm Optimization and Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Maolong Xi

    2016-01-01

    Full Text Available This paper focuses on the feature gene selection for cancer classification, which employs an optimization algorithm to select a subset of the genes. We propose a binary quantum-behaved particle swarm optimization (BQPSO for cancer feature gene selection, coupling support vector machine (SVM for cancer classification. First, the proposed BQPSO algorithm is described, which is a discretized version of original QPSO for binary 0-1 optimization problems. Then, we present the principle and procedure for cancer feature gene selection and cancer classification based on BQPSO and SVM with leave-one-out cross validation (LOOCV. Finally, the BQPSO coupling SVM (BQPSO/SVM, binary PSO coupling SVM (BPSO/SVM, and genetic algorithm coupling SVM (GA/SVM are tested for feature gene selection and cancer classification on five microarray data sets, namely, Leukemia, Prostate, Colon, Lung, and Lymphoma. The experimental results show that BQPSO/SVM has significant advantages in accuracy, robustness, and the number of feature genes selected compared with the other two algorithms.

  13. Cancer Feature Selection and Classification Using a Binary Quantum-Behaved Particle Swarm Optimization and Support Vector Machine

    Science.gov (United States)

    Sun, Jun; Liu, Li; Fan, Fangyun; Wu, Xiaojun

    2016-01-01

    This paper focuses on the feature gene selection for cancer classification, which employs an optimization algorithm to select a subset of the genes. We propose a binary quantum-behaved particle swarm optimization (BQPSO) for cancer feature gene selection, coupling support vector machine (SVM) for cancer classification. First, the proposed BQPSO algorithm is described, which is a discretized version of original QPSO for binary 0-1 optimization problems. Then, we present the principle and procedure for cancer feature gene selection and cancer classification based on BQPSO and SVM with leave-one-out cross validation (LOOCV). Finally, the BQPSO coupling SVM (BQPSO/SVM), binary PSO coupling SVM (BPSO/SVM), and genetic algorithm coupling SVM (GA/SVM) are tested for feature gene selection and cancer classification on five microarray data sets, namely, Leukemia, Prostate, Colon, Lung, and Lymphoma. The experimental results show that BQPSO/SVM has significant advantages in accuracy, robustness, and the number of feature genes selected compared with the other two algorithms. PMID:27642363

  14. An improved feature extraction algorithm based on KAZE for multi-spectral image

    Science.gov (United States)

    Yang, Jianping; Li, Jun

    2018-02-01

    Multi-spectral image contains abundant spectral information, which is widely used in all fields like resource exploration, meteorological observation and modern military. Image preprocessing, such as image feature extraction and matching, is indispensable while dealing with multi-spectral remote sensing image. Although the feature matching algorithm based on linear scale such as SIFT and SURF performs strong on robustness, the local accuracy cannot be guaranteed. Therefore, this paper proposes an improved KAZE algorithm, which is based on nonlinear scale, to raise the number of feature and to enhance the matching rate by using the adjusted-cosine vector. The experiment result shows that the number of feature and the matching rate of the improved KAZE are remarkably than the original KAZE algorithm.

  15. Chinese License Plates Recognition Method Based on A Robust and Efficient Feature Extraction and BPNN Algorithm

    Science.gov (United States)

    Zhang, Ming; Xie, Fei; Zhao, Jing; Sun, Rui; Zhang, Lei; Zhang, Yue

    2018-04-01

    The prosperity of license plate recognition technology has made great contribution to the development of Intelligent Transport System (ITS). In this paper, a robust and efficient license plate recognition method is proposed which is based on a combined feature extraction model and BPNN (Back Propagation Neural Network) algorithm. Firstly, the candidate region of the license plate detection and segmentation method is developed. Secondly, a new feature extraction model is designed considering three sets of features combination. Thirdly, the license plates classification and recognition method using the combined feature model and BPNN algorithm is presented. Finally, the experimental results indicate that the license plate segmentation and recognition both can be achieved effectively by the proposed algorithm. Compared with three traditional methods, the recognition accuracy of the proposed method has increased to 95.7% and the consuming time has decreased to 51.4ms.

  16. Emotion of Physiological Signals Classification Based on TS Feature Selection

    Institute of Scientific and Technical Information of China (English)

    Wang Yujing; Mo Jianlin

    2015-01-01

    This paper propose a method of TS-MLP about emotion recognition of physiological signal.It can recognize emotion successfully by Tabu search which selects features of emotion’s physiological signals and multilayer perceptron that is used to classify emotion.Simulation shows that it has achieved good emotion classification performance.

  17. Orthogonal feature selection method. [For preprocessing of man spectral data

    Energy Technology Data Exchange (ETDEWEB)

    Kowalski, B R [Univ. of Washington, Seattle; Bender, C F

    1976-01-01

    A new method of preprocessing spectral data for extraction of molecular structural information is desired. This SELECT method generates orthogonal features that are important for classification purposes and that also retain their identity to the original measurements. A brief introduction to chemical pattern recognition is presented. A brief description of the method and an application to mass spectral data analysis follow. (BLM)

  18. Feature extraction and sensor selection for NPP initiating event identification

    International Nuclear Information System (INIS)

    Lin, Ting-Han; Wu, Shun-Chi; Chen, Kuang-You; Chou, Hwai-Pwu

    2017-01-01

    Highlights: • A two-stage feature extraction scheme for NPP initiating event identification. • With stBP, interrelations among the sensors can be retained for identification. • With dSFS, sensors that are crucial for identification can be efficiently selected. • Efficacy of the scheme is illustrated with data from the Maanshan NPP simulator. - Abstract: Initiating event identification is essential in managing nuclear power plant (NPP) severe accidents. In this paper, a novel two-stage feature extraction scheme that incorporates the proposed sensor type-wise block projection (stBP) and deflatable sequential forward selection (dSFS) is used to elicit the discriminant information in the data obtained from various NPP sensors to facilitate event identification. With the stBP, the primal features can be extracted without eliminating the interrelations among the sensors of the same type. The extracted features are then subjected to a further dimensionality reduction by selecting the sensors that are most relevant to the events under consideration. This selection is not easy, and a combinatorial optimization technique is normally required. With the dSFS, an optimal sensor set can be found with less computational load. Moreover, its sensor deflation stage allows sensors in the preselected set to be iteratively refined to avoid being trapped into a local optimum. Results from detailed experiments containing data of 12 event categories and a total of 112 events generated with a Taiwan’s Maanshan NPP simulator are presented to illustrate the efficacy of the proposed scheme.

  19. Technical Evaluation Report 27: Educational Wikis: Features and selection criteria

    Directory of Open Access Journals (Sweden)

    Jim Rudolph

    2004-04-01

    Full Text Available This report discusses the educational uses of the ‘wiki,’ an increasingly popular approach to online community development. Wikis are defined and compared with ‘blogging’ methods; characteristics of major wiki engines are described; and wiki features and selection criteria are examined.

  20. Feature selection for high-dimensional integrated data

    KAUST Repository

    Zheng, Charles; Schwartz, Scott; Chapkin, Robert S.; Carroll, Raymond J.; Ivanov, Ivan

    2012-01-01

    Motivated by the problem of identifying correlations between genes or features of two related biological systems, we propose a model of feature selection in which only a subset of the predictors Xt are dependent on the multidimensional variate Y, and the remainder of the predictors constitute a “noise set” Xu independent of Y. Using Monte Carlo simulations, we investigated the relative performance of two methods: thresholding and singular-value decomposition, in combination with stochastic optimization to determine “empirical bounds” on the small-sample accuracy of an asymptotic approximation. We demonstrate utility of the thresholding and SVD feature selection methods to with respect to a recent infant intestinal gene expression and metagenomics dataset.

  1. Feature selection for high-dimensional integrated data

    KAUST Repository

    Zheng, Charles

    2012-04-26

    Motivated by the problem of identifying correlations between genes or features of two related biological systems, we propose a model of feature selection in which only a subset of the predictors Xt are dependent on the multidimensional variate Y, and the remainder of the predictors constitute a “noise set” Xu independent of Y. Using Monte Carlo simulations, we investigated the relative performance of two methods: thresholding and singular-value decomposition, in combination with stochastic optimization to determine “empirical bounds” on the small-sample accuracy of an asymptotic approximation. We demonstrate utility of the thresholding and SVD feature selection methods to with respect to a recent infant intestinal gene expression and metagenomics dataset.

  2. Local anesthesia selection algorithm in patients with concomitant somatic diseases.

    Science.gov (United States)

    Anisimova, E N; Sokhov, S T; Letunova, N Y; Orekhova, I V; Gromovik, M V; Erilin, E A; Ryazantsev, N A

    2016-01-01

    The paper presents basic principles of local anesthesia selection in patients with concomitant somatic diseases. These principles are history taking; analysis of drugs interaction with local anesthetic and sedation agents; determination of the functional status of the patient; patient anxiety correction; dental care with monitoring of hemodynamics parameters. It was found that adhering to this algorithm promotes prevention of urgent conditions in patients in outpatient dentistry.

  3. Comparison of feature selection and classification for MALDI-MS data

    Directory of Open Access Journals (Sweden)

    Yang Mary

    2009-07-01

    Full Text Available Abstract Introduction In the classification of Mass Spectrometry (MS proteomics data, peak detection, feature selection, and learning classifiers are critical to classification accuracy. To better understand which methods are more accurate when classifying data, some publicly available peak detection algorithms for Matrix assisted Laser Desorption Ionization Mass Spectrometry (MALDI-MS data were recently compared; however, the issue of different feature selection methods and different classification models as they relate to classification performance has not been addressed. With the application of intelligent computing, much progress has been made in the development of feature selection methods and learning classifiers for the analysis of high-throughput biological data. The main objective of this paper is to compare the methods of feature selection and different learning classifiers when applied to MALDI-MS data and to provide a subsequent reference for the analysis of MS proteomics data. Results We compared a well-known method of feature selection, Support Vector Machine Recursive Feature Elimination (SVMRFE, and a recently developed method, Gradient based Leave-one-out Gene Selection (GLGS that effectively performs microarray data analysis. We also compared several learning classifiers including K-Nearest Neighbor Classifier (KNNC, Naïve Bayes Classifier (NBC, Nearest Mean Scaled Classifier (NMSC, uncorrelated normal based quadratic Bayes Classifier recorded as UDC, Support Vector Machines, and a distance metric learning for Large Margin Nearest Neighbor classifier (LMNN based on Mahanalobis distance. To compare, we conducted a comprehensive experimental study using three types of MALDI-MS data. Conclusion Regarding feature selection, SVMRFE outperformed GLGS in classification. As for the learning classifiers, when classification models derived from the best training were compared, SVMs performed the best with respect to the expected testing

  4. Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data

    Directory of Open Access Journals (Sweden)

    Harris Lyndsay N

    2006-04-01

    Full Text Available Abstract Background Like microarray-based investigations, high-throughput proteomics techniques require machine learning algorithms to identify biomarkers that are informative for biological classification problems. Feature selection and classification algorithms need to be robust to noise and outliers in the data. Results We developed a recursive support vector machine (R-SVM algorithm to select important genes/biomarkers for the classification of noisy data. We compared its performance to a similar, state-of-the-art method (SVM recursive feature elimination or SVM-RFE, paying special attention to the ability of recovering the true informative genes/biomarkers and the robustness to outliers in the data. Simulation experiments show that a 5 %-~20 % improvement over SVM-RFE can be achieved regard to these properties. The SVM-based methods are also compared with a conventional univariate method and their respective strengths and weaknesses are discussed. R-SVM was applied to two sets of SELDI-TOF-MS proteomics data, one from a human breast cancer study and the other from a study on rat liver cirrhosis. Important biomarkers found by the algorithm were validated by follow-up biological experiments. Conclusion The proposed R-SVM method is suitable for analyzing noisy high-throughput proteomics and microarray data and it outperforms SVM-RFE in the robustness to noise and in the ability to recover informative features. The multivariate SVM-based method outperforms the univariate method in the classification performance, but univariate methods can reveal more of the differentially expressed features especially when there are correlations between the features.

  5. Optimal Parameter Selection of Power System Stabilizer using Genetic Algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Chung, Hyeng Hwan; Chung, Dong Il; Chung, Mun Kyu [Dong-AUniversity (Korea); Wang, Yong Peel [Canterbury Univeristy (New Zealand)

    1999-06-01

    In this paper, it is suggested that the selection method of optimal parameter of power system stabilizer (PSS) with robustness in low frequency oscillation for power system using real variable elitism genetic algorithm (RVEGA). The optimal parameters were selected in the case of power system stabilizer with one lead compensator, and two lead compensator. Also, the frequency responses characteristics of PSS, the system eigenvalues criterion and the dynamic characteristics were considered in the normal load and the heavy load, which proved usefulness of RVEGA compare with Yu's compensator design theory. (author). 20 refs., 15 figs., 8 tabs.

  6. Improved binary dragonfly optimization algorithm and wavelet packet based non-linear features for infant cry classification.

    Science.gov (United States)

    Hariharan, M; Sindhu, R; Vijean, Vikneswaran; Yazid, Haniza; Nadarajaw, Thiyagar; Yaacob, Sazali; Polat, Kemal

    2018-03-01

    Infant cry signal carries several levels of information about the reason for crying (hunger, pain, sleepiness and discomfort) or the pathological status (asphyxia, deaf, jaundice, premature condition and autism, etc.) of an infant and therefore suited for early diagnosis. In this work, combination of wavelet packet based features and Improved Binary Dragonfly Optimization based feature selection method was proposed to classify the different types of infant cry signals. Cry signals from 2 different databases were utilized. First database contains 507 cry samples of normal (N), 340 cry samples of asphyxia (A), 879 cry samples of deaf (D), 350 cry samples of hungry (H) and 192 cry samples of pain (P). Second database contains 513 cry samples of jaundice (J), 531 samples of premature (Prem) and 45 samples of normal (N). Wavelet packet transform based energy and non-linear entropies (496 features), Linear Predictive Coding (LPC) based cepstral features (56 features), Mel-frequency Cepstral Coefficients (MFCCs) were extracted (16 features). The combined feature set consists of 568 features. To overcome the curse of dimensionality issue, improved binary dragonfly optimization algorithm (IBDFO) was proposed to select the most salient attributes or features. Finally, Extreme Learning Machine (ELM) kernel classifier was used to classify the different types of infant cry signals using all the features and highly informative features as well. Several experiments of two-class and multi-class classification of cry signals were conducted. In binary or two-class experiments, maximum accuracy of 90.18% for H Vs P, 100% for A Vs N, 100% for D Vs N and 97.61% J Vs Prem was achieved using the features selected (only 204 features out of 568) by IBDFO. For the classification of multiple cry signals (multi-class problem), the selected features could differentiate between three classes (N, A & D) with the accuracy of 100% and seven classes with the accuracy of 97.62%. The experimental

  7. Combinatorial Optimization in Project Selection Using Genetic Algorithm

    Science.gov (United States)

    Dewi, Sari; Sawaluddin

    2018-01-01

    This paper discusses the problem of project selection in the presence of two objective functions that maximize profit and minimize cost and the existence of some limitations is limited resources availability and time available so that there is need allocation of resources in each project. These resources are human resources, machine resources, raw material resources. This is treated as a consideration to not exceed the budget that has been determined. So that can be formulated mathematics for objective function (multi-objective) with boundaries that fulfilled. To assist the project selection process, a multi-objective combinatorial optimization approach is used to obtain an optimal solution for the selection of the right project. It then described a multi-objective method of genetic algorithm as one method of multi-objective combinatorial optimization approach to simplify the project selection process in a large scope.

  8. A review of feature detection and match algorithms for localization and mapping

    Science.gov (United States)

    Li, Shimiao

    2017-09-01

    Localization and mapping is an essential ability of a robot to keep track of its own location in an unknown environment. Among existing methods for this purpose, vision-based methods are more effective solutions for being accurate, inexpensive and versatile. Vision-based methods can generally be categorized as feature-based approaches and appearance-based approaches. The feature-based approaches prove higher performance in textured scenarios. However, their performance depend highly on the applied feature-detection algorithms. In this paper, we surveyed algorithms for feature detection, which is an essential step in achieving vision-based localization and mapping. In this pater, we present mathematical models of the algorithms one after another. To compare the performances of the algorithms, we conducted a series of experiments on their accuracy, speed, scale invariance and rotation invariance. The results of the experiments showed that ORB is the fastest algorithm in detecting and matching features, the speed of which is more than 10 times that of SURF and approximately 40 times that of SIFT. And SIFT, although with no advantage in terms of speed, shows the most correct matching pairs and proves its accuracy.

  9. GMDH-Based Semi-Supervised Feature Selection for Electricity Load Classification Forecasting

    Directory of Open Access Journals (Sweden)

    Lintao Yang

    2018-01-01

    Full Text Available With the development of smart power grids, communication network technology and sensor technology, there has been an exponential growth in complex electricity load data. Irregular electricity load fluctuations caused by the weather and holiday factors disrupt the daily operation of the power companies. To deal with these challenges, this paper investigates a day-ahead electricity peak load interval forecasting problem. It transforms the conventional continuous forecasting problem into a novel interval forecasting problem, and then further converts the interval forecasting problem into the classification forecasting problem. In addition, an indicator system influencing the electricity load is established from three dimensions, namely the load series, calendar data, and weather data. A semi-supervised feature selection algorithm is proposed to address an electricity load classification forecasting issue based on the group method of data handling (GMDH technology. The proposed algorithm consists of three main stages: (1 training the basic classifier; (2 selectively marking the most suitable samples from the unclassified label data, and adding them to an initial training set; and (3 training the classification models on the final training set and classifying the test samples. An empirical analysis of electricity load dataset from four Chinese cities is conducted. Results show that the proposed model can address the electricity load classification forecasting problem more efficiently and effectively than the FW-Semi FS (forward semi-supervised feature selection and GMDH-U (GMDH-based semi-supervised feature selection for customer classification models.

  10. A graph-Laplacian-based feature extraction algorithm for neural spike sorting.

    Science.gov (United States)

    Ghanbari, Yasser; Spence, Larry; Papamichalis, Panos

    2009-01-01

    Analysis of extracellular neural spike recordings is highly dependent upon the accuracy of neural waveform classification, commonly referred to as spike sorting. Feature extraction is an important stage of this process because it can limit the quality of clustering which is performed in the feature space. This paper proposes a new feature extraction method (which we call Graph Laplacian Features, GLF) based on minimizing the graph Laplacian and maximizing the weighted variance. The algorithm is compared with Principal Components Analysis (PCA, the most commonly-used feature extraction method) using simulated neural data. The results show that the proposed algorithm produces more compact and well-separated clusters compared to PCA. As an added benefit, tentative cluster centers are output which can be used to initialize a subsequent clustering stage.

  11. Linear feature selection in texture analysis - A PLS based method

    DEFF Research Database (Denmark)

    Marques, Joselene; Igel, Christian; Lillholm, Martin

    2013-01-01

    We present a texture analysis methodology that combined uncommitted machine-learning techniques and partial least square (PLS) in a fully automatic framework. Our approach introduces a robust PLS-based dimensionality reduction (DR) step to specifically address outliers and high-dimensional feature...... and considering all CV groups, the methods selected 36 % of the original features available. The diagnosis evaluation reached a generalization area-under-the-ROC curve of 0.92, which was higher than established cartilage-based markers known to relate to OA diagnosis....

  12. A fully automated algorithm of baseline correction based on wavelet feature points and segment interpolation

    Science.gov (United States)

    Qian, Fang; Wu, Yihui; Hao, Peng

    2017-11-01

    Baseline correction is a very important part of pre-processing. Baseline in the spectrum signal can induce uneven amplitude shifts across different wavenumbers and lead to bad results. Therefore, these amplitude shifts should be compensated before further analysis. Many algorithms are used to remove baseline, however fully automated baseline correction is convenient in practical application. A fully automated algorithm based on wavelet feature points and segment interpolation (AWFPSI) is proposed. This algorithm finds feature points through continuous wavelet transformation and estimates baseline through segment interpolation. AWFPSI is compared with three commonly introduced fully automated and semi-automated algorithms, using simulated spectrum signal, visible spectrum signal and Raman spectrum signal. The results show that AWFPSI gives better accuracy and has the advantage of easy use.

  13. Automatic feature learning using multichannel ROI based on deep structured algorithms for computerized lung cancer diagnosis.

    Science.gov (United States)

    Sun, Wenqing; Zheng, Bin; Qian, Wei

    2017-10-01

    This study aimed to analyze the ability of extracting automatically generated features using deep structured algorithms in lung nodule CT image diagnosis, and compare its performance with traditional computer aided diagnosis (CADx) systems using hand-crafted features. All of the 1018 cases were acquired from Lung Image Database Consortium (LIDC) public lung cancer database. The nodules were segmented according to four radiologists' markings, and 13,668 samples were generated by rotating every slice of nodule images. Three multichannel ROI based deep structured algorithms were designed and implemented in this study: convolutional neural network (CNN), deep belief network (DBN), and stacked denoising autoencoder (SDAE). For the comparison purpose, we also implemented a CADx system using hand-crafted features including density features, texture features and morphological features. The performance of every scheme was evaluated by using a 10-fold cross-validation method and an assessment index of the area under the receiver operating characteristic curve (AUC). The observed highest area under the curve (AUC) was 0.899±0.018 achieved by CNN, which was significantly higher than traditional CADx with the AUC=0.848±0.026. The results from DBN was also slightly higher than CADx, while SDAE was slightly lower. By visualizing the automatic generated features, we found some meaningful detectors like curvy stroke detectors from deep structured schemes. The study results showed the deep structured algorithms with automatically generated features can achieve desirable performance in lung nodule diagnosis. With well-tuned parameters and large enough dataset, the deep learning algorithms can have better performance than current popular CADx. We believe the deep learning algorithms with similar data preprocessing procedure can be used in other medical image analysis areas as well. Copyright © 2017. Published by Elsevier Ltd.

  14. An Improved Nested Sampling Algorithm for Model Selection and Assessment

    Science.gov (United States)

    Zeng, X.; Ye, M.; Wu, J.; WANG, D.

    2017-12-01

    Multimodel strategy is a general approach for treating model structure uncertainty in recent researches. The unknown groundwater system is represented by several plausible conceptual models. Each alternative conceptual model is attached with a weight which represents the possibility of this model. In Bayesian framework, the posterior model weight is computed as the product of model prior weight and marginal likelihood (or termed as model evidence). As a result, estimating marginal likelihoods is crucial for reliable model selection and assessment in multimodel analysis. Nested sampling estimator (NSE) is a new proposed algorithm for marginal likelihood estimation. The implementation of NSE comprises searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm and its variants are often used for local sampling in NSE. However, M-H is not an efficient sampling algorithm for high-dimensional or complex likelihood function. For improving the performance of NSE, it could be feasible to integrate more efficient and elaborated sampling algorithm - DREAMzs into the local sampling. In addition, in order to overcome the computation burden problem of large quantity of repeating model executions in marginal likelihood estimation, an adaptive sparse grid stochastic collocation method is used to build the surrogates for original groundwater model.

  15. Parameter Selection for Ant Colony Algorithm Based on Bacterial Foraging Algorithm

    Directory of Open Access Journals (Sweden)

    Peng Li

    2016-01-01

    Full Text Available The optimal performance of the ant colony algorithm (ACA mainly depends on suitable parameters; therefore, parameter selection for ACA is important. We propose a parameter selection method for ACA based on the bacterial foraging algorithm (BFA, considering the effects of coupling between different parameters. Firstly, parameters for ACA are mapped into a multidimensional space, using a chemotactic operator to ensure that each parameter group approaches the optimal value, speeding up the convergence for each parameter set. Secondly, the operation speed for optimizing the entire parameter set is accelerated using a reproduction operator. Finally, the elimination-dispersal operator is used to strengthen the global optimization of the parameters, which avoids falling into a local optimal solution. In order to validate the effectiveness of this method, the results were compared with those using a genetic algorithm (GA and a particle swarm optimization (PSO, and simulations were conducted using different grid maps for robot path planning. The results indicated that parameter selection for ACA based on BFA was the superior method, able to determine the best parameter combination rapidly, accurately, and effectively.

  16. An Algorithm Based on the Self-Organized Maps for the Classification of Facial Features

    Directory of Open Access Journals (Sweden)

    Gheorghe Gîlcă

    2015-12-01

    Full Text Available This paper deals with an algorithm based on Self Organized Maps networks which classifies facial features. The proposed algorithm can categorize the facial features defined by the input variables: eyebrow, mouth, eyelids into a map of their grouping. The groups map is based on calculating the distance between each input vector and each output neuron layer , the neuron with the minimum distance being declared winner neuron. The network structure consists of two levels: the first level contains three input vectors, each having forty-one values, while the second level contains the SOM competitive network which consists of 100 neurons. The proposed system can classify facial features quickly and easily using the proposed algorithm based on SOMs.

  17. An improved algorithm for information hiding based on features of Arabic text: A Unicode approach

    Directory of Open Access Journals (Sweden)

    A.A. Mohamed

    2014-07-01

    Full Text Available Steganography means how to hide secret information in a cover media, so that other individuals fail to realize their existence. Due to the lack of data redundancy in the text file in comparison with other carrier files, text steganography is a difficult problem to solve. In this paper, we proposed a new promised steganographic algorithm for Arabic text based on features of Arabic text. The focus is on more secure algorithm and high capacity of the carrier. Our extensive experiments using the proposed algorithm resulted in a high capacity of the carrier media. The embedding capacity rate ratio of the proposed algorithm is high. In addition, our algorithm can resist traditional attacking methods since it makes the changes in carrier text as minimum as possible.

  18. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data

    KAUST Repository

    Abusamra, Heba

    2013-01-01

    Different experiments have been applied to compare the performance of the classification methods with and without performing feature selection. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in different feature selection methods is investigated and the most frequent features selected in each fold among all methods for both datasets are evaluated.

  19. A Novel Algorithm for Feature Level Fusion Using SVM Classifier for Multibiometrics-Based Person Identification

    Directory of Open Access Journals (Sweden)

    Ujwalla Gawande

    2013-01-01

    Full Text Available Recent times witnessed many advancements in the field of biometric and ultimodal biometric fields. This is typically observed in the area, of security, privacy, and forensics. Even for the best of unimodal biometric systems, it is often not possible to achieve a higher recognition rate. Multimodal biometric systems overcome various limitations of unimodal biometric systems, such as nonuniversality, lower false acceptance, and higher genuine acceptance rates. More reliable recognition performance is achievable as multiple pieces of evidence of the same identity are available. The work presented in this paper is focused on multimodal biometric system using fingerprint and iris. Distinct textual features of the iris and fingerprint are extracted using the Haar wavelet-based technique. A novel feature level fusion algorithm is developed to combine these unimodal features using the Mahalanobis distance technique. A support-vector-machine-based learning algorithm is used to train the system using the feature extracted. The performance of the proposed algorithms is validated and compared with other algorithms using the CASIA iris database and real fingerprint database. From the simulation results, it is evident that our algorithm has higher recognition rate and very less false rejection rate compared to existing approaches.

  20. Impact of Reconstruction Algorithms on CT Radiomic Features of Pulmonary Tumors: Analysis of Intra- and Inter-Reader Variability and Inter-Reconstruction Algorithm Variability.

    Science.gov (United States)

    Kim, Hyungjin; Park, Chang Min; Lee, Myunghee; Park, Sang Joon; Song, Yong Sub; Lee, Jong Hyuk; Hwang, Eui Jin; Goo, Jin Mo

    2016-01-01

    To identify the impact of reconstruction algorithms on CT radiomic features of pulmonary tumors and to reveal and compare the intra- and inter-reader and inter-reconstruction algorithm variability of each feature. Forty-two patients (M:F = 19:23; mean age, 60.43±10.56 years) with 42 pulmonary tumors (22.56±8.51mm) underwent contrast-enhanced CT scans, which were reconstructed with filtered back projection and commercial iterative reconstruction algorithm (level 3 and 5). Two readers independently segmented the whole tumor volume. Fifteen radiomic features were extracted and compared among reconstruction algorithms. Intra- and inter-reader variability and inter-reconstruction algorithm variability were calculated using coefficients of variation (CVs) and then compared. Among the 15 features, 5 first-order tumor intensity features and 4 gray level co-occurrence matrix (GLCM)-based features showed significant differences (palgorithms. As for the variability, effective diameter, sphericity, entropy, and GLCM entropy were the most robust features (CV≤5%). Inter-reader variability was larger than intra-reader or inter-reconstruction algorithm variability in 9 features. However, for entropy, homogeneity, and 4 GLCM-based features, inter-reconstruction algorithm variability was significantly greater than inter-reader variability (palgorithms. Inter-reconstruction algorithm variability was greater than inter-reader variability for entropy, homogeneity, and GLCM-based features.

  1. Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

    KAUST Repository

    Elsheikh, Ahmed H.; Wheeler, Mary Fanett; Hoteit, Ibrahim

    2014-01-01

    A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using

  2. NetProt: Complex-based Feature Selection.

    Science.gov (United States)

    Goh, Wilson Wen Bin; Wong, Limsoon

    2017-08-04

    Protein complex-based feature selection (PCBFS) provides unparalleled reproducibility with high phenotypic relevance on proteomics data. Currently, there are five PCBFS paradigms, but not all representative methods have been implemented or made readily available. To allow general users to take advantage of these methods, we developed the R-package NetProt, which provides implementations of representative feature-selection methods. NetProt also provides methods for generating simulated differential data and generating pseudocomplexes for complex-based performance benchmarking. The NetProt open source R package is available for download from https://github.com/gohwils/NetProt/releases/ , and online documentation is available at http://rpubs.com/gohwils/204259 .

  3. An artificial bee colony algorithm for uncertain portfolio selection.

    Science.gov (United States)

    Chen, Wei

    2014-01-01

    Portfolio selection is an important issue for researchers and practitioners. In this paper, under the assumption that security returns are given by experts' evaluations rather than historical data, we discuss the portfolio adjusting problem which takes transaction costs and diversification degree of portfolio into consideration. Uncertain variables are employed to describe the security returns. In the proposed mean-variance-entropy model, the uncertain mean value of the return is used to measure investment return, the uncertain variance of the return is used to measure investment risk, and the entropy is used to measure diversification degree of portfolio. In order to solve the proposed model, a modified artificial bee colony (ABC) algorithm is designed. Finally, a numerical example is given to illustrate the modelling idea and the effectiveness of the proposed algorithm.

  4. Medical Image Fusion Algorithm Based on Nonlinear Approximation of Contourlet Transform and Regional Features

    Directory of Open Access Journals (Sweden)

    Hui Huang

    2017-01-01

    Full Text Available According to the pros and cons of contourlet transform and multimodality medical imaging, here we propose a novel image fusion algorithm that combines nonlinear approximation of contourlet transform with image regional features. The most important coefficient bands of the contourlet sparse matrix are retained by nonlinear approximation. Low-frequency and high-frequency regional features are also elaborated to fuse medical images. The results strongly suggested that the proposed algorithm could improve the visual effects of medical image fusion and image quality, image denoising, and enhancement.

  5. Algorithms of control parameters selection for automation of FDM 3D printing process

    Directory of Open Access Journals (Sweden)

    Kogut Paweł

    2017-01-01

    Full Text Available The paper presents algorithms of control parameters selection of the Fused Deposition Modelling (FDM technology in case of an open printing solutions environment and 3DGence ONE printer. The following parameters were distinguished: model mesh density, material flow speed, cooling performance, retraction and printing speeds. These parameters are independent in principle printing system, but in fact to a certain degree that results from the selected printing equipment features. This is the first step for automation of the 3D printing process in FDM technology.

  6. Benefit salience and consumers' selective attention to product features

    OpenAIRE

    Ratneshwar, S; Warlop, Luk; Mick, DG; Seeger, G

    1997-01-01

    Although attention is a key construct in models of marketing communication and consumer choice, its selective nature has rarely been examined in common time-pressured conditions. We focus on the role of benefit salience, that is, the readiness with which particular benefits are brought to mind by consumers in relation to a given product category. Study I demonstrated that when product feature information was presented rapidly, individuals for whom the benefit of personalised customer service ...

  7. Automatic Peak Selection by a Benjamini-Hochberg-Based Algorithm

    KAUST Repository

    Abbas, Ahmed; Kong, Xin-Bing; Liu, Zhi; Jing, Bing-Yi; Gao, Xin

    2013-01-01

    A common issue in bioinformatics is that computational methods often generate a large number of predictions sorted according to certain confidence scores. A key problem is then determining how many predictions must be selected to include most of the true predictions while maintaining reasonably high precision. In nuclear magnetic resonance (NMR)-based protein structure determination, for instance, computational peak picking methods are becoming more and more common, although expert-knowledge remains the method of choice to determine how many peaks among thousands of candidate peaks should be taken into consideration to capture the true peaks. Here, we propose a Benjamini-Hochberg (B-H)-based approach that automatically selects the number of peaks. We formulate the peak selection problem as a multiple testing problem. Given a candidate peak list sorted by either volumes or intensities, we first convert the peaks into p-values and then apply the B-H-based algorithm to automatically select the number of peaks. The proposed approach is tested on the state-of-the-art peak picking methods, including WaVPeak [1] and PICKY [2]. Compared with the traditional fixed number-based approach, our approach returns significantly more true peaks. For instance, by combining WaVPeak or PICKY with the proposed method, the missing peak rates are on average reduced by 20% and 26%, respectively, in a benchmark set of 32 spectra extracted from eight proteins. The consensus of the B-H-selected peaks from both WaVPeak and PICKY achieves 88% recall and 83% precision, which significantly outperforms each individual method and the consensus method without using the B-H algorithm. The proposed method can be used as a standard procedure for any peak picking method and straightforwardly applied to some other prediction selection problems in bioinformatics. The source code, documentation and example data of the proposed method is available at http://sfb.kaust.edu.sa/pages/software.aspx. © 2013

  8. Automatic Peak Selection by a Benjamini-Hochberg-Based Algorithm

    KAUST Repository

    Abbas, Ahmed

    2013-01-07

    A common issue in bioinformatics is that computational methods often generate a large number of predictions sorted according to certain confidence scores. A key problem is then determining how many predictions must be selected to include most of the true predictions while maintaining reasonably high precision. In nuclear magnetic resonance (NMR)-based protein structure determination, for instance, computational peak picking methods are becoming more and more common, although expert-knowledge remains the method of choice to determine how many peaks among thousands of candidate peaks should be taken into consideration to capture the true peaks. Here, we propose a Benjamini-Hochberg (B-H)-based approach that automatically selects the number of peaks. We formulate the peak selection problem as a multiple testing problem. Given a candidate peak list sorted by either volumes or intensities, we first convert the peaks into p-values and then apply the B-H-based algorithm to automatically select the number of peaks. The proposed approach is tested on the state-of-the-art peak picking methods, including WaVPeak [1] and PICKY [2]. Compared with the traditional fixed number-based approach, our approach returns significantly more true peaks. For instance, by combining WaVPeak or PICKY with the proposed method, the missing peak rates are on average reduced by 20% and 26%, respectively, in a benchmark set of 32 spectra extracted from eight proteins. The consensus of the B-H-selected peaks from both WaVPeak and PICKY achieves 88% recall and 83% precision, which significantly outperforms each individual method and the consensus method without using the B-H algorithm. The proposed method can be used as a standard procedure for any peak picking method and straightforwardly applied to some other prediction selection problems in bioinformatics. The source code, documentation and example data of the proposed method is available at http://sfb.kaust.edu.sa/pages/software.aspx. © 2013

  9. A novel feature ranking algorithm for biometric recognition with PPG signals.

    Science.gov (United States)

    Reşit Kavsaoğlu, A; Polat, Kemal; Recep Bozkurt, M

    2014-06-01

    This study is intended for describing the application of the Photoplethysmography (PPG) signal and the time domain features acquired from its first and second derivatives for biometric identification. For this purpose, a sum of 40 features has been extracted and a feature-ranking algorithm is proposed. This proposed algorithm calculates the contribution of each feature to biometric recognition and collocates the features, the contribution of which is from great to small. While identifying the contribution of the features, the Euclidean distance and absolute distance formulas are used. The efficiency of the proposed algorithms is demonstrated by the results of the k-NN (k-nearest neighbor) classifier applications of the features. During application, each 15-period-PPG signal belonging to two different durations from each of the thirty healthy subjects were used with a PPG data acquisition card. The first PPG signals recorded from the subjects were evaluated as the 1st configuration; the PPG signals recorded later at a different time as the 2nd configuration and the combination of both were evaluated as the 3rd configuration. When the results were evaluated for the k-NN classifier model created along with the proposed algorithm, an identification of 90.44% for the 1st configuration, 94.44% for the 2nd configuration, and 87.22% for the 3rd configuration has successfully been attained. The obtained results showed that both the proposed algorithm and the biometric identification model based on this developed PPG signal are very promising for contactless recognizing the people with the proposed method. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. Effective automated feature construction and selection for classification of biological sequences.

    Directory of Open Access Journals (Sweden)

    Uday Kamath

    Full Text Available Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features.We present an algorithmic framework (EFFECT for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not.To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification

  11. A 3D Printing Model Watermarking Algorithm Based on 3D Slicing and Feature Points

    Directory of Open Access Journals (Sweden)

    Giao N. Pham

    2018-02-01

    Full Text Available With the increase of three-dimensional (3D printing applications in many areas of life, a large amount of 3D printing data is copied, shared, and used several times without any permission from the original providers. Therefore, copyright protection and ownership identification for 3D printing data in communications or commercial transactions are practical issues. This paper presents a novel watermarking algorithm for 3D printing models based on embedding watermark data into the feature points of a 3D printing model. Feature points are determined and computed by the 3D slicing process along the Z axis of a 3D printing model. The watermark data is embedded into a feature point of a 3D printing model by changing the vector length of the feature point in OXY space based on the reference length. The x and y coordinates of the feature point will be then changed according to the changed vector length that has been embedded with a watermark. Experimental results verified that the proposed algorithm is invisible and robust to geometric attacks, such as rotation, scaling, and translation. The proposed algorithm provides a better method than the conventional works, and the accuracy of the proposed algorithm is much higher than previous methods.

  12. An enhanced PSO-DEFS based feature selection with biometric authentication for identification of diabetic retinopathy

    Directory of Open Access Journals (Sweden)

    Umarani Balakrishnan

    2016-11-01

    Full Text Available Recently, automatic diagnosis of diabetic retinopathy (DR from the retinal image is the most significant research topic in the medical applications. Diabetic macular edema (DME is the major reason for the loss of vision in patients suffering from DR. Early identification of the DR enables to prevent the vision loss and encourage diabetic control activities. Many techniques are developed to diagnose the DR. The major drawbacks of the existing techniques are low accuracy and high time complexity. To overcome these issues, this paper proposes an enhanced particle swarm optimization-differential evolution feature selection (PSO-DEFS based feature selection approach with biometric authentication for the identification of DR. Initially, a hybrid median filter (HMF is used for pre-processing the input images. Then, the pre-processed images are embedded with each other by using least significant bit (LSB for authentication purpose. Simultaneously, the image features are extracted using convoluted local tetra pattern (CLTrP and Tamura features. Feature selection is performed using PSO-DEFS and PSO-gravitational search algorithm (PSO-GSA to reduce time complexity. Based on some performance metrics, the PSO-DEFS is chosen as a better choice for feature selection. The feature selection is performed based on the fitness value. A multi-relevance vector machine (M-RVM is introduced to classify the 13 normal and 62 abnormal images among 75 images from 60 patients. Finally, the DR patients are further classified by M-RVM. The experimental results exhibit that the proposed approach achieves better accuracy, sensitivity, and specificity than the existing techniques.

  13. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling

    Directory of Open Access Journals (Sweden)

    Hala Alshamlan

    2015-01-01

    Full Text Available An artificial bee colony (ABC is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR, and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO. The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.

  14. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling.

    Science.gov (United States)

    Alshamlan, Hala; Badr, Ghada; Alohali, Yousef

    2015-01-01

    An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.

  15. Self-Adaptive MOEA Feature Selection for Classification of Bankruptcy Prediction Data

    Science.gov (United States)

    Gaspar-Cunha, A.; Recio, G.; Costa, L.; Estébanez, C.

    2014-01-01

    Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy). The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier. PMID:24707201

  16. Feature Selection for Nonstationary Data: Application to Human Recognition Using Medical Biometrics.

    Science.gov (United States)

    Komeili, Majid; Louis, Wael; Armanfard, Narges; Hatzinakos, Dimitrios

    2018-05-01

    Electrocardiogram (ECG) and transient evoked otoacoustic emission (TEOAE) are among the physiological signals that have attracted significant interest in biometric community due to their inherent robustness to replay and falsification attacks. However, they are time-dependent signals and this makes them hard to deal with in across-session human recognition scenario where only one session is available for enrollment. This paper presents a novel feature selection method to address this issue. It is based on an auxiliary dataset with multiple sessions where it selects a subset of features that are more persistent across different sessions. It uses local information in terms of sample margins while enforcing an across-session measure. This makes it a perfect fit for aforementioned biometric recognition problem. Comprehensive experiments on ECG and TEOAE variability due to time lapse and body posture are done. Performance of the proposed method is compared against seven state-of-the-art feature selection algorithms as well as another six approaches in the area of ECG and TEOAE biometric recognition. Experimental results demonstrate that the proposed method performs noticeably better than other algorithms.

  17. HEART RATE VARIABILITY CLASSIFICATION USING SADE-ELM CLASSIFIER WITH BAT FEATURE SELECTION

    Directory of Open Access Journals (Sweden)

    R Kavitha

    2017-07-01

    Full Text Available The electrical activity of the human heart is measured by the vital bio medical signal called ECG. This electrocardiogram is employed as a crucial source to gather the diagnostic information of a patient’s cardiopathy. The monitoring function of cardiac disease is diagnosed by documenting and handling the electrocardiogram (ECG impulses. In the recent years many research has been done and developing an enhanced method to identify the risk in the patient’s body condition by processing and analysing the ECG signal. This analysis of the signal helps to find the cardiac abnormalities, arrhythmias, and many other heart problems. ECG signal is processed to detect the variability in heart rhythm; heart rate variability is calculated based on the time interval between heart beats. Heart Rate Variability HRV is measured by the variation in the beat to beat interval. The Heart rate Variability (HRV is an essential aspect to diagnose the properties of the heart. Recent development enhances the potential with the aid of non-linear metrics in reference point with feature selection. In this paper, the fundamental elements are taken from the ECG signal for feature selection process where Bat algorithm is employed for feature selection to predict the best feature and presented to the classifier for accurate classification. The popular machine learning algorithm ELM is taken for classification, integrated with evolutionary algorithm named Self- Adaptive Differential Evolution Extreme Learning Machine SADEELM to improve the reliability of classification. It combines Effective Fuzzy Kohonen clustering network (EFKCN to be able to increase the accuracy of the effect for HRV transmission classification. Hence, it is observed that the experiment carried out unveils that the precision is improved by the SADE-ELM method and concurrently optimizes the computation time.

  18. Adaptive Equalizer Using Selective Partial Update Algorithm and Selective Regressor Affine Projection Algorithm over Shallow Water Acoustic Channels

    Directory of Open Access Journals (Sweden)

    Masoumeh Soflaei

    2014-01-01

    Full Text Available One of the most important problems of reliable communications in shallow water channels is intersymbol interference (ISI which is due to scattering from surface and reflecting from bottom. Using adaptive equalizers in receiver is one of the best suggested ways for overcoming this problem. In this paper, we apply the family of selective regressor affine projection algorithms (SR-APA and the family of selective partial update APA (SPU-APA which have low computational complexity that is one of the important factors that influences adaptive equalizer performance. We apply experimental data from Strait of Hormuz for examining the efficiency of the proposed methods over shallow water channel. We observe that the values of the steady-state mean square error (MSE of SR-APA and SPU-APA decrease by 5.8 (dB and 5.5 (dB, respectively, in comparison with least mean square (LMS algorithm. Also the families of SPU-APA and SR-APA have better convergence speed than LMS type algorithm.

  19. The production route selection algorithm in virtual manufacturing networks

    Science.gov (United States)

    Krenczyk, D.; Skolud, B.; Olender, M.

    2017-08-01

    The increasing requirements and competition in the global market are challenges for the companies profitability in production and supply chain management. This situation became the basis for construction of virtual organizations, which are created in response to temporary needs. The problem of the production flow planning in virtual manufacturing networks is considered. In the paper the algorithm of the production route selection from the set of admissible routes, which meets the technology and resource requirements and in the context of the criterion of minimum cost is proposed.

  20. Feature selection and multi-kernel learning for adaptive graph regularized nonnegative matrix factorization

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-09-20

    Nonnegative matrix factorization (NMF), a popular part-based representation technique, does not capture the intrinsic local geometric structure of the data space. Graph regularized NMF (GNMF) was recently proposed to avoid this limitation by regularizing NMF with a nearest neighbor graph constructed from the input data set. However, GNMF has two main bottlenecks. First, using the original feature space directly to construct the graph is not necessarily optimal because of the noisy and irrelevant features and nonlinear distributions of data samples. Second, one possible way to handle the nonlinear distribution of data samples is by kernel embedding. However, it is often difficult to choose the most suitable kernel. To solve these bottlenecks, we propose two novel graph-regularized NMF methods, AGNMFFS and AGNMFMK, by introducing feature selection and multiple-kernel learning to the graph regularized NMF, respectively. Instead of using a fixed graph as in GNMF, the two proposed methods learn the nearest neighbor graph that is adaptive to the selected features and learned multiple kernels, respectively. For each method, we propose a unified objective function to conduct feature selection/multi-kernel learning, NMF and adaptive graph regularization simultaneously. We further develop two iterative algorithms to solve the two optimization problems. Experimental results on two challenging pattern classification tasks demonstrate that the proposed methods significantly outperform state-of-the-art data representation methods.

  1. An adaptive clustering algorithm for image matching based on corner feature

    Science.gov (United States)

    Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song

    2018-04-01

    The traditional image matching algorithm always can not balance the real-time and accuracy better, to solve the problem, an adaptive clustering algorithm for image matching based on corner feature is proposed in this paper. The method is based on the similarity of the matching pairs of vector pairs, and the adaptive clustering is performed on the matching point pairs. Harris corner detection is carried out first, the feature points of the reference image and the perceived image are extracted, and the feature points of the two images are first matched by Normalized Cross Correlation (NCC) function. Then, using the improved algorithm proposed in this paper, the matching results are clustered to reduce the ineffective operation and improve the matching speed and robustness. Finally, the Random Sample Consensus (RANSAC) algorithm is used to match the matching points after clustering. The experimental results show that the proposed algorithm can effectively eliminate the most wrong matching points while the correct matching points are retained, and improve the accuracy of RANSAC matching, reduce the computation load of whole matching process at the same time.

  2. EEG-based recognition of video-induced emotions: selecting subject-independent feature set.

    Science.gov (United States)

    Kortelainen, Jukka; Seppänen, Tapio

    2013-01-01

    Emotions are fundamental for everyday life affecting our communication, learning, perception, and decision making. Including emotions into the human-computer interaction (HCI) could be seen as a significant step forward offering a great potential for developing advanced future technologies. While the electrical activity of the brain is affected by emotions, offers electroencephalogram (EEG) an interesting channel to improve the HCI. In this paper, the selection of subject-independent feature set for EEG-based emotion recognition is studied. We investigate the effect of different feature sets in classifying person's arousal and valence while watching videos with emotional content. The classification performance is optimized by applying a sequential forward floating search algorithm for feature selection. The best classification rate (65.1% for arousal and 63.0% for valence) is obtained with a feature set containing power spectral features from the frequency band of 1-32 Hz. The proposed approach substantially improves the classification rate reported in the literature. In future, further analysis of the video-induced EEG changes including the topographical differences in the spectral features is needed.

  3. A Soft Computing Based Approach Using Modified Selection Strategy for Feature Reduction of Medical Systems

    Directory of Open Access Journals (Sweden)

    Kursat Zuhtuogullari

    2013-01-01

    Full Text Available The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data.

  4. A soft computing based approach using modified selection strategy for feature reduction of medical systems.

    Science.gov (United States)

    Zuhtuogullari, Kursat; Allahverdi, Novruz; Arikan, Nihat

    2013-01-01

    The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes) with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data.

  5. Inference for feature selection using the Lasso with high-dimensional data

    DEFF Research Database (Denmark)

    Brink-Jensen, Kasper; Ekstrøm, Claus Thorn

    2014-01-01

    Penalized regression models such as the Lasso have proved useful for variable selection in many fields - especially for situations with high-dimensional data where the numbers of predictors far exceeds the number of observations. These methods identify and rank variables of importance but do...... not generally provide any inference of the selected variables. Thus, the variables selected might be the "most important" but need not be significant. We propose a significance test for the selection found by the Lasso. We introduce a procedure that computes inference and p-values for features chosen...... by the Lasso. This method rephrases the null hypothesis and uses a randomization approach which ensures that the error rate is controlled even for small samples. We demonstrate the ability of the algorithm to compute $p$-values of the expected magnitude with simulated data using a multitude of scenarios...

  6. An efficient fractal image coding algorithm using unified feature and DCT

    International Nuclear Information System (INIS)

    Zhou Yiming; Zhang Chao; Zhang Zengke

    2009-01-01

    Fractal image compression is a promising technique to improve the efficiency of image storage and image transmission with high compression ratio, however, the huge time consumption for the fractal image coding is a great obstacle to the practical applications. In order to improve the fractal image coding, efficient fractal image coding algorithms using a special unified feature and a DCT coder are proposed in this paper. Firstly, based on a necessary condition to the best matching search rule during fractal image coding, the fast algorithm using a special unified feature (UFC) is addressed, and it can reduce the search space obviously and exclude most inappropriate matching subblocks before the best matching search. Secondly, on the basis of UFC algorithm, in order to improve the quality of the reconstructed image, a DCT coder is combined to construct a hybrid fractal image algorithm (DUFC). Experimental results show that the proposed algorithms can obtain good quality of the reconstructed images and need much less time than the baseline fractal coding algorithm.

  7. Object tracking system using a VSW algorithm based on color and point features

    Directory of Open Access Journals (Sweden)

    Lim Hye-Youn

    2011-01-01

    Full Text Available Abstract An object tracking system using a variable search window (VSW algorithm based on color and feature points is proposed. A meanshift algorithm is an object tracking technique that works according to color probability distributions. An advantage of this algorithm based on color is that it is robust to specific color objects; however, a disadvantage is that it is sensitive to non-specific color objects due to illumination and noise. Therefore, to offset this weakness, it presents the VSW algorithm based on robust feature points for the accurate tracking of moving objects. The proposed method extracts the feature points of a detected object which is the region of interest (ROI, and generates a VSW using the given information which is the positions of extracted feature points. The goal of this paper is to achieve an efficient and effective object tracking system that meets the accurate tracking of moving objects. Through experiments, the object tracking system is implemented that it performs more precisely than existing techniques.

  8. VHDL Implementation of Feature-Extraction Algorithm for the PANDA Electromagnetic Calorimeter

    NARCIS (Netherlands)

    Kavatsyuk, M.; Guliyev, E.; Lemmens, P. J. J.; Löhner, H.; Tambave, G.

    2010-01-01

    The feature-extraction algorithm, developed for the digital front-end electronics of the electromagnetic calorimeter of the PANDA detector at the future FAIR facility, is implemented in VHDL for a commercial 16 bit 100 MHz sampling ADC. The use of modified firmware with the running on-line

  9. VHDL implementation of feature-extraction algorithm for the PANDA electromagnetic calorimeter

    NARCIS (Netherlands)

    Guliyev, E.; Kavatsyuk, M.; Lemmens, P. J. J.; Tambave, G.; Löhner, H.

    2012-01-01

    A simple, efficient, and robust feature-extraction algorithm, developed for the digital front-end electronics of the electromagnetic calorimeter of the PANDA spectrometer at FAIR, Darmstadt, is implemented in VHDL for a commercial 16 bit 100 MHz sampling ADC. The source-code is available as an

  10. Classification of underground pipe scanned images using feature extraction and neuro-fuzzy algorithm.

    Science.gov (United States)

    Sinha, S K; Karray, F

    2002-01-01

    Pipeline surface defects such as holes and cracks cause major problems for utility managers, particularly when the pipeline is buried under the ground. Manual inspection for surface defects in the pipeline has a number of drawbacks, including subjectivity, varying standards, and high costs. Automatic inspection system using image processing and artificial intelligence techniques can overcome many of these disadvantages and offer utility managers an opportunity to significantly improve quality and reduce costs. A recognition and classification of pipe cracks using images analysis and neuro-fuzzy algorithm is proposed. In the preprocessing step the scanned images of pipe are analyzed and crack features are extracted. In the classification step the neuro-fuzzy algorithm is developed that employs a fuzzy membership function and error backpropagation algorithm. The idea behind the proposed approach is that the fuzzy membership function will absorb variation of feature values and the backpropagation network, with its learning ability, will show good classification efficiency.

  11. Natural selection and algorithmic design of mRNA.

    Science.gov (United States)

    Cohen, Barry; Skiena, Steven

    2003-01-01

    Messenger RNA (mRNA) sequences serve as templates for proteins according to the triplet code, in which each of the 4(3) = 64 different codons (sequences of three consecutive nucleotide bases) in RNA either terminate transcription or map to one of the 20 different amino acids (or residues) which build up proteins. Because there are more codons than residues, there is inherent redundancy in the coding. Certain residues (e.g., tryptophan) have only a single corresponding codon, while other residues (e.g., arginine) have as many as six corresponding codons. This freedom implies that the number of possible RNA sequences coding for a given protein grows exponentially in the length of the protein. Thus nature has wide latitude to select among mRNA sequences which are informationally equivalent, but structurally and energetically divergent. In this paper, we explore how nature takes advantage of this freedom and how to algorithmically design structures more energetically favorable than have been built through natural selection. In particular: (1) Natural Selection--we perform the first large-scale computational experiment comparing the stability of mRNA sequences from a variety of organisms to random synonymous sequences which respect the codon preferences of the organism. This experiment was conducted on over 27,000 sequences from 34 microbial species with 36 genomic structures. We provide evidence that in all genomic structures highly stable sequences are disproportionately abundant, and in 19 of 36 cases highly unstable sequences are disproportionately abundant. This suggests that the stability of mRNA sequences is subject to natural selection. (2) Artificial Selection--motivated by these biological results, we examine the algorithmic problem of designing the most stable and unstable mRNA sequences which code for a target protein. We give a polynomial-time dynamic programming solution to the most stable sequence problem (MSSP), which is asymptotically no more complex

  12. Feature Genes Selection Using Supervised Locally Linear Embedding and Correlation Coefficient for Microarray Classification.

    Science.gov (United States)

    Xu, Jiucheng; Mu, Huiyu; Wang, Yun; Huang, Fangzhou

    2018-01-01

    The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC 2 ), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible.

  13. Mining potential biomarkers associated with space flight in Caenorhabditis elegans experienced Shenzhou-8 mission with multiple feature selection techniques

    International Nuclear Information System (INIS)

    Zhao, Lei; Gao, Ying; Mi, Dong; Sun, Yeqing

    2016-01-01

    Highlights: • A combined algorithm is proposed to mine biomarkers of spaceflight in C. elegans. • This algorithm makes the feature selection more reliable and robust. • Apply this algorithm to predict 17 positive biomarkers to space environment stress. • The strategy can be used as a general method to select important features. - Abstract: To identify the potential biomarkers associated with space flight, a combined algorithm, which integrates the feature selection techniques, was used to deal with the microarray datasets of Caenorhabditis elegans obtained in the Shenzhou-8 mission. Compared with the ground control treatment, a total of 86 differentially expressed (DE) genes in responses to space synthetic environment or space radiation environment were identified by two filter methods. And then the top 30 ranking genes were selected by the random forest algorithm. Gene Ontology annotation and functional enrichment analyses showed that these genes were mainly associated with metabolism process. Furthermore, clustering analysis showed that 17 genes among these are positive, including 9 for space synthetic environment and 8 for space radiation environment only. These genes could be used as the biomarkers to reflect the space environment stresses. In addition, we also found that microgravity is the main stress factor to change the expression patterns of biomarkers for the short-duration spaceflight.

  14. Mining potential biomarkers associated with space flight in Caenorhabditis elegans experienced Shenzhou-8 mission with multiple feature selection techniques

    Energy Technology Data Exchange (ETDEWEB)

    Zhao, Lei [Institute of Environmental Systems Biology, College of Environmental Science and Engineering, Dalian Maritime University, Dalian 116026 (China); Gao, Ying [Center of Medical Physics and Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Shushanhu Road 350, Hefei 230031 (China); Mi, Dong, E-mail: mid@dlmu.edu.cn [Department of Physics, Dalian Maritime University, Dalian 116026 (China); Sun, Yeqing, E-mail: yqsun@dlmu.edu.cn [Institute of Environmental Systems Biology, College of Environmental Science and Engineering, Dalian Maritime University, Dalian 116026 (China)

    2016-09-15

    Highlights: • A combined algorithm is proposed to mine biomarkers of spaceflight in C. elegans. • This algorithm makes the feature selection more reliable and robust. • Apply this algorithm to predict 17 positive biomarkers to space environment stress. • The strategy can be used as a general method to select important features. - Abstract: To identify the potential biomarkers associated with space flight, a combined algorithm, which integrates the feature selection techniques, was used to deal with the microarray datasets of Caenorhabditis elegans obtained in the Shenzhou-8 mission. Compared with the ground control treatment, a total of 86 differentially expressed (DE) genes in responses to space synthetic environment or space radiation environment were identified by two filter methods. And then the top 30 ranking genes were selected by the random forest algorithm. Gene Ontology annotation and functional enrichment analyses showed that these genes were mainly associated with metabolism process. Furthermore, clustering analysis showed that 17 genes among these are positive, including 9 for space synthetic environment and 8 for space radiation environment only. These genes could be used as the biomarkers to reflect the space environment stresses. In addition, we also found that microgravity is the main stress factor to change the expression patterns of biomarkers for the short-duration spaceflight.

  15. Improving mass candidate detection in mammograms via feature maxima propagation and local feature selection.

    Science.gov (United States)

    Melendez, Jaime; Sánchez, Clara I; van Ginneken, Bram; Karssemeijer, Nico

    2014-08-01

    Mass candidate detection is a crucial component of multistep computer-aided detection (CAD) systems. It is usually performed by combining several local features by means of a classifier. When these features are processed on a per-image-location basis (e.g., for each pixel), mismatching problems may arise while constructing feature vectors for classification, which is especially true when the behavior expected from the evaluated features is a peaked response due to the presence of a mass. In this study, two of these problems, consisting of maxima misalignment and differences of maxima spread, are identified and two solutions are proposed. The first proposed method, feature maxima propagation, reproduces feature maxima through their neighboring locations. The second method, local feature selection, combines different subsets of features for different feature vectors associated with image locations. Both methods are applied independently and together. The proposed methods are included in a mammogram-based CAD system intended for mass detection in screening. Experiments are carried out with a database of 382 digital cases. Sensitivity is assessed at two sets of operating points. The first one is the interval of 3.5-15 false positives per image (FPs/image), which is typical for mass candidate detection. The second one is 1 FP/image, which allows to estimate the quality of the mass candidate detector's output for use in subsequent steps of the CAD system. The best results are obtained when the proposed methods are applied together. In that case, the mean sensitivity in the interval of 3.5-15 FPs/image significantly increases from 0.926 to 0.958 (p < 0.0002). At the lower rate of 1 FP/image, the mean sensitivity improves from 0.628 to 0.734 (p < 0.0002). Given the improved detection performance, the authors believe that the strategies proposed in this paper can render mass candidate detection approaches based on image location classification more robust to feature

  16. Automatic Correction Algorithm of Hyfrology Feature Attribute in National Geographic Census

    Science.gov (United States)

    Li, C.; Guo, P.; Liu, X.

    2017-09-01

    A subset of the attributes of hydrologic features data in national geographic census are not clear, the current solution to this problem was through manual filling which is inefficient and liable to mistakes. So this paper proposes an automatic correction algorithm of hydrologic features attribute. Based on the analysis of the structure characteristics and topological relation, we put forward three basic principles of correction which include network proximity, structure robustness and topology ductility. Based on the WJ-III map workstation, we realize the automatic correction of hydrologic features. Finally, practical data is used to validate the method. The results show that our method is highly reasonable and efficient.

  17. Comparisons of feature extraction algorithm based on unmanned aerial vehicle image

    Directory of Open Access Journals (Sweden)

    Xi Wenfei

    2017-07-01

    Full Text Available Feature point extraction technology has become a research hotspot in the photogrammetry and computer vision. The commonly used point feature extraction operators are SIFT operator, Forstner operator, Harris operator and Moravec operator, etc. With the high spatial resolution characteristics, UAV image is different from the traditional aviation image. Based on these characteristics of the unmanned aerial vehicle (UAV, this paper uses several operators referred above to extract feature points from the building images, grassland images, shrubbery images, and vegetable greenhouses images. Through the practical case analysis, the performance, advantages, disadvantages and adaptability of each algorithm are compared and analyzed by considering their speed and accuracy. Finally, the suggestions of how to adapt different algorithms in diverse environment are proposed.

  18. The algorithm of fast image stitching based on multi-feature extraction

    Science.gov (United States)

    Yang, Chunde; Wu, Ge; Shi, Jing

    2018-05-01

    This paper proposed an improved image registration method combining Hu-based invariant moment contour information and feature points detection, aiming to solve the problems in traditional image stitching algorithm, such as time-consuming feature points extraction process, redundant invalid information overload and inefficiency. First, use the neighborhood of pixels to extract the contour information, employing the Hu invariant moment as similarity measure to extract SIFT feature points in those similar regions. Then replace the Euclidean distance with Hellinger kernel function to improve the initial matching efficiency and get less mismatching points, further, estimate affine transformation matrix between the images. Finally, local color mapping method is adopted to solve uneven exposure, using the improved multiresolution fusion algorithm to fuse the mosaic images and realize seamless stitching. Experimental results confirm high accuracy and efficiency of method proposed in this paper.

  19. License Application Design Selection Feature Report: Aging and Blending

    International Nuclear Information System (INIS)

    Coltoni, B.; Anderson, M.J.

    1999-01-01

    The purpose of this document is to evaluate the concepts of Aging and Blending for waste sent to the Monitored Geologic Repository (MGR). These design features are based on pre-emplacement treatment of the waste stream. The envelope of the analysis has been performed under the direction of the License Application Design Selection Team (LADST), which advocated utilizing the Viability Assessment (VA) repository design (DOE 1998c) as the basis. Therefore, this evaluation attempts to modify the VA design only to the extent that Aging and Blending can be accomplished. This modified VA design will be contrasted to the VA Design and the difference in design, costs, and performance will be presented

  20. Notes on the evolution of feature selection methodology

    Czech Academy of Sciences Publication Activity Database

    Somol, Petr; Novovičová, Jana; Pudil, Pavel

    2007-01-01

    Roč. 43, č. 5 (2007), s. 713-730 ISSN 0023-5954 R&D Projects: GA ČR GA102/07/1594; GA MŠk 1M0572; GA AV ČR IAA2075302 EU Projects: European Commission(XE) 507752 - MUSCLE Grant - others:GA MŠk(CZ) 2C06019 Institutional research plan: CEZ:AV0Z10750506 Keywords : feature selection * branch and bound * sequential search * mixture model Subject RIV: IN - Informatics, Computer Science Impact factor: 0.552, year: 2007

  1. Conditional Mutual Information Based Feature Selection for Classification Task

    Czech Academy of Sciences Publication Activity Database

    Novovičová, Jana; Somol, Petr; Haindl, Michal; Pudil, Pavel

    2007-01-01

    Roč. 45, č. 4756 (2007), s. 417-426 ISSN 0302-9743 R&D Projects: GA MŠk 1M0572; GA AV ČR IAA2075302 EU Projects: European Commission(XE) 507752 - MUSCLE Grant - others:GA MŠk(CZ) 2C06019 Institutional research plan: CEZ:AV0Z10750506 Keywords : Pattern classification * feature selection * conditional mutual information * text categorization Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.402, year: 2005

  2. Structural features of subtype-selective EP receptor modulators.

    Science.gov (United States)

    Markovič, Tijana; Jakopin, Žiga; Dolenc, Marija Sollner; Mlinarič-Raščan, Irena

    2017-01-01

    Prostaglandin E2 is a potent endogenous molecule that binds to four different G-protein-coupled receptors: EP1-4. Each of these receptors is a valuable drug target, with distinct tissue localisation and signalling pathways. We review the structural features of EP modulators required for subtype-selective activity, as well as the structural requirements for improved pharmacokinetic parameters. Novel EP receptor subtype selective agonists and antagonists appear to be valuable drug candidates in the therapy of many pathophysiological states, including ulcerative colitis, glaucoma, bone healing, B cell lymphoma, neurological diseases, among others, which have been studied in vitro, in vivo and in early phase clinical trials. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  3. A feature matching and fusion-based positive obstacle detection algorithm for field autonomous land vehicles

    Directory of Open Access Journals (Sweden)

    Tao Wu

    2017-03-01

    Full Text Available Positive obstacles will cause damage to field robotics during traveling in field. Field autonomous land vehicle is a typical field robotic. This article presents a feature matching and fusion-based algorithm to detect obstacles using LiDARs for field autonomous land vehicles. There are three main contributions: (1 A novel setup method of compact LiDAR is introduced. This method improved the LiDAR data density and reduced the blind region of the LiDAR sensor. (2 A mathematical model is deduced under this new setup method. The ideal scan line is generated by using the deduced mathematical model. (3 Based on the proposed mathematical model, a feature matching and fusion (FMAF-based algorithm is presented in this article, which is employed to detect obstacles. Experimental results show that the performance of the proposed algorithm is robust and stable, and the computing time is reduced by an order of two magnitudes by comparing with other exited algorithms. This algorithm has been perfectly applied to our autonomous land vehicle, which has won the champion in the challenge of Chinese “Overcome Danger 2014” ground unmanned vehicle.

  4. Pattern Classification Using an Olfactory Model with PCA Feature Selection in Electronic Noses: Study and Application

    Directory of Open Access Journals (Sweden)

    Junbao Zheng

    2012-03-01

    Full Text Available Biologically-inspired models and algorithms are considered as promising sensor array signal processing methods for electronic noses. Feature selection is one of the most important issues for developing robust pattern recognition models in machine learning. This paper describes an investigation into the classification performance of a bionic olfactory model with the increase of the dimensions of input feature vector (outer factor as well as its parallel channels (inner factor. The principal component analysis technique was applied for feature selection and dimension reduction. Two data sets of three classes of wine derived from different cultivars and five classes of green tea derived from five different provinces of China were used for experiments. In the former case the results showed that the average correct classification rate increased as more principal components were put in to feature vector. In the latter case the results showed that sufficient parallel channels should be reserved in the model to avoid pattern space crowding. We concluded that 6~8 channels of the model with principal component feature vector values of at least 90% cumulative variance is adequate for a classification task of 3~5 pattern classes considering the trade-off between time consumption and classification rate.

  5. A novel relational regularization feature selection method for joint regression and classification in AD diagnosis.

    Science.gov (United States)

    Zhu, Xiaofeng; Suk, Heung-Il; Wang, Li; Lee, Seong-Whan; Shen, Dinggang

    2017-05-01

    In this paper, we focus on joint regression and classification for Alzheimer's disease diagnosis and propose a new feature selection method by embedding the relational information inherent in the observations into a sparse multi-task learning framework. Specifically, the relational information includes three kinds of relationships (such as feature-feature relation, response-response relation, and sample-sample relation), for preserving three kinds of the similarity, such as for the features, the response variables, and the samples, respectively. To conduct feature selection, we first formulate the objective function by imposing these three relational characteristics along with an ℓ 2,1 -norm regularization term, and further propose a computationally efficient algorithm to optimize the proposed objective function. With the dimension-reduced data, we train two support vector regression models to predict the clinical scores of ADAS-Cog and MMSE, respectively, and also a support vector classification model to determine the clinical label. We conducted extensive experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset to validate the effectiveness of the proposed method. Our experimental results showed the efficacy of the proposed method in enhancing the performances of both clinical scores prediction and disease status identification, compared to the state-of-the-art methods. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. Feature Selection and Kernel Learning for Local Learning-Based Clustering.

    Science.gov (United States)

    Zeng, Hong; Cheung, Yiu-ming

    2011-08-01

    The performance of the most clustering algorithms highly relies on the representation of data in the input space or the Hilbert space of kernel methods. This paper is to obtain an appropriate data representation through feature selection or kernel learning within the framework of the Local Learning-Based Clustering (LLC) (Wu and Schölkopf 2006) method, which can outperform the global learning-based ones when dealing with the high-dimensional data lying on manifold. Specifically, we associate a weight to each feature or kernel and incorporate it into the built-in regularization of the LLC algorithm to take into account the relevance of each feature or kernel for the clustering. Accordingly, the weights are estimated iteratively in the clustering process. We show that the resulting weighted regularization with an additional constraint on the weights is equivalent to a known sparse-promoting penalty. Hence, the weights of those irrelevant features or kernels can be shrunk toward zero. Extensive experiments show the efficacy of the proposed methods on the benchmark data sets.

  7. Optimized hyperspectral band selection using hybrid genetic algorithm and gravitational search algorithm

    Science.gov (United States)

    Zhang, Aizhu; Sun, Genyun; Wang, Zhenjie

    2015-12-01

    The serious information redundancy in hyperspectral images (HIs) cannot contribute to the data analysis accuracy, instead it require expensive computational resources. Consequently, to identify the most useful and valuable information from the HIs, thereby improve the accuracy of data analysis, this paper proposed a novel hyperspectral band selection method using the hybrid genetic algorithm and gravitational search algorithm (GA-GSA). In the proposed method, the GA-GSA is mapped to the binary space at first. Then, the accuracy of the support vector machine (SVM) classifier and the number of selected spectral bands are utilized to measure the discriminative capability of the band subset. Finally, the band subset with the smallest number of spectral bands as well as covers the most useful and valuable information is obtained. To verify the effectiveness of the proposed method, studies conducted on an AVIRIS image against two recently proposed state-of-the-art GSA variants are presented. The experimental results revealed the superiority of the proposed method and indicated that the method can indeed considerably reduce data storage costs and efficiently identify the band subset with stable and high classification precision.

  8. Features of mechanical snubbers and the method of selection

    International Nuclear Information System (INIS)

    Sunakoda, Katsuaki

    1978-01-01

    In the oil snubbers used in the high radiation environment of nuclear power stations, gas generation from oil and the deterioration of rubber material for sealing occur due to radiation damage, therefore periodical inspection and replacement are required during operation. The mechanical snubbers developed as aseismatic supporters in place of oil snubbers have entered the stage of practical use, and are made by two companies in USA and a company in Japan. Their features as compared with oil snubbers are as follows. The cost and time required for the maintenance were made as small as possible because the increase of the service life of mechanical components can be expected. The temperature dependence of mechanical snubbers is small. The matters demanding attention in the maintenance are the secular change of lubricating oil and the effect of radiation, and the rust prevention of ball screw bearings. These problems are being studied by Power Reactor and Nuclear Fuel Development Corp. for the fast prototype reactor Monju. The structural feature is to convert the thrust movement of equipments and pipings due to thermal expansion and contraction or earthquakes into rotating motion, using ball screws. The features and the construction of SMS type mechanical snubbers, the test and inspection prior to their shipping, the method of selection, and the method of handling them in actual places are explained. (Kako, I.)

  9. Relevance feature selection of modal frequency-ambient condition pattern recognition in structural health assessment for reinforced concrete buildings

    Directory of Open Access Journals (Sweden)

    He-Qing Mu

    2016-08-01

    Full Text Available Modal frequency is an important indicator for structural health assessment. Previous studies have shown that this indicator is substantially affected by the fluctuation of ambient conditions, such as temperature and humidity. Therefore, recognizing the pattern between modal frequency and ambient conditions is necessary for reliable long-term structural health assessment. In this article, a novel machine-learning algorithm is proposed to automatically select relevance features in modal frequency-ambient condition pattern recognition based on structural dynamic response and ambient condition measurement. In contrast to the traditional feature selection approaches by examining a large number of combinations of extracted features, the proposed algorithm conducts continuous relevance feature selection by introducing a sophisticated hyperparameterization on the weight parameter vector controlling the relevancy of different features in the prediction model. The proposed algorithm is then utilized for structural health assessment for a reinforced concrete building based on 1-year daily measurements. It turns out that the optimal model class including the relevance features for each vibrational mode is capable to capture the pattern between the corresponding modal frequency and the ambient conditions.

  10. Determination of Selection Method in Genetic Algorithm for Land Suitability

    Directory of Open Access Journals (Sweden)

    Irfianti Asti Dwi

    2016-01-01

    Full Text Available Genetic Algoirthm is one alternative solution in the field of modeling optimization, automatic programming and machine learning. The purpose of the study was to compare some type of selection methods in Genetic Algorithm for land suitability. Contribution of this research applies the best method to develop region based horticultural commodities. This testing is done by comparing the three methods on the method of selection, the Roulette Wheel, Tournament Selection and Stochastic Universal Sampling. Parameters of the locations used in the test scenarios include Temperature = 27°C, Rainfall = 1200 mm, hummidity = 30%, Cluster fruit = 4, Crossover Probabiitiy (Pc = 0.6, Mutation Probabilty (Pm = 0.2 and Epoch = 10. The second test epoch incluides location parameters consist of Temperature = 30°C, Rainfall = 2000 mm, Humidity = 35%, Cluster fruit = 5, Crossover Probability (Pc = 0.7, Mutation Probability (Pm = 0.3 and Epoch 10. The conclusion of this study shows that the Roulette Wheel is the best method because it produces more stable and fitness value than the other two methods.

  11. Dynamics of the evolution of learning algorithms by selection

    International Nuclear Information System (INIS)

    Neirotti, Juan Pablo; Caticha, Nestor

    2003-01-01

    We study the evolution of artificial learning systems by means of selection. Genetic programming is used to generate populations of programs that implement algorithms used by neural network classifiers to learn a rule in a supervised learning scenario. In contrast to concentrating on final results, which would be the natural aim while designing good learning algorithms, we study the evolution process. Phenotypic and genotypic entropies, which describe the distribution of fitness and of symbols, respectively, are used to monitor the dynamics. We identify significant functional structures responsible for the improvements in the learning process. In particular, some combinations of variables and operators are useful in assessing performance in rule extraction and can thus implement annealing of the learning schedule. We also find combinations that can signal surprise, measured on a single example, by the difference between predicted and correct classification. When such favorable structures appear, they are disseminated on very short time scales throughout the population. Due to such abruptness they can be thought of as dynamical transitions. But foremost, we find a strict temporal order of such discoveries. Structures that measure performance are never useful before those for measuring surprise. Invasions of the population by such structures in the reverse order were never observed. Asymptotically, the generalization ability approaches Bayesian results

  12. Feature selection for anomaly–based network intrusion detection using cluster validity indices

    CSIR Research Space (South Africa)

    Naidoo, Tyrone

    2015-09-01

    Full Text Available data, which is rarely available in operational networks. It uses normalized cluster validity indices as an objective function that is optimized over the search space of candidate feature subsets via a genetic algorithm. Feature sets produced...

  13. Feature Selection and Blind Source Separation in an EEG-Based Brain-Computer Interface

    Directory of Open Access Journals (Sweden)

    Michael H. Thaut

    2005-11-01

    Full Text Available Most EEG-based BCI systems make use of well-studied patterns of brain activity. However, those systems involve tasks that indirectly map to simple binary commands such as “yes” or “no” or require many weeks of biofeedback training. We hypothesized that signal processing and machine learning methods can be used to discriminate EEG in a direct “yes”/“no” BCI from a single session. Blind source separation (BSS and spectral transformations of the EEG produced a 180-dimensional feature space. We used a modified genetic algorithm (GA wrapped around a support vector machine (SVM classifier to search the space of feature subsets. The GA-based search found feature subsets that outperform full feature sets and random feature subsets. Also, BSS transformations of the EEG outperformed the original time series, particularly in conjunction with a subset search of both spaces. The results suggest that BSS and feature selection can be used to improve the performance of even a “direct,” single-session BCI.

  14. Feature selection using angle modulated simulated Kalman filter for peak classification of EEG signals.

    Science.gov (United States)

    Adam, Asrul; Ibrahim, Zuwairie; Mokhtar, Norrima; Shapiai, Mohd Ibrahim; Mubin, Marizan; Saad, Ismail

    2016-01-01

    In the existing electroencephalogram (EEG) signals peak classification research, the existing models, such as Dumpala, Acir, Liu, and Dingle peak models, employ different set of features. However, all these models may not be able to offer good performance for various applications and it is found to be problem dependent. Therefore, the objective of this study is to combine all the associated features from the existing models before selecting the best combination of features. A new optimization algorithm, namely as angle modulated simulated Kalman filter (AMSKF) will be employed as feature selector. Also, the neural network random weight method is utilized in the proposed AMSKF technique as a classifier. In the conducted experiment, 11,781 samples of peak candidate are employed in this study for the validation purpose. The samples are collected from three different peak event-related EEG signals of 30 healthy subjects; (1) single eye blink, (2) double eye blink, and (3) eye movement signals. The experimental results have shown that the proposed AMSKF feature selector is able to find the best combination of features and performs at par with the existing related studies of epileptic EEG events classification.

  15. Gene selection using hybrid binary black hole algorithm and modified binary particle swarm optimization.

    Science.gov (United States)

    Pashaei, Elnaz; Pashaei, Elham; Aydin, Nizamettin

    2018-04-14

    In cancer classification, gene selection is an important data preprocessing technique, but it is a difficult task due to the large search space. Accordingly, the objective of this study is to develop a hybrid meta-heuristic Binary Black Hole Algorithm (BBHA) and Binary Particle Swarm Optimization (BPSO) (4-2) model that emphasizes gene selection. In this model, the BBHA is embedded in the BPSO (4-2) algorithm to make the BPSO (4-2) more effective and to facilitate the exploration and exploitation of the BPSO (4-2) algorithm to further improve the performance. This model has been associated with Random Forest Recursive Feature Elimination (RF-RFE) pre-filtering technique. The classifiers which are evaluated in the proposed framework are Sparse Partial Least Squares Discriminant Analysis (SPLSDA); k-nearest neighbor and Naive Bayes. The performance of the proposed method was evaluated on two benchmark and three clinical microarrays. The experimental results and statistical analysis confirm the better performance of the BPSO (4-2)-BBHA compared with the BBHA, the BPSO (4-2) and several state-of-the-art methods in terms of avoiding local minima, convergence rate, accuracy and number of selected genes. The results also show that the BPSO (4-2)-BBHA model can successfully identify known biologically and statistically significant genes from the clinical datasets. Copyright © 2018 Elsevier Inc. All rights reserved.

  16. [Feature extraction for breast cancer data based on geometric algebra theory and feature selection using differential evolution].

    Science.gov (United States)

    Li, Jing; Hong, Wenxue

    2014-12-01

    The feature extraction and feature selection are the important issues in pattern recognition. Based on the geometric algebra representation of vector, a new feature extraction method using blade coefficient of geometric algebra was proposed in this study. At the same time, an improved differential evolution (DE) feature selection method was proposed to solve the elevated high dimension issue. The simple linear discriminant analysis was used as the classifier. The result of the 10-fold cross-validation (10 CV) classification of public breast cancer biomedical dataset was more than 96% and proved superior to that of the original features and traditional feature extraction method.

  17. Brake fault diagnosis using Clonal Selection Classification Algorithm (CSCA – A statistical learning approach

    Directory of Open Access Journals (Sweden)

    R. Jegadeeshwaran

    2015-03-01

    Full Text Available In automobile, brake system is an essential part responsible for control of the vehicle. Any failure in the brake system impacts the vehicle's motion. It will generate frequent catastrophic effects on the vehicle cum passenger's safety. Thus the brake system plays a vital role in an automobile and hence condition monitoring of the brake system is essential. Vibration based condition monitoring using machine learning techniques are gaining momentum. This study is one such attempt to perform the condition monitoring of a hydraulic brake system through vibration analysis. In this research, the performance of a Clonal Selection Classification Algorithm (CSCA for brake fault diagnosis has been reported. A hydraulic brake system test rig was fabricated. Under good and faulty conditions of a brake system, the vibration signals were acquired using a piezoelectric transducer. The statistical parameters were extracted from the vibration signal. The best feature set was identified for classification using attribute evaluator. The selected features were then classified using CSCA. The classification accuracy of such artificial intelligence technique has been compared with other machine learning approaches and discussed. The Clonal Selection Classification Algorithm performs better and gives the maximum classification accuracy (96% for the fault diagnosis of a hydraulic brake system.

  18. A Cross-Domain Collaborative Filtering Algorithm Based on Feature Construction and Locally Weighted Linear Regression.

    Science.gov (United States)

    Yu, Xu; Lin, Jun-Yu; Jiang, Feng; Du, Jun-Wei; Han, Ji-Zhong

    2018-01-01

    Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.

  19. A Cross-Domain Collaborative Filtering Algorithm Based on Feature Construction and Locally Weighted Linear Regression

    Directory of Open Access Journals (Sweden)

    Xu Yu

    2018-01-01

    Full Text Available Cross-domain collaborative filtering (CDCF solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR. We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.

  20. An Extended HITS Algorithm on Bipartite Network for Features Extraction of Online Customer Reviews

    Directory of Open Access Journals (Sweden)

    Chen Liu

    2018-05-01

    Full Text Available How to acquire useful information intelligently in the age of information explosion has become an important issue. In this context, sentiment analysis emerges with the growth of the need of information extraction. One of the most important tasks of sentiment analysis is feature extraction of entities in consumer reviews. This paper first constitutes a directed bipartite feature-sentiment relation network with a set of candidate features-sentiment pairs that is extracted by dependency syntax analysis from consumer reviews. Then, a novel method called MHITS which combines PMI with weighted HITS algorithm is proposed to rank these candidate product features to find out real product features. Empirical experiments indicate the effectiveness of our approach across different kinds and various data sizes of product. In addition, the effect of the proposed algorithm is not the same for the corpus with different proportions of the word pair that includes the “bad”, “good”, “poor”, “pretty good”, “not bad” these general collocation words.

  1. Road network selection for small-scale maps using an improved centrality-based algorithm

    Directory of Open Access Journals (Sweden)

    Roy Weiss

    2014-12-01

    Full Text Available The road network is one of the key feature classes in topographic maps and databases. In the task of deriving road networks for products at smaller scales, road network selection forms a prerequisite for all other generalization operators, and is thus a fundamental operation in the overall process of topographic map and database production. The objective of this work was to develop an algorithm for automated road network selection from a large-scale (1:10,000 to a small-scale database (1:200,000. The project was pursued in collaboration with swisstopo, the national mapping agency of Switzerland, with generic mapping requirements in mind. Preliminary experiments suggested that a selection algorithm based on betweenness centrality performed best for this purpose, yet also exposed problems. The main contribution of this paper thus consists of four extensions that address deficiencies of the basic centrality-based algorithm and lead to a significant improvement of the results. The first two extensions improve the formation of strokes concatenating the road segments, which is crucial since strokes provide the foundation upon which the network centrality measure is computed. Thus, the first extension ensures that roundabouts are detected and collapsed, thus avoiding interruptions of strokes by roundabouts, while the second introduces additional semantics in the process of stroke formation, allowing longer and more plausible strokes to built. The third extension detects areas of high road density (i.e., urban areas using density-based clustering and then locally increases the threshold of the centrality measure used to select road segments, such that more thinning takes place in those areas. Finally, since the basic algorithm tends to create dead-ends—which however are not tolerated in small-scale maps—the fourth extension reconnects these dead-ends to the main network, searching for the best path in the main heading of the dead-end.

  2. Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data.

    Science.gov (United States)

    Shah, M; Marchand, M; Corbeil, J

    2012-01-01

    One of the objectives of designing feature selection learning algorithms is to obtain classifiers that depend on a small number of attributes and have verifiable future performance guarantees. There are few, if any, approaches that successfully address the two goals simultaneously. To the best of our knowledge, such algorithms that give theoretical bounds on the future performance have not been proposed so far in the context of the classification of gene expression data. In this work, we investigate the premise of learning a conjunction (or disjunction) of decision stumps in Occam's Razor, Sample Compression, and PAC-Bayes learning settings for identifying a small subset of attributes that can be used to perform reliable classification tasks. We apply the proposed approaches for gene identification from DNA microarray data and compare our results to those of the well-known successful approaches proposed for the task. We show that our algorithm not only finds hypotheses with a much smaller number of genes while giving competitive classification accuracy but also having tight risk guarantees on future performance, unlike other approaches. The proposed approaches are general and extensible in terms of both designing novel algorithms and application to other domains.

  3. Multi-source feature extraction and target recognition in wireless sensor networks based on adaptive distributed wavelet compression algorithms

    Science.gov (United States)

    Hortos, William S.

    2008-04-01

    participating nodes. Therefore, the feature-extraction method based on the Haar DWT is presented that employs a maximum-entropy measure to determine significant wavelet coefficients. Features are formed by calculating the energy of coefficients grouped around the competing clusters. A DWT-based feature extraction algorithm used for vehicle classification in WSNs can be enhanced by an added rule for selecting the optimal number of resolution levels to improve the correct classification rate and reduce energy consumption expended in local algorithm computations. Published field trial data for vehicular ground targets, measured with multiple sensor types, are used to evaluate the wavelet-assisted algorithms. Extracted features are used in established target recognition routines, e.g., the Bayesian minimum-error-rate classifier, to compare the effects on the classification performance of the wavelet compression. Simulations of feature sets and recognition routines at different resolution levels in target scenarios indicate the impact on classification rates, while formulas are provided to estimate reduction in resource use due to distributed compression.

  4. Effects of changing canopy directional reflectance on feature selection

    Science.gov (United States)

    Smith, J. A.; Oliver, R. E.; Kilpela, O. E.

    1973-01-01

    The use of a Monte Carlo model for generating sample directional reflectance data for two simplified target canopies at two different solar positions is reported. Successive iterations through the model permit the calculation of a mean vector and covariance matrix for canopy reflectance for varied sensor view angles. These data may then be used to calculate the divergence between the target distributions for various wavelength combinations and for these view angles. Results of a feature selection analysis indicate that different sets of wavelengths are optimum for target discrimination depending on sensor view angle and that the targets may be more easily discriminated for some scan angles than others. The time-varying behavior of these results is also pointed out.

  5. Comparisons and Selections of Features and Classifiers for Short Text Classification

    Science.gov (United States)

    Wang, Ye; Zhou, Zhi; Jin, Shan; Liu, Debin; Lu, Mi

    2017-10-01

    Short text is considerably different from traditional long text documents due to its shortness and conciseness, which somehow hinders the applications of conventional machine learning and data mining algorithms in short text classification. According to traditional artificial intelligence methods, we divide short text classification into three steps, namely preprocessing, feature selection and classifier comparison. In this paper, we have illustrated step-by-step how we approach our goals. Specifically, in feature selection, we compared the performance and robustness of the four methods of one-hot encoding, tf-idf weighting, word2vec and paragraph2vec, and in the classification part, we deliberately chose and compared Naive Bayes, Logistic Regression, Support Vector Machine, K-nearest Neighbor and Decision Tree as our classifiers. Then, we compared and analysed the classifiers horizontally with each other and vertically with feature selections. Regarding the datasets, we crawled more than 400,000 short text files from Shanghai and Shenzhen Stock Exchanges and manually labeled them into two classes, the big and the small. There are eight labels in the big class, and 59 labels in the small class.

  6. An Improved User Selection Algorithm in Multiuser MIMO Broadcast with Channel Prediction

    Science.gov (United States)

    Min, Zhi; Ohtsuki, Tomoaki

    In multiuser MIMO-BC (Multiple-Input Multiple-Output Broadcasting) systems, user selection is important to achieve multiuser diversity. The optimal user selection algorithm is to try all the combinations of users to find the user group that can achieve the multiuser diversity. Unfortunately, the high calculation cost of the optimal algorithm prevents its implementation. Thus, instead of the optimal algorithm, some suboptimal user selection algorithms were proposed based on semiorthogonality of user channel vectors. The purpose of this paper is to achieve multiuser diversity with a small amount of calculation. For this purpose, we propose a user selection algorithm that can improve the orthogonality of a selected user group. We also apply a channel prediction technique to a MIMO-BC system to get more accurate channel information at the transmitter. Simulation results show that the channel prediction can improve the accuracy of channel information for user selections, and the proposed user selection algorithm achieves higher sum rate capacity than the SUS (Semiorthogonal User Selection) algorithm. Also we discuss the setting of the algorithm threshold. As the result of a discussion on the calculation complexity, which uses the number of complex multiplications as the parameter, the proposed algorithm is shown to have a calculation complexity almost equal to that of the SUS algorithm, and they are much lower than that of the optimal user selection algorithm.

  7. Multi-sparse dictionary colorization algorithm based on the feature classification and detail enhancement

    Science.gov (United States)

    Yan, Dan; Bai, Lianfa; Zhang, Yi; Han, Jing

    2018-02-01

    For the problems of missing details and performance of the colorization based on sparse representation, we propose a conceptual model framework for colorizing gray-scale images, and then a multi-sparse dictionary colorization algorithm based on the feature classification and detail enhancement (CEMDC) is proposed based on this framework. The algorithm can achieve a natural colorized effect for a gray-scale image, and it is consistent with the human vision. First, the algorithm establishes a multi-sparse dictionary classification colorization model. Then, to improve the accuracy rate of the classification, the corresponding local constraint algorithm is proposed. Finally, we propose a detail enhancement based on Laplacian Pyramid, which is effective in solving the problem of missing details and improving the speed of image colorization. In addition, the algorithm not only realizes the colorization of the visual gray-scale image, but also can be applied to the other areas, such as color transfer between color images, colorizing gray fusion images, and infrared images.

  8. Parallel algorithm for determining motion vectors in ice floe images by matching edge features

    Science.gov (United States)

    Manohar, M.; Ramapriyan, H. K.; Strong, J. P.

    1988-01-01

    A parallel algorithm is described to determine motion vectors of ice floes using time sequences of images of the Arctic ocean obtained from the Synthetic Aperture Radar (SAR) instrument flown on-board the SEASAT spacecraft. Researchers describe a parallel algorithm which is implemented on the MPP for locating corresponding objects based on their translationally and rotationally invariant features. The algorithm first approximates the edges in the images by polygons or sets of connected straight-line segments. Each such edge structure is then reduced to a seed point. Associated with each seed point are the descriptions (lengths, orientations and sequence numbers) of the lines constituting the corresponding edge structure. A parallel matching algorithm is used to match packed arrays of such descriptions to identify corresponding seed points in the two images. The matching algorithm is designed such that fragmentation and merging of ice floes are taken into account by accepting partial matches. The technique has been demonstrated to work on synthetic test patterns and real image pairs from SEASAT in times ranging from .5 to 0.7 seconds for 128 x 128 images.

  9. Selected event reconstruction algorithms for the CBM experiment at FAIR

    International Nuclear Information System (INIS)

    Lebedev, Semen; Höhne, Claudia; Lebedev, Andrey; Ososkov, Gennady

    2014-01-01

    Development of fast and efficient event reconstruction algorithms is an important and challenging task in the Compressed Baryonic Matter (CBM) experiment at the future FAIR facility. The event reconstruction algorithms have to process terabytes of input data produced in particle collisions. In this contribution, several event reconstruction algorithms are presented. Optimization of the algorithms in the following CBM detectors are discussed: Ring Imaging Cherenkov (RICH) detector, Transition Radiation Detectors (TRD) and Muon Chamber (MUCH). The ring reconstruction algorithm in the RICH is discussed. In TRD and MUCH track reconstruction algorithms are based on track following and Kalman Filter methods. All algorithms were significantly optimized to achieve maximum speed up and minimum memory consumption. Obtained results showed that a significant speed up factor for all algorithms was achieved and the reconstruction efficiency stays at high level.

  10. An Incremental Classification Algorithm for Mining Data with Feature Space Heterogeneity

    Directory of Open Access Journals (Sweden)

    Yu Wang

    2014-01-01

    Full Text Available Feature space heterogeneity often exists in many real world data sets so that some features are of different importance for classification over different subsets. Moreover, the pattern of feature space heterogeneity might dynamically change over time as more and more data are accumulated. In this paper, we develop an incremental classification algorithm, Supervised Clustering for Classification with Feature Space Heterogeneity (SCCFSH, to address this problem. In our approach, supervised clustering is implemented to obtain a number of clusters such that samples in each cluster are from the same class. After the removal of outliers, relevance of features in each cluster is calculated based on their variations in this cluster. The feature relevance is incorporated into distance calculation for classification. The main advantage of SCCFSH lies in the fact that it is capable of solving a classification problem with feature space heterogeneity in an incremental way, which is favorable for online classification tasks with continuously changing data. Experimental results on a series of data sets and application to a database marketing problem show the efficiency and effectiveness of the proposed approach.

  11. Ultra-fast fluence optimization for beam angle selection algorithms

    Science.gov (United States)

    Bangert, M.; Ziegenhein, P.; Oelfke, U.

    2014-03-01

    Beam angle selection (BAS) including fluence optimization (FO) is among the most extensive computational tasks in radiotherapy. Precomputed dose influence data (DID) of all considered beam orientations (up to 100 GB for complex cases) has to be handled in the main memory and repeated FOs are required for different beam ensembles. In this paper, the authors describe concepts accelerating FO for BAS algorithms using off-the-shelf multiprocessor workstations. The FO runtime is not dominated by the arithmetic load of the CPUs but by the transportation of DID from the RAM to the CPUs. On multiprocessor workstations, however, the speed of data transportation from the main memory to the CPUs is non-uniform across the RAM; every CPU has a dedicated memory location (node) with minimum access time. We apply a thread node binding strategy to ensure that CPUs only access DID from their preferred node. Ideal load balancing for arbitrary beam ensembles is guaranteed by distributing the DID of every candidate beam equally to all nodes. Furthermore we use a custom sorting scheme of the DID to minimize the overall data transportation. The framework is implemented on an AMD Opteron workstation. One FO iteration comprising dose, objective function, and gradient calculation takes between 0.010 s (9 beams, skull, 0.23 GB DID) and 0.070 s (9 beams, abdomen, 1.50 GB DID). Our overall FO time is < 1 s for small cases, larger cases take ~ 4 s. BAS runs including FOs for 1000 different beam ensembles take ~ 15-70 min, depending on the treatment site. This enables an efficient clinical evaluation of different BAS algorithms.

  12. Ultra-fast fluence optimization for beam angle selection algorithms

    International Nuclear Information System (INIS)

    Bangert, M; Ziegenhein, P; Oelfke, U

    2014-01-01

    Beam angle selection (BAS) including fluence optimization (FO) is among the most extensive computational tasks in radiotherapy. Precomputed dose influence data (DID) of all considered beam orientations (up to 100 GB for complex cases) has to be handled in the main memory and repeated FOs are required for different beam ensembles. In this paper, the authors describe concepts accelerating FO for BAS algorithms using off-the-shelf multiprocessor workstations. The FO runtime is not dominated by the arithmetic load of the CPUs but by the transportation of DID from the RAM to the CPUs. On multiprocessor workstations, however, the speed of data transportation from the main memory to the CPUs is non-uniform across the RAM; every CPU has a dedicated memory location (node) with minimum access time. We apply a thread node binding strategy to ensure that CPUs only access DID from their preferred node. Ideal load balancing for arbitrary beam ensembles is guaranteed by distributing the DID of every candidate beam equally to all nodes. Furthermore we use a custom sorting scheme of the DID to minimize the overall data transportation. The framework is implemented on an AMD Opteron workstation. One FO iteration comprising dose, objective function, and gradient calculation takes between 0.010 s (9 beams, skull, 0.23 GB DID) and 0.070 s (9 beams, abdomen, 1.50 GB DID). Our overall FO time is < 1 s for small cases, larger cases take ∼ 4 s. BAS runs including FOs for 1000 different beam ensembles take ∼ 15–70 min, depending on the treatment site. This enables an efficient clinical evaluation of different BAS algorithms.

  13. Study of Machine-Learning Classifier and Feature Set Selection for Intent Classification of Korean Tweets about Food Safety

    Directory of Open Access Journals (Sweden)

    Yeom, Ha-Neul

    2014-09-01

    Full Text Available In recent years, several studies have proposed making use of the Twitter micro-blogging service to track various trends in online media and discussion. In this study, we specifically examine the use of Twitter to track discussions of food safety in the Korean language. Given the irregularity of keyword use in most tweets, we focus on optimistic machine-learning and feature set selection to classify collected tweets. We build the classifier model using Naive Bayes & Naive Bayes Multinomial, Support Vector Machine, and Decision Tree Algorithms, all of which show good performance. To select an optimum feature set, we construct a basic feature set as a standard for performance comparison, so that further test feature sets can be evaluated. Experiments show that precision and F-measure performance are best when using a Naive Bayes Multinomial classifier model with a test feature set defined by extracting Substantive, Predicate, Modifier, and Interjection parts of speech.

  14. A Spectrum Sensing Method Based on Signal Feature and Clustering Algorithm in Cognitive Wireless Multimedia Sensor Networks

    Directory of Open Access Journals (Sweden)

    Yongwei Zhang

    2017-01-01

    Full Text Available In order to solve the problem of difficulty in determining the threshold in spectrum sensing technologies based on the random matrix theory, a spectrum sensing method based on clustering algorithm and signal feature is proposed for Cognitive Wireless Multimedia Sensor Networks. Firstly, the wireless communication signal features are obtained according to the sampling signal covariance matrix. Then, the clustering algorithm is used to classify and test the signal features. Different signal features and clustering algorithms are compared in this paper. The experimental results show that the proposed method has better sensing performance.

  15. Intelligent feature selection techniques for pattern classification of Lamb wave signals

    International Nuclear Information System (INIS)

    Hinders, Mark K.; Miller, Corey A.

    2014-01-01

    Lamb wave interaction with flaws is a complex, three-dimensional phenomenon, which often frustrates signal interpretation schemes based on mode arrival time shifts predicted by dispersion curves. As the flaw severity increases, scattering and mode conversion effects will often dominate the time-domain signals, obscuring available information about flaws because multiple modes may arrive on top of each other. Even for idealized flaw geometries the scattering and mode conversion behavior of Lamb waves is very complex. Here, multi-mode Lamb waves in a metal plate are propagated across a rectangular flat-bottom hole in a sequence of pitch-catch measurements corresponding to the double crosshole tomography geometry. The flaw is sequentially deepened, with the Lamb wave measurements repeated at each flaw depth. Lamb wave tomography reconstructions are used to identify which waveforms have interacted with the flaw and thereby carry information about its depth. Multiple features are extracted from each of the Lamb wave signals using wavelets, which are then fed to statistical pattern classification algorithms that identify flaw severity. In order to achieve the highest classification accuracy, an optimal feature space is required but it’s never known a priori which features are going to be best. For structural health monitoring we make use of the fact that physical flaws, such as corrosion, will only increase over time. This allows us to identify feature vectors which are topologically well-behaved by requiring that sequential classes “line up” in feature vector space. An intelligent feature selection routine is illustrated that identifies favorable class distributions in multi-dimensional feature spaces using computational homology theory. Betti numbers and formal classification accuracies are calculated for each feature space subset to establish a correlation between the topology of the class distribution and the corresponding classification accuracy

  16. Application of tripolar concentric electrodes and prefeature selection algorithm for brain-computer interface.

    Science.gov (United States)

    Besio, Walter G; Cao, Hongbao; Zhou, Peng

    2008-04-01

    For persons with severe disabilities, a brain-computer interface (BCI) may be a viable means of communication. Lapalacian electroencephalogram (EEG) has been shown to improve classification in EEG recognition. In this work, the effectiveness of signals from tripolar concentric electrodes and disc electrodes were compared for use as a BCI. Two sets of left/right hand motor imagery EEG signals were acquired. An autoregressive (AR) model was developed for feature extraction with a Mahalanobis distance based linear classifier for classification. An exhaust selection algorithm was employed to analyze three factors before feature extraction. The factors analyzed were 1) length of data in each trial to be used, 2) start position of data, and 3) the order of the AR model. The results showed that tripolar concentric electrodes generated significantly higher classification accuracy than disc electrodes.

  17. Synchronous Firefly Algorithm for Cluster Head Selection in WSN

    Directory of Open Access Journals (Sweden)

    Madhusudhanan Baskaran

    2015-01-01

    Full Text Available Wireless Sensor Network (WSN consists of small low-cost, low-power multifunctional nodes interconnected to efficiently aggregate and transmit data to sink. Cluster-based approaches use some nodes as Cluster Heads (CHs and organize WSNs efficiently for aggregation of data and energy saving. A CH conveys information gathered by cluster nodes and aggregates/compresses data before transmitting it to a sink. However, this additional responsibility of the node results in a higher energy drain leading to uneven network degradation. Low Energy Adaptive Clustering Hierarchy (LEACH offsets this by probabilistically rotating cluster heads role among nodes with energy above a set threshold. CH selection in WSN is NP-Hard as optimal data aggregation with efficient energy savings cannot be solved in polynomial time. In this work, a modified firefly heuristic, synchronous firefly algorithm, is proposed to improve the network performance. Extensive simulation shows the proposed technique to perform well compared to LEACH and energy-efficient hierarchical clustering. Simulations show the effectiveness of the proposed method in decreasing the packet loss ratio by an average of 9.63% and improving the energy efficiency of the network when compared to LEACH and EEHC.

  18. Route Selection with Unspecified Sites Using Knowledge Based Genetic Algorithm

    Science.gov (United States)

    Kanoh, Hitoshi; Nakamura, Nobuaki; Nakamura, Tomohiro

    This paper addresses the problem of selecting a route to a given destination that traverses several non-specific sites (e.g. a bank, a gas station) as requested by a driver. The proposed solution uses a genetic algorithm that includes viral infection. The method is to generate two populations of viruses as domain specific knowledge in addition to a population of routes. A part of an arterial road is regarded as a main virus, and a road that includes a site is regarded as a site virus. An infection occurs between two points common to a candidate route and the virus, and involves the substitution of the intersections carried by the virus for those on the existing candidate route. Crossover and infection determine the easiest-to-drive and quasi-shortest route through the objective landmarks. Experiments using actual road maps show that this infection-based mechanism is an effective way of solving the problem. Our strategy is general, and can be effectively used in other optimization problems.

  19. Feature Reduction Based on Genetic Algorithm and Hybrid Model for Opinion Mining

    Directory of Open Access Journals (Sweden)

    P. Kalaivani

    2015-01-01

    Full Text Available With the rapid growth of websites and web form the number of product reviews is available on the sites. An opinion mining system is needed to help the people to evaluate emotions, opinions, attitude, and behavior of others, which is used to make decisions based on the user preference. In this paper, we proposed an optimized feature reduction that incorporates an ensemble method of machine learning approaches that uses information gain and genetic algorithm as feature reduction techniques. We conducted comparative study experiments on multidomain review dataset and movie review dataset in opinion mining. The effectiveness of single classifiers Naïve Bayes, logistic regression, support vector machine, and ensemble technique for opinion mining are compared on five datasets. The proposed hybrid method is evaluated and experimental results using information gain and genetic algorithm with ensemble technique perform better in terms of various measures for multidomain review and movie reviews. Classification algorithms are evaluated using McNemar’s test to compare the level of significance of the classifiers.

  20. Recurrence predictive models for patients with hepatocellular carcinoma after radiofrequency ablation using support vector machines with feature selection methods.

    Science.gov (United States)

    Liang, Ja-Der; Ping, Xiao-Ou; Tseng, Yi-Ju; Huang, Guan-Tarn; Lai, Feipei; Yang, Pei-Ming

    2014-12-01

    Recurrence of hepatocellular carcinoma (HCC) is an important issue despite effective treatments with tumor eradication. Identification of patients who are at high risk for recurrence may provide more efficacious screening and detection of tumor recurrence. The aim of this study was to develop recurrence predictive models for HCC patients who received radiofrequency ablation (RFA) treatment. From January 2007 to December 2009, 83 newly diagnosed HCC patients receiving RFA as their first treatment were enrolled. Five feature selection methods including genetic algorithm (GA), simulated annealing (SA) algorithm, random forests (RF) and hybrid methods (GA+RF and SA+RF) were utilized for selecting an important subset of features from a total of 16 clinical features. These feature selection methods were combined with support vector machine (SVM) for developing predictive models with better performance. Five-fold cross-validation was used to train and test SVM models. The developed SVM-based predictive models with hybrid feature selection methods and 5-fold cross-validation had averages of the sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and area under the ROC curve as 67%, 86%, 82%, 69%, 90%, and 0.69, respectively. The SVM derived predictive model can provide suggestive high-risk recurrent patients, who should be closely followed up after complete RFA treatment. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  1. Feature selection in wind speed prediction systems based on a hybrid coral reefs optimization – Extreme learning machine approach

    International Nuclear Information System (INIS)

    Salcedo-Sanz, S.; Pastor-Sánchez, A.; Prieto, L.; Blanco-Aguilera, A.; García-Herrera, R.

    2014-01-01

    Highlights: • A novel approach for short-term wind speed prediction is presented. • The system is formed by a coral reefs optimization algorithm and an extreme learning machine. • Feature selection is carried out with the CRO to improve the ELM performance. • The method is tested in real wind farm data in USA, for the period 2007–2008. - Abstract: This paper presents a novel approach for short-term wind speed prediction based on a Coral Reefs Optimization algorithm (CRO) and an Extreme Learning Machine (ELM), using meteorological predictive variables from a physical model (the Weather Research and Forecast model, WRF). The approach is based on a Feature Selection Problem (FSP) carried out with the CRO, that must obtain a reduced number of predictive variables out of the total available from the WRF. This set of features will be the input of an ELM, that finally provides the wind speed prediction. The CRO is a novel bio-inspired approach, based on the simulation of reef formation and coral reproduction, able to obtain excellent results in optimization problems. On the other hand, the ELM is a new paradigm in neural networks’ training, that provides a robust and extremely fast training of the network. Together, these algorithms are able to successfully solve this problem of feature selection in short-term wind speed prediction. Experiments in a real wind farm in the USA show the excellent performance of the CRO–ELM approach in this FSP wind speed prediction problem

  2. The effect of destination linked feature selection in real-time network intrusion detection

    CSIR Research Space (South Africa)

    Mzila, P

    2013-07-01

    Full Text Available techniques in the network intrusion detection system (NIDS) is the feature selection technique. The ability of NIDS to accurately identify intrusion from the network traffic relies heavily on feature selection, which describes the pattern of the network...

  3. Solving multi-objective job shop problem using nature-based algorithms: new Pareto approximation features

    Directory of Open Access Journals (Sweden)

    Jarosław Rudy

    2015-01-01

    Full Text Available In this paper the job shop scheduling problem (JSP with minimizing two criteria simultaneously is considered. JSP is frequently used model in real world applications of combinatorial optimization. Multi-objective job shop problems (MOJSP were rarely studied. We implement and compare two multi-agent nature-based methods, namely ant colony optimization (ACO and genetic algorithm (GA for MOJSP. Both of those methods employ certain technique, taken from the multi-criteria decision analysis in order to establish ranking of solutions. ACO and GA differ in a method of keeping information about previously found solutions and their quality, which affects the course of the search. In result, new features of Pareto approximations provided by said algorithms are observed: aside from the slight superiority of the ACO method the Pareto frontier approximations provided by both methods are disjoint sets. Thus, both methods can be used to search mutually exclusive areas of the Pareto frontier.

  4. Soft computing based feature selection for environmental sound classification

    NARCIS (Netherlands)

    Shakoor, A.; May, T.M.; Van Schijndel, N.H.

    2010-01-01

    Environmental sound classification has a wide range of applications,like hearing aids, mobile communication devices, portable media players, and auditory protection devices. Sound classification systemstypically extract features from the input sound. Using too many features increases complexity

  5. Efficient Generation and Selection of Combined Features for Improved Classification

    KAUST Repository

    Shono, Ahmad N.

    2014-01-01

    This study contributes a methodology and associated toolkit developed to allow users to experiment with the use of combined features in classification problems. Methods are provided for efficiently generating combined features from an original

  6. Feature extraction and selection for objective gait analysis and fall risk assessment by accelerometry

    Directory of Open Access Journals (Sweden)

    Cremer Gerald

    2011-01-01

    Full Text Available Abstract Background Falls in the elderly is nowadays a major concern because of their consequences on elderly general health and moral states. Moreover, the aging of the population and the increasing life expectancy make the prediction of falls more and more important. The analysis presented in this article makes a first step in this direction providing a way to analyze gait and classify hospitalized elderly fallers and non-faller. This tool, based on an accelerometer network and signal processing, gives objective informations about the gait and does not need any special gait laboratory as optical analysis do. The tool is also simple to use by a non expert and can therefore be widely used on a large set of patients. Method A population of 20 hospitalized elderlies was asked to execute several classical clinical tests evaluating their risk of falling. They were also asked if they experienced any fall in the last 12 months. The accelerations of the limbs were recorded during the clinical tests with an accelerometer network distributed on the body. A total of 67 features were extracted from the accelerometric signal recorded during a simple 25 m walking test at comfort speed. A feature selection algorithm was used to select those able to classify subjects at risk and not at risk for several classification algorithms types. Results The results showed that several classification algorithms were able to discriminate people from the two groups of interest: fallers and non-fallers hospitalized elderlies. The classification performances of the used algorithms were compared. Moreover a subset of the 67 features was considered to be significantly different between the two groups using a t-test. Conclusions This study gives a method to classify a population of hospitalized elderlies in two groups: at risk of falling or not at risk based on accelerometric data. This is a first step to design a risk of falling assessment system that could be used to provide

  7. Multivariate anomaly detection for Earth observations: a comparison of algorithms and feature extraction techniques

    Directory of Open Access Journals (Sweden)

    M. Flach

    2017-08-01

    Full Text Available Today, many processes at the Earth's surface are constantly monitored by multiple data streams. These observations have become central to advancing our understanding of vegetation dynamics in response to climate or land use change. Another set of important applications is monitoring effects of extreme climatic events, other disturbances such as fires, or abrupt land transitions. One important methodological question is how to reliably detect anomalies in an automated and generic way within multivariate data streams, which typically vary seasonally and are interconnected across variables. Although many algorithms have been proposed for detecting anomalies in multivariate data, only a few have been investigated in the context of Earth system science applications. In this study, we systematically combine and compare feature extraction and anomaly detection algorithms for detecting anomalous events. Our aim is to identify suitable workflows for automatically detecting anomalous patterns in multivariate Earth system data streams. We rely on artificial data that mimic typical properties and anomalies in multivariate spatiotemporal Earth observations like sudden changes in basic characteristics of time series such as the sample mean, the variance, changes in the cycle amplitude, and trends. This artificial experiment is needed as there is no gold standard for the identification of anomalies in real Earth observations. Our results show that a well-chosen feature extraction step (e.g., subtracting seasonal cycles, or dimensionality reduction is more important than the choice of a particular anomaly detection algorithm. Nevertheless, we identify three detection algorithms (k-nearest neighbors mean distance, kernel density estimation, a recurrence approach and their combinations (ensembles that outperform other multivariate approaches as well as univariate extreme-event detection methods. Our results therefore provide an effective workflow to

  8. Novel Mahalanobis-based feature selection improves one-class classification of early hepatocellular carcinoma.

    Science.gov (United States)

    Thomaz, Ricardo de Lima; Carneiro, Pedro Cunha; Bonin, João Eliton; Macedo, Túlio Augusto Alves; Patrocinio, Ana Claudia; Soares, Alcimar Barbosa

    2018-05-01

    Detection of early hepatocellular carcinoma (HCC) is responsible for increasing survival rates in up to 40%. One-class classifiers can be used for modeling early HCC in multidetector computed tomography (MDCT), but demand the specific knowledge pertaining to the set of features that best describes the target class. Although the literature outlines several features for characterizing liver lesions, it is unclear which is most relevant for describing early HCC. In this paper, we introduce an unconstrained GA feature selection algorithm based on a multi-objective Mahalanobis fitness function to improve the classification performance for early HCC. We compared our approach to a constrained Mahalanobis function and two other unconstrained functions using Welch's t-test and Gaussian Data Descriptors. The performance of each fitness function was evaluated by cross-validating a one-class SVM. The results show that the proposed multi-objective Mahalanobis fitness function is capable of significantly reducing data dimensionality (96.4%) and improving one-class classification of early HCC (0.84 AUC). Furthermore, the results provide strong evidence that intensity features extracted at the arterial to portal and arterial to equilibrium phases are important for classifying early HCC.

  9. Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

    Science.gov (United States)

    Faye, Ibrahima; Samir, Brahim Belhaouari; Md Said, Abas

    2014-01-01

    Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. PMID:25045727

  10. Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification.

    Science.gov (United States)

    Fan, Jianqing; Feng, Yang; Jiang, Jiancheng; Tong, Xin

    We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.

  11. iFER: facial expression recognition using automatically selected geometric eye and eyebrow features

    Science.gov (United States)

    Oztel, Ismail; Yolcu, Gozde; Oz, Cemil; Kazan, Serap; Bunyak, Filiz

    2018-03-01

    Facial expressions have an important role in interpersonal communications and estimation of emotional states or intentions. Automatic recognition of facial expressions has led to many practical applications and became one of the important topics in computer vision. We present a facial expression recognition system that relies on geometry-based features extracted from eye and eyebrow regions of the face. The proposed system detects keypoints on frontal face images and forms a feature set using geometric relationships among groups of detected keypoints. Obtained feature set is refined and reduced using the sequential forward selection (SFS) algorithm and fed to a support vector machine classifier to recognize five facial expression classes. The proposed system, iFER (eye-eyebrow only facial expression recognition), is robust to lower face occlusions that may be caused by beards, mustaches, scarves, etc. and lower face motion during speech production. Preliminary experiments on benchmark datasets produced promising results outperforming previous facial expression recognition studies using partial face features, and comparable results to studies using whole face information, only slightly lower by ˜ 2.5 % compared to the best whole face facial recognition system while using only ˜ 1 / 3 of the facial region.

  12. Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction

    Directory of Open Access Journals (Sweden)

    Nigsch Florian

    2008-10-01

    Full Text Available Abstract Background We present a novel feature selection algorithm, Winnowing Artificial Ant Colony (WAAC, that performs simultaneous feature selection and model parameter optimisation for the development of predictive quantitative structure-property relationship (QSPR models. The WAAC algorithm is an extension of the modified ant colony algorithm of Shen et al. (J Chem Inf Model 2005, 45: 1024–1029. We test the ability of the algorithm to develop a predictive partial least squares model for the Karthikeyan dataset (J Chem Inf Model 2005, 45: 581–590 of melting point values. We also test its ability to perform feature selection on a support vector machine model for the same dataset. Results Starting from an initial set of 203 descriptors, the WAAC algorithm selected a PLS model with 68 descriptors which has an RMSE on an external test set of 46.6°C and R2 of 0.51. The number of components chosen for the model was 49, which was close to optimal for this feature selection. The selected SVM model has 28 descriptors (cost of 5, ε of 0.21 and an RMSE of 45.1°C and R2 of 0.54. This model outperforms a kNN model (RMSE of 48.3°C, R2 of 0.47 for the same data and has similar performance to a Random Forest model (RMSE of 44.5°C, R2 of 0.55. However it is much less prone to bias at the extremes of the range of melting points as shown by the slope of the line through the residuals: -0.43 for WAAC/SVM, -0.53 for Random Forest. Conclusion With a careful choice of objective function, the WAAC algorithm can be used to optimise machine learning and regression models that suffer from overfitting. Where model parameters also need to be tuned, as is the case with support vector machine and partial least squares models, it can optimise these simultaneously. The moving probabilities used by the algorithm are easily interpreted in terms of the best and current models of the ants, and the winnowing procedure promotes the removal of irrelevant descriptors.

  13. Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction.

    Science.gov (United States)

    O'Boyle, Noel M; Palmer, David S; Nigsch, Florian; Mitchell, John Bo

    2008-10-29

    We present a novel feature selection algorithm, Winnowing Artificial Ant Colony (WAAC), that performs simultaneous feature selection and model parameter optimisation for the development of predictive quantitative structure-property relationship (QSPR) models. The WAAC algorithm is an extension of the modified ant colony algorithm of Shen et al. (J Chem Inf Model 2005, 45: 1024-1029). We test the ability of the algorithm to develop a predictive partial least squares model for the Karthikeyan dataset (J Chem Inf Model 2005, 45: 581-590) of melting point values. We also test its ability to perform feature selection on a support vector machine model for the same dataset. Starting from an initial set of 203 descriptors, the WAAC algorithm selected a PLS model with 68 descriptors which has an RMSE on an external test set of 46.6 degrees C and R2 of 0.51. The number of components chosen for the model was 49, which was close to optimal for this feature selection. The selected SVM model has 28 descriptors (cost of 5, epsilon of 0.21) and an RMSE of 45.1 degrees C and R2 of 0.54. This model outperforms a kNN model (RMSE of 48.3 degrees C, R2 of 0.47) for the same data and has similar performance to a Random Forest model (RMSE of 44.5 degrees C, R2 of 0.55). However it is much less prone to bias at the extremes of the range of melting points as shown by the slope of the line through the residuals: -0.43 for WAAC/SVM, -0.53 for Random Forest. With a careful choice of objective function, the WAAC algorithm can be used to optimise machine learning and regression models that suffer from overfitting. Where model parameters also need to be tuned, as is the case with support vector machine and partial least squares models, it can optimise these simultaneously. The moving probabilities used by the algorithm are easily interpreted in terms of the best and current models of the ants, and the winnowing procedure promotes the removal of irrelevant descriptors.

  14. ARABIC TEXT CLASSIFICATION USING NEW STEMMER FOR FEATURE SELECTION AND DECISION TREES

    Directory of Open Access Journals (Sweden)

    SAID BAHASSINE

    2017-06-01

    Full Text Available Text classification is the process of assignment of unclassified text to appropriate classes based on their content. The most prevalent representation for text classification is the bag of words vector. In this representation, the words that appear in documents often have multiple morphological structures, grammatical forms. In most cases, this morphological variant of words belongs to the same category. In the first part of this paper, anew stemming algorithm was developed in which each term of a given document is represented by its root. In the second part, a comparative study is conducted of the impact of two stemming algorithms namely Khoja’s stemmer and our new stemmer (referred to hereafter by origin-stemmer on Arabic text classification. This investigation was carried out using chi-square as a feature of selection to reduce the dimensionality of the feature space and decision tree classifier. In order to evaluate the performance of the classifier, this study used a corpus that consists of 5070 documents independently classified into six categories: sport, entertainment, business, Middle East, switch and world on WEKA toolkit. The recall, f-measure and precision measures are used to compare the performance of the obtained models. The experimental results show that text classification using rout stemmer outperforms classification using Khoja’s stemmer. The f-measure was 92.9% in sport category and 89.1% in business category.

  15. Evaluating EMG Feature and Classifier Selection for Application to Partial-Hand Prosthesis Control

    Directory of Open Access Journals (Sweden)

    Adenike A. Adewuyi

    2016-10-01

    Full Text Available Pattern recognition-based myoelectric control of upper limb prostheses has the potential to restore control of multiple degrees of freedom. Though this control method has been extensively studied in individuals with higher-level amputations, few studies have investigated its effectiveness for individuals with partial-hand amputations. Most partial-hand amputees retain a functional wrist and the ability of pattern recognition-based methods to correctly classify hand motions from different wrist positions is not well studied. In this study, focusing on partial-hand amputees, we evaluate (1 the performance of non-linear and linear pattern recognition algorithms and (2 the performance of optimal EMG feature subsets for classification of four hand motion classes in different wrist positions for 16 non-amputees and 4 amputees. Our results show that linear discriminant analysis and linear and non-linear artificial neural networks perform significantly better than the quadratic discriminant analysis for both non-amputees and partial-hand amputees. For amputees, including information from multiple wrist positions significantly decreased error (p<0.001 but no further significant decrease in error occurred when more than 4, 2, or 3 positions were included for the extrinsic (p=0.07, intrinsic (p=0.06, or combined extrinsic and intrinsic muscle EMG (p=0.08, respectively. Finally, we found that a feature set determined by selecting optimal features from each channel outperformed the commonly used time domain (p<0.001 and time domain/autoregressive feature sets (p<0.01. This method can be used as a screening filter to select the features from each channel that provide the best classification of hand postures across different wrist positions.

  16. A New Curve Tracing Algorithm Based on Local Feature in the Vectorization of Paper Seismograms

    Directory of Open Access Journals (Sweden)

    Maofa Wang

    2014-02-01

    Full Text Available History paper seismograms are very important information for earthquake monitoring and prediction. The vectorization of paper seismograms is an import problem to be resolved. Auto tracing of waveform curves is a key technology for the vectorization of paper seismograms. It can transform an original scanning image into digital waveform data. Accurately tracing out all the key points of each curve in seismograms is the foundation for vectorization of paper seismograms. In the paper, we present a new curve tracing algorithm based on local feature, applying to auto extraction of earthquake waveform in paper seismograms.

  17. Specific features of NDT data and processing algorithms: new remedies to old ills

    International Nuclear Information System (INIS)

    Georgel, B.

    1994-01-01

    Non destructive testing data from in-service inspections have specific features that require the most sophisticated techniques of signal and image processing. Each step in the overall information extraction process must be optimized by using recent approaches such like data decomposition and modelization, compression, sensor fusion and knowledge based systems. This can be achieved by means of wavelet transform, inverse problems formulation, standard compression algorithms, combined detection and estimation, neural networks and expert systems. These techniques are briefly presented through a number of Electricite de France applications or through recent literature results. (author). 1 fig., 20 refs

  18. Feature selection for wearable smartphone-based human activity recognition with able bodied, elderly, and stroke patients.

    Directory of Open Access Journals (Sweden)

    Nicole A Capela

    Full Text Available Human activity recognition (HAR, using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter. The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree. Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.

  19. Feature selection for wearable smartphone-based human activity recognition with able bodied, elderly, and stroke patients.

    Science.gov (United States)

    Capela, Nicole A; Lemaire, Edward D; Baddour, Natalie

    2015-01-01

    Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.

  20. Behavioral features recognition and oestrus detection based on fast approximate clustering algorithm in dairy cows

    Science.gov (United States)

    Tian, Fuyang; Cao, Dong; Dong, Xiaoning; Zhao, Xinqiang; Li, Fade; Wang, Zhonghua

    2017-06-01

    Behavioral features recognition was an important effect to detect oestrus and sickness in dairy herds and there is a need for heat detection aid. The detection method was based on the measure of the individual behavioural activity, standing time, and temperature of dairy using vibrational sensor and temperature sensor in this paper. The data of behavioural activity index, standing time, lying time and walking time were sent to computer by lower power consumption wireless communication system. The fast approximate K-means algorithm (FAKM) was proposed to deal the data of the sensor for behavioral features recognition. As a result of technical progress in monitoring cows using computers, automatic oestrus detection has become possible.

  1. License Application Design Selection Feature Report: Waste Package Self Shielding Design Feature 13

    International Nuclear Information System (INIS)

    Tang, J.S.

    2000-01-01

    In the Viability Assessment (VA) reference design, handling of waste packages (WPs) in the emplacement drifts is performed remotely, and human access to the drifts is precluded when WPs are present. This report will investigate the feasibility of using a self-shielded WP design to reduce the radiation levels in the emplacement drifts to a point that, when coupled with ventilation, will create an acceptable environment for human access. This provides the benefit of allowing human entry to emplacement drifts to perform maintenance on ground support and instrumentation, and carry out performance confirmation activities. More direct human control of WP handling and emplacement operations would also be possible. However, these potential benefits must be weighed against the cost of implementation, and potential impacts on pre- and post-closure performance of the repository and WPs. The first section of this report will provide background information on previous investigations of the self-shielded WP design feature, summarize the objective and scope of this document, and provide quality assurance and software information. A shielding performance and cost study that includes several candidate shield materials will then be performed in the subsequent section to allow selection of two self-shielded WP design options for further evaluation. Finally, the remaining sections will evaluate the impacts of the two WP self-shielding options on the repository design, operations, safety, cost, and long-term performance of the WPs with respect to the VA reference design

  2. Classifying spatially heterogeneous wetland communities using machine learning algorithms and spectral and textural features.

    Science.gov (United States)

    Szantoi, Zoltan; Escobedo, Francisco J; Abd-Elrahman, Amr; Pearlstine, Leonard; Dewitt, Bon; Smith, Scot

    2015-05-01

    Mapping of wetlands (marsh vs. swamp vs. upland) is a common remote sensing application.Yet, discriminating between similar freshwater communities such as graminoid/sedge fromremotely sensed imagery is more difficult. Most of this activity has been performed using medium to low resolution imagery. There are only a few studies using highspatial resolutionimagery and machine learning image classification algorithms for mapping heterogeneouswetland plantcommunities. This study addresses this void by analyzing whether machine learning classifierssuch as decisiontrees (DT) and artificial neural networks (ANN) can accurately classify graminoid/sedgecommunities usinghigh resolution aerial imagery and image texture data in the Everglades National Park, Florida.In addition tospectral bands, the normalized difference vegetation index, and first- and second-order texturefeatures derivedfrom the near-infrared band were analyzed. Classifier accuracies were assessed using confusiontablesand the calculated kappa coefficients of the resulting maps. The results indicated that an ANN(multilayerperceptron based on backpropagation) algorithm produced a statistically significantly higheraccuracy(82.04%) than the DT (QUEST) algorithm (80.48%) or the maximum likelihood (80.56%)classifier (αtexture features.

  3. A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification

    CERN Document Server

    Weber, I; Henzinger, M; Baykan, E

    2011-01-01

    Given only the URL of a Web page, can we identify its topic? We study this problem in detail by exploring a large number of different feature sets and algorithms on several datasets. We also show that the inherent overlap between topics and the sparsity of the information in URLs makes this a very challenging problem. Web page classification without a page's content is desirable when the content is not available at all, when a classification is needed before obtaining the content, or when classification speed is of utmost importance. For our experiments we used five different corpora comprising a total of about 3 million (URL, classification) pairs. We evaluated several techniques for feature generation and classification algorithms. The individual binary classifiers were then combined via boosting into metabinary classifiers. We achieve typical F-measure values between 80 and 85, and a typical precision of around 86. The precision can be pushed further over 90 while maintaining a typical level of recall betw...

  4. GANN: Genetic algorithm neural networks for the detection of conserved combinations of features in DNA

    Directory of Open Access Journals (Sweden)

    Beiko Robert G

    2005-02-01

    Full Text Available Abstract Background The multitude of motif detection algorithms developed to date have largely focused on the detection of patterns in primary sequence. Since sequence-dependent DNA structure and flexibility may also play a role in protein-DNA interactions, the simultaneous exploration of sequence- and structure-based hypotheses about the composition of binding sites and the ordering of features in a regulatory region should be considered as well. The consideration of structural features requires the development of new detection tools that can deal with data types other than primary sequence. Results GANN (available at http://bioinformatics.org.au/gann is a machine learning tool for the detection of conserved features in DNA. The software suite contains programs to extract different regions of genomic DNA from flat files and convert these sequences to indices that reflect sequence and structural composition or the presence of specific protein binding sites. The machine learning component allows the classification of different types of sequences based on subsamples of these indices, and can identify the best combinations of indices and machine learning architecture for sequence discrimination. Another key feature of GANN is the replicated splitting of data into training and test sets, and the implementation of negative controls. In validation experiments, GANN successfully merged important sequence and structural features to yield good predictive models for synthetic and real regulatory regions. Conclusion GANN is a flexible tool that can search through large sets of sequence and structural feature combinations to identify those that best characterize a set of sequences.

  5. Selecting a general-purpose data compression algorithm

    Science.gov (United States)

    Mathews, Gary Jason

    1995-01-01

    The National Space Science Data Center's Common Data Formate (CDF) is capable of storing many types of data such as scalar data items, vectors, and multidimensional arrays of bytes, integers, or floating point values. However, regardless of the dimensionality and data type, the data break down into a sequence of bytes that can be fed into a data compression function to reduce the amount of data without losing data integrity and thus remaining fully reconstructible. Because of the diversity of data types and high performance speed requirements, a general-purpose, fast, simple data compression algorithm is required to incorporate data compression into CDF. The questions to ask are how to evaluate and compare compression algorithms, and what compression algorithm meets all requirements. The object of this paper is to address these questions and determine the most appropriate compression algorithm to use within the CDF data management package that would be applicable to other software packages with similar data compression needs.

  6. The Papers Printing Quality Complex Assessment Algorithm Development Taking into Account the Composition and Production Technological Features

    Science.gov (United States)

    Babakhanova, Kh A.; Varepo, L. G.; Nagornova, I. V.; Babluyk, E. B.; Kondratov, A. P.

    2018-04-01

    Paper is one of the printing system key components causing the high-quality printed products output. Providing the printing companies with the specified printing properties paper, while simultaneously increasing the paper products range and volume by means of the forecasting methods application and evaluation during the production process, is certainly a relevant problem. The paper presents the printing quality control algorithm taking into consideration the paper printing properties quality assessment depending on the manufacture technological features and composition variation. The information system including raw material and paper properties data and making possible pulp and paper enterprises to select paper composition optimal formulation is proposed taking into account the printing process procedure peculiarities of the paper manufacturing with specified printing properties.

  7. AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity.

    Science.gov (United States)

    Sun, Lei; Wang, Jun; Wei, Jinmao

    2017-03-14

    The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance in biomedical field. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a popular metric to evaluate and find out disease-related genes (features). The existing ROC-based feature selection approaches are simple and effective in evaluating individual features. However, these approaches may fail to find real target feature subset due to their lack of effective means to reduce the redundancy between features, which is essential in machine learning. In this paper, we propose to assess feature complementarity by a trick of measuring the distances between the misclassified instances and their nearest misses on the dimensions of pairwise features. If a misclassified instance and its nearest miss on one feature dimension are far apart on another feature dimension, the two features are regarded as complementary to each other. Subsequently, we propose a novel filter feature selection approach on the basis of the ROC analysis. The new approach employs an efficient heuristic search strategy to select optimal features with highest complementarities. The experimental results on a broad range of microarray data sets validate that the classifiers built on the feature subset selected by our approach can get the minimal balanced error rate with a small amount of significant features. Compared with other ROC-based feature selection approaches, our new approach can select fewer features and effectively improve the classification performance.

  8. A modified genetic algorithm with fuzzy roulette wheel selection for job-shop scheduling problems

    Science.gov (United States)

    Thammano, Arit; Teekeng, Wannaporn

    2015-05-01

    The job-shop scheduling problem is one of the most difficult production planning problems. Since it is in the NP-hard class, a recent trend in solving the job-shop scheduling problem is shifting towards the use of heuristic and metaheuristic algorithms. This paper proposes a novel metaheuristic algorithm, which is a modification of the genetic algorithm. This proposed algorithm introduces two new concepts to the standard genetic algorithm: (1) fuzzy roulette wheel selection and (2) the mutation operation with tabu list. The proposed algorithm has been evaluated and compared with several state-of-the-art algorithms in the literature. The experimental results on 53 JSSPs show that the proposed algorithm is very effective in solving the combinatorial optimization problems. It outperforms all state-of-the-art algorithms on all benchmark problems in terms of the ability to achieve the optimal solution and the computational time.

  9. Robust Ground Target Detection by SAR and IR Sensor Fusion Using Adaboost-Based Feature Selection

    Directory of Open Access Journals (Sweden)

    Sungho Kim

    2016-07-01

    Full Text Available Long-range ground targets are difficult to detect in a noisy cluttered environment using either synthetic aperture radar (SAR images or infrared (IR images. SAR-based detectors can provide a high detection rate with a high false alarm rate to background scatter noise. IR-based approaches can detect hot targets but are affected strongly by the weather conditions. This paper proposes a novel target detection method by decision-level SAR and IR fusion using an Adaboost-based machine learning scheme to achieve a high detection rate and low false alarm rate. The proposed method consists of individual detection, registration, and fusion architecture. This paper presents a single framework of a SAR and IR target detection method using modified Boolean map visual theory (modBMVT and feature-selection based fusion. Previous methods applied different algorithms to detect SAR and IR targets because of the different physical image characteristics. One method that is optimized for IR target detection produces unsuccessful results in SAR target detection. This study examined the image characteristics and proposed a unified SAR and IR target detection method by inserting a median local average filter (MLAF, pre-filter and an asymmetric morphological closing filter (AMCF, post-filter into the BMVT. The original BMVT was optimized to detect small infrared targets. The proposed modBMVT can remove the thermal and scatter noise by the MLAF and detect extended targets by attaching the AMCF after the BMVT. Heterogeneous SAR and IR images were registered automatically using the proposed RANdom SAmple Region Consensus (RANSARC-based homography optimization after a brute-force correspondence search using the detected target centers and regions. The final targets were detected by feature-selection based sensor fusion using Adaboost. The proposed method showed good SAR and IR target detection performance through feature selection-based decision fusion on a synthetic

  10. Robust Ground Target Detection by SAR and IR Sensor Fusion Using Adaboost-Based Feature Selection

    Science.gov (United States)

    Kim, Sungho; Song, Woo-Jin; Kim, So-Hyun

    2016-01-01

    Long-range ground targets are difficult to detect in a noisy cluttered environment using either synthetic aperture radar (SAR) images or infrared (IR) images. SAR-based detectors can provide a high detection rate with a high false alarm rate to background scatter noise. IR-based approaches can detect hot targets but are affected strongly by the weather conditions. This paper proposes a novel target detection method by decision-level SAR and IR fusion using an Adaboost-based machine learning scheme to achieve a high detection rate and low false alarm rate. The proposed method consists of individual detection, registration, and fusion architecture. This paper presents a single framework of a SAR and IR target detection method using modified Boolean map visual theory (modBMVT) and feature-selection based fusion. Previous methods applied different algorithms to detect SAR and IR targets because of the different physical image characteristics. One method that is optimized for IR target detection produces unsuccessful results in SAR target detection. This study examined the image characteristics and proposed a unified SAR and IR target detection method by inserting a median local average filter (MLAF, pre-filter) and an asymmetric morphological closing filter (AMCF, post-filter) into the BMVT. The original BMVT was optimized to detect small infrared targets. The proposed modBMVT can remove the thermal and scatter noise by the MLAF and detect extended targets by attaching the AMCF after the BMVT. Heterogeneous SAR and IR images were registered automatically using the proposed RANdom SAmple Region Consensus (RANSARC)-based homography optimization after a brute-force correspondence search using the detected target centers and regions. The final targets were detected by feature-selection based sensor fusion using Adaboost. The proposed method showed good SAR and IR target detection performance through feature selection-based decision fusion on a synthetic database generated

  11. Selecting Informative Features of the Helicopter and Aircraft Acoustic Signals in the Adaptive Autonomous Information Systems for Recognition

    Directory of Open Access Journals (Sweden)

    V. K. Hohlov

    2017-01-01

    Full Text Available The article forms the rationale for selecting the informative features of the helicopter and aircraft acoustic signals to solve a problem of their recognition and shows that the most informative ones are the counts of extrema in the energy spectra of the input signals, which represent non-centered random variables. An apparatus of the multiple initial regression coefficients was selected as a mathematical tool of research. The application of digital re-circulators with positive and negative feedbacks, which have the comb-like frequency characteristics, solves the problem of selecting informative features. A distinguishing feature of such an approach is easy agility of the comb frequency characteristics just through the agility of a delay value of digital signal in the feedback circuit. Adding an adaptation block to the selection block of the informative features enables us to ensure the invariance of used informative feature and counts of local extrema of the spectral power density to the airspeed of a helicopter. The paper gives reasons for the principle of adaptation and the structure of the adaptation block. To form the discriminator characteristics are used the cross-correlation statistical characteristics of the quadrature components of acoustic signal realizations, obtained by Hilbert transform. The paper proposes to provide signal recognition using a regression algorithm that allows handling the non-centered informative features and using a priori information about coefficients of initial regression of signal and noise.The simulation in Matlab Simulink has shown that selected informative features of signals in regressive processing of signal realizations of 0.5 s duration have good separability, and based on a set of 100 acoustic signal realizations in each class in full-scale conditions, has proved ensuring a correct recognition probability of 0.975, at least. The considered principles of informative features selection and adaptation can

  12. A comprehensive analysis of earthquake damage patterns using high dimensional model representation feature selection

    Science.gov (United States)

    Taşkin Kaya, Gülşen

    2013-10-01

    Recently, earthquake damage assessment using satellite images has been a very popular ongoing research direction. Especially with the availability of very high resolution (VHR) satellite images, a quite detailed damage map based on building scale has been produced, and various studies have also been conducted in the literature. As the spatial resolution of satellite images increases, distinguishability of damage patterns becomes more cruel especially in case of using only the spectral information during classification. In order to overcome this difficulty, textural information needs to be involved to the classification to improve the visual quality and reliability of damage map. There are many kinds of textural information which can be derived from VHR satellite images depending on the algorithm used. However, extraction of textural information and evaluation of them have been generally a time consuming process especially for the large areas affected from the earthquake due to the size of VHR image. Therefore, in order to provide a quick damage map, the most useful features describing damage patterns needs to be known in advance as well as the redundant features. In this study, a very high resolution satellite image after Iran, Bam earthquake was used to identify the earthquake damage. Not only the spectral information, textural information was also used during the classification. For textural information, second order Haralick features were extracted from the panchromatic image for the area of interest using gray level co-occurrence matrix with different size of windows and directions. In addition to using spatial features in classification, the most useful features representing the damage characteristic were selected with a novel feature selection method based on high dimensional model representation (HDMR) giving sensitivity of each feature during classification. The method called HDMR was recently proposed as an efficient tool to capture the input

  13. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  14. On the Feature Selection and Classification Based on Information Gain for Document Sentiment Analysis

    Directory of Open Access Journals (Sweden)

    Asriyanti Indah Pratiwi

    2018-01-01

    Full Text Available Sentiment analysis in a movie review is the needs of today lifestyle. Unfortunately, enormous features make the sentiment of analysis slow and less sensitive. Finding the optimum feature selection and classification is still a challenge. In order to handle an enormous number of features and provide better sentiment classification, an information-based feature selection and classification are proposed. The proposed method reduces more than 90% unnecessary features while the proposed classification scheme achieves 96% accuracy of sentiment classification. From the experimental results, it can be concluded that the combination of proposed feature selection and classification achieves the best performance so far.

  15. Latent Dirichlet Allocation (LDA) Model and kNN Algorithm to Classify Research Project Selection

    Science.gov (United States)

    Safi’ie, M. A.; Utami, E.; Fatta, H. A.

    2018-03-01

    Universitas Sebelas Maret has a teaching staff more than 1500 people, and one of its tasks is to carry out research. In the other side, the funding support for research and service is limited, so there is need to be evaluated to determine the Research proposal submission and devotion on society (P2M). At the selection stage, research proposal documents are collected as unstructured data and the data stored is very large. To extract information contained in the documents therein required text mining technology. This technology applied to gain knowledge to the documents by automating the information extraction. In this articles we use Latent Dirichlet Allocation (LDA) to the documents as a model in feature extraction process, to get terms that represent its documents. Hereafter we use k-Nearest Neighbour (kNN) algorithm to classify the documents based on its terms.

  16. An Improved SPEA2 Algorithm with Adaptive Selection of Evolutionary Operators Scheme for Multiobjective Optimization Problems

    Directory of Open Access Journals (Sweden)

    Fuqing Zhao

    2016-01-01

    Full Text Available A fixed evolutionary mechanism is usually adopted in the multiobjective evolutionary algorithms and their operators are static during the evolutionary process, which causes the algorithm not to fully exploit the search space and is easy to trap in local optima. In this paper, a SPEA2 algorithm which is based on adaptive selection evolution operators (AOSPEA is proposed. The proposed algorithm can adaptively select simulated binary crossover, polynomial mutation, and differential evolution operator during the evolutionary process according to their contribution to the external archive. Meanwhile, the convergence performance of the proposed algorithm is analyzed with Markov chain. Simulation results on the standard benchmark functions reveal that the performance of the proposed algorithm outperforms the other classical multiobjective evolutionary algorithms.

  17. Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

    International Nuclear Information System (INIS)

    Elsheikh, Ahmed H.; Wheeler, Mary F.; Hoteit, Ibrahim

    2014-01-01

    A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems

  18. Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

    Energy Technology Data Exchange (ETDEWEB)

    Elsheikh, Ahmed H., E-mail: aelsheikh@ices.utexas.edu [Institute for Computational Engineering and Sciences (ICES), University of Texas at Austin, TX (United States); Institute of Petroleum Engineering, Heriot-Watt University, Edinburgh EH14 4AS (United Kingdom); Wheeler, Mary F. [Institute for Computational Engineering and Sciences (ICES), University of Texas at Austin, TX (United States); Hoteit, Ibrahim [Department of Earth Sciences and Engineering, King Abdullah University of Science and Technology (KAUST), Thuwal (Saudi Arabia)

    2014-02-01

    A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic ensemble method based on an ensemble of directional derivatives. SEM only requires forward model runs and the simulator is then used as a black box and no adjoint code is needed. The developed HNS algorithm is successfully applied for Bayesian calibration and prior model selection of several nonlinear subsurface flow problems.

  19. Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations

    Science.gov (United States)

    2012-01-01

    Background Through the wealth of information contained within them, genome-wide association studies (GWAS) have the potential to provide researchers with a systematic means of associating genetic variants with a wide variety of disease phenotypes. Due to the limitations of approaches that have analyzed single variants one at a time, it has been proposed that the genetic basis of these disorders could be determined through detailed analysis of the genetic variants themselves and in conjunction with one another. The construction of models that account for these subsets of variants requires methodologies that generate predictions based on the total risk of a particular group of polymorphisms. However, due to the excessive number of variants, constructing these types of models has so far been computationally infeasible. Results We have implemented an algorithm, known as greedy RLS, that we use to perform the first known wrapper-based feature selection on the genome-wide level. The running time of greedy RLS grows linearly in the number of training examples, the number of features in the original data set, and the number of selected features. This speed is achieved through computational short-cuts based on matrix calculus. Since the memory consumption in present-day computers can form an even tighter bottleneck than running time, we also developed a space efficient variation of greedy RLS which trades running time for memory. These approaches are then compared to traditional wrapper-based feature selection implementations based on support vector machines (SVM) to reveal the relative speed-up and to assess the feasibility of the new algorithm. As a proof of concept, we apply greedy RLS to the Hypertension – UK National Blood Service WTCCC dataset and select the most predictive variants using 3-fold external cross-validation in less than 26 minutes on a high-end desktop. On this dataset, we also show that greedy RLS has a better classification performance on independent

  20. A data-driven multi-model methodology with deep feature selection for short-term wind forecasting

    International Nuclear Information System (INIS)

    Feng, Cong; Cui, Mingjian; Hodge, Bri-Mathias; Zhang, Jie

    2017-01-01

    Highlights: • An ensemble model is developed to produce both deterministic and probabilistic wind forecasts. • A deep feature selection framework is developed to optimally determine the inputs to the forecasting methodology. • The developed ensemble methodology has improved the forecasting accuracy by up to 30%. - Abstract: With the growing wind penetration into the power system worldwide, improving wind power forecasting accuracy is becoming increasingly important to ensure continued economic and reliable power system operations. In this paper, a data-driven multi-model wind forecasting methodology is developed with a two-layer ensemble machine learning technique. The first layer is composed of multiple machine learning models that generate individual forecasts. A deep feature selection framework is developed to determine the most suitable inputs to the first layer machine learning models. Then, a blending algorithm is applied in the second layer to create an ensemble of the forecasts produced by first layer models and generate both deterministic and probabilistic forecasts. This two-layer model seeks to utilize the statistically different characteristics of each machine learning algorithm. A number of machine learning algorithms are selected and compared in both layers. This developed multi-model wind forecasting methodology is compared to several benchmarks. The effectiveness of the proposed methodology is evaluated to provide 1-hour-ahead wind speed forecasting at seven locations of the Surface Radiation network. Numerical results show that comparing to the single-algorithm models, the developed multi-model framework with deep feature selection procedure has improved the forecasting accuracy by up to 30%.

  1. Interactive prostate segmentation using atlas-guided semi-supervised learning and adaptive feature selection

    Energy Technology Data Exchange (ETDEWEB)

    Park, Sang Hyun [Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599 (United States); Gao, Yaozong, E-mail: yzgao@cs.unc.edu [Department of Computer Science, Department of Radiology, and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599 (United States); Shi, Yinghuan, E-mail: syh@nju.edu.cn [State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023 (China); Shen, Dinggang, E-mail: dgshen@med.unc.edu [Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599 and Department of Brain and Cognitive Engineering, Korea University, Seoul 136-713 (Korea, Republic of)

    2014-11-01

    Purpose: Accurate prostate segmentation is necessary for maximizing the effectiveness of radiation therapy of prostate cancer. However, manual segmentation from 3D CT images is very time-consuming and often causes large intra- and interobserver variations across clinicians. Many segmentation methods have been proposed to automate this labor-intensive process, but tedious manual editing is still required due to the limited performance. In this paper, the authors propose a new interactive segmentation method that can (1) flexibly generate the editing result with a few scribbles or dots provided by a clinician, (2) fast deliver intermediate results to the clinician, and (3) sequentially correct the segmentations from any type of automatic or interactive segmentation methods. Methods: The authors formulate the editing problem as a semisupervised learning problem which can utilize a priori knowledge of training data and also the valuable information from user interactions. Specifically, from a region of interest near the given user interactions, the appropriate training labels, which are well matched with the user interactions, can be locally searched from a training set. With voting from the selected training labels, both confident prostate and background voxels, as well as unconfident voxels can be estimated. To reflect informative relationship between voxels, location-adaptive features are selected from the confident voxels by using regression forest and Fisher separation criterion. Then, the manifold configuration computed in the derived feature space is enforced into the semisupervised learning algorithm. The labels of unconfident voxels are then predicted by regularizing semisupervised learning algorithm. Results: The proposed interactive segmentation method was applied to correct automatic segmentation results of 30 challenging CT images. The correction was conducted three times with different user interactions performed at different time periods, in order to

  2. Interactive prostate segmentation using atlas-guided semi-supervised learning and adaptive feature selection

    International Nuclear Information System (INIS)

    Park, Sang Hyun; Gao, Yaozong; Shi, Yinghuan; Shen, Dinggang

    2014-01-01

    Purpose: Accurate prostate segmentation is necessary for maximizing the effectiveness of radiation therapy of prostate cancer. However, manual segmentation from 3D CT images is very time-consuming and often causes large intra- and interobserver variations across clinicians. Many segmentation methods have been proposed to automate this labor-intensive process, but tedious manual editing is still required due to the limited performance. In this paper, the authors propose a new interactive segmentation method that can (1) flexibly generate the editing result with a few scribbles or dots provided by a clinician, (2) fast deliver intermediate results to the clinician, and (3) sequentially correct the segmentations from any type of automatic or interactive segmentation methods. Methods: The authors formulate the editing problem as a semisupervised learning problem which can utilize a priori knowledge of training data and also the valuable information from user interactions. Specifically, from a region of interest near the given user interactions, the appropriate training labels, which are well matched with the user interactions, can be locally searched from a training set. With voting from the selected training labels, both confident prostate and background voxels, as well as unconfident voxels can be estimated. To reflect informative relationship between voxels, location-adaptive features are selected from the confident voxels by using regression forest and Fisher separation criterion. Then, the manifold configuration computed in the derived feature space is enforced into the semisupervised learning algorithm. The labels of unconfident voxels are then predicted by regularizing semisupervised learning algorithm. Results: The proposed interactive segmentation method was applied to correct automatic segmentation results of 30 challenging CT images. The correction was conducted three times with different user interactions performed at different time periods, in order to

  3. Interactive prostate segmentation using atlas-guided semi-supervised learning and adaptive feature selection.

    Science.gov (United States)

    Park, Sang Hyun; Gao, Yaozong; Shi, Yinghuan; Shen, Dinggang

    2014-11-01

    Accurate prostate segmentation is necessary for maximizing the effectiveness of radiation therapy of prostate cancer. However, manual segmentation from 3D CT images is very time-consuming and often causes large intra- and interobserver variations across clinicians. Many segmentation methods have been proposed to automate this labor-intensive process, but tedious manual editing is still required due to the limited performance. In this paper, the authors propose a new interactive segmentation method that can (1) flexibly generate the editing result with a few scribbles or dots provided by a clinician, (2) fast deliver intermediate results to the clinician, and (3) sequentially correct the segmentations from any type of automatic or interactive segmentation methods. The authors formulate the editing problem as a semisupervised learning problem which can utilize a priori knowledge of training data and also the valuable information from user interactions. Specifically, from a region of interest near the given user interactions, the appropriate training labels, which are well matched with the user interactions, can be locally searched from a training set. With voting from the selected training labels, both confident prostate and background voxels, as well as unconfident voxels can be estimated. To reflect informative relationship between voxels, location-adaptive features are selected from the confident voxels by using regression forest and Fisher separation criterion. Then, the manifold configuration computed in the derived feature space is enforced into the semisupervised learning algorithm. The labels of unconfident voxels are then predicted by regularizing semisupervised learning algorithm. The proposed interactive segmentation method was applied to correct automatic segmentation results of 30 challenging CT images. The correction was conducted three times with different user interactions performed at different time periods, in order to evaluate both the efficiency

  4. Design optimization and analysis of selected thermal devices using self-adaptive Jaya algorithm

    International Nuclear Information System (INIS)

    Rao, R.V.; More, K.C.

    2017-01-01

    Highlights: • Self-adaptive Jaya algorithm is proposed for optimal design of thermal devices. • Optimization of heat pipe, cooling tower, heat sink and thermo-acoustic prime mover is presented. • Results of the proposed algorithm are better than the other optimization techniques. • The proposed algorithm may be conveniently used for the optimization of other devices. - Abstract: The present study explores the use of an improved Jaya algorithm called self-adaptive Jaya algorithm for optimal design of selected thermal devices viz; heat pipe, cooling tower, honeycomb heat sink and thermo-acoustic prime mover. Four different optimization case studies of the selected thermal devices are presented. The researchers had attempted the same design problems in the past using niched pareto genetic algorithm (NPGA), response surface method (RSM), leap-frog optimization program with constraints (LFOPC) algorithm, teaching-learning based optimization (TLBO) algorithm, grenade explosion method (GEM) and multi-objective genetic algorithm (MOGA). The results achieved by using self-adaptive Jaya algorithm are compared with those achieved by using the NPGA, RSM, LFOPC, TLBO, GEM and MOGA algorithms. The self-adaptive Jaya algorithm is proved superior as compared to the other optimization methods in terms of the results, computational effort and function evalutions.

  5. Planimetric Features Generalization for the Production of Small-Scale Map by Using Base Maps and the Existing Algorithms

    Directory of Open Access Journals (Sweden)

    M. Modiri

    2014-10-01

    Full Text Available Cartographic maps are representations of the Earth upon a flat surface in the smaller scale than it’s true. Large scale maps cover relatively small regions in great detail and small scale maps cover large regions such as nations, continents and the whole globe. Logical connection between the features and scale map must be maintained by changing the scale and it is important to recognize that even the most accurate maps sacrifice a certain amount of accuracy in scale to deliver a greater visual usefulness to its user. Cartographic generalization, or map generalization, is the method whereby information is selected and represented on a map in a way that adapts to the scale of the display medium of the map, not necessarily preserving all intricate geographical or other cartographic details. Due to the problems facing small-scale map production process and the need to spend time and money for surveying, today’s generalization is used as executive approach. The software is proposed in this paper that converted various data and information to certain Data Model. This software can produce generalization map according to base map using the existing algorithm. Planimetric generalization algorithms and roles are described in this article. Finally small-scale maps with 1:100,000, 1:250,000 and 1:500,000 scale are produced automatically and they are shown at the end.

  6. The mixing evolutionary algorithm : indepedent selection and allocation of trials

    NARCIS (Netherlands)

    C.H.M. van Kemenade

    1997-01-01

    textabstractWhen using an evolutionary algorithm to solve a problem involving building blocks we have to grow the building blocks and then mix these building blocks to obtain the (optimal) solution. Finding a good balance between the growing and the mixing process is a prerequisite to get a reliable

  7. Study of Image Analysis Algorithms for Segmentation, Feature Extraction and Classification of Cells

    Directory of Open Access Journals (Sweden)

    Margarita Gamarra

    2017-08-01

    Full Text Available Recent advances in microcopy and improvements in image processing algorithms have allowed the development of computer-assisted analytical approaches in cell identification. Several applications could be mentioned in this field: Cellular phenotype identification, disease detection and treatment, identifying virus entry in cells and virus classification; these applications could help to complement the opinion of medical experts. Although many surveys have been presented in medical image analysis, they focus mainly in tissues and organs and none of the surveys about image cells consider an analysis following the stages in the typical image processing: Segmentation, feature extraction and classification. The goal of this study is to provide comprehensive and critical analyses about the trends in each stage of cell image processing. In this paper, we present a literature survey about cell identification using different image processing techniques.

  8. Selecting protein families for environmental features based on manifold regularization.

    Science.gov (United States)

    Jiang, Xingpeng; Xu, Weiwei; Park, E K; Li, Guangrong

    2014-06-01

    Recently, statistics and machine learning have been developed to identify functional or taxonomic features of environmental features or physiological status. Important proteins (or other functional and taxonomic entities) to environmental features can be potentially used as biosensors. A major challenge is how the distribution of protein and gene functions embodies the adaption of microbial communities across environments and host habitats. In this paper, we propose a novel regularization method for linear regression to adapt the challenge. The approach is inspired by local linear embedding (LLE) and we call it a manifold-constrained regularization for linear regression (McRe). The novel regularization procedure also has potential to be used in solving other linear systems. We demonstrate the efficiency and the performance of the approach in both simulation and real data.

  9. A Single LiDAR-Based Feature Fusion Indoor Localization Algorithm.

    Science.gov (United States)

    Wang, Yun-Ting; Peng, Chao-Chung; Ravankar, Ankit A; Ravankar, Abhijeet

    2018-04-23

    In past years, there has been significant progress in the field of indoor robot localization. To precisely recover the position, the robots usually relies on multiple on-board sensors. Nevertheless, this affects the overall system cost and increases computation. In this research work, we considered a light detection and ranging (LiDAR) device as the only sensor for detecting surroundings and propose an efficient indoor localization algorithm. To attenuate the computation effort and preserve localization robustness, a weighted parallel iterative closed point (WP-ICP) with interpolation is presented. As compared to the traditional ICP, the point cloud is first processed to extract corners and line features before applying point registration. Later, points labeled as corners are only matched with the corner candidates. Similarly, points labeled as lines are only matched with the lines candidates. Moreover, their ICP confidence levels are also fused in the algorithm, which make the pose estimation less sensitive to environment uncertainties. The proposed WP-ICP architecture reduces the probability of mismatch and thereby reduces the ICP iterations. Finally, based on given well-constructed indoor layouts, experiment comparisons are carried out under both clean and perturbed environments. It is shown that the proposed method is effective in significantly reducing computation effort and is simultaneously able to preserve localization precision.

  10. Improved algorithms and advanced features of the CAD to MC conversion tool McCad

    International Nuclear Information System (INIS)

    Lu, L.; Fischer, U.; Pereslavtsev, P.

    2014-01-01

    Highlights: •The latest improvements of the McCad conversion approach including decomposition and void filling algorithms is presented. •An advanced interface for the materials editing and assignment has been developed and added to the McCAD GUI. •These improvements have been tested and successfully applied to DEMO and ITER NBI (Neutral Beam Injector) applications. •The performance of the CAD model conversion process is shown to be significantly improved. -- Abstract: McCad is a geometry conversion tool developed at KIT to enable the automatic bi-directional conversions of CAD models into the Monte Carlo (MC) geometries utilized for neutronics calculations (CAD to MC) and, reversed (MC to CAD), for visualization purposes. The paper presents the latest improvements of the conversion algorithms including improved decomposition, void filling and an advanced interface for the materials editing and assignment. The new implementations and features were tested on fusion neutronics applications to the DEMO and ITER NBI (Neutral Beam Injector) models. The results demonstrate greater stability and enhanced efficiency of McCad conversion process

  11. A Single LiDAR-Based Feature Fusion Indoor Localization Algorithm

    Directory of Open Access Journals (Sweden)

    Yun-Ting Wang

    2018-04-01

    Full Text Available In past years, there has been significant progress in the field of indoor robot localization. To precisely recover the position, the robots usually relies on multiple on-board sensors. Nevertheless, this affects the overall system cost and increases computation. In this research work, we considered a light detection and ranging (LiDAR device as the only sensor for detecting surroundings and propose an efficient indoor localization algorithm. To attenuate the computation effort and preserve localization robustness, a weighted parallel iterative closed point (WP-ICP with interpolation is presented. As compared to the traditional ICP, the point cloud is first processed to extract corners and line features before applying point registration. Later, points labeled as corners are only matched with the corner candidates. Similarly, points labeled as lines are only matched with the lines candidates. Moreover, their ICP confidence levels are also fused in the algorithm, which make the pose estimation less sensitive to environment uncertainties. The proposed WP-ICP architecture reduces the probability of mismatch and thereby reduces the ICP iterations. Finally, based on given well-constructed indoor layouts, experiment comparisons are carried out under both clean and perturbed environments. It is shown that the proposed method is effective in significantly reducing computation effort and is simultaneously able to preserve localization precision.

  12. Modified automatic term selection v2: A faster algorithm to calculate inelastic scattering cross-sections

    Energy Technology Data Exchange (ETDEWEB)

    Rusz, Ján, E-mail: jan.rusz@fysik.uu.se

    2017-06-15

    Highlights: • New algorithm for calculating double differential scattering cross-section. • Shown good convergence properties. • Outperforms older MATS algorithm, particularly in zone axis calculations. - Abstract: We present a new algorithm for calculating inelastic scattering cross-section for fast electrons. Compared to the previous Modified Automatic Term Selection (MATS) algorithm (Rusz et al. [18]), it has far better convergence properties in zone axis calculations and it allows to identify contributions of individual atoms. One can think of it as a blend of MATS algorithm and a method described by Weickenmeier and Kohl [10].

  13. YamiPred: A novel evolutionary method for predicting pre-miRNAs and selecting relevant features

    KAUST Repository

    Kleftogiannis, Dimitrios A.; Theofilatos, Konstantinos; Likothanassis, Spiros; Mavroudi, Seferina

    2015-01-01

    MicroRNAs (miRNAs) are small non-coding RNAs, which play a significant role in gene regulation. Predicting miRNA genes is a challenging bioinformatics problem and existing experimental and computational methods fail to deal with it effectively. We developed YamiPred, an embedded classification method that combines the efficiency and robustness of Support Vector Machines (SVM) with Genetic Algorithms (GA) for feature selection and parameters optimization. YamiPred was tested in a new and realistic human dataset and was compared with state-of-the-art computational intelligence approaches and the prevalent SVM-based tools for miRNA prediction. Experimental results indicate that YamiPred outperforms existing approaches in terms of accuracy and of geometric mean of sensitivity and specificity. The embedded feature selection component selects a compact feature subset that contributes to the performance optimization. Further experimentation with this minimal feature subset has achieved very high classification performance and revealed the minimum number of samples required for developing a robust predictor. YamiPred also confirmed the important role of commonly used features such as entropy and enthalpy, and uncovered the significance of newly introduced features, such as %A-U aggregate nucleotide frequency and positional entropy. The best model trained on human data has successfully predicted pre-miRNAs to other organisms including the category of viruses.

  14. Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants.

    Science.gov (United States)

    Yousef, Malik; Saçar Demirci, Müşerref Duygu; Khalifa, Waleed; Allmer, Jens

    2016-01-01

    MicroRNAs (miRNAs) are short RNA sequences involved in posttranscriptional gene regulation. Their experimental analysis is complicated and, therefore, needs to be supplemented with computational miRNA detection. Currently computational miRNA detection is mainly performed using machine learning and in particular two-class classification. For machine learning, the miRNAs need to be parametrized and more than 700 features have been described. Positive training examples for machine learning are readily available, but negative data is hard to come by. Therefore, it seems prerogative to use one-class classification instead of two-class classification. Previously, we were able to almost reach two-class classification accuracy using one-class classifiers. In this work, we employ feature selection procedures in conjunction with one-class classification and show that there is up to 36% difference in accuracy among these feature selection methods. The best feature set allowed the training of a one-class classifier which achieved an average accuracy of ~95.6% thereby outperforming previous two-class-based plant miRNA detection approaches by about 0.5%. We believe that this can be improved upon in the future by rigorous filtering of the positive training examples and by improving current feature clustering algorithms to better target pre-miRNA feature selection.

  15. YamiPred: A novel evolutionary method for predicting pre-miRNAs and selecting relevant features

    KAUST Repository

    Kleftogiannis, Dimitrios A.

    2015-01-23

    MicroRNAs (miRNAs) are small non-coding RNAs, which play a significant role in gene regulation. Predicting miRNA genes is a challenging bioinformatics problem and existing experimental and computational methods fail to deal with it effectively. We developed YamiPred, an embedded classification method that combines the efficiency and robustness of Support Vector Machines (SVM) with Genetic Algorithms (GA) for feature selection and parameters optimization. YamiPred was tested in a new and realistic human dataset and was compared with state-of-the-art computational intelligence approaches and the prevalent SVM-based tools for miRNA prediction. Experimental results indicate that YamiPred outperforms existing approaches in terms of accuracy and of geometric mean of sensitivity and specificity. The embedded feature selection component selects a compact feature subset that contributes to the performance optimization. Further experimentation with this minimal feature subset has achieved very high classification performance and revealed the minimum number of samples required for developing a robust predictor. YamiPred also confirmed the important role of commonly used features such as entropy and enthalpy, and uncovered the significance of newly introduced features, such as %A-U aggregate nucleotide frequency and positional entropy. The best model trained on human data has successfully predicted pre-miRNAs to other organisms including the category of viruses.

  16. Modified CC-LR algorithm with three diverse feature sets for motor imagery tasks classification in EEG based brain-computer interface.

    Science.gov (United States)

    Siuly; Li, Yan; Paul Wen, Peng

    2014-03-01

    Motor imagery (MI) tasks classification provides an important basis for designing brain-computer interface (BCI) systems. If the MI tasks are reliably distinguished through identifying typical patterns in electroencephalography (EEG) data, a motor disabled people could communicate with a device by composing sequences of these mental states. In our earlier study, we developed a cross-correlation based logistic regression (CC-LR) algorithm for the classification of MI tasks for BCI applications, but its performance was not satisfactory. This study develops a modified version of the CC-LR algorithm exploring a suitable feature set that can improve the performance. The modified CC-LR algorithm uses the C3 electrode channel (in the international 10-20 system) as a reference channel for the cross-correlation (CC) technique and applies three diverse feature sets separately, as the input to the logistic regression (LR) classifier. The present algorithm investigates which feature set is the best to characterize the distribution of MI tasks based EEG data. This study also provides an insight into how to select a reference channel for the CC technique with EEG signals considering the anatomical structure of the human brain. The proposed algorithm is compared with eight of the most recently reported well-known methods including the BCI III Winner algorithm. The findings of this study indicate that the modified CC-LR algorithm has potential to improve the identification performance of MI tasks in BCI systems. The results demonstrate that the proposed technique provides a classification improvement over the existing methods tested. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  17. A Two-Pass Exact Algorithm for Selection on Parallel Disk Systems.

    Science.gov (United States)

    Mi, Tian; Rajasekaran, Sanguthevar

    2013-07-01

    Numerous OLAP queries process selection operations of "top N", median, "top 5%", in data warehousing applications. Selection is a well-studied problem that has numerous applications in the management of data and databases since, typically, any complex data query can be reduced to a series of basic operations such as sorting and selection. The parallel selection has also become an important fundamental operation, especially after parallel databases were introduced. In this paper, we present a deterministic algorithm Recursive Sampling Selection (RSS) to solve the exact out-of-core selection problem, which we show needs no more than (2 + ε ) passes ( ε being a very small fraction). We have compared our RSS algorithm with two other algorithms in the literature, namely, the Deterministic Sampling Selection and QuickSelect on the Parallel Disks Systems. Our analysis shows that DSS is a (2 + ε )-pass algorithm when the total number of input elements N is a polynomial in the memory size M (i.e., N = M c for some constant c ). While, our proposed algorithm RSS runs in (2 + ε ) passes without any assumptions. Experimental results indicate that both RSS and DSS outperform QuickSelect on the Parallel Disks Systems. Especially, the proposed algorithm RSS is more scalable and robust to handle big data when the input size is far greater than the core memory size, including the case of N ≫ M c .

  18. Feature selection for domain knowledge representation through multitask learning

    CSIR Research Space (South Africa)

    Rosman, Benjamin S

    2014-10-01

    Full Text Available represent stimuli of interest, and rich feature sets which increase the dimensionality of the space and thus the difficulty of the learning problem. We focus on a multitask reinforcement learning setting, where the agent is learning domain knowledge...

  19. A method of evolving novel feature extraction algorithms for detecting buried objects in FLIR imagery using genetic programming

    Science.gov (United States)

    Paino, A.; Keller, J.; Popescu, M.; Stone, K.

    2014-06-01

    In this paper we present an approach that uses Genetic Programming (GP) to evolve novel feature extraction algorithms for greyscale images. Our motivation is to create an automated method of building new feature extraction algorithms for images that are competitive with commonly used human-engineered features, such as Local Binary Pattern (LBP) and Histogram of Oriented Gradients (HOG). The evolved feature extraction algorithms are functions defined over the image space, and each produces a real-valued feature vector of variable length. Each evolved feature extractor breaks up the given image into a set of cells centered on every pixel, performs evolved operations on each cell, and then combines the results of those operations for every cell using an evolved operator. Using this method, the algorithm is flexible enough to reproduce both LBP and HOG features. The dataset we use to train and test our approach consists of a large number of pre-segmented image "chips" taken from a Forward Looking Infrared Imagery (FLIR) camera mounted on the hood of a moving vehicle. The goal is to classify each image chip as either containing or not containing a buried object. To this end, we define the fitness of a candidate solution as the cross-fold validation accuracy of the features generated by said candidate solution when used in conjunction with a Support Vector Machine (SVM) classifier. In order to validate our approach, we compare the classification accuracy of an SVM trained using our evolved features with the accuracy of an SVM trained using mainstream feature extraction algorithms, including LBP and HOG.

  20. UNLABELED SELECTED SAMPLES IN FEATURE EXTRACTION FOR CLASSIFICATION OF HYPERSPECTRAL IMAGES WITH LIMITED TRAINING SAMPLES

    Directory of Open Access Journals (Sweden)

    A. Kianisarkaleh

    2015-12-01

    Full Text Available Feature extraction plays a key role in hyperspectral images classification. Using unlabeled samples, often unlimitedly available, unsupervised and semisupervised feature extraction methods show better performance when limited number of training samples exists. This paper illustrates the importance of selecting appropriate unlabeled samples that used in feature extraction methods. Also proposes a new method for unlabeled samples selection using spectral and spatial information. The proposed method has four parts including: PCA, prior classification, posterior classification and sample selection. As hyperspectral image passes these parts, selected unlabeled samples can be used in arbitrary feature extraction methods. The effectiveness of the proposed unlabeled selected samples in unsupervised and semisupervised feature extraction is demonstrated using two real hyperspectral datasets. Results show that through selecting appropriate unlabeled samples, the proposed method can improve the performance of feature extraction methods and increase classification accuracy.

  1. Optimum Actuator Selection with a Genetic Algorithm for Aircraft Control

    Science.gov (United States)

    Rogers, James L.

    2004-01-01

    The placement of actuators on a wing determines the control effectiveness of the airplane. One approach to placement maximizes the moments about the pitch, roll, and yaw axes, while minimizing the coupling. For example, the desired actuators produce a pure roll moment without at the same time causing much pitch or yaw. For a typical wing, there is a large set of candidate locations for placing actuators, resulting in a substantially larger number of combinations to examine in order to find an optimum placement satisfying the mission requirements and mission constraints. A genetic algorithm has been developed for finding the best placement for four actuators to produce an uncoupled pitch moment. The genetic algorithm has been extended to find the minimum number of actuators required to provide uncoupled pitch, roll, and yaw control. A simplified, untapered, unswept wing is the model for each application.

  2. Darwinian algorithms and the Wason selection task: a factorial analysis of social contract selection task problems.

    Science.gov (United States)

    Platt, R D; Griggs, R A

    1993-08-01

    In four experiments with 760 subjects, the present study examined Cosmides' Darwinian algorithm theory of reasoning: specifically, its explanation of facilitation on the Wason selection task. The first experiment replicated Cosmides' finding of facilitation for social contract versions of the selection task, using both her multiple-problem format and a single-problem format. Experiment 2 examined performance on Cosmides' three main social contract problems while manipulating the perspective of the subject and the presence and absence of cost-benefit information. The presence of cost-benefit information improved performance in two of the three problems while the perspective manipulation had no effect. In Experiment 3, the cost-benefit effect was replicated; and performance on one of the three problems was enhanced by the presence of explicit negatives on the NOT-P and NOT-Q cards. Experiment 4 examined the role of the deontic term "must" in the facilitation observed for two of the social contract problems. The presence of "must" led to a significant improvement in performance. The results of these experiments are strongly supportive of social contract theory in that cost-benefit information is necessary for substantial facilitation to be observed in Cosmides' problems. These findings also suggest the presence of other cues that can help guide subjects to a deontic social contract interpretation when the social contract nature of the problem is not clear.

  3. The Nonlocal Sparse Reconstruction Algorithm by Similarity Measurement with Shearlet Feature Vector

    Directory of Open Access Journals (Sweden)

    Wu Qidi

    2014-01-01

    Full Text Available Due to the limited accuracy of conventional methods with image restoration, the paper supplied a nonlocal sparsity reconstruction algorithm with similarity measurement. To improve the performance of restoration results, we proposed two schemes to dictionary learning and sparse coding, respectively. In the part of the dictionary learning, we measured the similarity between patches from degraded image by constructing the Shearlet feature vector. Besides, we classified the patches into different classes with similarity and trained the cluster dictionary for each class, by cascading which we could gain the universal dictionary. In the part of sparse coding, we proposed a novel optimal objective function with the coding residual item, which can suppress the residual between the estimate coding and true sparse coding. Additionally, we show the derivation of self-adaptive regularization parameter in optimization under the Bayesian framework, which can make the performance better. It can be indicated from the experimental results that by taking full advantage of similar local geometric structure feature existing in the nonlocal patches and the coding residual suppression, the proposed method shows advantage both on visual perception and PSNR compared to the conventional methods.

  4. Algorithms

    Indian Academy of Sciences (India)

    polynomial) division have been found in Vedic Mathematics which are dated much before Euclid's algorithm. A programming language Is used to describe an algorithm for execution on a computer. An algorithm expressed using a programming.

  5. Iterated Local Search Algorithm with Strategic Oscillation for School Bus Routing Problem with Bus Stop Selection

    Directory of Open Access Journals (Sweden)

    Mohammad Saied Fallah Niasar

    2017-02-01

    Full Text Available he school bus routing problem (SBRP represents a variant of the well-known vehicle routing problem. The main goal of this study is to pick up students allocated to some bus stops and generate routes, including the selected stops, in order to carry students to school. In this paper, we have proposed a simple but effective metaheuristic approach that employs two features: first, it utilizes large neighborhood structures for a deeper exploration of the search space; second, the proposed heuristic executes an efficient transition between the feasible and infeasible portions of the search space. Exploration of the infeasible area is controlled by a dynamic penalty function to convert the unfeasible solution into a feasible one. Two metaheuristics, called N-ILS (a variant of the Nearest Neighbourhood with Iterated Local Search algorithm and I-ILS (a variant of Insertion with Iterated Local Search algorithm are proposed to solve SBRP. Our experimental procedure is based on the two data sets. The results show that N-ILS is able to obtain better solutions in shorter computing times. Additionally, N-ILS appears to be very competitive in comparison with the best existing metaheuristics suggested for SBRP

  6. Our Selections and Decisions: Inherent Features of the Nervous System?

    Science.gov (United States)

    Rösler, Frank

    The chapter summarizes findings on the neuronal bases of decisionmaking. Taking the phenomenon of selection it will be explained that systems built only from excitatory and inhibitory neuron (populations) have the emergent property of selecting between different alternatives. These considerations suggest that there exists a hierarchical architecture with central selection switches. However, in such a system, functions of selection and decision-making are not localized, but rather emerge from an interaction of several participating networks. These are, on the one hand, networks that process specific input and output representations and, on the other hand, networks that regulate the relative activation/inhibition of the specific input and output networks. These ideas are supported by recent empirical evidence. Moreover, other studies show that rather complex psychological variables, like subjective probability estimates, expected gains and losses, prediction errors, etc., do have biological correlates, i.e., they can be localized in time and space as activation states of neural networks and single cells. These findings suggest that selections and decisions are consequences of an architecture which, seen from a biological perspective, is fully deterministic. However, a transposition of such nomothetic functional principles into the idiographic domain, i.e., using them as elements for comprehensive 'mechanistic' explanations of individual decisions, seems not to be possible because of principle limitations. Therefore, individual decisions will remain predictable by means of probabilistic models alone.

  7. Selecting Features Of A Web Platform To Enhance Course Delivery

    OpenAIRE

    Karen A. Berger; Martin T. Topol

    2011-01-01

    This paper reviews key features of popular Web platforms used for course delivery. Institutions of higher education have rushed to adopt these platforms for several reasons. From the point of view of the educator, the most important reason is to enhance the classroom experience (real or virtual). Classroom experiences can benefit from a continuous stream of discourse made possible by the communications tools available in the web platforms designed for educational application. In addition, web...

  8. A hybrid intelligent algorithm for portfolio selection problem with fuzzy returns

    Science.gov (United States)

    Li, Xiang; Zhang, Yang; Wong, Hau-San; Qin, Zhongfeng

    2009-11-01

    Portfolio selection theory with fuzzy returns has been well developed and widely applied. Within the framework of credibility theory, several fuzzy portfolio selection models have been proposed such as mean-variance model, entropy optimization model, chance constrained programming model and so on. In order to solve these nonlinear optimization models, a hybrid intelligent algorithm is designed by integrating simulated annealing algorithm, neural network and fuzzy simulation techniques, where the neural network is used to approximate the expected value and variance for fuzzy returns and the fuzzy simulation is used to generate the training data for neural network. Since these models are used to be solved by genetic algorithm, some comparisons between the hybrid intelligent algorithm and genetic algorithm are given in terms of numerical examples, which imply that the hybrid intelligent algorithm is robust and more effective. In particular, it reduces the running time significantly for large size problems.

  9. Rating Algorithm for Pronunciation of English Based on Audio Feature Pattern Matching

    Directory of Open Access Journals (Sweden)

    Li Kun

    2015-01-01

    Full Text Available With the increasing internationalization of China, language communication has become an important channel for us to adapt to the political and economic environment. How to improve English learners’ language learning efficiency in limited conditions has turned into a problem demanding prompt solution at present. This paper applies two pronunciation patterns according to the actual needs of English pronunciation rating: to-be-evaluated pronunciation pattern and standard pronunciation pattern. It will translate the patterns into English pronunciation rating results through European distance. Besides, this paper will introduce the design philosophy of the whole algorithm in combination with CHMM matching pattern. Each link of the CHMM pattern will be given selective analysis while a contrast experiment between the CHMM matching pattern and the other two patterns will be conducted. From the experiment results, it can be concluded that CHMM pattern is the best option.

  10. A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods.

    Science.gov (United States)

    Torija, Antonio J; Ruiz, Diego P

    2015-02-01

    The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.

  11. Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection

    Science.gov (United States)

    Sarojini, Balakrishnan; Ramaraj, Narayanasamy; Nickolas, Savarimuthu

    Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.

  12. A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark

    Directory of Open Access Journals (Sweden)

    Yong Wang

    2016-02-01

    Full Text Available Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly.

  13. A modification of the successive projections algorithm for spectral variable selection in the presence of unknown interferents.

    Science.gov (United States)

    Soares, Sófacles Figueredo Carreiro; Galvão, Roberto Kawakami Harrop; Araújo, Mário César Ugulino; da Silva, Edvan Cirino; Pereira, Claudete Fernandes; de Andrade, Stéfani Iury Evangelista; Leite, Flaviano Carvalho

    2011-03-09

    This work proposes a modification to the successive projections algorithm (SPA) aimed at selecting spectral variables for multiple linear regression (MLR) in the presence of unknown interferents not included in the calibration data set. The modified algorithm favours the selection of variables in which the effect of the interferent is less pronounced. The proposed procedure can be regarded as an adaptive modelling technique, because the spectral features of the samples to be analyzed are considered in the variable selection process. The advantages of this new approach are demonstrated in two analytical problems, namely (1) ultraviolet-visible spectrometric determination of tartrazine, allure red and sunset yellow in aqueous solutions under the interference of erythrosine, and (2) near-infrared spectrometric determination of ethanol in gasoline under the interference of toluene. In these case studies, the performance of conventional MLR-SPA models is substantially degraded by the presence of the interferent. This problem is circumvented by applying the proposed Adaptive MLR-SPA approach, which results in prediction errors smaller than those obtained by three other multivariate calibration techniques, namely stepwise regression, full-spectrum partial-least-squares (PLS) and PLS with variables selected by a genetic algorithm. An inspection of the variable selection results reveals that the Adaptive approach successfully avoids spectral regions in which the interference is more intense. Copyright © 2011 Elsevier B.V. All rights reserved.

  14. Features of mechanical snubbers and the method of selection

    Energy Technology Data Exchange (ETDEWEB)

    Sunakoda, K [Sanwa Tekki Corp., Utsunomiya (Japan). Utsunomiya Works

    1978-11-01

    In the oil snubbers used in the high radiation environment of nuclear power stations, gas generation from oil and the deterioration of rubber material for sealing occur due to radiation damage, therefore periodical inspection and replacement are required during operation. The mechanical snubbers developed as aseismatic supporters in place of oil snubbers have entered the stage of practical use, and are made by two companies in USA and a company in Japan. Their features as compared with oil snubbers are presented.ces are explained.

  15. Mutual information based feature selection for medical image retrieval

    Science.gov (United States)

    Zhi, Lijia; Zhang, Shaomin; Li, Yan

    2018-04-01

    In this paper, authors propose a mutual information based method for lung CT image retrieval. This method is designed to adapt to different datasets and different retrieval task. For practical applying consideration, this method avoids using a large amount of training data. Instead, with a well-designed training process and robust fundamental features and measurements, the method in this paper can get promising performance and maintain economic training computation. Experimental results show that the method has potential practical values for clinical routine application.

  16. Self-organized spectrum chunk selection algorithm for Local Area LTE-Advanced

    DEFF Research Database (Denmark)

    Kumar, Sanjay; Wang, Yuanye; Marchetti, Nicola

    2010-01-01

    This paper presents a self organized spectrum chunk selection algorithm in order to minimize the mutual intercell interference among Home Node Bs (HeNBs), aiming to improve the system throughput performance compared to the existing frequency reuse one scheme. The proposed algorithm is useful...

  17. Molecular Features Underlying Selectivity in Chicken Bitter Taste Receptors

    Directory of Open Access Journals (Sweden)

    Antonella Di Pizio

    2018-01-01

    Full Text Available Chickens sense the bitter taste of structurally different molecules with merely three bitter taste receptors (Gallus gallus taste 2 receptors, ggTas2rs, representing a minimal case of bitter perception. Some bitter compounds like quinine, diphenidol and chlorpheniramine, activate all three ggTas2rs, while others selectively activate one or two of the receptors. We focus on bitter compounds with different selectivity profiles toward the three receptors, to shed light on the molecular recognition complexity in bitter taste. Using homology modeling and induced-fit docking simulations, we investigated the binding modes of ggTas2r agonists. Interestingly, promiscuous compounds are predicted to establish polar interactions with position 6.51 and hydrophobic interactions with positions 3.32 and 5.42 in all ggTas2rs; whereas certain residues are responsible for receptor selectivity. Lys3.29 and Asn3.36 are suggested as ggTas2r1-specificity-conferring residues; Gln6.55 as ggTas2r2-specificity-conferring residue; Ser5.38 and Gln7.42 as ggTas2r7-specificity conferring residues. The selectivity profile of quinine analogs, quinidine, epiquinidine and ethylhydrocupreine, was then characterized by combining calcium-imaging experiments and in silico approaches. ggTas2r models were used to virtually screen BitterDB compounds. ~50% of compounds known to be bitter to human are likely to be bitter to chicken, with 25, 20, 37% predicted to be ggTas2r1, ggTas2r2, ggTas2r7 agonists, respectively. Predicted ggTas2rs agonists can be tested with in vitro and in vivo experiments, contributing to our understanding of bitter taste in chicken and, consequently, to the improvement of chicken feed.

  18. Particle swarm optimization based feature enhancement and feature selection for improved emotion recognition in speech and glottal signals.

    Science.gov (United States)

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature.

  19. Breast cancer MRI radiomics: An overview of algorithmic features and impact of inter-reader variability in annotating tumors.

    Science.gov (United States)

    Saha, Ashirbani; Harowicz, Michael R; Mazurowski, Maciej A

    2018-04-16

    To review features used in MRI radiomics of breast cancer and study the inter-reader stability of the features METHODS: We implemented 529 algorithmic features that can be extracted from tumor and fibroglandular tissue (FGT) in breast MRIs. The features were identified based on a review of the existing literature with consideration of their usage, prognostic ability, and uniqueness. The set was then extended so that it comprehensively describes breast cancer imaging characteristics. The features were classified into 10 groups based on the type of data used to extract them and the type of calculation being performed. For the assessment of inter-reader variability, 4 fellowship-trained readers annotated tumors on pre-operative dynamic contrast enhanced MRIs for 50 breast cancer patients. Based on the annotations, an algorithm automatically segmented the image and extracted all features resulting in one set of features for each reader. For a given feature, the inter-reader stability was defined as the intra-class correlation coefficient (ICC) computed using the feature values obtained through all readers for all cases. The average inter-reader stability for all features was 0.8474 (95% CI: 0.8068-0.8858). The mean inter-reader stability was lower for tumor-based features (0.6348, 95% CI: 0.5391-0.7257) than FGT-based features (0.9984, 95% CI: 0.9970-0.9992). The feature group with the highest inter-reader stability quantifies breast and FGT volume. The feature group with the lowest inter-reader stability quantifies variations in tumor enhancement. Breast MRI radiomics features widely vary in terms of their stability in the presence of inter-reader variability. Appropriate measures need to be taken for reducing this variability in tumor-based radiomics. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  20. Non-sky polarization-based dehazing algorithm for non-specular objects using polarization difference and global scene feature.

    Science.gov (United States)

    Qu, Yufu; Zou, Zhaofan

    2017-10-16

    Photographic images taken in foggy or hazy weather (hazy images) exhibit poor visibility and detail because of scattering and attenuation of light caused by suspended particles, and therefore, image dehazing has attracted considerable research attention. The current polarization-based dehazing algorithms strongly rely on the presence of a "sky area", and thus, the selection of model parameters is susceptible to external interference of high-brightness objects and strong light sources. In addition, the noise of the restored image is large. In order to solve these problems, we propose a polarization-based dehazing algorithm that does not rely on the sky area ("non-sky"). First, a linear polarizer is used to collect three polarized images. The maximum- and minimum-intensity images are then obtained by calculation, assuming the polarization of light emanating from objects is negligible in most scenarios involving non-specular objects. Subsequently, the polarization difference of the two images is used to determine a sky area and calculate the infinite atmospheric light value. Next, using the global features of the image, and based on the assumption that the airlight and object radiance are irrelevant, the degree of polarization of the airlight (DPA) is calculated by solving for the optimal solution of the correlation coefficient equation between airlight and object radiance; the optimal solution is obtained by setting the right-hand side of the equation to zero. Then, the hazy image is subjected to dehazing. Subsequently, a filtering denoising algorithm, which combines the polarization difference information and block-matching and 3D (BM3D) filtering, is designed to filter the image smoothly. Our experimental results show that the proposed polarization-based dehazing algorithm does not depend on whether the image includes a sky area and does not require complex models. Moreover, the dehazing image except specular object scenarios is superior to those obtained by Tarel

  1. The admissible portfolio selection problem with transaction costs and an improved PSO algorithm

    Science.gov (United States)

    Chen, Wei; Zhang, Wei-Guo

    2010-05-01

    In this paper, we discuss the portfolio selection problem with transaction costs under the assumption that there exist admissible errors on expected returns and risks of assets. We propose a new admissible efficient portfolio selection model and design an improved particle swarm optimization (PSO) algorithm because traditional optimization algorithms fail to work efficiently for our proposed problem. Finally, we offer a numerical example to illustrate the proposed effective approaches and compare the admissible portfolio efficient frontiers under different constraints.

  2. An algorithm for selecting the most accurate protocol for contact angle measurement by drop shape analysis.

    Science.gov (United States)

    Xu, Z N

    2014-12-01

    In this study, an error analysis is performed to study real water drop images and the corresponding numerically generated water drop profiles for three widely used static contact angle algorithms: the circle- and ellipse-fitting algorithms and the axisymmetric drop shape analysis-profile (ADSA-P) algorithm. The results demonstrate the accuracy of the numerically generated drop profiles based on the Laplace equation. A significant number of water drop profiles with different volumes, contact angles, and noise levels are generated, and the influences of the three factors on the accuracies of the three algorithms are systematically investigated. The results reveal that the above-mentioned three algorithms are complementary. In fact, the circle- and ellipse-fitting algorithms show low errors and are highly resistant to noise for water drops with small/medium volumes and contact angles, while for water drop with large volumes and contact angles just the ADSA-P algorithm can meet accuracy requirement. However, this algorithm introduces significant errors in the case of small volumes and contact angles because of its high sensitivity to noise. The critical water drop volumes of the circle- and ellipse-fitting algorithms corresponding to a certain contact angle error are obtained through a significant amount of computation. To improve the precision of the static contact angle measurement, a more accurate algorithm based on a combination of the three algorithms is proposed. Following a systematic investigation, the algorithm selection rule is described in detail, while maintaining the advantages of the three algorithms and overcoming their deficiencies. In general, static contact angles over the entire hydrophobicity range can be accurately evaluated using the proposed algorithm. The ease of erroneous judgment in static contact angle measurements is avoided. The proposed algorithm is validated by a static contact angle evaluation of real and numerically generated water drop

  3. Feature Selection Methods for Zero-Shot Learning of Neural Activity

    Directory of Open Access Journals (Sweden)

    Carlos A. Caceres

    2017-06-01

    Full Text Available Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy.

  4. Quality-aware features-based noise level estimator for block matching and three-dimensional filtering algorithm

    Science.gov (United States)

    Xu, Shaoping; Hu, Lingyan; Yang, Xiaohui

    2016-01-01

    The performance of conventional denoising algorithms is usually controlled by one or several parameters whose optimal settings depend on the contents of the processed images and the characteristics of the noises. Among these parameters, noise level is a fundamental parameter that is always assumed to be known by most of the existing denoising algorithms (so-called nonblind denoising algorithms), which largely limits the applicability of these nonblind denoising algorithms in many applications. Moreover, these nonblind algorithms do not always achieve the best denoised images in visual quality even when fed with the actual noise level parameter. To address these shortcomings, in this paper we propose a new quality-aware features-based noise level estimator (NLE), which consists of quality-aware features extraction and optimal noise level parameter prediction. First, considering that image local contrast features convey important structural information that is closely related to image perceptual quality, we utilize the marginal statistics of two local contrast operators, i.e., the gradient magnitude and the Laplacian of Gaussian (LOG), to extract quality-aware features. The proposed quality-aware features have very low computational complexity, making them well suited for time-constrained applications. Then we propose a learning-based framework where the noise level parameter is estimated based on the quality-aware features. Based on the proposed NLE, we develop a blind block matching and three-dimensional filtering (BBM3D) denoising algorithm which is capable of effectively removing additive white Gaussian noise, even coupled with impulse noise. The noise level parameter of the BBM3D algorithm is automatically tuned according to the quality-aware features, guaranteeing the best performance. As such, the classical block matching and three-dimensional algorithm can be transformed into a blind one in an unsupervised manner. Experimental results demonstrate that the

  5. Max-AUC feature selection in computer-aided detection of polyps in CT colonography.

    Science.gov (United States)

    Xu, Jian-Wu; Suzuki, Kenji

    2014-03-01

    We propose a feature selection method based on a sequential forward floating selection (SFFS) procedure to improve the performance of a classifier in computerized detection of polyps in CT colonography (CTC). The feature selection method is coupled with a nonlinear support vector machine (SVM) classifier. Unlike the conventional linear method based on Wilks' lambda, the proposed method selected the most relevant features that would maximize the area under the receiver operating characteristic curve (AUC), which directly maximizes classification performance, evaluated based on AUC value, in the computer-aided detection (CADe) scheme. We presented two variants of the proposed method with different stopping criteria used in the SFFS procedure. The first variant searched all feature combinations allowed in the SFFS procedure and selected the subsets that maximize the AUC values. The second variant performed a statistical test at each step during the SFFS procedure, and it was terminated if the increase in the AUC value was not statistically significant. The advantage of the second variant is its lower computational cost. To test the performance of the proposed method, we compared it against the popular stepwise feature selection method based on Wilks' lambda for a colonic-polyp database (25 polyps and 2624 nonpolyps). We extracted 75 morphologic, gray-level-based, and texture features from the segmented lesion candidate regions. The two variants of the proposed feature selection method chose 29 and 7 features, respectively. Two SVM classifiers trained with these selected features yielded a 96% by-polyp sensitivity at false-positive (FP) rates of 4.1 and 6.5 per patient, respectively. Experiments showed a significant improvement in the performance of the classifier with the proposed feature selection method over that with the popular stepwise feature selection based on Wilks' lambda that yielded 18.0 FPs per patient at the same sensitivity level.

  6. Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests.

    Science.gov (United States)

    Le, Trang T; Simmons, W Kyle; Misaki, Masaya; Bodurka, Jerzy; White, Bill C; Savitz, Jonathan; McKinney, Brett A

    2017-09-15

    Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p≫n , these differential privacy methods are susceptible to overfitting. We introduce private Evaporative Cooling, a stochastic privacy-preserving machine learning algorithm that uses Relief-F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy-preserving threshold mechanism to a thermodynamic Maxwell-Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy-preserving feature selection. On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy-preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief-F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting-state fMRI data from a study of major depressive disorder. Code

  7. Feature selection is the ReliefF for multiple instance learning

    NARCIS (Netherlands)

    Zafra, A.; Pechenizkiy, M.; Ventura, S.

    2010-01-01

    Dimensionality reduction and feature selection in particular are known to be of a great help for making supervised learning more effective and efficient. Many different feature selection techniques have been proposed for the traditional settings, where each instance is expected to have a label. In

  8. An algorithm for finding biologically significant features in microarray data based on a priori manifold learning.

    Directory of Open Access Journals (Sweden)

    Zena M Hira

    Full Text Available Microarray databases are a large source of genetic data, which, upon proper analysis, could enhance our understanding of biology and medicine. Many microarray experiments have been designed to investigate the genetic mechanisms of cancer, and analytical approaches have been applied in order to classify different types of cancer or distinguish between cancerous and non-cancerous tissue. However, microarrays are high-dimensional datasets with high levels of noise and this causes problems when using machine learning methods. A popular approach to this problem is to search for a set of features that will simplify the structure and to some degree remove the noise from the data. The most widely used approach to feature extraction is principal component analysis (PCA which assumes a multivariate Gaussian model of the data. More recently, non-linear methods have been investigated. Among these, manifold learning algorithms, for example Isomap, aim to project the data from a higher dimensional space onto a lower dimension one. We have proposed a priori manifold learning for finding a manifold in which a representative set of microarray data is fused with relevant data taken from the KEGG pathway database. Once the manifold has been constructed the raw microarray data is projected onto it and clustering and classification can take place. In contrast to earlier fusion based methods, the prior knowledge from the KEGG databases is not used in, and does not bias the classification process--it merely acts as an aid to find the best space in which to search the data. In our experiments we have found that using our new manifold method gives better classification results than using either PCA or conventional Isomap.

  9. Preconditioned dynamic mode decomposition and mode selection algorithms for large datasets using incremental proper orthogonal decomposition

    Science.gov (United States)

    Ohmichi, Yuya

    2017-07-01

    In this letter, we propose a simple and efficient framework of dynamic mode decomposition (DMD) and mode selection for large datasets. The proposed framework explicitly introduces a preconditioning step using an incremental proper orthogonal decomposition (POD) to DMD and mode selection algorithms. By performing the preconditioning step, the DMD and mode selection can be performed with low memory consumption and therefore can be applied to large datasets. Additionally, we propose a simple mode selection algorithm based on a greedy method. The proposed framework is applied to the analysis of three-dimensional flow around a circular cylinder.

  10. GMDH Method with Genetic Selection Algorithm and Cloning

    Czech Academy of Sciences Publication Activity Database

    Jiřina, Marcel; Jiřina jr., M.

    2013-01-01

    Roč. 23, č. 5 (2013), s. 451-464 ISSN 1210-0552 Institutional support: RVO:67985807 Keywords : multivariate data * GMDH * linear regression * Gauss-Markov conditions * cloning * genetic selection * classification Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.412, year: 2013

  11. Algoritmi selektivnog šifrovanja - pregled sa ocenom performansi / Selective encryption algorithms: Overview with performance evaluation

    Directory of Open Access Journals (Sweden)

    Boriša Ž. Jovanović

    2010-10-01

    Full Text Available Digitalni multimedijalni sadržaj postaje zastupljeniji i sve više se razmenjuje putem računarskih mreža i javnih kanala (satelitske komunikacije, bežične mreže, internet, itd. koji predstavljaju nebezbedne medijume za prenos informacija osetljive sadržine. Sve više na značaju dobijaju mehanizmi kriptološke zaštite slika i video sadržaja. Tradicionalni sistemi kriptografske obrade u sistemima za prenos ovih vrsta informacija garantuju visok stepen sigurnosti, ali i imaju svoje nedostatke - visoku cenu implementacije i znatno kašnjenje u prenosu podataka. Pomenuti nedostaci se prevazilaze primenom algoritama selektivnog šifrovanja. / Digital multimedia content is becoming widely used and increasingly exchanged over computer network and public channels (satelite, wireless networks, Internet, etc. which is unsecured transmission media for ex changing that kind of information. Mechanisms made to encrypt image and video data are becoming more and more significant. Traditional cryptographic techniques can guarantee a high level of security but at the cost of expensive implementation and important transmission delays. These shortcomings can be exceeded using selective encryption algorithms. Introduction In traditional image and video content protection schemes, called fully layered, the whole content is first compressed. Then, the compressed bitstream is entirely encrypted using a standard cipher (DES - Data Encryption Algorithm, IDEA - International Data Encryption Algorithm, AES - Advanced Encryption Algorithm etc.. The specific characteristics of this kind of data, high-transmission rate with limited bandwidth, make standard encryption algorithms inadequate. Another limitation of traditional systems consists of altering the whole bitstream syntax which may disable some codec functionalities on the delivery site coder and decoder on the receiving site. Selective encryption is a new trend in image and video content protection. As its

  12. Effect of feature-selective attention on neuronal responses in macaque area MT

    Science.gov (United States)

    Chen, X.; Hoffmann, K.-P.; Albright, T. D.

    2012-01-01

    Attention influences visual processing in striate and extrastriate cortex, which has been extensively studied for spatial-, object-, and feature-based attention. Most studies exploring neural signatures of feature-based attention have trained animals to attend to an object identified by a certain feature and ignore objects/displays identified by a different feature. Little is known about the effects of feature-selective attention, where subjects attend to one stimulus feature domain (e.g., color) of an object while features from different domains (e.g., direction of motion) of the same object are ignored. To study this type of feature-selective attention in area MT in the middle temporal sulcus, we trained macaque monkeys to either attend to and report the direction of motion of a moving sine wave grating (a feature for which MT neurons display strong selectivity) or attend to and report its color (a feature for which MT neurons have very limited selectivity). We hypothesized that neurons would upregulate their firing rate during attend-direction conditions compared with attend-color conditions. We found that feature-selective attention significantly affected 22% of MT neurons. Contrary to our hypothesis, these neurons did not necessarily increase firing rate when animals attended to direction of motion but fell into one of two classes. In one class, attention to color increased the gain of stimulus-induced responses compared with attend-direction conditions. The other class displayed the opposite effects. Feature-selective activity modulations occurred earlier in neurons modulated by attention to color compared with neurons modulated by attention to motion direction. Thus feature-selective attention influences neuronal processing in macaque area MT but often exhibited a mismatch between the preferred stimulus dimension (direction of motion) and the preferred attention dimension (attention to color). PMID:22170961

  13. A Multiagent Evolutionary Algorithm for the Resource-Constrained Project Portfolio Selection and Scheduling Problem

    Directory of Open Access Journals (Sweden)

    Yongyi Shou

    2014-01-01

    Full Text Available A multiagent evolutionary algorithm is proposed to solve the resource-constrained project portfolio selection and scheduling problem. The proposed algorithm has a dual level structure. In the upper level a set of agents make decisions to select appropriate project portfolios. Each agent selects its project portfolio independently. The neighborhood competition operator and self-learning operator are designed to improve the agent’s energy, that is, the portfolio profit. In the lower level the selected projects are scheduled simultaneously and completion times are computed to estimate the expected portfolio profit. A priority rule-based heuristic is used by each agent to solve the multiproject scheduling problem. A set of instances were generated systematically from the widely used Patterson set. Computational experiments confirmed that the proposed evolutionary algorithm is effective for the resource-constrained project portfolio selection and scheduling problem.

  14. An Artificial Bee Colony Algorithm for Uncertain Portfolio Selection

    OpenAIRE

    Chen, Wei

    2014-01-01

    Portfolio selection is an important issue for researchers and practitioners. In this paper, under the assumption that security returns are given by experts’ evaluations rather than historical data, we discuss the portfolio adjusting problem which takes transaction costs and diversification degree of portfolio into consideration. Uncertain variables are employed to describe the security returns. In the proposed mean-variance-entropy model, the uncertain mean value of the return is ...

  15. Enhancing Breast Cancer Recurrence Algorithms Through Selective Use of Medical Record Data.

    Science.gov (United States)

    Kroenke, Candyce H; Chubak, Jessica; Johnson, Lisa; Castillo, Adrienne; Weltzien, Erin; Caan, Bette J

    2016-03-01

    The utility of data-based algorithms in research has been questioned because of errors in identification of cancer recurrences. We adapted previously published breast cancer recurrence algorithms, selectively using medical record (MR) data to improve classification. We evaluated second breast cancer event (SBCE) and recurrence-specific algorithms previously published by Chubak and colleagues in 1535 women from the Life After Cancer Epidemiology (LACE) and 225 women from the Women's Health Initiative cohorts and compared classification statistics to published values. We also sought to improve classification with minimal MR examination. We selected pairs of algorithms-one with high sensitivity/high positive predictive value (PPV) and another with high specificity/high PPV-using MR information to resolve discrepancies between algorithms, properly classifying events based on review; we called this "triangulation." Finally, in LACE, we compared associations between breast cancer survival risk factors and recurrence using MR data, single Chubak algorithms, and triangulation. The SBCE algorithms performed well in identifying SBCE and recurrences. Recurrence-specific algorithms performed more poorly than published except for the high-specificity/high-PPV algorithm, which performed well. The triangulation method (sensitivity = 81.3%, specificity = 99.7%, PPV = 98.1%, NPV = 96.5%) improved recurrence classification over two single algorithms (sensitivity = 57.1%, specificity = 95.5%, PPV = 71.3%, NPV = 91.9%; and sensitivity = 74.6%, specificity = 97.3%, PPV = 84.7%, NPV = 95.1%), with 10.6% MR review. Triangulation performed well in survival risk factor analyses vs analyses using MR-identified recurrences. Use of multiple recurrence algorithms in administrative data, in combination with selective examination of MR data, may improve recurrence data quality and reduce research costs. © The Author 2015. Published by Oxford University Press. All rights reserved. For

  16. Selection of parameters for advanced machining processes using firefly algorithm

    Directory of Open Access Journals (Sweden)

    Rajkamal Shukla

    2017-02-01

    Full Text Available Advanced machining processes (AMPs are widely utilized in industries for machining complex geometries and intricate profiles. In this paper, two significant processes such as electric discharge machining (EDM and abrasive water jet machining (AWJM are considered to get the optimum values of responses for the given range of process parameters. The firefly algorithm (FA is attempted to the considered processes to obtain optimized parameters and the results obtained are compared with the results given by previous researchers. The variation of process parameters with respect to the responses are plotted to confirm the optimum results obtained using FA. In EDM process, the performance parameter “MRR” is increased from 159.70 gm/min to 181.6723 gm/min, while “Ra” and “REWR” are decreased from 6.21 μm to 3.6767 μm and 6.21% to 6.324 × 10−5% respectively. In AWJM process, the value of the “kerf” and “Ra” are decreased from 0.858 mm to 0.3704 mm and 5.41 mm to 4.443 mm respectively. In both the processes, the obtained results show a significant improvement in the responses.

  17. A Comparison of Supervised Machine Learning Algorithms and Feature Vectors for MS Lesion Segmentation Using Multimodal Structural MRI

    Science.gov (United States)

    Sweeney, Elizabeth M.; Vogelstein, Joshua T.; Cuzzocreo, Jennifer L.; Calabresi, Peter A.; Reich, Daniel S.; Crainiceanu, Ciprian M.; Shinohara, Russell T.

    2014-01-01

    Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance. PMID:24781953

  18. Object learning improves feature extraction but does not improve feature selection.

    Directory of Open Access Journals (Sweden)

    Linus Holm

    Full Text Available A single glance at your crowded desk is enough to locate your favorite cup. But finding an unfamiliar object requires more effort. This superiority in recognition performance for learned objects has at least two possible sources. For familiar objects observers might: 1 select more informative image locations upon which to fixate their eyes, or 2 extract more information from a given eye fixation. To test these possibilities, we had observers localize fragmented objects embedded in dense displays of random contour fragments. Eight participants searched for objects in 600 images while their eye movements were recorded in three daily sessions. Performance improved as subjects trained with the objects: The number of fixations required to find an object decreased by 64% across the 3 sessions. An ideal observer model that included measures of fragment confusability was used to calculate the information available from a single fixation. Comparing human performance to the model suggested that across sessions information extraction at each eye fixation increased markedly, by an amount roughly equal to the extra information that would be extracted following a 100% increase in functional field of view. Selection of fixation locations, on the other hand, did not improve with practice.

  19. Algorithms

    Indian Academy of Sciences (India)

    to as 'divide-and-conquer'. Although there has been a large effort in realizing efficient algorithms, there are not many universally accepted algorithm design paradigms. In this article, we illustrate algorithm design techniques such as balancing, greedy strategy, dynamic programming strategy, and backtracking or traversal of ...

  20. Feature selection model based on clustering and ranking in pipeline for microarray data

    Directory of Open Access Journals (Sweden)

    Barnali Sahu

    2017-01-01

    Full Text Available Most of the available feature selection techniques in the literature are classifier bound. It means a group of features tied to the performance of a specific classifier as applied in wrapper and hybrid approach. Our objective in this study is to select a set of generic features not tied to any classifier based on the proposed framework. This framework uses attribute clustering and feature ranking techniques in pipeline in order to remove redundant features. On each uncovered cluster, signal-to-noise ratio, t-statistics and significance analysis of microarray are independently applied to select the top ranked features. Both filter and evolutionary wrapper approaches have been considered for feature selection and the data set with selected features are given to ensemble of predefined statistically different classifiers. The class labels of the test data are determined using majority voting technique. Moreover, with the aforesaid objectives, this paper focuses on obtaining a stable result out of various classification models. Further, a comparative analysis has been performed to study the classification accuracy and computational time of the current approach and evolutionary wrapper techniques. It gives a better insight into the features and further enhancing the classification accuracy with less computational time.

  1. An Introduction to Model Selection: Tools and Algorithms

    Directory of Open Access Journals (Sweden)

    Sébastien Hélie

    2006-03-01

    Full Text Available Model selection is a complicated matter in science, and psychology is no exception. In particular, the high variance in the object of study (i.e., humans prevents the use of Popper’s falsification principle (which is the norm in other sciences. Therefore, the desirability of quantitative psychological models must be assessed by measuring the capacity of the model to fit empirical data. In the present paper, an error measure (likelihood, as well as five methods to compare model fits (the likelihood ratio test, Akaike’s information criterion, the Bayesian information criterion, bootstrapping and cross-validation, are presented. The use of each method is illustrated by an example, and the advantages and weaknesses of each method are also discussed.

  2. Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection.

    Science.gov (United States)

    Chen, Yifei; Sun, Yuxing; Han, Bing-Qing

    2015-01-01

    Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. The experimental results reveal that the context similarity-based methods perform better in terms of the F1 measure and the dimension reduction rate. Benefiting from the context information surrounding the features, the proposed methods can select distinctive features effectively for protein interaction article classification.

  3. Identification of Biomarkers for Esophageal Squamous Cell Carcinoma Using Feature Selection and Decision Tree Methods

    Directory of Open Access Journals (Sweden)

    Chun-Wei Tung

    2013-01-01

    Full Text Available Esophageal squamous cell cancer (ESCC is one of the most common fatal human cancers. The identification of biomarkers for early detection could be a promising strategy to decrease mortality. Previous studies utilized microarray techniques to identify more than one hundred genes; however, it is desirable to identify a small set of biomarkers for clinical use. This study proposes a sequential forward feature selection algorithm to design decision tree models for discriminating ESCC from normal tissues. Two potential biomarkers of RUVBL1 and CNIH were identified and validated based on two public available microarray datasets. To test the discrimination ability of the two biomarkers, 17 pairs of expression profiles of ESCC and normal tissues from Taiwanese male patients were measured by using microarray techniques. The classification accuracies of the two biomarkers in all three datasets were higher than 90%. Interpretable decision tree models were constructed to analyze expression patterns of the two biomarkers. RUVBL1 was consistently overexpressed in all three datasets, although we found inconsistent CNIH expression possibly affected by the diverse major risk factors for ESCC across different areas.

  4. The M-OLAP Cube Selection Problem: A Hyper-polymorphic Algorithm Approach

    Science.gov (United States)

    Loureiro, Jorge; Belo, Orlando

    OLAP systems depend heavily on the materialization of multidimensional structures to speed-up queries, whose appropriate selection constitutes the cube selection problem. However, the recently proposed distribution of OLAP structures emerges to answer new globalization's requirements, capturing the known advantages of distributed databases. But this hardens the search for solutions, especially due to the inherent heterogeneity, imposing an extra characteristic of the algorithm that must be used: adaptability. Here the emerging concept known as hyper-heuristic can be a solution. In fact, having an algorithm where several (meta-)heuristics may be selected under the control of a heuristic has an intrinsic adaptive behavior. This paper presents a hyper-heuristic polymorphic algorithm used to solve the extended cube selection and allocation problem generated in M-OLAP architectures.

  5. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data

    KAUST Repository

    Abusamra, Heba

    2013-05-01

    Microarray technology has enriched the study of gene expression in such a way that scientists are now able to measure the expression levels of thousands of genes in a single experiment. Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification, interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This thesis aims on a comparative study of state-of-the-art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k- nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t- statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used for this study. Different experiments have been applied to compare the performance of the classification methods with and without performing feature selection. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in

  6. An improved chaotic fruit fly optimization based on a mutation strategy for simultaneous feature selection and parameter optimization for SVM and its applications.

    Science.gov (United States)

    Ye, Fei; Lou, Xin Yuan; Sun, Lin Fu

    2017-01-01

    This paper proposes a new support vector machine (SVM) optimization scheme based on an improved chaotic fly optimization algorithm (FOA) with a mutation strategy to simultaneously perform parameter setting turning for the SVM and feature selection. In the improved FOA, the chaotic particle initializes the fruit fly swarm location and replaces the expression of distance for the fruit fly to find the food source. However, the proposed mutation strategy uses two distinct generative mechanisms for new food sources at the osphresis phase, allowing the algorithm procedure to search for the optimal solution in both the whole solution space and within the local solution space containing the fruit fly swarm location. In an evaluation based on a group of ten benchmark problems, the proposed algorithm's performance is compared with that of other well-known algorithms, and the results support the superiority of the proposed algorithm. Moreover, this algorithm is successfully applied in a SVM to perform both parameter setting turning for the SVM and feature selection to solve real-world classification problems. This method is called chaotic fruit fly optimization algorithm (CIFOA)-SVM and has been shown to be a more robust and effective optimization method than other well-known methods, particularly in terms of solving the medical diagnosis problem and the credit card problem.

  7. Emotional textile image classification based on cross-domain convolutional sparse autoencoders with feature selection

    Science.gov (United States)

    Li, Zuhe; Fan, Yangyu; Liu, Weihua; Yu, Zeqi; Wang, Fengqin

    2017-01-01

    We aim to apply sparse autoencoder-based unsupervised feature learning to emotional semantic analysis for textile images. To tackle the problem of limited training data, we present a cross-domain feature learning scheme for emotional textile image classification using convolutional autoencoders. We further propose a correlation-analysis-based feature selection method for the weights learned by sparse autoencoders to reduce the number of features extracted from large size images. First, we randomly collect image patches on an unlabeled image dataset in the source domain and learn local features with a sparse autoencoder. We then conduct feature selection according to the correlation between different weight vectors corresponding to the autoencoder's hidden units. We finally adopt a convolutional neural network including a pooling layer to obtain global feature activations of textile images in the target domain and send these global feature vectors into logistic regression models for emotional image classification. The cross-domain unsupervised feature learning method achieves 65% to 78% average accuracy in the cross-validation experiments corresponding to eight emotional categories and performs better than conventional methods. Feature selection can reduce the computational cost of global feature extraction by about 50% while improving classification performance.

  8. Vibration and acoustic frequency spectra for industrial process modeling using selective fusion multi-condition samples and multi-source features

    Science.gov (United States)

    Tang, Jian; Qiao, Junfei; Wu, ZhiWei; Chai, Tianyou; Zhang, Jian; Yu, Wen

    2018-01-01

    Frequency spectral data of mechanical vibration and acoustic signals relate to difficult-to-measure production quality and quantity parameters of complex industrial processes. A selective ensemble (SEN) algorithm can be used to build a soft sensor model of these process parameters by fusing valued information selectively from different perspectives. However, a combination of several optimized ensemble sub-models with SEN cannot guarantee the best prediction model. In this study, we use several techniques to construct mechanical vibration and acoustic frequency spectra of a data-driven industrial process parameter model based on selective fusion multi-condition samples and multi-source features. Multi-layer SEN (MLSEN) strategy is used to simulate the domain expert cognitive process. Genetic algorithm and kernel partial least squares are used to construct the inside-layer SEN sub-model based on each mechanical vibration and acoustic frequency spectral feature subset. Branch-and-bound and adaptive weighted fusion algorithms are integrated to select and combine outputs of the inside-layer SEN sub-models. Then, the outside-layer SEN is constructed. Thus, "sub-sampling training examples"-based and "manipulating input features"-based ensemble construction methods are integrated, thereby realizing the selective information fusion process based on multi-condition history samples and multi-source input features. This novel approach is applied to a laboratory-scale ball mill grinding process. A comparison with other methods indicates that the proposed MLSEN approach effectively models mechanical vibration and acoustic signals.

  9. Recursive Cluster Elimination (RCE for classification and feature selection from gene expression data

    Directory of Open Access Journals (Sweden)

    Showe Louise C

    2007-05-01

    Full Text Available Abstract Background Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE rather than recursive feature elimination (RFE. We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. Results We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs, a supervised machine learning classification method, to identify and score (rank those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA with recursive feature elimination (SVM-RFE and PDA-RFE are used to remove genes based on their individual discriminant weights. Conclusion SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together

  10. Applying a machine learning model using a locally preserving projection based feature regeneration algorithm to predict breast cancer risk

    Science.gov (United States)

    Heidari, Morteza; Zargari Khuzani, Abolfazl; Danala, Gopichandh; Mirniaharikandehei, Seyedehnafiseh; Qian, Wei; Zheng, Bin

    2018-03-01

    Both conventional and deep machine learning has been used to develop decision-support tools applied in medical imaging informatics. In order to take advantages of both conventional and deep learning approach, this study aims to investigate feasibility of applying a locally preserving projection (LPP) based feature regeneration algorithm to build a new machine learning classifier model to predict short-term breast cancer risk. First, a computer-aided image processing scheme was used to segment and quantify breast fibro-glandular tissue volume. Next, initially computed 44 image features related to the bilateral mammographic tissue density asymmetry were extracted. Then, an LLP-based feature combination method was applied to regenerate a new operational feature vector using a maximal variance approach. Last, a k-nearest neighborhood (KNN) algorithm based machine learning classifier using the LPP-generated new feature vectors was developed to predict breast cancer risk. A testing dataset involving negative mammograms acquired from 500 women was used. Among them, 250 were positive and 250 remained negative in the next subsequent mammography screening. Applying to this dataset, LLP-generated feature vector reduced the number of features from 44 to 4. Using a leave-onecase-out validation method, area under ROC curve produced by the KNN classifier significantly increased from 0.62 to 0.68 (p breast cancer detected in the next subsequent mammography screening.

  11. A Convergent Differential Evolution Algorithm with Hidden Adaptation Selection for Engineering Optimization

    Directory of Open Access Journals (Sweden)

    Zhongbo Hu

    2014-01-01

    Full Text Available Many improved differential Evolution (DE algorithms have emerged as a very competitive class of evolutionary computation more than a decade ago. However, few improved DE algorithms guarantee global convergence in theory. This paper developed a convergent DE algorithm in theory, which employs a self-adaptation scheme for the parameters and two operators, that is, uniform mutation and hidden adaptation selection (haS operators. The parameter self-adaptation and uniform mutation operator enhance the diversity of populations and guarantee ergodicity. The haS can automatically remove some inferior individuals in the process of the enhancing population diversity. The haS controls the proposed algorithm to break the loop of current generation with a small probability. The breaking probability is a hidden adaptation and proportional to the changes of the number of inferior individuals. The proposed algorithm is tested on ten engineering optimization problems taken from IEEE CEC2011.

  12. An algorithm for generation of DEMs from contour lines considering geomorphic features

    Directory of Open Access Journals (Sweden)

    Xiao-Ping Rui

    2016-04-01

    Full Text Available Geomorphic information is omitted from many existing methods of generating gridded digital elevation models (DEMs from contour lines, resulting in significant errors during interpolation. Here, we present an advanced schema for improvement of the comprehensive regionalized method of linear interpolation. This approach uses a moving fitting method for an interpolated point and selects elevation points that are representative of geomorphic features as a whole to improve interpolation quality. A total of 16 points are selected, according to certain criteria, in eight directions surrounding the interpolated point; thus, there are two points in each direction, which is sufficient to provide an accurate representation of the geomorphic features of the DEM. Our method introduces virtual control points to prevent sudden changes in the interpolation results, which helps to overcome problems related to the distortion of the local geospatial distribution in areas where feature geomorphic information is inadequate. We construct the spline interpolation function using intersection points and virtual control points, all of which are applied to compute the point elevation. Moreover, we index all elevation values and spatial points of linear features using the R-tree method to ensure that points related to an interpolated position can be retrieved as quickly as possible. Finally, we test our method using a coal mine elevation dataset. The results confirm that our proposed method can generate DEMs smoothly and, in particular, avoid problems related to local distortion.    Resumen La información geomórfica se omite en muchos de los métodos de generación de Modelos Digitales de Elevación (DEM, en inglés que se elaboran a partir de líneas de contorno, lo que resulta en errores significativos durante la interpolación. En este trabajo se presenta un esquema avanzado para el mejoramiento del método comprensivo regionalizado de interpolación lineal. Esta

  13. Hybrid Model Based on Genetic Algorithms and SVM Applied to Variable Selection within Fruit Juice Classification

    Directory of Open Access Journals (Sweden)

    C. Fernandez-Lozano

    2013-01-01

    Full Text Available Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM. Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA, the most representative variables for a specific classification problem can be selected.

  14. Multi-level gene/MiRNA feature selection using deep belief nets and active learning.

    Science.gov (United States)

    Ibrahim, Rania; Yousri, Noha A; Ismail, Mohamed A; El-Makky, Nagwa M

    2014-01-01

    Selecting the most discriminative genes/miRNAs has been raised as an important task in bioinformatics to enhance disease classifiers and to mitigate the dimensionality curse problem. Original feature selection methods choose genes/miRNAs based on their individual features regardless of how they perform together. Considering group features instead of individual ones provides a better view for selecting the most informative genes/miRNAs. Recently, deep learning has proven its ability in representing the data in multiple levels of abstraction, allowing for better discrimination between different classes. However, the idea of using deep learning for feature selection is not widely used in the bioinformatics field yet. In this paper, a novel multi-level feature selection approach named MLFS is proposed for selecting genes/miRNAs based on expression profiles. The approach is based on both deep and active learning. Moreover, an extension to use the technique for miRNAs is presented by considering the biological relation between miRNAs and genes. Experimental results show that the approach was able to outperform classical feature selection methods in hepatocellular carcinoma (HCC) by 9%, lung cancer by 6% and breast cancer by around 10% in F1-measure. Results also show the enhancement in F1-measure of our approach over recently related work in [1] and [2].

  15. A Genetic Algorithm-based Antenna Selection Approach for Large-but-Finite MIMO Networks

    KAUST Repository

    Makki, Behrooz

    2016-12-29

    We study the performance of antenna selectionbased multiple-input-multiple-output (MIMO) networks with large but finite number of transmit antennas and receivers. Considering the continuous and bursty communication scenarios with different users’ data request probabilities, we develop an efficient antenna selection scheme using genetic algorithms (GA). As demonstrated, the proposed algorithm is generic in the sense that it can be used in the cases with different objective functions, precoding methods, levels of available channel state information and channel models. Our results show that the proposed GAbased algorithm reaches (almost) the same throughput as the exhaustive search-based optimal approach, with substantially less implementation complexity.

  16. A Genetic Algorithm-based Antenna Selection Approach for Large-but-Finite MIMO Networks

    KAUST Repository

    Makki, Behrooz; Ide, Anatole; Svensson, Tommy; Eriksson, Thomas; Alouini, Mohamed-Slim

    2016-01-01

    We study the performance of antenna selectionbased multiple-input-multiple-output (MIMO) networks with large but finite number of transmit antennas and receivers. Considering the continuous and bursty communication scenarios with different users’ data request probabilities, we develop an efficient antenna selection scheme using genetic algorithms (GA). As demonstrated, the proposed algorithm is generic in the sense that it can be used in the cases with different objective functions, precoding methods, levels of available channel state information and channel models. Our results show that the proposed GAbased algorithm reaches (almost) the same throughput as the exhaustive search-based optimal approach, with substantially less implementation complexity.

  17. Hitting times of local and global optima in genetic algorithms with very high selection pressure

    Directory of Open Access Journals (Sweden)

    Eremeev Anton V.

    2017-01-01

    Full Text Available The paper is devoted to upper bounds on the expected first hitting times of the sets of local or global optima for non-elitist genetic algorithms with very high selection pressure. The results of this paper extend the range of situations where the upper bounds on the expected runtime are known for genetic algorithms and apply, in particular, to the Canonical Genetic Algorithm. The obtained bounds do not require the probability of fitness-decreasing mutation to be bounded by a constant which is less than one.

  18. A Quantum Hybrid PSO Combined with Fuzzy k-NN Approach to Feature Selection and Cell Classification in Cervical Cancer Detection

    Directory of Open Access Journals (Sweden)

    Abdullah M. Iliyasu

    2017-12-01

    Full Text Available A quantum hybrid (QH intelligent approach that blends the adaptive search capability of the quantum-behaved particle swarm optimisation (QPSO method with the intuitionistic rationality of traditional fuzzy k-nearest neighbours (Fuzzy k-NN algorithm (known simply as the Q-Fuzzy approach is proposed for efficient feature selection and classification of cells in cervical smeared (CS images. From an initial multitude of 17 features describing the geometry, colour, and texture of the CS images, the QPSO stage of our proposed technique is used to select the best subset features (i.e., global best particles that represent a pruned down collection of seven features. Using a dataset of almost 1000 images, performance evaluation of our proposed Q-Fuzzy approach assesses the impact of our feature selection on classification accuracy by way of three experimental scenarios that are compared alongside two other approaches: the All-features (i.e., classification without prior feature selection and another hybrid technique combining the standard PSO algorithm with the Fuzzy k-NN technique (P-Fuzzy approach. In the first and second scenarios, we further divided the assessment criteria in terms of classification accuracy based on the choice of best features and those in terms of the different categories of the cervical cells. In the third scenario, we introduced new QH hybrid techniques, i.e., QPSO combined with other supervised learning methods, and compared the classification accuracy alongside our proposed Q-Fuzzy approach. Furthermore, we employed statistical approaches to establish qualitative agreement with regards to the feature selection in the experimental scenarios 1 and 3. The synergy between the QPSO and Fuzzy k-NN in the proposed Q-Fuzzy approach improves classification accuracy as manifest in the reduction in number cell features, which is crucial for effective cervical cancer detection and diagnosis.

  19. Log-linear model based behavior selection method for artificial fish swarm algorithm.

    Science.gov (United States)

    Huang, Zhehuang; Chen, Yidong

    2015-01-01

    Artificial fish swarm algorithm (AFSA) is a population based optimization technique inspired by social behavior of fishes. In past several years, AFSA has been successfully applied in many research and application areas. The behavior of fishes has a crucial impact on the performance of AFSA, such as global exploration ability and convergence speed. How to construct and select behaviors of fishes are an important task. To solve these problems, an improved artificial fish swarm algorithm based on log-linear model is proposed and implemented in this paper. There are three main works. Firstly, we proposed a new behavior selection algorithm based on log-linear model which can enhance decision making ability of behavior selection. Secondly, adaptive movement behavior based on adaptive weight is presented, which can dynamically adjust according to the diversity of fishes. Finally, some new behaviors are defined and introduced into artificial fish swarm algorithm at the first time to improve global optimization capability. The experiments on high dimensional function optimization showed that the improved algorithm has more powerful global exploration ability and reasonable convergence speed compared with the standard artificial fish swarm algorithm.

  20. Log-Linear Model Based Behavior Selection Method for Artificial Fish Swarm Algorithm

    Directory of Open Access Journals (Sweden)

    Zhehuang Huang

    2015-01-01

    Full Text Available Artificial fish swarm algorithm (AFSA is a population based optimization technique inspired by social behavior of fishes. In past several years, AFSA has been successfully applied in many research and application areas. The behavior of fishes has a crucial impact on the performance of AFSA, such as global exploration ability and convergence speed. How to construct and select behaviors of fishes are an important task. To solve these problems, an improved artificial fish swarm algorithm based on log-linear model is proposed and implemented in this paper. There are three main works. Firstly, we proposed a new behavior selection algorithm based on log-linear model which can enhance decision making ability of behavior selection. Secondly, adaptive movement behavior based on adaptive weight is presented, which can dynamically adjust according to the diversity of fishes. Finally, some new behaviors are defined and introduced into artificial fish swarm algorithm at the first time to improve global optimization capability. The experiments on high dimensional function optimization showed that the improved algorithm has more powerful global exploration ability and reasonable convergence speed compared with the standard artificial fish swarm algorithm.

  1. Hybrid nested sampling algorithm for Bayesian model selection applied to inverse subsurface flow problems

    KAUST Repository

    Elsheikh, Ahmed H.

    2014-02-01

    A Hybrid Nested Sampling (HNS) algorithm is proposed for efficient Bayesian model calibration and prior model selection. The proposed algorithm combines, Nested Sampling (NS) algorithm, Hybrid Monte Carlo (HMC) sampling and gradient estimation using Stochastic Ensemble Method (SEM). NS is an efficient sampling algorithm that can be used for Bayesian calibration and estimating the Bayesian evidence for prior model selection. Nested sampling has the advantage of computational feasibility. Within the nested sampling algorithm, a constrained sampling step is performed. For this step, we utilize HMC to reduce the correlation between successive sampled states. HMC relies on the gradient of the logarithm of the posterior distribution, which we estimate using a stochastic