WorldWideScience

Sample records for two-stage feature selection

  1. Robust prediction of B-factor profile from sequence using two-stage SVR based on random forest feature selection.

    Science.gov (United States)

    Pan, Xiao-Yong; Shen, Hong-Bin

    2009-01-01

    B-factor is highly correlated with protein internal motion, which is used to measure the uncertainty in the position of an atom within a crystal structure. Although the rapid progress of structural biology in recent years makes more accurate protein structures available than ever, with the avalanche of new protein sequences emerging during the post-genomic Era, the gap between the known protein sequences and the known protein structures becomes wider and wider. It is urgent to develop automated methods to predict B-factor profile from the amino acid sequences directly, so as to be able to timely utilize them for basic research. In this article, we propose a novel approach, called PredBF, to predict the real value of B-factor. We firstly extract both global and local features from the protein sequences as well as their evolution information, then the random forests feature selection is applied to rank their importance and the most important features are inputted to a two-stage support vector regression (SVR) for prediction, where the initial predicted outputs from the 1(st) SVR are further inputted to the 2nd layer SVR for final refinement. Our results have revealed that a systematic analysis of the importance of different features makes us have deep insights into the different contributions of features and is very necessary for developing effective B-factor prediction tools. The two-layer SVR prediction model designed in this study further enhanced the robustness of predicting the B-factor profile. As a web server, PredBF is freely available at: http://www.csbio.sjtu.edu.cn/bioinf/PredBF for academic use.

  2. Two-stage Security Controls Selection

    NARCIS (Netherlands)

    Yevseyeva, I.; Basto, Fernandes V.; Moorsel, van A.; Janicke, H.; Michael, Emmerich T. M.

    2016-01-01

    To protect a system from potential cyber security breaches and attacks, one needs to select efficient security controls, taking into account technical and institutional goals and constraints, such as available budget, enterprise activity, internal and external environment. Here we model the security

  3. Two-Stage Fuzzy Portfolio Selection Problem with Transaction Costs

    Directory of Open Access Journals (Sweden)

    Yanju Chen

    2015-01-01

    Full Text Available This paper studies a two-period portfolio selection problem. The problem is formulated as a two-stage fuzzy portfolio selection model with transaction costs, in which the future returns of risky security are characterized by possibility distributions. The objective of the proposed model is to achieve the maximum utility in terms of the expected value and variance of the final wealth. Given the first-stage decision vector and a realization of fuzzy return, the optimal value expression of the second-stage programming problem is derived. As a result, the proposed two-stage model is equivalent to a single-stage model, and the analytical optimal solution of the two-stage model is obtained, which helps us to discuss the properties of the optimal solution. Finally, some numerical experiments are performed to demonstrate the new modeling idea and the effectiveness. The computational results provided by the proposed model show that the more risk-averse investor will invest more wealth in the risk-free security. They also show that the optimal invested amount in risky security increases as the risk-free return decreases and the optimal utility increases as the risk-free return increases, whereas the optimal utility increases as the transaction costs decrease. In most instances the utilities provided by the proposed two-stage model are larger than those provided by the single-stage model.

  4. Classification of brain disease in magnetic resonance images using two-stage local feature fusion

    Science.gov (United States)

    Li, Tao; Li, Wu; Yang, Yehui

    2017-01-01

    Background Many classification methods have been proposed based on magnetic resonance images. Most methods rely on measures such as volume, the cerebral cortical thickness and grey matter density. These measures are susceptible to the performance of registration and limited in representation of anatomical structure. This paper proposes a two-stage local feature fusion method, in which deformable registration is not desired and anatomical information is represented from moderate scale. Methods Keypoints are firstly extracted from scale-space to represent anatomical structure. Then, two kinds of local features are calculated around the keypoints, one for correspondence and the other for representation. Scores are assigned for keypoints to quantify their effect in classification. The sum of scores for all effective keypoints is used to determine which group the test subject belongs to. Results We apply this method to magnetic resonance images of Alzheimer's disease and Parkinson's disease. The advantage of local feature in correspondence and representation contributes to the final classification. With the help of local feature (Scale Invariant Feature Transform, SIFT) in correspondence, the performance becomes better. Local feature (Histogram of Oriented Gradient, HOG) extracted from 16×16 cell block obtains better results compared with 4×4 and 8×8 cell block. Discussion This paper presents a method which combines the effect of SIFT descriptor in correspondence and the representation ability of HOG descriptor in anatomical structure. This method has the potential in distinguishing patients with brain disease from controls. PMID:28207873

  5. Selective capsulotomies of the expanded breast as a remodelling method in two-stage breast reconstruction.

    Science.gov (United States)

    Grimaldi, Luca; Campana, Matteo; Brandi, Cesare; Nisi, Giuseppe; Brafa, Anna; Calabrò, Massimiliano; D'Aniello, Carlo

    2013-06-01

    The two-stage breast reconstruction with tissue expander and prosthesis is nowadays a common method for achieving a satisfactory appearance in selected patients who had a mastectomy, but its most common aesthetic drawback is represented by an excessive volumetric increment of the superior half of the reconstructed breast, with a convexity of the profile in that area. A possible solution to limit this effect, and to fulfil the inferior pole, may be obtained by reducing the inferior tissue resistance by means of capsulotomies. This study reports the effects of various types of capsulotomies, performed in 72 patients after removal of the mammary expander, with the aim of emphasising the convexity of the inferior mammary aspect in the expanded breast. According to each kind of desired modification, possible solutions are described. On the basis of subjective and objective evaluations, an overall high degree of satisfaction has been evidenced. The described selective capsulotomies, when properly carried out, may significantly improve the aesthetic results in two-stage reconstructed breasts, with no additional scars, with minimal risks, and with little lengthening of the surgical time.

  6. Two-stage atlas subset selection in multi-atlas based image segmentation

    Energy Technology Data Exchange (ETDEWEB)

    Zhao, Tingting, E-mail: tingtingzhao@mednet.ucla.edu; Ruan, Dan, E-mail: druan@mednet.ucla.edu [The Department of Radiation Oncology, University of California, Los Angeles, California 90095 (United States)

    2015-06-15

    Purpose: Fast growing access to large databases and cloud stored data presents a unique opportunity for multi-atlas based image segmentation and also presents challenges in heterogeneous atlas quality and computation burden. This work aims to develop a novel two-stage method tailored to the special needs in the face of large atlas collection with varied quality, so that high-accuracy segmentation can be achieved with low computational cost. Methods: An atlas subset selection scheme is proposed to substitute a significant portion of the computationally expensive full-fledged registration in the conventional scheme with a low-cost alternative. More specifically, the authors introduce a two-stage atlas subset selection method. In the first stage, an augmented subset is obtained based on a low-cost registration configuration and a preliminary relevance metric; in the second stage, the subset is further narrowed down to a fusion set of desired size, based on full-fledged registration and a refined relevance metric. An inference model is developed to characterize the relationship between the preliminary and refined relevance metrics, and a proper augmented subset size is derived to ensure that the desired atlases survive the preliminary selection with high probability. Results: The performance of the proposed scheme has been assessed with cross validation based on two clinical datasets consisting of manually segmented prostate and brain magnetic resonance images, respectively. The proposed scheme demonstrates comparable end-to-end segmentation performance as the conventional single-stage selection method, but with significant computation reduction. Compared with the alternative computation reduction method, their scheme improves the mean and medium Dice similarity coefficient value from (0.74, 0.78) to (0.83, 0.85) and from (0.82, 0.84) to (0.95, 0.95) for prostate and corpus callosum segmentation, respectively, with statistical significance. Conclusions: The authors

  7. TSCC: Two-Stage Combinatorial Clustering for virtual screening using protein-ligand interactions and physicochemical features

    Science.gov (United States)

    2010-01-01

    Background The increasing numbers of 3D compounds and protein complexes stored in databases contribute greatly to current advances in biotechnology, being employed in several pharmaceutical and industrial applications. However, screening and retrieving appropriate candidates as well as handling false positives presents a challenge for all post-screening analysis methods employed in retrieving therapeutic and industrial targets. Results Using the TSCC method, virtually screened compounds were clustered based on their protein-ligand interactions, followed by structure clustering employing physicochemical features, to retrieve the final compounds. Based on the protein-ligand interaction profile (first stage), docked compounds can be clustered into groups with distinct binding interactions. Structure clustering (second stage) grouped similar compounds obtained from the first stage into clusters of similar structures; the lowest energy compound from each cluster being selected as a final candidate. Conclusion By representing interactions at the atomic-level and including measures of interaction strength, better descriptions of protein-ligand interactions and a more specific analysis of virtual screening was achieved. The two-stage clustering approach enhanced our post-screening analysis resulting in accurate performances in clustering, mining and visualizing compound candidates, thus, improving virtual screening enrichment. PMID:21143810

  8. Prediction of syngas quality for two-stage gasification of selected waste feedstocks.

    Science.gov (United States)

    De Filippis, Paolo; Borgianni, Carlo; Paolucci, Martino; Pochetti, Fausto

    2004-01-01

    This paper compares the syngas produced from methane with the syngas obtained from the gasification, in a two-stage reactor, of various waste feedstocks. The syngas composition and the gasification conditions were simulated using a simple thermodynamic model. The waste feedstocks considered are: landfill gas, waste oil, municipal solid waste (MSW) typical of a low-income country, the same MSW blended with landfill gas, refuse derived fuel (RDF) made from the same MSW, the same RDF blended with waste oil and a MSW typical of a high-income country. Energy content, the sum of H2 and CO gas percentages, and the ratio of H2 to CO are considered as measures of syngas quality. The simulation shows that landfill gas gives the best results in terms of both H2+CO and H2/CO, and that the MSW of low-income countries can be expected to provide inferior syngas on all three quality measures. Co-gasification of the MSW from low-income countries with landfill gas, and the mixture of waste oil with RDF from low-income MSW are considered as options to improve gas quality.

  9. SU-E-J-128: Two-Stage Atlas Selection in Multi-Atlas-Based Image Segmentation

    Energy Technology Data Exchange (ETDEWEB)

    Zhao, T; Ruan, D [UCLA School of Medicine, Los Angeles, CA (United States)

    2015-06-15

    Purpose: In the new era of big data, multi-atlas-based image segmentation is challenged by heterogeneous atlas quality and high computation burden from extensive atlas collection, demanding efficient identification of the most relevant atlases. This study aims to develop a two-stage atlas selection scheme to achieve computational economy with performance guarantee. Methods: We develop a low-cost fusion set selection scheme by introducing a preliminary selection to trim full atlas collection into an augmented subset, alleviating the need for extensive full-fledged registrations. More specifically, fusion set selection is performed in two successive steps: preliminary selection and refinement. An augmented subset is first roughly selected from the whole atlas collection with a simple registration scheme and the corresponding preliminary relevance metric; the augmented subset is further refined into the desired fusion set size, using full-fledged registration and the associated relevance metric. The main novelty of this work is the introduction of an inference model to relate the preliminary and refined relevance metrics, based on which the augmented subset size is rigorously derived to ensure the desired atlases survive the preliminary selection with high probability. Results: The performance and complexity of the proposed two-stage atlas selection method were assessed using a collection of 30 prostate MR images. It achieved comparable segmentation accuracy as the conventional one-stage method with full-fledged registration, but significantly reduced computation time to 1/3 (from 30.82 to 11.04 min per segmentation). Compared with alternative one-stage cost-saving approach, the proposed scheme yielded superior performance with mean and medium DSC of (0.83, 0.85) compared to (0.74, 0.78). Conclusion: This work has developed a model-guided two-stage atlas selection scheme to achieve significant cost reduction while guaranteeing high segmentation accuracy. The benefit

  10. HOSPITAL SITE SELECTION USING TWO-STAGE FUZZY MULTI-CRITERIA DECISION MAKING PROCESS

    Directory of Open Access Journals (Sweden)

    Ali Soltani

    2011-01-01

    Full Text Available Site selection for sitting of urban activities/facilities is one of the crucial policy-related decisions taken by urban planners and policy makers. The process of site selection is inherently complicated. A careless site imposes exorbitant costs on city budget and damages the environment inevitably. Nowadays, multi-attributes decision making approaches are suggested to use to improve precision of decision making and reduce surplus side effects. Two well-known techniques, analytical hierarchal process and analytical network process are among multi-criteria decision making systems which can easily be consistent with both quantitative and qualitative criteria. These are also developed to be fuzzy analytical hierarchal process and fuzzy analytical network process systems which are capable of accommodating inherent uncertainty and vagueness in multi-criteria decision-making. This paper reports the process and results of a hospital site selection within the Region 5 of Shiraz metropolitan area, Iran using integrated fuzzy analytical network process systems with Geographic Information System (GIS. The weights of the alternatives were calculated using fuzzy analytical network process. Then a sensitivity analysis was conducted to measure the elasticity of a decision in regards to different criteria. This study contributes to planning practice by suggesting a more comprehensive decision making tool for site selection.

  11. HOSPITAL SITE SELECTION USING TWO-STAGE FUZZY MULTI-CRITERIA DECISION MAKING PROCESS

    Directory of Open Access Journals (Sweden)

    Ali Soltani

    2011-06-01

    Full Text Available Site selection for sitting of urban activities/facilities is one of the crucial policy-related decisions taken by urban planners and policy makers. The process of site selection is inherently complicated. A careless site imposes exorbitant costs on city budget and damages the environment inevitably. Nowadays, multi-attributes decision making approaches are suggested to use to improve precision of decision making and reduce surplus side effects. Two well-known techniques, analytical hierarchal process and analytical network process are among multi-criteria decision making systems which can easily be consistent with both quantitative and qualitative criteria. These are also developed to be fuzzy analytical hierarchal process and fuzzy analytical network process systems which are capable of accommodating inherent uncertainty and vagueness in multi-criteria decision-making. This paper reports the process and results of a hospital site selection within the Region 5 of Shiraz metropolitan area, Iran using integrated fuzzy analytical network process systems with Geographic Information System (GIS. The weights of the alternatives were calculated using fuzzy analytical network process. Then a sensitivity analysis was conducted to measure the elasticity of a decision in regards to different criteria. This study contributes to planning practice by suggesting a more comprehensive decision making tool for site selection.

  12. Unsupervised Feature Subset Selection

    DEFF Research Database (Denmark)

    Søndberg-Madsen, Nicolaj; Thomsen, C.; Pena, Jose

    2003-01-01

    This paper studies filter and hybrid filter-wrapper feature subset selection for unsupervised learning (data clustering). We constrain the search for the best feature subset by scoring the dependence of every feature on the rest of the features, conjecturing that these scores discriminate some...... irrelevant features. We report experimental results on artificial and real data for unsupervised learning of naive Bayes models. Both the filter and hybrid approaches perform satisfactorily....

  13. Accounting for selection and correlation in the analysis of two-stage genome-wide association studies.

    Science.gov (United States)

    Robertson, David S; Prevost, A Toby; Bowden, Jack

    2016-10-01

    The problem of selection bias has long been recognized in the analysis of two-stage trials, where promising candidates are selected in stage 1 for confirmatory analysis in stage 2. To efficiently correct for bias, uniformly minimum variance conditionally unbiased estimators (UMVCUEs) have been proposed for a wide variety of trial settings, but where the population parameter estimates are assumed to be independent. We relax this assumption and derive the UMVCUE in the multivariate normal setting with an arbitrary known covariance structure. One area of application is the estimation of odds ratios (ORs) when combining a genome-wide scan with a replication study. Our framework explicitly accounts for correlated single nucleotide polymorphisms, as might occur due to linkage disequilibrium. We illustrate our approach on the measurement of the association between 11 genetic variants and the risk of Crohn's disease, as reported in Parkes and others (2007. Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility. Nat. Gen. 39: (7), 830-832.), and show that the estimated ORs can vary substantially if both selection and correlation are taken into account.

  14. Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification.

    Science.gov (United States)

    Elyasigomari, V; Lee, D A; Screen, H R C; Shaheed, M H

    2017-03-01

    For each cancer type, only a few genes are informative. Due to the so-called 'curse of dimensionality' problem, the gene selection task remains a challenge. To overcome this problem, we propose a two-stage gene selection method called MRMR-COA-HS. In the first stage, the minimum redundancy and maximum relevance (MRMR) feature selection is used to select a subset of relevant genes. The selected genes are then fed into a wrapper setup that combines a new algorithm, COA-HS, using the support vector machine as a classifier. The method was applied to four microarray datasets, and the performance was assessed by the leave one out cross-validation method. Comparative performance assessment of the proposed method with other evolutionary algorithms suggested that the proposed algorithm significantly outperforms other methods in selecting a fewer number of genes while maintaining the highest classification accuracy. The functions of the selected genes were further investigated, and it was confirmed that the selected genes are biologically relevant to each cancer type.

  15. Drop-out between the two liver resections of two-stage hepatectomy. Patient selection or loss of chance?

    Science.gov (United States)

    Viganò, L; Torzilli, G; Cimino, M; Imai, K; Vibert, E; Donadon, M; Castaing, D; Adam, R

    2016-09-01

    Two-stage hepatectomy (TSH) is the present standard for multiple bilobar colorectal liver metastases (CLM), but 25-35% of patients fail to complete the scheduled procedure (drop-out). To elucidate if drop-out of TSH is a patient selection (as usually considered) or a loss of chance. All the consecutive patients scheduled for a TSH at the Paul Brousse Hospital between 2000 and 2012 were considered. TSH patients were matched 1:1 with patients receiving a one-stage ultrasound-guided hepatectomy (OSH) at the Humanitas Research Hospital in the same period. Matching criteria were: primary tumor N status; timing of CLM diagnosis; CLM number and distribution into the liver. Sixty-three pairs of patients were analyzed. Demographic and tumor characteristics were similar (median 7 CLM), except for more chemotherapy lines and adjuvant chemotherapy in TSH. Drop-out rate of TSH was 38.1% (0% of OSH). The two groups had similar R0 resection rate (19.0% OSH vs. 15.9% TSH). OSH and completed TSH had similar five-year survival (from CLM diagnosis 49.8% vs. 49.7%, from liver resection 36.1% vs. 44.3%), superior to drop-out (10% three-year survival, p drop-out vs. OSH/completed TSH) was the only independent prognostic factor (p = 0.003). Drop-out of TSH could be a loss of chance rather than a criteria for patient selection. "Unselected" OSH patients had the same outcomes of selected patients who completed TSH. A complete resection is the main determinant of prognosis. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Selective Leaching of Vanadium from Roasted Stone Coal by Dilute Sulfuric Acid Dephosphorization-Two-Stage Pressure Acid Leaching

    Directory of Open Access Journals (Sweden)

    Jun Huang

    2016-07-01

    Full Text Available A novel staged leaching process has been reported in this paper to selectively extract vanadium from roasted stone coal and the mechanisms have been clarified. Results showed that the leaching efficiency of V, Al, P and Fe was 80.46%, 12.24%, 0.67% and 3.12%, respectively, under the optimum dilute sulfuric acid dephosphorization (DSAD-two-stage pressure acid leaching (PAL conditions. The efficient separation of V from Fe, Al and P was realized. As apatite could be leached more easily than mica, the apatite could completely react with sulfuric acid, while the mica had almost no change in the DSAD process, which was the key aspect in realizing the effective separation of V from P. Similarly, the hydrolyzation of Fe and Al could be initiated more easily than that of V by decreasing the residual acid of leachate. The alunite and iron-sulphate compound generated in the first-stage PAL process resulted in the effective separation of V from Fe and Al.

  17. Feature selection in bioinformatics

    Science.gov (United States)

    Wang, Lipo

    2012-06-01

    In bioinformatics, there are often a large number of input features. For example, there are millions of single nucleotide polymorphisms (SNPs) that are genetic variations which determine the dierence between any two unrelated individuals. In microarrays, thousands of genes can be proled in each test. It is important to nd out which input features (e.g., SNPs or genes) are useful in classication of a certain group of people or diagnosis of a given disease. In this paper, we investigate some powerful feature selection techniques and apply them to problems in bioinformatics. We are able to identify a very small number of input features sucient for tasks at hand and we demonstrate this with some real-world data.

  18. Direct and two-stage data analysis procedures based on PCA, PLS-DA and ANN for ISE-based electronic tongue-Effect of supervised feature extraction.

    Science.gov (United States)

    Ciosek, P; Brzózka, Z; Wróblewski, W; Martinelli, E; Di Natale, C; D'Amico, A

    2005-09-15

    A novel strategy of data analysis for artificial taste and odour systems is presented in this work. It is demonstrated that using a supervised method also in feature extraction phase enhances fruit juice classification capability of sensor array developed at Warsaw University of Technology. Comparison of direct processing (raw data processed by Artificial Neural Network (ANN), raw data processed by Partial Least Squares-Discriminant Analysis (PLS-DA)) and two-stage processing (Principal Components Analysis (PCA) outputs processed by ANN, PLS-DA outputs processed by ANN) is presented. It is shown that considerable increase of classification capability occurred in the case of the new method proposed by the authors.

  19. Online feature selection with streaming features.

    Science.gov (United States)

    Wu, Xindong; Yu, Kui; Ding, Wei; Wang, Hao; Zhu, Xingquan

    2013-05-01

    We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.

  20. A comprehensive study of task coalescing for selecting parallelism granularity in a two-stage bidiagonal reduction

    KAUST Repository

    Haidar, Azzam

    2012-05-01

    We present new high performance numerical kernels combined with advanced optimization techniques that significantly increase the performance of parallel bidiagonal reduction. Our approach is based on developing efficient fine-grained computational tasks as well as reducing overheads associated with their high-level scheduling during the so-called bulge chasing procedure that is an essential phase of a scalable bidiagonalization procedure. In essence, we coalesce multiple tasks in a way that reduces the time needed to switch execution context between the scheduler and useful computational tasks. At the same time, we maintain the crucial information about the tasks and their data dependencies between the coalescing groups. This is the necessary condition to preserve numerical correctness of the computation. We show our annihilation strategy based on multiple applications of single orthogonal reflectors. Despite non-trivial characteristics in computational complexity and memory access patterns, our optimization approach smoothly applies to the annihilation scenario. The coalescing positively influences another equally important aspect of the bulge chasing stage: the memory reuse. For the tasks within the coalescing groups, the data is retained in high levels of the cache hierarchy and, as a consequence, operations that are normally memory-bound increase their ratio of computation to off-chip communication and become compute-bound which renders them amenable to efficient execution on multicore architectures. The performance for the new two-stage bidiagonal reduction is staggering. Our implementation results in up to 50-fold and 12-fold improvement (∼130 Gflop/s) compared to the equivalent routines from LAPACK V3.2 and Intel MKL V10.3, respectively, on an eight socket hexa-core AMD Opteron multicore shared-memory system with a matrix size of 24000 x 24000. Last but not least, we provide a comprehensive study on the impact of the coalescing group size in terms of cache

  1. Feature Selection: Algorithms and Challenges

    Institute of Scientific and Technical Information of China (English)

    Xindong Wu; Yanglan Gan; Hao Wang; Xuegang Hu

    2006-01-01

    Feature selection is an active area in data mining research and development. It consists of efforts and contributions from a wide variety of communities, including statistics, machine learning, and pattern recognition. The diversity, on one hand, equips us with many methods and tools. On the other hand, the profusion of options causes confusion. This paper reviews various feature selection methods and identifies research challenges that are at the forefront of this exciting area.

  2. A curriculum-based approach for feature selection

    Science.gov (United States)

    Kalavala, Deepthi; Bhagvati, Chakravarthy

    2017-06-01

    Curriculum learning is a learning technique in which a classifier learns from easy samples first and then from increasingly difficult samples. On similar lines, a curriculum based feature selection framework is proposed for identifying most useful features in a dataset. Given a dataset, first, easy and difficult samples are identified. In general, the number of easy samples is assumed larger than difficult samples. Then, feature selection is done in two stages. In the first stage a fast feature selection method which gives feature scores is used. Feature scores are then updated incrementally with the set of difficult samples. The existing feature selection methods are not incremental in nature; entire data needs to be used in feature selection. The use of curriculum learning is expected to decrease the time needed for feature selection with classification accuracy comparable to the existing methods. Curriculum learning also allows incremental refinements in feature selection as new training samples become available. Our experiments on a number of standard datasets demonstrate that feature selection is indeed faster without sacrificing classification accuracy.

  3. Two-stage AMPA receptor trafficking in classical conditioning and selective role for glutamate receptor subunit 4 (tGluA4) flop splice variant.

    Science.gov (United States)

    Zheng, Zhaoqing; Sabirzhanov, Boris; Keifer, Joyce

    2012-07-01

    Previously, we proposed a two-stage model for an in vitro neural correlate of eyeblink classical conditioning involving the initial synaptic incorporation of glutamate receptor A1 (GluA1)-containing α-amino-3-hydroxy-5-methylisoxazole-4-propionic acid type receptors (AMPARs) followed by delivery of GluA4-containing AMPARs that support acquisition of conditioned responses. To test specific elements of our model for conditioning, selective knockdown of GluA4 AMPAR subunits was used using small-interfering RNAs (siRNAs). Recently, we sequenced and characterized the GluA4 subunit and its splice variants from pond turtles, Trachemys scripta elegans (tGluA4). Analysis of the relative abundance of mRNA expression by real-time RT-PCR showed that the flip/flop variants of tGluA4, tGluA4c, and a novel truncated variant tGluA4trc1 are major isoforms in the turtle brain. Here, transfection of in vitro brain stem preparations with anti-tGluA4 siRNA suppressed conditioning, tGluA4 mRNA and protein expression, and synaptic delivery of tGluA4-containing AMPARs but not tGluA1 subunits. Significantly, transfection of abducens motor neurons by nerve injections of tGluA4 flop rescue plasmid prior to anti-tGluA4 siRNA application restored conditioning and synaptic incorporation of tGluA4-containing AMPARs. In contrast, treatment with rescue plasmids for tGluA4 flip or tGluA4trc1 failed to rescue conditioning. Finally, treatment with a siRNA directed against GluA1 subunits inhibited conditioning and synaptic delivery of tGluA1-containing AMPARs and importantly, those containing tGluA4. These data strongly support our two-stage model of conditioning and our hypothesis that synaptic incorporation of tGluA4-containing AMPARs underlies the acquisition of in vitro classical conditioning. Furthermore, they suggest that tGluA4 flop may have a critical role in conditioning mechanisms compared with the other tGluA4 splice variants.

  4. Feature Selection and Effective Classifiers.

    Science.gov (United States)

    Deogun, Jitender S.; Choubey, Suresh K.; Raghavan, Vijay V.; Sever, Hayri

    1998-01-01

    Develops and analyzes four algorithms for feature selection in the context of rough set methodology. Experimental results confirm the expected relationship between the time complexity of these algorithms and the classification accuracy of the resulting upper classifiers. When compared, results of upper classifiers perform better than lower…

  5. Feature selection for portfolio optimization

    DEFF Research Database (Denmark)

    Bjerring, Thomas Trier; Ross, Omri; Weissensteiner, Alex

    2016-01-01

    Most portfolio selection rules based on the sample mean and covariance matrix perform poorly out-of-sample. Moreover, there is a growing body of evidence that such optimization rules are not able to beat simple rules of thumb, such as 1/N. Parameter uncertainty has been identified as one major...... reason for these findings. A strand of literature addresses this problem by improving the parameter estimation and/or by relying on more robust portfolio selection methods. Independent of the chosen portfolio selection rule, we propose using feature selection first in order to reduce the asset menu....... While most of the diversification benefits are preserved, the parameter estimation problem is alleviated. We conduct out-of-sample back-tests to show that in most cases different well-established portfolio selection rules applied on the reduced asset universe are able to improve alpha relative...

  6. Feature selection for portfolio optimization

    DEFF Research Database (Denmark)

    Bjerring, Thomas Trier; Ross, Omri; Weissensteiner, Alex

    2016-01-01

    Most portfolio selection rules based on the sample mean and covariance matrix perform poorly out-of-sample. Moreover, there is a growing body of evidence that such optimization rules are not able to beat simple rules of thumb, such as 1/N. Parameter uncertainty has been identified as one major...... reason for these findings. A strand of literature addresses this problem by improving the parameter estimation and/or by relying on more robust portfolio selection methods. Independent of the chosen portfolio selection rule, we propose using feature selection first in order to reduce the asset menu....... While most of the diversification benefits are preserved, the parameter estimation problem is alleviated. We conduct out-of-sample back-tests to show that in most cases different well-established portfolio selection rules applied on the reduced asset universe are able to improve alpha relative...

  7. Feature Selection in Scientific Applications

    Energy Technology Data Exchange (ETDEWEB)

    Cantu-Paz, E; Newsam, S; Kamath, C

    2004-02-27

    Numerous applications of data mining to scientific data involve the induction of a classification model. In many cases, the collection of data is not performed with this task in mind, and therefore, the data might contain irrelevant or redundant features that affect negatively the accuracy of the induction algorithms. The size and dimensionality of typical scientific data make it difficult to use any available domain information to identify features that discriminate between the classes of interest. Similarly, exploratory data analysis techniques have limitations on the amount and dimensionality of the data that can be effectively processed. In this paper, we describe applications of efficient feature selection methods to data sets from astronomy, plasma physics, and remote sensing. We use variations of recently proposed filter methods as well as traditional wrapper approaches where practical. We discuss the importance of these applications, the general challenges of feature selection in scientific datasets, the strategies for success that were common among our diverse applications, and the lessons learned in solving these problems.

  8. CBFS: high performance feature selection algorithm based on feature clearness.

    Directory of Open Access Journals (Sweden)

    Minseok Seo

    Full Text Available BACKGROUND: The goal of feature selection is to select useful features and simultaneously exclude garbage features from a given dataset for classification purposes. This is expected to bring reduction of processing time and improvement of classification accuracy. METHODOLOGY: In this study, we devised a new feature selection algorithm (CBFS based on clearness of features. Feature clearness expresses separability among classes in a feature. Highly clear features contribute towards obtaining high classification accuracy. CScore is a measure to score clearness of each feature and is based on clustered samples to centroid of classes in a feature. We also suggest combining CBFS and other algorithms to improve classification accuracy. CONCLUSIONS/SIGNIFICANCE: From the experiment we confirm that CBFS is more excellent than up-to-date feature selection algorithms including FeaLect. CBFS can be applied to microarray gene selection, text categorization, and image classification.

  9. Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos

    NARCIS (Netherlands)

    Burghouts, G.J.; Schutte, K.; Bouma, H.; Hollander, R.J.M. den

    2013-01-01

    In this paper, a system is presented that can detect 48 human actions in realistic videos, ranging from simple actions such as ‘walk’ to complex actions such as ‘exchange’. We propose a method that gives a major contribution in performance. The reason for this major improvement is related to a diffe

  10. SELECTED FEATURES OF POLISH FARMERS

    Directory of Open Access Journals (Sweden)

    Grzegorz Spychalski

    2013-12-01

    Full Text Available The paper presents results of the research carried out among farm owners in Wielkopolskie voivodeship referring to selected features of social capital. The author identifies and estimates impact of some socio-professional factors on social capital quality and derives statistical conclusion. As a result there is a list of economic policy measures facilitating rural areas development in this aspect. The level of education, civic activity and tendency for collective activity are main conditions of social capital quality in Polish rural areas.

  11. THE FEATURE SUBSET SELECTION ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    LiuYongguo; LiXueming; 等

    2003-01-01

    The motivation of data mining is how to extract effective information from huge data in very large database.However,some redundant irrelevant attributes,which result in low performance and high computing complexity,are included in the very large database in general.So,Feature Selection(FSS)becomes one important issue in the field of data mining.In this letter,an Fss model based on the filter approach is built,which uses the simulated annealing gentic algorithm.Experimental results show that convergence and stability of this algorithm are adequately achieved.

  12. THE FEATURE SUBSET SELECTION ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    Liu Yongguo; Li Xueming; Wu Zhongfu

    2003-01-01

    The motivation of data mining is how to extract effective information from huge data in very large database. However, some redundant and irrelevant attributes, which result in low performance and high computing complexity, are included in the very large database in general.So, Feature Subset Selection (FSS) becomes one important issue in the field of data mining. In this letter, an FSS model based on the filter approach is built, which uses the simulated annealing genetic algorithm. Experimental results show that convergence and stability of this algorithm are adequately achieved.

  13. Rough set-based feature selection method

    Institute of Scientific and Technical Information of China (English)

    ZHAN Yanmei; ZENG Xiangyang; SUN Jincai

    2005-01-01

    A new feature selection method is proposed based on the discern matrix in rough set in this paper. The main idea of this method is that the most effective feature, if used for classification, can distinguish the most number of samples belonging to different classes. Experiments are performed using this method to select relevant features for artificial datasets and real-world datasets. Results show that the selection method proposed can correctly select all the relevant features of artificial datasets and drastically reduce the number of features at the same time. In addition, when this method is used for the selection of classification features of real-world underwater targets,the number of classification features after selection drops to 20% of the original feature set, and the classification accuracy increases about 6% using dataset after feature selection.

  14. Measuring demand for flat water recreation using a two-stage/disequilibrium travel cost model with adjustment for overdispersion and self-selection

    Science.gov (United States)

    McKean, John R.; Johnson, Donn; Taylor, R. Garth

    2003-04-01

    An alternate travel cost model is applied to an on-site sample to estimate the value of flat water recreation on the impounded lower Snake River. Four contiguous reservoirs would be eliminated if the dams are breached to protect endangered Pacific salmon and steelhead trout. The empirical method applies truncated negative binomial regression with adjustment for endogenous stratification. The two-stage decision model assumes that recreationists allocate their time among work and leisure prior to deciding among consumer goods. The allocation of time and money among goods in the second stage is conditional on the predetermined work time and income. The second stage is a disequilibrium labor market which also applies if employers set work hours or if recreationists are not in the labor force. When work time is either predetermined, fixed by contract, or nonexistent, recreationists must consider separate prices and budgets for time and money.

  15. Genetic search feature selection for affective modeling

    DEFF Research Database (Denmark)

    Martínez, Héctor P.; Yannakakis, Georgios N.

    2010-01-01

    Automatic feature selection is a critical step towards the generation of successful computational models of affect. This paper presents a genetic search-based feature selection method which is developed as a global-search algorithm for improving the accuracy of the affective models built...

  16. Genetic Feature Selection for Texture Classification

    Institute of Scientific and Technical Information of China (English)

    PAN Li; ZHENG Hong; ZHANG Zuxun; ZHANG Jianqing

    2004-01-01

    This paper presents a novel approach to feature subset selection using genetic algorithms. This approach has the ability to accommodate multiple criteria such as the accuracy and cost of classification into the process of feature selection and finds the effective feature subset for texture classification. On the basis of the effective feature subset selected, a method is described to extract the objects which are higher than their surroundings, such as trees or forest, in the color aerial images. The methodology presented in this paper is illustrated by its application to the problem of trees extraction from aerial images.

  17. Genetic search feature selection for affective modeling

    DEFF Research Database (Denmark)

    Martínez, Héctor P.; Yannakakis, Georgios N.

    2010-01-01

    Automatic feature selection is a critical step towards the generation of successful computational models of affect. This paper presents a genetic search-based feature selection method which is developed as a global-search algorithm for improving the accuracy of the affective models built....... The method is tested and compared against sequential forward feature selection and random search in a dataset derived from a game survey experiment which contains bimodal input features (physiological and gameplay) and expressed pairwise preferences of affect. Results suggest that the proposed method...

  18. Embedded Incremental Feature Selection for Reinforcement Learning

    Science.gov (United States)

    2012-05-01

    Classical reinforcement learning techniques become impractical in domains with large complex state spaces. The size of a domain’s state space is...require all the provided features. In this paper we present a feature selection algorithm for reinforcement learning called Incremental Feature

  19. Two-stage processing of sounds explains behavioral performance variations due to changes in stimulus contrast and selective attention: an MEG study.

    Directory of Open Access Journals (Sweden)

    Jaakko Kauramäki

    Full Text Available Selectively attending to task-relevant sounds whilst ignoring background noise is one of the most amazing feats performed by the human brain. Here, we studied the underlying neural mechanisms by recording magnetoencephalographic (MEG responses of 14 healthy human subjects while they performed a near-threshold auditory discrimination task vs. a visual control task of similar difficulty. The auditory stimuli consisted of notch-filtered continuous noise masker sounds, and of 1020-Hz target tones occasionally (p = 0.1 replacing 1000-Hz standard tones of 300-ms duration that were embedded at the center of the notches, the widths of which were parametrically varied. As a control for masker effects, tone-evoked responses were additionally recorded without masker sound. Selective attention to tones significantly increased the amplitude of the onset M100 response at ~100 ms to the standard tones during presence of the masker sounds especially with notches narrower than the critical band. Further, attention modulated sustained response most clearly at 300-400 ms time range from sound onset, with narrower notches than in case of the M100, thus selectively reducing the masker-induced suppression of the tone-evoked response. Our results show evidence of a multiple-stage filtering mechanism of sensory input in the human auditory cortex: 1 one at early (~100 ms latencies bilaterally in posterior parts of the secondary auditory areas, and 2 adaptive filtering of attended sounds from task-irrelevant background masker at longer latency (~300 ms in more medial auditory cortical regions, predominantly in the left hemisphere, enhancing processing of near-threshold sounds.

  20. Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention.

    Science.gov (United States)

    Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Li, Peijun; Fang, Fang; Sun, Pei

    2016-01-13

    An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how these two brain functions correlate with each other remains to be elucidated. In this functional magnetic resonance imaging (fMRI) study, we explored the neural mechanism by which feature-selective attention modulates audiovisual semantic integration. During the fMRI experiment, the subjects were presented with visual-only, auditory-only, or audiovisual dynamical facial stimuli and performed several feature-selective attention tasks. Our results revealed that a distribution of areas, including heteromodal areas and brain areas encoding attended features, may be involved in audiovisual semantic integration. Through feature-selective attention, the human brain may selectively integrate audiovisual semantic information from attended features by enhancing functional connectivity and thus regulating information flows from heteromodal areas to brain areas encoding the attended features.

  1. Prominent feature selection of microarray data

    Institute of Scientific and Technical Information of China (English)

    Yihui Liu

    2009-01-01

    For wavelet transform, a set of orthogonal wavelet basis aims to detect the localized changing features contained in microarray data. In this research, we investigate the performance of the selected wavelet features based on wavelet detail coefficients at the second level and the third level. The genetic algorithm is performed to optimize wavelet detail coefficients to select the best discriminant features. Exper-iments are carried out on four microarray datasets to evaluate the performance of classification. Experimental results prove that wavelet features optimized from detail coefficients efficiently characterize the differences between normal tissues and cancer tissues.

  2. Stable Feature Selection for Biomarker Discovery

    CERN Document Server

    He, Zengyou

    2010-01-01

    Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development.

  3. ECG Signal Feature Selection for Emotion Recognition

    Directory of Open Access Journals (Sweden)

    Lichen Xun

    2013-01-01

    Full Text Available This paper aims to study the selection of features based on ECG in emotion recognition. In the process of features selection, we start from existing feature selection algorithm, and pay special attention to some of the intuitive value on ECG waveform as well. Through the use of ANOVA and heuristic search, we picked out the different features to distinguish joy and pleasure these two emotions, then we combine this with pathological analysis of ECG signals by the view of the medical experts to discuss the logic corresponding relation between ECG waveform and emotion distinguish. Through experiment, using the method in this paper we only picked out five features and reached 92% of accuracy rate in the recognition of joy and pleasure.

  4. A Genetic Algorithm-Based Feature Selection

    Directory of Open Access Journals (Sweden)

    Babatunde Oluleye

    2014-07-01

    Full Text Available This article details the exploration and application of Genetic Algorithm (GA for feature selection. Particularly a binary GA was used for dimensionality reduction to enhance the performance of the concerned classifiers. In this work, hundred (100 features were extracted from set of images found in the Flavia dataset (a publicly available dataset. The extracted features are Zernike Moments (ZM, Fourier Descriptors (FD, Lengendre Moments (LM, Hu 7 Moments (Hu7M, Texture Properties (TP and Geometrical Properties (GP. The main contributions of this article are (1 detailed documentation of the GA Toolbox in MATLAB and (2 the development of a GA-based feature selector using a novel fitness function (kNN-based classification error which enabled the GA to obtain a combinatorial set of feature giving rise to optimal accuracy. The results obtained were compared with various feature selectors from WEKA software and obtained better results in many ways than WEKA feature selectors in terms of classification accuracy

  5. Classification Using Markov Blanket for Feature Selection

    DEFF Research Database (Denmark)

    Zeng, Yifeng; Luo, Jian

    2009-01-01

    Selecting relevant features is in demand when a large data set is of interest in a classification task. It produces a tractable number of features that are sufficient and possibly improve the classification performance. This paper studies a statistical method of Markov blanket induction algorithm...... induction as a feature selection method. In addition, we point out an important assumption behind the Markov blanket induction algorithm and show its effect on the classification performance....... for filtering features and then applies a classifier using the Markov blanket predictors. The Markov blanket contains a minimal subset of relevant features that yields optimal classification performance. We experimentally demonstrate the improved performance of several classifiers using a Markov blanket...

  6. Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection

    Directory of Open Access Journals (Sweden)

    Shengyu Liu

    2015-01-01

    Full Text Available Drug name recognition (DNR is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge.

  7. Feature engineering for drug name recognition in biomedical texts: feature conjunction and feature selection.

    Science.gov (United States)

    Liu, Shengyu; Tang, Buzhou; Chen, Qingcai; Wang, Xiaolong; Fan, Xiaoming

    2015-01-01

    Drug name recognition (DNR) is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge.

  8. Medical Image Feature, Extraction, Selection And Classification

    Directory of Open Access Journals (Sweden)

    M.VASANTHA,

    2010-06-01

    Full Text Available Breast cancer is the most common type of cancer found in women. It is the most frequent form of cancer and one in 22 women in India is likely to suffer from breast cancer. This paper proposes a image classifier to classify the mammogram images. Mammogram image is classified into normal image, benign image and malignant image. Totally 26 features including histogram intensity features and GLCM features are extracted from mammogram image. A hybrid approach of feature selection is proposed in this paper which reduces 75% of the features. Decision tree algorithms are applied to mammography lassification by using these reduced features. Experimental results have been obtained for a data set of 113 images taken from MIAS of different types. This technique of classification has not been attempted before and it reveals the potential of Data mining in medical treatment.

  9. Feature subset selection based on relevance

    Science.gov (United States)

    Wang, Hui; Bell, David; Murtagh, Fionn

    In this paper an axiomatic characterisation of feature subset selection is presented. Two axioms are presented: sufficiency axiom—preservation of learning information, and necessity axiom—minimising encoding length. The sufficiency axiom concerns the existing dataset and is derived based on the following understanding: any selected feature subset should be able to describe the training dataset without losing information, i.e. it is consistent with the training dataset. The necessity axiom concerns the predictability and is derived from Occam's razor, which states that the simplest among different alternatives is preferred for prediction. The two axioms are then restated in terms of relevance in a concise form: maximising both the r( X; Y) and r( Y; X) relevance. Based on the relevance characterisation, four feature subset selection algorithms are presented and analysed: one is exhaustive and the remaining three are heuristic. Experimentation is also presented and the results are encouraging. Comparison is also made with some well-known feature subset selection algorithms, in particular, with the built-in feature selection mechanism in C4.5.

  10. The Importance of Feature Selection in Classification

    Directory of Open Access Journals (Sweden)

    Mrs.K. Moni Sushma Deep

    2014-01-01

    Full Text Available Feature Selection is an important technique for classification for reducing the dimensionality of feature space and it removes redundant, irrelevant, or noisy data. In this paper the feature are selected based on the ranking methods.(1 Information Gain (IG attribute evaluation, (2 Gain Ratio (GR attribute evaluation, (3 Symmetrical Uncertainty (SU attribute evaluation. This paper evaluates the features which are derived from the 3 methods using supervised learning algorithms K-Nearest Neighbor and Naïve Bayes. The measures used for the classifier are True Positive, False Positive, Accuracy and they compared between the algorithm for experimental results. we have taken 2 data sets Pima and Wine from UCI Repository database.

  11. Discriminative feature selection for visual tracking

    Science.gov (United States)

    Ma, Junkai; Luo, Haibo; Zhou, Wei; Song, Yingchao; Hui, Bin; Chang, Zheng

    2017-06-01

    Visual tracking is an important role in computer vision tasks. The robustness of tracking algorithm is a challenge. Especially in complex scenarios such as clutter background, illumination variation and appearance changes etc. As an important component in tracking algorithm, the appropriateness of feature is closed related to the tracking precision. In this paper, an online discriminative feature selection is proposed to provide the tracker the most discriminative feature. Firstly, a feature pool which contains different information of the image such as gradient, gray value and edge is built. And when every frame is processed during tracking, all of these features will be extracted. Secondly, these features are ranked depend on their discrimination between target and background and the highest scored feature is chosen to represent the candidate image patch. Then, after obtaining the tracking result, the target model will be update to adapt the appearance variation. The experiment show that our method is robust when compared with other state-of-the-art algorithms.

  12. Coevolution of active vision and feature selection.

    Science.gov (United States)

    Floreano, Dario; Kato, Toshifumi; Marocco, Davide; Sauser, Eric

    2004-03-01

    We show that complex visual tasks, such as position- and size-invariant shape recognition and navigation in the environment, can be tackled with simple architectures generated by a coevolutionary process of active vision and feature selection. Behavioral machines equipped with primitive vision systems and direct pathways between visual and motor neurons are evolved while they freely interact with their environments. We describe the application of this methodology in three sets of experiments, namely, shape discrimination, car driving, and robot navigation. We show that these systems develop sensitivity to a number of oriented, retinotopic, visual-feature-oriented edges, corners, height, and a behavioral repertoire to locate, bring, and keep these features in sensitive regions of the vision system, resembling strategies observed in simple insects.

  13. Adaptive feature selection for hyperspectral data analysis

    Science.gov (United States)

    Korycinski, Donna; Crawford, Melba M.; Barnes, J. Wesley

    2004-02-01

    Hyperspectral data can potentially provide greatly improved capability for discrimination between many land cover types, but new methods are required to process these data and extract the required information. Data sets are extremely large, and the data are not well distributed across these high dimensional spaces. The increased number and resolution of spectral bands, many of which are highly correlated, is problematic for supervised statistical classification techniques when the number of training samples is small relative to the dimension of the input vector. Selection of the most relevant subset of features is one means of mitigating these effects. A new algorithm based on the tabu search metaheuristic optimization technique was developed to perform subset feature selection and implemented within a binary hierarchical tree framework. Results obtained using the new approach were compared to those from a greedy common greedy selection technique and to a Fisher discriminant based feature extraction method, both of which were implemented in the same binary hierarchical tree classification scheme. The tabu search based method generally yielded higher classification accuracies with lower variability than these other methods in experiments using hyperspectral data acquired by the EO-1 Hyperion sensor over the Okavango Delta of Botswana.

  14. Novel Feature Selection by Differential Evolution Algorithm

    Directory of Open Access Journals (Sweden)

    Ali Ghareaghaji

    2013-11-01

    Full Text Available Iris scan biometrics employs the unique characteristic and features of the human iris in order to verify the identity of in individual. In today's world, where terrorist attacks are on the rise employment of infallible security systems is a must. This makes Iris recognition systems unavoidable in emerging security. Authentication the objective function is minimized using Differential Evolutionary (DE Algorithm where the population vector is encoded using Binary Encoded Decimal to avoid the float number optimization problem. An automatic clustering of the possible values of the Lagrangian multiplier provides a detailed insight of the selected features during the proposed DE based optimization process. The classification accuracy of Support Vector Machine (SVM is used to measure the performance of the selected features. The proposed algorithm outperforms the existing DE based approaches when tested on IRIS, Wine, Wisconsin Breast Cancer, Sonar and Ionosphere datasets. The same algorithm when applied on gait based people identification, using skeleton data points obtained from Microsoft Kinect sensor, exceeds the previously reported accuracies.

  15. A Hybrid Feature Subset Selection using Metrics and Forward Selection

    Directory of Open Access Journals (Sweden)

    K. Fathima Bibi

    2015-04-01

    Full Text Available The aim of this study is to design a Feature Subset Selection Technique that speeds up the Feature Selection (FS process in high dimensional datasets with reduced computational cost and great efficiency. FS has become the focus of much research on decision support system areas for which data with tremendous number of variables are analyzed. Filters and wrappers are proposed techniques for the feature subset selection process. Filters make use of association based approach but wrappers adopt classification algorithms to identify important features. Filter method lacks the ability of minimization of simplification error while wrapper method burden weighty computational resource. To pull through these difficulties, a hybrid approach is proposed combining both filters and wrappers. Filter approach uses a permutation of ranker search methods and a wrapper which improves the learning accurateness and obtains a lessening in the memory requirements and finishing time. The UCI machine learning repository was chosen to experiment the approach. The classification accuracy resulted from our approach proves to be higher.

  16. The construction of two-stage tests

    NARCIS (Netherlands)

    Adema, Jos J.

    1988-01-01

    Although two-stage testing is not the most efficient form of adaptive testing, it has some advantages. In this paper, linear programming models are given for the construction of two-stage tests. In these models, practical constraints with respect to, among other things, test composition, administrat

  17. Two-Stage Modelling Of Random Phenomena

    Science.gov (United States)

    Barańska, Anna

    2015-12-01

    The main objective of this publication was to present a two-stage algorithm of modelling random phenomena, based on multidimensional function modelling, on the example of modelling the real estate market for the purpose of real estate valuation and estimation of model parameters of foundations vertical displacements. The first stage of the presented algorithm includes a selection of a suitable form of the function model. In the classical algorithms, based on function modelling, prediction of the dependent variable is its value obtained directly from the model. The better the model reflects a relationship between the independent variables and their effect on the dependent variable, the more reliable is the model value. In this paper, an algorithm has been proposed which comprises adjustment of the value obtained from the model with a random correction determined from the residuals of the model for these cases which, in a separate analysis, were considered to be the most similar to the object for which we want to model the dependent variable. The effect of applying the developed quantitative procedures for calculating the corrections and qualitative methods to assess the similarity on the final outcome of the prediction and its accuracy, was examined by statistical methods, mainly using appropriate parametric tests of significance. The idea of the presented algorithm has been designed so as to approximate the value of the dependent variable of the studied phenomenon to its value in reality and, at the same time, to have it "smoothed out" by a well fitted modelling function.

  18. Unsupervised Feature Selection for Latent Dirichlet Allocation

    Institute of Scientific and Technical Information of China (English)

    Xu Weiran; Du Gang; Chen Guang; Guo Jun; Yang Jie

    2011-01-01

    As a generative model Latent Dirichlet Allocation Model,which lacks optimization of topics' discrimination capability focuses on how to generate data,This paper aims to improve the discrimination capability through unsupervised feature selection.Theoretical analysis shows that the discrimination capability of a topic is limited by the discrimination capability of its representative words.The discrimination capability of a word is approximated by the Information Gain of the word for topics,which is used to distinguish between “general word” and “special word” in LDA topics.Therefore,we add a constraint to the LDA objective function to let the “general words” only happen in “general topics”other than “special topics”.Then a heuristic algorithm is presented to get the solution.Experiments show that this method can not only improve the information gain of topics,but also make the topics easier to understand by human.

  19. Feature selection with the image grand tour

    Science.gov (United States)

    Marchette, David J.; Solka, Jeffrey L.

    2000-08-01

    The grand tour is a method for visualizing high dimensional data by presenting the user with a set of projections and the projected data. This idea was extended to multispectral images by viewing each pixel as a multidimensional value, and viewing the projections of the grand tour as an image. The user then looks for projections which provide a useful interpretation of the image, for example, separating targets from clutter. We discuss a modification of this which allows the user to select convolution kernels which provide useful discriminant ability, both in an unsupervised manner as in the image grand tour, or in a supervised manner using training data. This approach is extended to other window-based features. For example, one can define a generalization of the median filter as a linear combination of the order statistics within a window. Thus the median filter is that projection containing zeros everywhere except for the middle value, which contains a one. Using the convolution grand tour one can select projections on these order statistics to obtain new nonlinear filters.

  20. Feature Selection for Chemical Sensor Arrays Using Mutual Information

    Science.gov (United States)

    Wang, X. Rosalind; Lizier, Joseph T.; Nowotny, Thomas; Berna, Amalia Z.; Prokopenko, Mikhail; Trowell, Stephen C.

    2014-01-01

    We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays. PMID:24595058

  1. A New Approach of Feature Selection for Text Categorization

    Institute of Scientific and Technical Information of China (English)

    CUI Zifeng; XU Baowen; ZHANG Weifeng; XU Junling

    2006-01-01

    This paper proposes a new approach of feature selection based on the independent measure between features for text categorization.A fundamental hypothesis that occurrence of the terms in documents is independent of each other,widely used in the probabilistic models for text categorization (TC), is discussed.However, the basic hypothesis is incomplete for independence of feature set.From the view of feature selection, a new independent measure between features is designed, by which a feature selection algorithm is given to obtain a feature subset.The selected subset is high in relevance with category and strong in independence between features,satisfies the basic hypothesis at maximum degree.Compared with other traditional feature selection method in TC (which is only taken into the relevance account), the performance of feature subset selected by our method is prior to others with experiments on the benchmark dataset of 20 Newsgroups.

  2. Feature Selection for Wheat Yield Prediction

    Science.gov (United States)

    Ruß, Georg; Kruse, Rudolf

    Carrying out effective and sustainable agriculture has become an important issue in recent years. Agricultural production has to keep up with an everincreasing population by taking advantage of a field’s heterogeneity. Nowadays, modern technology such as the global positioning system (GPS) and a multitude of developed sensors enable farmers to better measure their fields’ heterogeneities. For this small-scale, precise treatment the term precision agriculture has been coined. However, the large amounts of data that are (literally) harvested during the growing season have to be analysed. In particular, the farmer is interested in knowing whether a newly developed heterogeneity sensor is potentially advantageous or not. Since the sensor data are readily available, this issue should be seen from an artificial intelligence perspective. There it can be treated as a feature selection problem. The additional task of yield prediction can be treated as a multi-dimensional regression problem. This article aims to present an approach towards solving these two practically important problems using artificial intelligence and data mining ideas and methodologies.

  3. Detecting Lo cal Manifold Structure for Unsup ervised Feature Selection

    Institute of Scientific and Technical Information of China (English)

    FENG Ding-Cheng; CHEN Feng; XU Wen-Li

    2014-01-01

    Unsupervised feature selection is fundamental in statistical pattern recognition, and has drawn persistent attention in the past several decades. Recently, much work has shown that feature selection can be formulated as nonlinear dimensionality reduction with discrete constraints. This line of research emphasizes utilizing the manifold learning techniques, where feature selection and learning can be studied based on the manifold assumption in data distribution. Many existing feature selection methods such as Laplacian score, SPEC (spectrum decomposition of graph Laplacian), TR (trace ratio) criterion, MSFS (multi-cluster feature selection) and EVSC (eigenvalue sensitive criterion) apply the basic properties of graph Laplacian, and select the optimal feature subsets which best preserve the manifold structure defined on the graph Laplacian. In this paper, we propose a new feature selection perspective from locally linear embedding (LLE), which is another popular manifold learning method. The main difficulty of using LLE for feature selection is that its optimization involves quadratic programming and eigenvalue decomposition, both of which are continuous procedures and different from discrete feature selection. We prove that the LLE objective can be decomposed with respect to data dimensionalities in the subset selection problem, which also facilitates constructing better coordinates from data using the principal component analysis (PCA) technique. Based on these results, we propose a novel unsupervised feature selection algorithm, called locally linear selection (LLS), to select a feature subset representing the underlying data manifold. The local relationship among samples is computed from the LLE formulation, which is then used to estimate the contribution of each individual feature to the underlying manifold structure. These contributions, represented as LLS scores, are ranked and selected as the candidate solution to feature selection. We further develop a

  4. Naive Bayes-Guided Bat Algorithm for Feature Selection

    Directory of Open Access Journals (Sweden)

    Ahmed Majid Taha

    2013-01-01

    Full Text Available When the amount of data and information is said to double in every 20 months or so, feature selection has become highly important and beneficial. Further improvements in feature selection will positively affect a wide array of applications in fields such as pattern recognition, machine learning, or signal processing. Bio-inspired method called Bat Algorithm hybridized with a Naive Bayes classifier has been presented in this work. The performance of the proposed feature selection algorithm was investigated using twelve benchmark datasets from different domains and was compared to three other well-known feature selection algorithms. Discussion focused on four perspectives: number of features, classification accuracy, stability, and feature generalization. The results showed that BANB significantly outperformed other algorithms in selecting lower number of features, hence removing irrelevant, redundant, or noisy features while maintaining the classification accuracy. BANB is also proven to be more stable than other methods and is capable of producing more general feature subsets.

  5. Feature dimensionality reduction for myoelectric pattern recognition: a comparison study of feature selection and feature projection methods.

    Science.gov (United States)

    Liu, Jie

    2014-12-01

    This study investigates the effect of the feature dimensionality reduction strategies on the classification of surface electromyography (EMG) signals toward developing a practical myoelectric control system. Two dimensionality reduction strategies, feature selection and feature projection, were tested on both EMG feature sets, respectively. A feature selection based myoelectric pattern recognition system was introduced to select the features by eliminating the redundant features of EMG recordings instead of directly choosing a subset of EMG channels. The Markov random field (MRF) method and a forward orthogonal search algorithm were employed to evaluate the contribution of each individual feature to the classification, respectively. Our results from 15 healthy subjects indicate that, with a feature selection analysis, independent of the type of feature set, across all subjects high overall accuracies can be achieved in classification of seven different forearm motions with a small number of top ranked original EMG features obtained from the forearm muscles (average overall classification accuracy >95% with 12 selected EMG features). Compared to various feature dimensionality reduction techniques in myoelectric pattern recognition, the proposed filter-based feature selection approach is independent of the type of classification algorithms and features, which can effectively reduce the redundant information not only across different channels, but also cross different features in the same channel. This may enable robust EMG feature dimensionality reduction without needing to change ongoing, practical use of classification algorithms, an important step toward clinical utility.

  6. Efficient Generation and Selection of Combined Features for Improved Classification

    KAUST Repository

    Shono, Ahmad N.

    2014-05-01

    This study contributes a methodology and associated toolkit developed to allow users to experiment with the use of combined features in classification problems. Methods are provided for efficiently generating combined features from an original feature set, for efficiently selecting the most discriminating of these generated combined features, and for efficiently performing a preliminary comparison of the classification results when using the original features exclusively against the results when using the selected combined features. The potential benefit of considering combined features in classification problems is demonstrated by applying the developed methodology and toolkit to three sample data sets where the discovery of combined features containing new discriminating information led to improved classification results.

  7. CARES: Completely Automated Robust Edge Snapper for carotid ultrasound IMT measurement on a multi-institutional database of 300 images: a two stage system combining an intensity-based feature approach with first order absolute moments

    Science.gov (United States)

    Molinari, Filippo; Acharya, Rajendra; Zeng, Guang; Suri, Jasjit S.

    2011-03-01

    The carotid intima-media thickness (IMT) is the most used marker for the progression of atherosclerosis and onset of the cardiovascular diseases. Computer-aided measurements improve accuracy, but usually require user interaction. In this paper we characterized a new and completely automated technique for carotid segmentation and IMT measurement based on the merits of two previously developed techniques. We used an integrated approach of intelligent image feature extraction and line fitting for automatically locating the carotid artery in the image frame, followed by wall interfaces extraction based on Gaussian edge operator. We called our system - CARES. We validated the CARES on a multi-institutional database of 300 carotid ultrasound images. IMT measurement bias was 0.032 +/- 0.141 mm, better than other automated techniques and comparable to that of user-driven methodologies. Our novel approach of CARES processed 96% of the images leading to the figure of merit to be 95.7%. CARES ensured complete automation and high accuracy in IMT measurement; hence it could be a suitable clinical tool for processing of large datasets in multicenter studies involving atherosclerosis.pre-

  8. Principal Feature Analysis: A Multivariate Feature Selection Method for fMRI Data

    Directory of Open Access Journals (Sweden)

    Lijun Wang

    2013-01-01

    Full Text Available Brain decoding with functional magnetic resonance imaging (fMRI requires analysis of complex, multivariate data. Multivoxel pattern analysis (MVPA has been widely used in recent years. MVPA treats the activation of multiple voxels from fMRI data as a pattern and decodes brain states using pattern classification methods. Feature selection is a critical procedure of MVPA because it decides which features will be included in the classification analysis of fMRI data, thereby improving the performance of the classifier. Features can be selected by limiting the analysis to specific anatomical regions or by computing univariate (voxel-wise or multivariate statistics. However, these methods either discard some informative features or select features with redundant information. This paper introduces the principal feature analysis as a novel multivariate feature selection method for fMRI data processing. This multivariate approach aims to remove features with redundant information, thereby selecting fewer features, while retaining the most information.

  9. NEW FEATURE SELECTION METHOD IN MACHINE FAULT DIAGNOSIS

    Institute of Scientific and Technical Information of China (English)

    Wang Xinfeng; Qiu Jing; Liu Guanjun

    2005-01-01

    Aiming to deficiency of the filter and wrapper feature selection methods, a new method based on composite method of filter and wrapper method is proposed. First the method filters original features to form a feature subset which can meet classification correctness rate, then applies wrapper feature selection method select optimal feature subset. A successful technique for solving optimization problems is given by genetic algorithm (GA). GA is applied to the problem of optimal feature selection. The composite method saves computing time several times of the wrapper method with holding the classification accuracy in data simulation and experiment on bearing fault feature selection. So this method possesses excellent optimization property, can save more selection time, and has the characteristics of high accuracy and high efficiency.

  10. Two-stage sampling for acceptance testing

    Energy Technology Data Exchange (ETDEWEB)

    Atwood, C.L.; Bryan, M.F.

    1992-09-01

    Sometimes a regulatory requirement or a quality-assurance procedure sets an allowed maximum on a confidence limit for a mean. If the sample mean of the measurements is below the allowed maximum, but the confidence limit is above it, a very widespread practice is to increase the sample size and recalculate the confidence bound. The confidence level of this two-stage procedure is rarely found correctly, but instead is typically taken to be the nominal confidence level, found as if the final sample size had been specified in advance. In typical settings, the correct nominal [alpha] should be between the desired P(Type I error) and half that value. This note gives tables for the correct a to use, some plots of power curves, and an example of correct two-stage sampling.

  11. Two-stage sampling for acceptance testing

    Energy Technology Data Exchange (ETDEWEB)

    Atwood, C.L.; Bryan, M.F.

    1992-09-01

    Sometimes a regulatory requirement or a quality-assurance procedure sets an allowed maximum on a confidence limit for a mean. If the sample mean of the measurements is below the allowed maximum, but the confidence limit is above it, a very widespread practice is to increase the sample size and recalculate the confidence bound. The confidence level of this two-stage procedure is rarely found correctly, but instead is typically taken to be the nominal confidence level, found as if the final sample size had been specified in advance. In typical settings, the correct nominal {alpha} should be between the desired P(Type I error) and half that value. This note gives tables for the correct a to use, some plots of power curves, and an example of correct two-stage sampling.

  12. Two Stage Gear Tooth Dynamics Program

    Science.gov (United States)

    1989-08-01

    cordi - tions and associated iteration prooedure become more complex. This is due to both the increased number of components and to the time for a...solved for each stage in the two stage solution . There are (3 + ntrrber of planets) degrees of freedom fcr eacb stage plus two degrees of freedom...should be devised. It should be noted that this is not minor task. In general, each stage plus an input or output shaft will have 2 times (4 + number

  13. Feature selection with neighborhood entropy-based cooperative game theory.

    Science.gov (United States)

    Zeng, Kai; She, Kun; Niu, Xinzheng

    2014-01-01

    Feature selection plays an important role in machine learning and data mining. In recent years, various feature measurements have been proposed to select significant features from high-dimensional datasets. However, most traditional feature selection methods will ignore some features which have strong classification ability as a group but are weak as individuals. To deal with this problem, we redefine the redundancy, interdependence, and independence of features by using neighborhood entropy. Then the neighborhood entropy-based feature contribution is proposed under the framework of cooperative game. The evaluative criteria of features can be formalized as the product of contribution and other classical feature measures. Finally, the proposed method is tested on several UCI datasets. The results show that neighborhood entropy-based cooperative game theory model (NECGT) yield better performance than classical ones.

  14. An ensemble approach for feature selection of Cyber Attack Dataset

    CERN Document Server

    Singh, Shailendra

    2009-01-01

    Feature selection is an indispensable preprocessing step when mining huge datasets that can significantly improve the overall system performance. Therefore in this paper we focus on a hybrid approach of feature selection. This method falls into two phases. The filter phase select the features with highest information gain and guides the initialization of search process for wrapper phase whose output the final feature subset. The final feature subsets are passed through the Knearest neighbor classifier for classification of attacks. The effectiveness of this algorithm is demonstrated on DARPA KDDCUP99 cyber attack dataset.

  15. Geochemical dynamics in selected Yellowstone hydrothermal features

    Science.gov (United States)

    Druschel, G.; Kamyshny, A.; Findlay, A.; Nuzzio, D.

    2010-12-01

    Yellowstone National Park has a wide diversity of thermal features, and includes springs with a range of pH conditions that significantly impact sulfur speciation. We have utilized a combination of voltammetric and spectroscopic techniques to characterize the intermediate sulfur chemistry of Cinder Pool, Evening Primrose, Ojo Caliente, Frying Pan, Azure, and Dragon thermal springs. These measurements additionally have demonstrated the geochemical dynamics inherent in these systems; significant variability in chemical speciation occur in many of these thermal features due to changes in gas supply rates, fluid discharge rates, and thermal differences that occur on second time scales. The dynamics of the geochemical settings shown may significantly impact how microorganisms interact with the sulfur forms in these systems.

  16. Condensate from a two-stage gasifier

    DEFF Research Database (Denmark)

    Bentzen, Jens Dall; Henriksen, Ulrik Birk; Hindsgaul, Claus

    2000-01-01

    that the organic compounds and the inhibition effect are very low even before treatment with activated carbon. The moderate inhibition effect relates to a high content of ammonia in the condensate. The nitrifiers become tolerant to the condensate after a few weeks of exposure. The level of organic compounds......Condensate, produced when gas from downdraft biomass gasifier is cooled, contains organic compounds that inhibit nitrifiers. Treatment with activated carbon removes most of the organics and makes the condensate far less inhibitory. The condensate from an optimised two-stage gasifier is so clean...

  17. Two Stage Sibling Cycle Compressor/Expander.

    Science.gov (United States)

    1994-02-01

    vol. 5, p. 424. 11. L. Bauwens and M.P. Mitchell, " Regenerator Analysis: Validation of the MS*2 Stirling Cycle Code," Proc. XVIIIth International...PL-TR--94-1051 PL-TR-- 94-1051 TWO STAGE SIBLING CYCLE COMPRESSOR/EXPANDER Matthew P. Mitchell . Mitchell/ Stirling Machines/Systems, Inc. No\\ 1995...ty. THIS PAGE IS UNCLASSIFIED PL-TR-94-1051 This final report was prepared byMitchell/ Stirling Machines/Systems, Inc., Berkeley, CA under Contract

  18. A Study on Feature Selection Techniques in Educational Data Mining

    CERN Document Server

    Ramaswami, M

    2009-01-01

    Educational data mining (EDM) is a new growing research area and the essence of data mining concepts are used in the educational field for the purpose of extracting useful information on the behaviors of students in the learning process. In this EDM, feature selection is to be made for the generation of subset of candidate variables. As the feature selection influences the predictive accuracy of any performance model, it is essential to study elaborately the effectiveness of student performance model in connection with feature selection techniques. In this connection, the present study is devoted not only to investigate the most relevant subset features with minimum cardinality for achieving high predictive performance by adopting various filtered feature selection techniques in data mining but also to evaluate the goodness of subsets with different cardinalities and the quality of six filtered feature selection algorithms in terms of F-measure value and Receiver Operating Characteristics (ROC) value, generat...

  19. A New Feature Selection Method for Text Clustering

    Institute of Scientific and Technical Information of China (English)

    XU Junling; XU Baowen; ZHANG Weifeng; CUI Zifeng; ZHANG Wei

    2007-01-01

    Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method for text clustering based on expectation maximization and cluster validity is proposed. It uses supervised feature selection method on the intermediate clustering result which is generated during iterative clustering to do feature selection for text clustering; meanwhile, the Davies-Bouldin's index is used to evaluate the intermediate feature subsets indirectly. Then feature subsets are selected according to the curve of the DaviesBouldin's index. Experiment is carried out on several popular datasets and the results show the advantages of the proposed method.

  20. Aptamers overview: selection, features and applications.

    Science.gov (United States)

    Hernandez, Luiza I; Machado, Isabel; Schafer, Thomas; Hernandez, Frank J

    2015-01-01

    Apatamer technology has been around for a quarter of a century and the field had matured enough to start seeing real applications, especially in the medical field. Since their discovery, aptamers rapidly emerged as key players in many fields, such as diagnostics, drug discovery, food science, drug delivery and therapeutics. Because of their synthetic nature, aptamers are evolving at an exponential rate gaining from the newest advances in chemistry, nanotechnology, biology and medicine. This review is meant to give an overview of the aptamer field, by including general aspects of aptamer identification and applications as well as highlighting certain features that contribute to their quick deployment in the biomedical field.

  1. Bayesian feature selection to estimate customer survival

    OpenAIRE

    Figini, Silvia; Giudici, Paolo; Brooks, S P

    2006-01-01

    We consider the problem of estimating the lifetime value of customers, when a large number of features are present in the data. In order to measure lifetime value we use survival analysis models to estimate customer tenure. In such a context, a number of classical modelling challenges arise. We will show how our proposed Bayesian methods perform, and compare it with classical churn models on a real case study. More specifically, based on data from a media service company, our aim will be to p...

  2. FEATURE SELECTION USING GENETIC ALGORITHMS FOR HANDWRITTEN CHARACTER RECOGNITION

    NARCIS (Netherlands)

    Kim, G.; Kim, S.

    2004-01-01

    A feature selection method using genetic algorithms which are suitable means for selecting appropriate set of features from ones with huge dimension is proposed. SGA (Simple Genetic Algorithm) and its modified methods are applied to improve the recognition speed as well as the recognition accuracy.

  3. High Dimensional Data Clustering Using Fast Cluster Based Feature Selection

    Directory of Open Access Journals (Sweden)

    Karthikeyan.P

    2014-03-01

    Full Text Available Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature selection algorithm (FAST is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent; the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree (MST using the Kruskal‟s Algorithm clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Index Terms—

  4. Feature Selection Criteria for Real Time EKF-SLAM Algorithm

    Directory of Open Access Journals (Sweden)

    Fernando Auat Cheein

    2010-02-01

    Full Text Available This paper presents a seletion procedure for environmet features for the correction stage of a SLAM (Simultaneous Localization and Mapping algorithm based on an Extended Kalman Filter (EKF. This approach decreases the computational time of the correction stage which allows for real and constant-time implementations of the SLAM. The selection procedure consists in chosing the features the SLAM system state covariance is more sensible to. The entire system is implemented on a mobile robot equipped with a range sensor laser. The features extracted from the environment correspond to lines and corners. Experimental results of the real time SLAM algorithm and an analysis of the processing-time consumed by the SLAM with the feature selection procedure proposed are shown. A comparison between the feature selection approach proposed and the classical sequential EKF-SLAM along with an entropy feature selection approach is also performed.

  5. Simultaneous Channel and Feature Selection of Fused EEG Features Based on Sparse Group Lasso

    Directory of Open Access Journals (Sweden)

    Jin-Jia Wang

    2015-01-01

    Full Text Available Feature extraction and classification of EEG signals are core parts of brain computer interfaces (BCIs. Due to the high dimension of the EEG feature vector, an effective feature selection algorithm has become an integral part of research studies. In this paper, we present a new method based on a wrapped Sparse Group Lasso for channel and feature selection of fused EEG signals. The high-dimensional fused features are firstly obtained, which include the power spectrum, time-domain statistics, AR model, and the wavelet coefficient features extracted from the preprocessed EEG signals. The wrapped channel and feature selection method is then applied, which uses the logistical regression model with Sparse Group Lasso penalized function. The model is fitted on the training data, and parameter estimation is obtained by modified blockwise coordinate descent and coordinate gradient descent method. The best parameters and feature subset are selected by using a 10-fold cross-validation. Finally, the test data is classified using the trained model. Compared with existing channel and feature selection methods, results show that the proposed method is more suitable, more stable, and faster for high-dimensional feature fusion. It can simultaneously achieve channel and feature selection with a lower error rate. The test accuracy on the data used from international BCI Competition IV reached 84.72%.

  6. Simultaneous channel and feature selection of fused EEG features based on Sparse Group Lasso.

    Science.gov (United States)

    Wang, Jin-Jia; Xue, Fang; Li, Hui

    2015-01-01

    Feature extraction and classification of EEG signals are core parts of brain computer interfaces (BCIs). Due to the high dimension of the EEG feature vector, an effective feature selection algorithm has become an integral part of research studies. In this paper, we present a new method based on a wrapped Sparse Group Lasso for channel and feature selection of fused EEG signals. The high-dimensional fused features are firstly obtained, which include the power spectrum, time-domain statistics, AR model, and the wavelet coefficient features extracted from the preprocessed EEG signals. The wrapped channel and feature selection method is then applied, which uses the logistical regression model with Sparse Group Lasso penalized function. The model is fitted on the training data, and parameter estimation is obtained by modified blockwise coordinate descent and coordinate gradient descent method. The best parameters and feature subset are selected by using a 10-fold cross-validation. Finally, the test data is classified using the trained model. Compared with existing channel and feature selection methods, results show that the proposed method is more suitable, more stable, and faster for high-dimensional feature fusion. It can simultaneously achieve channel and feature selection with a lower error rate. The test accuracy on the data used from international BCI Competition IV reached 84.72%.

  7. Evaluation of Feature Selection Approaches for Urdu Text Categorization

    Directory of Open Access Journals (Sweden)

    Tehseen Zia

    2015-05-01

    Full Text Available Efficient feature selection is an important phase of designing an effective text categorization system. Various feature selection methods have been proposed for selecting dissimilar feature sets. It is often essential to evaluate that which method is more effective for a given task and what size of feature set is an effective model selection choice. Aim of this paper is to answer these questions for designing Urdu text categorization system. Five widely used feature selection methods were examined using six well-known classification algorithms: naive Bays (NB, k-nearest neighbor (KNN, support vector machines (SVM with linear, polynomial and radial basis kernels and decision tree (i.e. J48. The study was conducted over two test collections: EMILLE collection and a naive collection. We have observed that three feature selection methods i.e. information gain, Chi statistics, and symmetrical uncertain, have performed uniformly in most of the cases if not all. Moreover, we have found that no single feature selection method is best for all classifiers. While gain ratio out-performed others for naive Bays and J48, information gain has shown top performance for KNN and SVM with polynomial and radial basis kernels. Overall, linear SVM with any of feature selection methods including information gain, Chi statistics or symmetric uncertain methods is turned-out to be first choice across other combinations of classifiers and feature selection methods on moderate size naive collection. On the other hand, naive Bays with any of feature selection method have shown its advantage for a small sized EMILLE corpus.

  8. Classification in two-stage screening.

    Science.gov (United States)

    Longford, Nicholas T

    2015-11-10

    Decision theory is applied to the problem of setting thresholds in medical screening when it is organised in two stages. In the first stage that involves a less expensive procedure that can be applied on a mass scale, an individual is classified as a negative or a likely positive. In the second stage, the likely positives are subjected to another test that classifies them as (definite) positives or negatives. The second-stage test is more accurate, but also more expensive and more involved, and so there are incentives to restrict its application. Robustness of the method with respect to the parameters, some of which have to be set by elicitation, is assessed by sensitivity analysis.

  9. Two stage gear tooth dynamics program

    Science.gov (United States)

    Boyd, Linda S.

    1989-01-01

    The epicyclic gear dynamics program was expanded to add the option of evaluating the tooth pair dynamics for two epicyclic gear stages with peripheral components. This was a practical extension to the program as multiple gear stages are often used for speed reduction, space, weight, and/or auxiliary units. The option was developed for either stage to be a basic planetary, star, single external-external mesh, or single external-internal mesh. The two stage system allows for modeling of the peripherals with an input mass and shaft, an output mass and shaft, and a connecting shaft. Execution of the initial test case indicated an instability in the solution with the tooth paid loads growing to excessive magnitudes. A procedure to trace the instability is recommended as well as a method of reducing the program's computation time by reducing the number of boundary condition iterations.

  10. A Features Selection for Crops Classification

    Science.gov (United States)

    Zhao, Lei; Chen, Erxue; Li, Zengyuan; Li, Lan; Gu, Xinzhi

    2016-08-01

    Polarization orientation angle (POA) is a major parameter of electromagnetic wave. This angle will be shift due to azimuth slopes, which will affect the radiometric quality of PolSAR data. Under the assumption of reflection symmetrical medium, the shift value of polarization orientation angle (POAs) can be estimated by Circular Polarization Method (CPM). Then, the shift angle can be used to compensate PolSAR data or extract DEM information. However, it is less effective when using high-frequency SAR (L-, C-band) in the forest area. The main reason is that the polarization orientation angle shift of forest area not only influenced by topography, but also affected by the forest canopy. Among them, the influence of the former belongs to the interference information should be removed, but the impact of the latter belongs to the polarization feature information needs to be retained. The ALOS2 PALSAR2 L-band full polarimetric SAR data was used in this study. Base on the Circular Polarization and DEM-based method, we analyzed the variation of shift value of polarization orientation angle and developed the polarization orientation shift estimation and compensation of PolSAR data in forest.

  11. A Two Stage Classification Approach for Handwritten Devanagari Characters

    CERN Document Server

    Arora, Sandhya; Nasipuri, Mita; Malik, Latesh

    2010-01-01

    The paper presents a two stage classification approach for handwritten devanagari characters The first stage is using structural properties like shirorekha, spine in character and second stage exploits some intersection features of characters which are fed to a feedforward neural network. Simple histogram based method does not work for finding shirorekha, vertical bar (Spine) in handwritten devnagari characters. So we designed a differential distance based technique to find a near straight line for shirorekha and spine. This approach has been tested for 50000 samples and we got 89.12% success

  12. Two-Stage Fan I: Aerodynamic and Mechanical Design

    Science.gov (United States)

    Messenger, H. E.; Kennedy, E. E.

    1972-01-01

    A two-stage, highly-loaded fan was designed to deliver an overall pressure ratio of 2.8 with an adiabatic efficiency of 83.9 percent. At the first rotor inlet, design flow per unit annulus area is 42 lbm/sec/sq ft (205 kg/sec/sq m), hub/tip ratio is 0.4 with a tip diameter of 31 inches (0.787 m), and design tip speed is 1450 ft/sec (441.96 m/sec). Other features include use of multiple-circular-arc airfoils, resettable stators, and split casings over the rotor tip sections for casing treatment tests.

  13. Feature selection using genetic algorithms for fetal heart rate analysis.

    Science.gov (United States)

    Xu, Liang; Redman, Christopher W G; Payne, Stephen J; Georgieva, Antoniya

    2014-07-01

    The fetal heart rate (FHR) is monitored on a paper strip (cardiotocogram) during labour to assess fetal health. If necessary, clinicians can intervene and assist with a prompt delivery of the baby. Data-driven computerized FHR analysis could help clinicians in the decision-making process. However, selecting the best computerized FHR features that relate to labour outcome is a pressing research problem. The objective of this study is to apply genetic algorithms (GA) as a feature selection method to select the best feature subset from 64 FHR features and to integrate these best features to recognize unfavourable FHR patterns. The GA was trained on 404 cases and tested on 106 cases (both balanced datasets) using three classifiers, respectively. Regularization methods and backward selection were used to optimize the GA. Reasonable classification performance is shown on the testing set for the best feature subset (Cohen's kappa values of 0.45 to 0.49 using different classifiers). This is, to our knowledge, the first time that a feature selection method for FHR analysis has been developed on a database of this size. This study indicates that different FHR features, when integrated, can show good performance in predicting labour outcome. It also gives the importance of each feature, which will be a valuable reference point for further studies.

  14. Hadoop neural network for parallel and distributed feature selection.

    Science.gov (United States)

    Hodge, Victoria J; O'Keefe, Simon; Austin, Jim

    2016-06-01

    In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  15. Feature Selection for Neural Network Based Stock Prediction

    Science.gov (United States)

    Sugunnasil, Prompong; Somhom, Samerkae

    We propose a new methodology of feature selection for stock movement prediction. The methodology is based upon finding those features which minimize the correlation relation function. We first produce all the combination of feature and evaluate each of them by using our evaluate function. We search through the generated set with hill climbing approach. The self-organizing map based stock prediction model is utilized as the prediction method. We conduct the experiment on data sets of the Microsoft Corporation, General Electric Co. and Ford Motor Co. The results show that our feature selection method can improve the efficiency of the neural network based stock prediction.

  16. Selective attention to temporal features on nested time scales.

    Science.gov (United States)

    Henry, Molly J; Herrmann, Björn; Obleser, Jonas

    2015-02-01

    Meaningful auditory stimuli such as speech and music often vary simultaneously along multiple time scales. Thus, listeners must selectively attend to, and selectively ignore, separate but intertwined temporal features. The current study aimed to identify and characterize the neural network specifically involved in this feature-selective attention to time. We used a novel paradigm where listeners judged either the duration or modulation rate of auditory stimuli, and in which the stimulation, working memory demands, response requirements, and task difficulty were held constant. A first analysis identified all brain regions where individual brain activation patterns were correlated with individual behavioral performance patterns, which thus supported temporal judgments generically. A second analysis then isolated those brain regions that specifically regulated selective attention to temporal features: Neural responses in a bilateral fronto-parietal network including insular cortex and basal ganglia decreased with degree of change of the attended temporal feature. Critically, response patterns in these regions were inverted when the task required selectively ignoring this feature. The results demonstrate how the neural analysis of complex acoustic stimuli with multiple temporal features depends on a fronto-parietal network that simultaneously regulates the selective gain for attended and ignored temporal features.

  17. A New Evolutionary-Incremental Framework for Feature Selection

    Directory of Open Access Journals (Sweden)

    Mohamad-Hoseyn Sigari

    2014-01-01

    Full Text Available Feature selection is an NP-hard problem from the viewpoint of algorithm design and it is one of the main open problems in pattern recognition. In this paper, we propose a new evolutionary-incremental framework for feature selection. The proposed framework can be applied on an ordinary evolutionary algorithm (EA such as genetic algorithm (GA or invasive weed optimization (IWO. This framework proposes some generic modifications on ordinary EAs to be compatible with the variable length of solutions. In this framework, the solutions related to the primary generations have short length. Then, the length of solutions may be increased through generations gradually. In addition, our evolutionary-incremental framework deploys two new operators called addition and deletion operators which change the length of solutions randomly. For evaluation of the proposed framework, we use that for feature selection in the application of face recognition. In this regard, we applied our feature selection method on a robust face recognition algorithm which is based on the extraction of Gabor coefficients. Experimental results show that our proposed evolutionary-incremental framework can select a few number of features from existing thousands features efficiently. Comparison result of the proposed methods with the previous methods shows that our framework is comprehensive, robust, and well-defined to apply on many EAs for feature selection.

  18. Feature Selection for Image Retrieval based on Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Preeti Kushwaha

    2016-12-01

    Full Text Available This paper describes the development and implementation of feature selection for content based image retrieval. We are working on CBIR system with new efficient technique. In this system, we use multi feature extraction such as colour, texture and shape. The three techniques are used for feature extraction such as colour moment, gray level co- occurrence matrix and edge histogram descriptor. To reduce curse of dimensionality and find best optimal features from feature set using feature selection based on genetic algorithm. These features are divided into similar image classes using clustering for fast retrieval and improve the execution time. Clustering technique is done by k-means algorithm. The experimental result shows feature selection using GA reduces the time for retrieval and also increases the retrieval precision, thus it gives better and faster results as compared to normal image retrieval system. The result also shows precision and recall of proposed approach compared to previous approach for each image class. The CBIR system is more efficient and better performs using feature selection based on Genetic Algorithm.

  19. Feature-selective attention in healthy old age: a selective decline in selective attention?

    Science.gov (United States)

    Quigley, Cliodhna; Müller, Matthias M

    2014-02-12

    Deficient selection against irrelevant information has been proposed to underlie age-related cognitive decline. We recently reported evidence for maintained early sensory selection when older and younger adults used spatial selective attention to perform a challenging task. Here we explored age-related differences when spatial selection is not possible and feature-selective attention must be deployed. We additionally compared the integrity of feedforward processing by exploiting the well established phenomenon of suppression of visual cortical responses attributable to interstimulus competition. Electroencephalogram was measured while older and younger human adults responded to brief occurrences of coherent motion in an attended stimulus composed of randomly moving, orientation-defined, flickering bars. Attention was directed to horizontal or vertical bars by a pretrial cue, after which two orthogonally oriented, overlapping stimuli or a single stimulus were presented. Horizontal and vertical bars flickered at different frequencies and thereby elicited separable steady-state visual-evoked potentials, which were used to examine the effect of feature-based selection and the competitive influence of a second stimulus on ongoing visual processing. Age differences were found in feature-selective attentional modulation of visual responses: older adults did not show consistent modulation of magnitude or phase. In contrast, the suppressive effect of a second stimulus was robust and comparable in magnitude across age groups, suggesting that bottom-up processing of the current stimuli is essentially unchanged in healthy old age. Thus, it seems that visual processing per se is unchanged, but top-down attentional control is compromised in older adults when space cannot be used to guide selection.

  20. Lazy learner text categorization algorithm based on embedded feature selection

    Institute of Scientific and Technical Information of China (English)

    Yan Peng; Zheng Xuefeng; Zhu Jianyong; Xiao Yunhong

    2009-01-01

    To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although having been widely used, FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms. On the basis of the sparsity characteristic of text vectors, a new TC algorithm based on lazy feature selection (LFS) is presented. As a new type of embedded feature selection approach, the LFS method can greatly reduce the dimension of features without any information losing, which can improve both efficiency and performance of algorithms greatly. The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms.

  1. Features Selection for Skin Micro-Image Symptomatic Recognition

    Institute of Scientific and Technical Information of China (English)

    HU Yue-li; CAO Jia-lin; ZHAO Qian; FENG Xu

    2004-01-01

    Automatic recognition of skin micro-image symptom is important in skin diagnosis and treatment. Feature selection is to improve the classification performance of skin micro-image symptom.This paper proposes a hybrid approach based on the support vector machine (SVM) technique and genetic algorithm (GA) to select an optimum feature subset from the feature group extracted from the skin micro-images. An adaptive GA is introduced for maintaining the convergence rate. With the proposed method, the average cross validation accuracy is increased from 88.25% using all features to 96.92 % using only selected features provided by a classifier for classification of 5 classes of skin symptoms. The experimental results are satisfactory.

  2. Features Selection for Skin Micro-Image Symptomatic Recognition

    Institute of Scientific and Technical Information of China (English)

    HUYue-li; CAOJia-lin; ZHAOQian; FENGXu

    2004-01-01

    Automatic recognition of skin micro-image symptom is important in skin diagnosis and treatment. Feature selection is to improve the classification performance of skin micro-image symptom.This paper proposes a hybrid approach based on the support vector machine (SVM) technique and genetic algorithm (GA) to select an optimum feature subset from the feature group extracted from the skin micro-images. An adaptive GA is introduced for maintaining the convergence rate. With the proposed method, the average cross validation accuracy is increased from 88.25% using all features to 96.92% using only selected features provided by a classifier for classification of 5 classes of skin symptoms. The experimental results are satisfactory.

  3. Feature selection for domain knowledge representation through multitask learning

    CSIR Research Space (South Africa)

    Rosman, Benjamin S

    2014-10-01

    Full Text Available -1 Feature selection for domain knowledge representation through multitask learning Benjamin Rosman Mobile Intelligent Autonomous Systems CSIR South Africa BRosman@csir.co.za Representation learning is a difficult and important problem...

  4. Optimized Image Steganalysis through Feature Selection using MBEGA

    CERN Document Server

    Geetha, S

    2010-01-01

    Feature based steganalysis, an emerging branch in information forensics, aims at identifying the presence of a covert communication by employing the statistical features of the cover and stego image as clues/evidences. Due to the large volumes of security audit data as well as complex and dynamic properties of steganogram behaviours, optimizing the performance of steganalysers becomes an important open problem. This paper is focussed at fine tuning the performance of six promising steganalysers in this field, through feature selection. We propose to employ Markov Blanket-Embedded Genetic Algorithm (MBEGA) for stego sensitive feature selection process. In particular, the embedded Markov blanket based memetic operators add or delete features (or genes) from a genetic algorithm (GA) solution so as to quickly improve the solution and fine-tune the search. Empirical results suggest that MBEGA is effective and efficient in eliminating irrelevant and redundant features based on both Markov blanket and predictive pow...

  5. A New Heuristic for Feature Selection by Consistent Biclustering

    CERN Document Server

    Mucherino, Antonio

    2010-01-01

    Given a set of data, biclustering aims at finding simultaneous partitions in biclusters of its samples and of the features which are used for representing the samples. Consistent biclusterings allow to obtain correct classifications of the samples from the known classification of the features, and vice versa, and they are very useful for performing supervised classifications. The problem of finding consistent biclusterings can be seen as a feature selection problem, where the features that are not relevant for classification purposes are removed from the set of data, while the total number of features is maximized in order to preserve information. This feature selection problem can be formulated as a linear fractional 0-1 optimization problem. We propose a reformulation of this problem as a bilevel optimization problem, and we present a heuristic algorithm for an efficient solution of the reformulated problem. Computational experiments show that the presented algorithm is able to find better solutions with re...

  6. Modeling Suspicious Email Detection using Enhanced Feature Selection

    OpenAIRE

    2013-01-01

    The paper presents a suspicious email detection model which incorporates enhanced feature selection. In the paper we proposed the use of feature selection strategies along with classification technique for terrorists email detection. The presented model focuses on the evaluation of machine learning algorithms such as decision tree (ID3), logistic regression, Na\\"ive Bayes (NB), and Support Vector Machine (SVM) for detecting emails containing suspicious content. In the literature, various algo...

  7. Ensemble feature selection integrating elitist roles and quantum game model

    Institute of Scientific and Technical Information of China (English)

    Weiping Ding; Jiandong Wang; Zhijin Guan; Quan Shi

    2015-01-01

    To accelerate the selection process of feature subsets in the rough set theory (RST), an ensemble elitist roles based quantum game (EERQG) algorithm is proposed for feature selec-tion. Firstly, the multilevel elitist roles based dynamics equilibrium strategy is established, and both immigration and emigration of elitists are able to be self-adaptive to balance between exploration and exploitation for feature selection. Secondly, the utility matrix of trust margins is introduced to the model of multilevel elitist roles to enhance various elitist roles’ performance of searching the optimal feature subsets, and the win-win utility solutions for feature selec-tion can be attained. Meanwhile, a novel ensemble quantum game strategy is designed as an intriguing exhibiting structure to perfect the dynamics equilibrium of multilevel elitist roles. Final y, the en-semble manner of multilevel elitist roles is employed to achieve the global minimal feature subset, which wil greatly improve the fea-sibility and effectiveness. Experiment results show the proposed EERQG algorithm has superiority compared to the existing feature selection algorithms.

  8. Effective Feature Selection for 5G IM Applications Traffic Classification

    Directory of Open Access Journals (Sweden)

    Muhammad Shafiq

    2017-01-01

    Full Text Available Recently, machine learning (ML algorithms have widely been applied in Internet traffic classification. However, due to the inappropriate features selection, ML-based classifiers are prone to misclassify Internet flows as that traffic occupies majority of traffic flows. To address this problem, a novel feature selection metric named weighted mutual information (WMI is proposed. We develop a hybrid feature selection algorithm named WMI_ACC, which filters most of the features with WMI metric. It further uses a wrapper method to select features for ML classifiers with accuracy (ACC metric. We evaluate our approach using five ML classifiers on the two different network environment traces captured. Furthermore, we also apply Wilcoxon pairwise statistical test on the results of our proposed algorithm to find out the robust features from the selected set of features. Experimental results show that our algorithm gives promising results in terms of classification accuracy, recall, and precision. Our proposed algorithm can achieve 99% flow accuracy results, which is very promising.

  9. Feature selection for optimized skin tumor recognition using genetic algorithms.

    Science.gov (United States)

    Handels, H; Ross, T; Kreusch, J; Wolff, H H; Pöppl, S J

    1999-07-01

    In this paper, a new approach to computer supported diagnosis of skin tumors in dermatology is presented. High resolution skin surface profiles are analyzed to recognize malignant melanomas and nevocytic nevi (moles), automatically. In the first step, several types of features are extracted by 2D image analysis methods characterizing the structure of skin surface profiles: texture features based on cooccurrence matrices, Fourier features and fractal features. Then, feature selection algorithms are applied to determine suitable feature subsets for the recognition process. Feature selection is described as an optimization problem and several approaches including heuristic strategies, greedy and genetic algorithms are compared. As quality measure for feature subsets, the classification rate of the nearest neighbor classifier computed with the leaving-one-out method is used. Genetic algorithms show the best results. Finally, neural networks with error back-propagation as learning paradigm are trained using the selected feature sets. Different network topologies, learning parameters and pruning algorithms are investigated to optimize the classification performance of the neural classifiers. With the optimized recognition system a classification performance of 97.7% is achieved.

  10. Feature selection using feature dissimilarity measure and density-based clustering: Application to biological data

    Indian Academy of Sciences (India)

    Debarka Sengupta; Indranil Aich; Sanghamitra Bandyopadhyay

    2015-10-01

    Reduction of dimensionality has emerged as a routine process in modelling complex biological systems. A large number of feature selection techniques have been reported in the literature to improve model performance in terms of accuracy and speed. In the present article an unsupervised feature selection technique is proposed, using maximum information compression index as the dissimilarity measure and the well-known density-based cluster identification technique DBSCAN for identifying the largest natural group of dissimilar features. The algorithm is fast and less sensitive to the user-supplied parameters. Moreover, the method automatically determines the required number of features and identifies them. We used the proposed method for reducing dimensionality of a number of benchmark data sets of varying sizes. Its performance was also extensively compared with some other well-known feature selection methods.

  11. Multi-task GLOH feature selection for human age estimation

    CERN Document Server

    Liang, Yixiong; Xu, Ying; Xiang, Yao; Zou, Beiji

    2011-01-01

    In this paper, we propose a novel age estimation method based on GLOH feature descriptor and multi-task learning (MTL). The GLOH feature descriptor, one of the state-of-the-art feature descriptor, is used to capture the age-related local and spatial information of face image. As the exacted GLOH features are often redundant, MTL is designed to select the most informative feature bins for age estimation problem, while the corresponding weights are determined by ridge regression. This approach largely reduces the dimensions of feature, which can not only improve performance but also decrease the computational burden. Experiments on the public available FG-NET database show that the proposed method can achieve comparable performance over previous approaches while using much fewer features.

  12. Improving Naive Bayes with Online Feature Selection for Quick Adaptation to Evolving Feature Usefulness

    Energy Technology Data Exchange (ETDEWEB)

    Pon, R K; Cardenas, A F; Buttler, D J

    2007-09-19

    The definition of what makes an article interesting varies from user to user and continually evolves even for a single user. As a result, for news recommendation systems, useless document features can not be determined a priori and all features are usually considered for interestingness classification. Consequently, the presence of currently useless features degrades classification performance [1], particularly over the initial set of news articles being classified. The initial set of document is critical for a user when considering which particular news recommendation system to adopt. To address these problems, we introduce an improved version of the naive Bayes classifier with online feature selection. We use correlation to determine the utility of each feature and take advantage of the conditional independence assumption used by naive Bayes for online feature selection and classification. The augmented naive Bayes classifier performs 28% better than the traditional naive Bayes classifier in recommending news articles from the Yahoo! RSS feeds.

  13. Dominant Local Binary Pattern Based Face Feature Selection and Detection

    Directory of Open Access Journals (Sweden)

    Kavitha.T

    2010-04-01

    Full Text Available Face Detection plays a major role in Biometrics.Feature selection is a problem of formidable complexity. Thispaper proposes a novel approach to extract face features forface detection. The LBP features can be extracted faster in asingle scan through the raw image and lie in a lower dimensional space, whilst still retaining facial information efficiently. The LBP features are robust to low-resolution images. The dominant local binary pattern (DLBP is used to extract features accurately. A number of trainable methods are emerging in the empirical practice due to their effectiveness. The proposed method is a trainable system for selecting face features from over-completes dictionaries of imagemeasurements. After the feature selection procedure is completed the SVM classifier is used for face detection. The main advantage of this proposal is that it is trained on a very small training set. The classifier is used to increase the selection accuracy. This is not only advantageous to facilitate the datagathering stage, but, more importantly, to limit the training time. CBCL frontal faces dataset is used for training and validation.

  14. The Effect of Feature Selection on Phish Website Detection

    Directory of Open Access Journals (Sweden)

    Hiba Zuhair

    2015-10-01

    Full Text Available Recently, limited anti-phishing campaigns have given phishers more possibilities to bypass through their advanced deceptions. Moreover, failure to devise appropriate classification techniques to effectively identify these deceptions has degraded the detection of phishing websites. Consequently, exploiting as new; few; predictive; and effective features as possible has emerged as a key challenge to keep the detection resilient. Thus, some prior works had been carried out to investigate and apply certain selected methods to develop their own classification techniques. However, no study had generally agreed on which feature selection method that could be employed as the best assistant to enhance the classification performance. Hence, this study empirically examined these methods and their effects on classification performance. Furthermore, it recommends some promoting criteria to assess their outcomes and offers contribution on the problem at hand. Hybrid features, low and high dimensional datasets, different feature selection methods, and classification models were examined in this study. As a result, the findings displayed notably improved detection precision with low latency, as well as noteworthy gains in robustness and prediction susceptibilities. Although selecting an ideal feature subset was a challenging task, the findings retrieved from this study had provided the most advantageous feature subset as possible for robust selection and effective classification in the phishing detection domain.

  15. Selecting Optimal Subset of Features for Student Performance Model

    Directory of Open Access Journals (Sweden)

    Hany M. Harb

    2012-09-01

    Full Text Available Educational data mining (EDM is a new growing research area and the essence of data mining concepts are used in the educational field for the purpose of extracting useful information on the student behavior in the learning process. Classification methods like decision trees, rule mining, and Bayesian network, can be applied on the educational data for predicting the student behavior like performance in an examination. This prediction may help in student evaluation. As the feature selection influences the predictive accuracy of any performance model, it is essential to study elaborately the effectiveness of student performance model in connection with feature selection techniques. The main objective of this work is to achieve high predictive performance by adopting various feature selection techniques to increase the predictive accuracy with least number of features. The outcomes show a reduction in computational time and constructional cost in both training and classification phases of the student performance model.

  16. Protein fold classification with genetic algorithms and feature selection.

    Science.gov (United States)

    Chen, Peng; Liu, Chunmei; Burge, Legand; Mahmood, Mohammad; Southerland, William; Gloster, Clay

    2009-10-01

    Protein fold classification is a key step to predicting protein tertiary structures. This paper proposes a novel approach based on genetic algorithms and feature selection to classifying protein folds. Our dataset is divided into a training dataset and a test dataset. Each individual for the genetic algorithms represents a selection function of the feature vectors of the training dataset. A support vector machine is applied to each individual to evaluate the fitness value (fold classification rate) of each individual. The aim of the genetic algorithms is to search for the best individual that produces the highest fold classification rate. The best individual is then applied to the feature vectors of the test dataset and a support vector machine is built to classify protein folds based on selected features. Our experimental results on Ding and Dubchak's benchmark dataset of 27-class folds show that our approach achieves an accuracy of 71.28%, which outperforms current state-of-the-art protein fold predictors.

  17. DYNAMIC FEATURE SELECTION FOR WEB USER IDENTIFICATION ON LINGUISTIC AND STYLISTIC FEATURES OF ONLINE TEXTS

    Directory of Open Access Journals (Sweden)

    A. A. Vorobeva

    2017-01-01

    Full Text Available The paper deals with identification and authentication of web users participating in the Internet information processes (based on features of online texts.In digital forensics web user identification based on various linguistic features can be used to discover identity of individuals, criminals or terrorists using the Internet to commit cybercrimes. Internet could be used as a tool in different types of cybercrimes (fraud and identity theft, harassment and anonymous threats, terrorist or extremist statements, distribution of illegal content and information warfare. Linguistic identification of web users is a kind of biometric identification, it can be used to narrow down the suspects, identify a criminal and prosecute him. Feature set includes various linguistic and stylistic features extracted from online texts. We propose dynamic feature selection for each web user identification task. Selection is based on calculating Manhattan distance to k-nearest neighbors (Relief-f algorithm. This approach improves the identification accuracy and minimizes the number of features. Experiments were carried out on several datasets with different level of class imbalance. Experiment results showed that features relevance varies in different set of web users (probable authors of some text; features selection for each set of web users improves identification accuracy by 4% at the average that is approximately 1% higher than with the use of static set of features. The proposed approach is most effective for a small number of training samples (messages per user.

  18. Feature Extraction and Selection Strategies for Automated Target Recognition

    Science.gov (United States)

    Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin

    2010-01-01

    Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory region of-interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.

  19. Feature Extraction and Selection Strategies for Automated Target Recognition

    Science.gov (United States)

    Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin

    2010-01-01

    Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory region of-interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.

  20. Mutual information-based feature selection for radiomics

    Science.gov (United States)

    Oubel, Estanislao; Beaumont, Hubert; Iannessi, Antoine

    2016-03-01

    Background The extraction and analysis of image features (radiomics) is a promising field in the precision medicine era, with applications to prognosis, prediction, and response to treatment quantification. In this work, we present a mutual information - based method for quantifying reproducibility of features, a necessary step for qualification before their inclusion in big data systems. Materials and Methods Ten patients with Non-Small Cell Lung Cancer (NSCLC) lesions were followed over time (7 time points in average) with Computed Tomography (CT). Five observers segmented lesions by using a semi-automatic method and 27 features describing shape and intensity distribution were extracted. Inter-observer reproducibility was assessed by computing the multi-information (MI) of feature changes over time, and the variability of global extrema. Results The highest MI values were obtained for volume-based features (VBF). The lesion mass (M), surface to volume ratio (SVR) and volume (V) presented statistically significant higher values of MI than the rest of features. Within the same VBF group, SVR showed also the lowest variability of extrema. The correlation coefficient (CC) of feature values was unable to make a difference between features. Conclusions MI allowed to discriminate three features (M, SVR, and V) from the rest in a statistically significant manner. This result is consistent with the order obtained when sorting features by increasing values of extrema variability. MI is a promising alternative for selecting features to be considered as surrogate biomarkers in a precision medicine context.

  1. Composite likelihood and two-stage estimation in family studies

    DEFF Research Database (Denmark)

    Andersen, Elisabeth Anne Wreford

    2002-01-01

    Composite likelihood; Two-stage estimation; Family studies; Copula; Optimal weights; All possible pairs......Composite likelihood; Two-stage estimation; Family studies; Copula; Optimal weights; All possible pairs...

  2. On the robustness of two-stage estimators

    KAUST Repository

    Zhelonkin, Mikhail

    2012-04-01

    The aim of this note is to provide a general framework for the analysis of the robustness properties of a broad class of two-stage models. We derive the influence function, the change-of-variance function, and the asymptotic variance of a general two-stage M-estimator, and provide their interpretations. We illustrate our results in the case of the two-stage maximum likelihood estimator and the two-stage least squares estimator. © 2011.

  3. Hyperspectral image classification based on NMF Features Selection Method

    Science.gov (United States)

    Abe, Bolanle T.; Jordaan, J. A.

    2013-12-01

    Hyperspectral instruments are capable of collecting hundreds of images corresponding to wavelength channels for the same area on the earth surface. Due to the huge number of features (bands) in hyperspectral imagery, land cover classification procedures are computationally expensive and pose a problem known as the curse of dimensionality. In addition, higher correlation among contiguous bands increases the redundancy within the bands. Hence, dimension reduction of hyperspectral data is very crucial so as to obtain good classification accuracy results. This paper presents a new feature selection technique. Non-negative Matrix Factorization (NMF) algorithm is proposed to obtain reduced relevant features in the input domain of each class label. This aimed to reduce classification error and dimensionality of classification challenges. Indiana pines of the Northwest Indiana dataset is used to evaluate the performance of the proposed method through experiments of features selection and classification. The Waikato Environment for Knowledge Analysis (WEKA) data mining framework is selected as a tool to implement the classification using Support Vector Machines and Neural Network. The selected features subsets are subjected to land cover classification to investigate the performance of the classifiers and how the features size affects classification accuracy. Results obtained shows that performances of the classifiers are significant. The study makes a positive contribution to the problems of hyperspectral imagery by exploring NMF, SVMs and NN to improve classification accuracy. The performances of the classifiers are valuable for decision maker to consider tradeoffs in method accuracy versus method complexity.

  4. Tournament screening cum EBIC for feature selection with high-dimensional feature spaces

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    The feature selection characterized by relatively small sample size and extremely high-dimensional feature space is common in many areas of contemporary statistics.The high dimensionality of the feature space causes serious diffculties:(i) the sample correlations between features become high even if the features are stochastically independent;(ii) the computation becomes intractable.These diffculties make conventional approaches either inapplicable or ine?cient.The reduction of dimensionality of the feature space followed by low dimensional approaches appears the only feasible way to tackle the problem.Along this line,we develop in this article a tournament screening cum EBIC approach for feature selection with high dimensional feature space.The procedure of tournament screening mimics that of a tournament.It is shown theoretically that the tournament screening has the sure screening property,a necessary property which should be satisfied by any valid screening procedure.It is demonstrated by numerical studies that the tournament screening cum EBIC approach enjoys desirable properties such as having higher positive selection rate and lower false discovery rate than other approaches.

  5. Feature selection versus feature compression in the building of calibration models from FTIR-spectrophotometry datasets.

    Science.gov (United States)

    Vergara, Alexander; Llobet, Eduard

    2012-01-15

    Undoubtedly, FTIR-spectrophotometry has become a standard in chemical industry for monitoring, on-the-fly, the different concentrations of reagents and by-products. However, representing chemical samples by FTIR spectra, which spectra are characterized by hundreds if not thousands of variables, conveys their own set of particular challenges because they necessitate to be analyzed in a high-dimensional feature space, where many of these features are likely to be highly correlated and many others surely affected by noise. Therefore, identifying a subset of features that preserves the classifier/regressor performance seems imperative prior any attempt to build an appropriate pattern recognition method. In this context, we investigate the benefit of utilizing two different dimensionality reduction methods, namely the minimum Redundancy-Maximum Relevance (mRMR) feature selection scheme and a new self-organized map (SOM) based feature compression, coupled to regression methods to quantitatively analyze two-component liquid samples utilizing FTIR spectrophotometry. Since these methods give us the possibility of selecting a small subset of relevant features from FTIR spectra preserving the statistical characteristics of the target variable being analyzed, we claim that expressing the FTIR spectra by these dimensionality-reduced set of features may be beneficial. We demonstrate the utility of these novel feature selection schemes in quantifying the distinct analytes within their binary mixtures utilizing a FTIR-spectrophotometer.

  6. Selective processing of multiple features in the human brain: effects of feature type and salience.

    Science.gov (United States)

    McGinnis, E Menton; Keil, Andreas

    2011-02-09

    Identifying targets in a stream of items at a given constant spatial location relies on selection of aspects such as color, shape, or texture. Such attended (target) features of a stimulus elicit a negative-going event-related brain potential (ERP), termed Selection Negativity (SN), which has been used as an index of selective feature processing. In two experiments, participants viewed a series of Gabor patches in which targets were defined as a specific combination of color, orientation, and shape. Distracters were composed of different combinations of color, orientation, and shape of the target stimulus. This design allows comparisons of items with and without specific target features. Consistent with previous ERP research, SN deflections extended between 160-300 ms. Data from the subsequent P3 component (300-450 ms post-stimulus) were also examined, and were regarded as an index of target processing. In Experiment A, predominant effects of target color on SN and P3 amplitudes were found, along with smaller ERP differences in response to variations of orientation and shape. Manipulating color to be less salient while enhancing the saliency of the orientation of the Gabor patch (Experiment B) led to delayed color selection and enhanced orientation selection. Topographical analyses suggested that the location of SN on the scalp reliably varies with the nature of the to-be-attended feature. No interference of non-target features on the SN was observed. These results suggest that target feature selection operates by means of electrocortical facilitation of feature-specific sensory processes, and that selective electrocortical facilitation is more effective when stimulus saliency is heightened.

  7. Selective processing of multiple features in the human brain: effects of feature type and salience.

    Directory of Open Access Journals (Sweden)

    E Menton McGinnis

    Full Text Available Identifying targets in a stream of items at a given constant spatial location relies on selection of aspects such as color, shape, or texture. Such attended (target features of a stimulus elicit a negative-going event-related brain potential (ERP, termed Selection Negativity (SN, which has been used as an index of selective feature processing. In two experiments, participants viewed a series of Gabor patches in which targets were defined as a specific combination of color, orientation, and shape. Distracters were composed of different combinations of color, orientation, and shape of the target stimulus. This design allows comparisons of items with and without specific target features. Consistent with previous ERP research, SN deflections extended between 160-300 ms. Data from the subsequent P3 component (300-450 ms post-stimulus were also examined, and were regarded as an index of target processing. In Experiment A, predominant effects of target color on SN and P3 amplitudes were found, along with smaller ERP differences in response to variations of orientation and shape. Manipulating color to be less salient while enhancing the saliency of the orientation of the Gabor patch (Experiment B led to delayed color selection and enhanced orientation selection. Topographical analyses suggested that the location of SN on the scalp reliably varies with the nature of the to-be-attended feature. No interference of non-target features on the SN was observed. These results suggest that target feature selection operates by means of electrocortical facilitation of feature-specific sensory processes, and that selective electrocortical facilitation is more effective when stimulus saliency is heightened.

  8. Effective automated feature construction and selection for classification of biological sequences.

    Directory of Open Access Journals (Sweden)

    Uday Kamath

    Full Text Available Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features.We present an algorithmic framework (EFFECT for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not.To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification

  9. [Electroencephalogram Feature Selection Based on Correlation Coefficient Analysis].

    Science.gov (United States)

    Zhou, Jinzhi; Tang, Xiaofang

    2015-08-01

    In order to improve the accuracy of classification with small amount of motor imagery training data on the development of brain-computer interface (BCD systems, we proposed an analyzing method to automatically select the characteristic parameters based on correlation coefficient analysis. Throughout the five sample data of dataset IV a from 2005 BCI Competition, we utilized short-time Fourier transform (STFT) and correlation coefficient calculation to reduce the number of primitive electroencephalogram dimension, then introduced feature extraction based on common spatial pattern (CSP) and classified by linear discriminant analysis (LDA). Simulation results showed that the average rate of classification accuracy could be improved by using correlation coefficient feature selection method than those without using this algorithm. Comparing with support vector machine (SVM) optimization features algorithm, the correlation coefficient analysis can lead better selection parameters to improve the accuracy of classification.

  10. Informative Feature Selection for Object Recognition via Sparse PCA

    Science.gov (United States)

    2011-04-07

    the BMW database [17] are used for training. For each image pair in SfM, SURF features are deemed informative if the consensus of the corresponding...observe that the first two sparse PVs are sufficient for selecting in- formative features that lie on the foreground objects in the BMW database (as... BMW ) database [17]. The database consists of multiple-view images of 20 landmark buildings on the Berkeley campus. For each building, wide-baseline

  11. Selecting Features of Single Lead ECG Signal for Automatic Sleep Stages Classification using Correlation-based Feature Subset Selection

    Directory of Open Access Journals (Sweden)

    Ary Noviyanto

    2011-09-01

    Full Text Available Knowing about our sleep quality will help human life to maximize our life performance. ECG signal has potency to determine the sleep stages so that sleep quality can be measured. The data that used in this research is single lead ECG signal from the MIT-BIH Polysomnographic Database. The ECGs features can be derived from RR interval, EDR information and raw ECG signal. Correlation-based Feature Subset Selection (CFS is used to choose the features which are significant to determine the sleep stages. Those features will be evaluated using four different characteristic classifiers (Bayesian network, multilayer perceptron, IB1 and random forest. Performance evaluations by Bayesian network, IB1 and random forest show that CFS performs excellent. It can reduce the number of features significantly with small decreasing accuracy. The best classification result based on this research is a combination of the feature set derived from raw ECG signal and the random forest classifier.

  12. Two-Stage Part-Based Pedestrian Detection

    DEFF Research Database (Denmark)

    Møgelmose, Andreas; Prioletti, Antonio; Trivedi, Mohan M.

    2012-01-01

    Detecting pedestrians is still a challenging task for automotive vision system due the extreme variability of targets, lighting conditions, occlusions, and high speed vehicle motion. A lot of research has been focused on this problem in the last 10 years and detectors based on classifiers has...... gained a special place among the different approaches presented. This work presents a state-of-the-art pedestrian detection system based on a two stages classifier. Candidates are extracted with a Haar cascade classifier trained with the DaimlerDB dataset and then validated through part-based HOG...... of several metrics, such as detection rate, false positives per hour, and frame rate. The novelty of this system rely in the combination of HOG part-based approach, tracking based on specific optimized feature and porting on a real prototype....

  13. TWO-STAGE OCCLUDED OBJECT RECOGNITION METHOD FOR MICROASSEMBLY

    Institute of Scientific and Technical Information of China (English)

    WANG Huaming; ZHU Jianying

    2007-01-01

    A two-stage object recognition algorithm with the presence of occlusion is presented for microassembly. Coarse localization determines whether template is in image or not and approximately where it is, and fine localization gives its accurate position. In coarse localization, local feature, which is invariant to translation, rotation and occlusion, is used to form signatures. By comparing signature of template with that of image, approximate transformation parameter from template to image is obtained, which is used as initial parameter value for fine localization. An objective function, which is a function of transformation parameter, is constructed in fine localization and minimized to realize sub-pixel localization accuracy. The occluded pixels are not taken into account in objective function, so the localization accuracy will not be influenced by the occlusion.

  14. Two-stage designs for cross-over bioequivalence trials.

    Science.gov (United States)

    Kieser, Meinhard; Rauch, Geraldine

    2015-07-20

    The topic of applying two-stage designs in the field of bioequivalence studies has recently gained attention in the literature and in regulatory guidelines. While there exists some methodological research on the application of group sequential designs in bioequivalence studies, implementation of adaptive approaches has focused up to now on superiority and non-inferiority trials. Especially, no comparison of the features and performance characteristics of these designs has been performed, and therefore, the question of which design to employ in this setting remains open. In this paper, we discuss and compare 'classical' group sequential designs and three types of adaptive designs that offer the option of mid-course sample size recalculation. A comprehensive simulation study demonstrates that group sequential designs can be identified, which show power characteristics that are similar to those of the adaptive designs but require a lower average sample size. The methods are illustrated with a real bioequivalence study example.

  15. Feature Selection for Audio Surveillance in Urban Environment

    Directory of Open Access Journals (Sweden)

    KIKTOVA Eva

    2014-05-01

    Full Text Available This paper presents the work leading to the acoustic event detection system, which is designed to recognize two types of acoustic events (shot and breaking glass in urban environment. For this purpose, a huge front-end processing was performed for the effective parametric representation of an input sound. MFCC features and features computed during their extraction (MELSPEC and FBANK, then MPEG-7 audio descriptors and other temporal and spectral characteristics were extracted. High dimensional feature sets were created and in the next phase reduced by the mutual information based selection algorithms. Hidden Markov Model based classifier was applied and evaluated by the Viterbi decoding algorithm. Thus very effective feature sets were identified and also the less important features were found.

  16. Fuse Selection for the Two-Stage Explosive Type Switches

    Science.gov (United States)

    Muravlev, I. O.; Surkov, M. A.; Tarasov, E. V.; Uvarov, N. F.

    2017-04-01

    In the two-level explosive switch destruction of a delay happens in the form of electric explosion. Criteria of similarity of electric explosion in transformer oil are defined. The challenge of protecting the power electrical equipment from short circuit currents is still urgent, especially with the growth of unit capacity. Is required to reduce the tripping time as much as possible, and limit the amplitude of the fault current, that is very important for saving of working capacity of life-support systems. This is particularly important when operating in remote stand-alone power supply systems with a high share of renewable energy, working through the inverter transducers, as well as inverter-type diesel generators. The explosive breakers copes well with these requirements. High-speed flow of transformer oil and high pressure provides formation rate of a contact gap of 20 - 100 m/s. In these conditions there is as a rapid increase in voltage on the discontinuity, and recovery of electric strength (Ures) after current interruption.

  17. Feature selection gait-based gender classification under different circumstances

    Science.gov (United States)

    Sabir, Azhin; Al-Jawad, Naseer; Jassim, Sabah

    2014-05-01

    This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.

  18. Spatial selection of features within perceived and remembered objects

    Directory of Open Access Journals (Sweden)

    Duncan E Astle

    2009-04-01

    Full Text Available Our representation of the visual world can be modulated by spatially specific attentional biases that depend flexibly on task goals. We compared searching for task-relevant features in perceived versus remembered objects. When searching perceptual input, selected task-relevant and suppressed task-irrelevant features elicited contrasting spatiotopic ERP effects, despite them being perceptually identical. This was also true when participants searched a memory array, suggesting that memory had retained the spatial organisation of the original perceptual input and that this representation could be modulated in a spatially specific fashion. However, task-relevant selection and task-irrelevant suppression effects were of the opposite polarity when searching remembered compared to perceived objects. We suggest that this surprising result stems from the nature of feature- and object-based representations when stored in visual short-term memory. When stored, features are integrated into objects, meaning that the spatially specific selection mechanisms must operate upon objects rather than specific feature-level representations.

  19. Technical Evaluation Report 27: Educational Wikis: Features and selection criteria

    Directory of Open Access Journals (Sweden)

    Jim Rudolph

    2004-04-01

    Full Text Available This report discusses the educational uses of the ‘wiki,’ an increasingly popular approach to online community development. Wikis are defined and compared with ‘blogging’ methods; characteristics of major wiki engines are described; and wiki features and selection criteria are examined.

  20. Variance Ranklets : Orientation-selective rank features for contrast modulations

    NARCIS (Netherlands)

    Azzopardi, George; Smeraldi, Fabrizio

    2009-01-01

    We introduce a novel type of orientation–selective rank features that are sensitive to contrast modulations (second–order stimuli). Variance Ranklets are designed in close analogy with the standard Ranklets, but use the Siegel–Tukey statistics for dispersion instead of the Wilcoxon statistics. Their

  1. Emotion of Physiological Signals Classification Based on TS Feature Selection

    Institute of Scientific and Technical Information of China (English)

    Wang Yujing; Mo Jianlin

    2015-01-01

    This paper propose a method of TS-MLP about emotion recognition of physiological signal.It can recognize emotion successfully by Tabu search which selects features of emotion’s physiological signals and multilayer perceptron that is used to classify emotion.Simulation shows that it has achieved good emotion classification performance.

  2. Magnetic Field Feature Extraction and Selection for Indoor Location Estimation

    Directory of Open Access Journals (Sweden)

    Carlos E. Galván-Tejada

    2014-06-01

    Full Text Available User indoor positioning has been under constant improvement especially with the availability of new sensors integrated into the modern mobile devices, which allows us to exploit not only infrastructures made for everyday use, such as WiFi, but also natural infrastructure, as is the case of natural magnetic field. In this paper we present an extension and improvement of our current indoor localization model based on the feature extraction of 46 magnetic field signal features. The extension adds a feature selection phase to our methodology, which is performed through Genetic Algorithm (GA with the aim of optimizing the fitness of our current model. In addition, we present an evaluation of the final model in two different scenarios: home and office building. The results indicate that performing a feature selection process allows us to reduce the number of signal features of the model from 46 to 5 regardless the scenario and room location distribution. Further, we verified that reducing the number of features increases the probability of our estimator correctly detecting the user’s location (sensitivity and its capacity to detect false positives (specificity in both scenarios.

  3. Using PSO-Based Hierarchical Feature Selection Algorithm

    Directory of Open Access Journals (Sweden)

    Zhiwei Ji

    2014-01-01

    Full Text Available Hepatocellular carcinoma (HCC is one of the most common malignant tumors. Clinical symptoms attributable to HCC are usually absent, thus often miss the best therapeutic opportunities. Traditional Chinese Medicine (TCM plays an active role in diagnosis and treatment of HCC. In this paper, we proposed a particle swarm optimization-based hierarchical feature selection (PSOHFS model to infer potential syndromes for diagnosis of HCC. Firstly, the hierarchical feature representation is developed by a three-layer tree. The clinical symptoms and positive score of patient are leaf nodes and root in the tree, respectively, while each syndrome feature on the middle layer is extracted from a group of symptoms. Secondly, an improved PSO-based algorithm is applied in a new reduced feature space to search an optimal syndrome subset. Based on the result of feature selection, the causal relationships of symptoms and syndromes are inferred via Bayesian networks. In our experiment, 147 symptoms were aggregated into 27 groups and 27 syndrome features were extracted. The proposed approach discovered 24 syndromes which obviously improved the diagnosis accuracy. Finally, the Bayesian approach was applied to represent the causal relationships both at symptom and syndrome levels. The results show that our computational model can facilitate the clinical diagnosis of HCC.

  4. Auditory-model based robust feature selection for speech recognition.

    Science.gov (United States)

    Koniaris, Christos; Kuropatwinski, Marcin; Kleijn, W Bastiaan

    2010-02-01

    It is shown that robust dimension-reduction of a feature set for speech recognition can be based on a model of the human auditory system. Whereas conventional methods optimize classification performance, the proposed method exploits knowledge implicit in the auditory periphery, inheriting its robustness. Features are selected to maximize the similarity of the Euclidean geometry of the feature domain and the perceptual domain. Recognition experiments using mel-frequency cepstral coefficients (MFCCs) confirm the effectiveness of the approach, which does not require labeled training data. For noisy data the method outperforms commonly used discriminant-analysis based dimension-reduction methods that rely on labeling. The results indicate that selecting MFCCs in their natural order results in subsets with good performance.

  5. Review and Evaluation of Feature Selection Algorithms in Synthetic Problems

    CERN Document Server

    Belanche, L A

    2011-01-01

    The main purpose of Feature Subset Selection is to find a reduced subset of attributes from a data set described by a feature set. The task of a feature selection algorithm (FSA) is to provide with a computational solution motivated by a certain definition of relevance or by a reliable evaluation measure. In this paper several fundamental algorithms are studied to assess their performance in a controlled experimental scenario. A measure to evaluate FSAs is devised that computes the degree of matching between the output given by a FSA and the known optimal solutions. An extensive experimental study on synthetic problems is carried out to assess the behaviour of the algorithms in terms of solution accuracy and size as a function of the relevance, irrelevance, redundancy and size of the data samples. The controlled experimental conditions facilitate the derivation of better-supported and meaningful conclusions.

  6. Feature selection for high-dimensional integrated data

    KAUST Repository

    Zheng, Charles

    2012-04-26

    Motivated by the problem of identifying correlations between genes or features of two related biological systems, we propose a model of feature selection in which only a subset of the predictors Xt are dependent on the multidimensional variate Y, and the remainder of the predictors constitute a “noise set” Xu independent of Y. Using Monte Carlo simulations, we investigated the relative performance of two methods: thresholding and singular-value decomposition, in combination with stochastic optimization to determine “empirical bounds” on the small-sample accuracy of an asymptotic approximation. We demonstrate utility of the thresholding and SVD feature selection methods to with respect to a recent infant intestinal gene expression and metagenomics dataset.

  7. Modeling neuron selectivity over simple midlevel features for image classification.

    Science.gov (United States)

    Shu Kong; Zhuolin Jiang; Qiang Yang

    2015-08-01

    We now know that good mid-level features can greatly enhance the performance of image classification, but how to efficiently learn the image features is still an open question. In this paper, we present an efficient unsupervised midlevel feature learning approach (MidFea), which only involves simple operations, such as k-means clustering, convolution, pooling, vector quantization, and random projection. We show this simple feature can also achieve good performance in traditional classification task. To further boost the performance, we model the neuron selectivity (NS) principle by building an additional layer over the midlevel features prior to the classifier. The NS-layer learns category-specific neurons in a supervised manner with both bottom-up inference and top-down analysis, and thus supports fast inference for a query image. Through extensive experiments, we demonstrate that this higher level NS-layer notably improves the classification accuracy with our simple MidFea, achieving comparable performances for face recognition, gender classification, age estimation, and object categorization. In particular, our approach runs faster in inference by an order of magnitude than sparse coding-based feature learning methods. As a conclusion, we argue that not only do carefully learned features (MidFea) bring improved performance, but also a sophisticated mechanism (NS-layer) at higher level boosts the performance further.

  8. Development of two-stage grain grinder

    Directory of Open Access Journals (Sweden)

    V. N. Trubnikov

    2016-01-01

    Full Text Available The most important task in the development of the diet of farm animals feeding is a selection of the most balanced in its composition and most nutritious feeds, which are safe and meet all the necessary requirements at the same time. To evaluate the productive value of feeds and their effectiveness the rate of food productive action η was proposed. This ratio reflects the productive part of the total value of the exchange energy of the daily feed ration and is an essential criterion of the feed quality indicators. In the feed rations of animals the most expensive, but energy-rich feed is a mixed fodder, a mixture of grinded seeds of agricultural crops and protein, mineral and vitamin additives. In the diet for its nutritional value, this feed product is for cattle – 50, pigs – 60… 100 and birds – 100%. The basic operation in the production of mixed fodder is seeds grinding, i.e. their destruction under the influence of external forces, exceeding the forces of molecular adhesion of the grains particles. To grind the grain different ways are used: chopping, grinding, impact «in flight», crushing, etc. In the production of mixed fodder on the existing production equipment, there is the problem of getting the grain mixed fodder the necessary degree of grinding and uniform in its particle size distribution at the same time. When receiving too coarse grinding there is a problem of difficult digestibility of mixed fodder by farm animals. Moreover grinding process is accompanied by a high energy consumption. Grain grinder, the principle of which is based on the implementation of two ways of grinding grain: splitting and impact «in flight» is proposed. The proposed constructive solutions allow to obtain a high-performance technical means for crushing seeds of crops, as well as reduce energy costs that arise during the course of the process of obtaining of mixed fodder. The methodology justification of degree of grain grinding by

  9. Economic indicators selection for crime rates forecasting using cooperative feature selection

    Science.gov (United States)

    Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Salleh Sallehuddin, Roselina

    2013-04-01

    Features selection in multivariate forecasting model is very important to ensure that the model is accurate. The purpose of this study is to apply the Cooperative Feature Selection method for features selection. The features are economic indicators that will be used in crime rate forecasting model. The Cooperative Feature Selection combines grey relational analysis and artificial neural network to establish a cooperative model that can rank and select the significant economic indicators. Grey relational analysis is used to select the best data series to represent each economic indicator and is also used to rank the economic indicators according to its importance to the crime rate. After that, the artificial neural network is used to select the significant economic indicators for forecasting the crime rates. In this study, we used economic indicators of unemployment rate, consumer price index, gross domestic product and consumer sentiment index, as well as data rates of property crime and violent crime for the United States. Levenberg-Marquardt neural network is used in this study. From our experiments, we found that consumer price index is an important economic indicator that has a significant influence on the violent crime rate. While for property crime rate, the gross domestic product, unemployment rate and consumer price index are the influential economic indicators. The Cooperative Feature Selection is also found to produce smaller errors as compared to Multiple Linear Regression in forecasting property and violent crime rates.

  10. An Improved Particle Swarm Optimization for Feature Selection

    Institute of Scientific and Technical Information of China (English)

    Yuanning Liu; Gang Wang; Huiling Chen; Hao Dong; Xiaodong Zhu; Sujing Wang

    2011-01-01

    Particle Swarm Optimization (PSO) is a popular and bionic algorithm based on the social behavior associated with bird flocking for optimization problems. To maintain the diversity of swarms, a few studies of multi-swarm strategy have been reported. However, the competition among swarms, reservation or destruction of a swarm, has not been considered further. In this paper, we formulate four rules by introducing the mechanism for survival of the fittest, which simulates the competition among the swarms. Based on the mechanism, we design a modified Multi-Swarm PSO (MSPSO) to solve discrete problems,which consists of a number of sub-swarms and a multi-swarm scheduler that can monitor and control each sub-swarm using the rules. To further settle the feature selection problems, we propose an Improved Feature Selection (IFS) method by integrating MSPSO, Support Vector Machines (SVM) with F-score method. The IFS method aims to achieve higher generalization capability through performing kernel parameter optimization and feature selection simultaneously. The performance of the proposed method is compared with that of the standard PSO based, Genetic Algorithm (GA) based and the grid search based methods on 10 benchmark datasets, taken from UCI machine learning and StatLog databases. The numerical results and statistical analysis show that the proposed IFS method performs significantly better than the other three methods in terms of prediction accuracy with smaller subset of features.

  11. Optimal Features Subset Selection and Classification for Iris Recognition

    Directory of Open Access Journals (Sweden)

    Roy Kaushik

    2008-01-01

    Full Text Available Abstract The selection of the optimal features subset and the classification have become an important issue in the field of iris recognition. We propose a feature selection scheme based on the multiobjectives genetic algorithm (MOGA to improve the recognition accuracy and asymmetrical support vector machine for the classification of iris patterns. We also suggest a segmentation scheme based on the collarette area localization. The deterministic feature sequence is extracted from the iris images using the 1D log-Gabor wavelet technique, and the extracted feature sequence is used to train the support vector machine (SVM. The MOGA is applied to optimize the features sequence and to increase the overall performance based on the matching accuracy of the SVM. The parameters of SVM are optimized to improve the overall generalization performance, and the traditional SVM is modified to an asymmetrical SVM to treat the false accept and false reject cases differently and to handle the unbalanced data of a specific class with respect to the other classes. Our experimental results indicate that the performance of SVM as a classifier is better than the performance of the classifiers based on the feedforward neural network, the k-nearest neighbor, and the Hamming and the Mahalanobis distances. The proposed technique is computationally effective with recognition rates of 99.81% and 96.43% on CASIA and ICE datasets, respectively.

  12. Optimal Features Subset Selection and Classification for Iris Recognition

    Directory of Open Access Journals (Sweden)

    Prabir Bhattacharya

    2008-06-01

    Full Text Available The selection of the optimal features subset and the classification have become an important issue in the field of iris recognition. We propose a feature selection scheme based on the multiobjectives genetic algorithm (MOGA to improve the recognition accuracy and asymmetrical support vector machine for the classification of iris patterns. We also suggest a segmentation scheme based on the collarette area localization. The deterministic feature sequence is extracted from the iris images using the 1D log-Gabor wavelet technique, and the extracted feature sequence is used to train the support vector machine (SVM. The MOGA is applied to optimize the features sequence and to increase the overall performance based on the matching accuracy of the SVM. The parameters of SVM are optimized to improve the overall generalization performance, and the traditional SVM is modified to an asymmetrical SVM to treat the false accept and false reject cases differently and to handle the unbalanced data of a specific class with respect to the other classes. Our experimental results indicate that the performance of SVM as a classifier is better than the performance of the classifiers based on the feedforward neural network, the k-nearest neighbor, and the Hamming and the Mahalanobis distances. The proposed technique is computationally effective with recognition rates of 99.81% and 96.43% on CASIA and ICE datasets, respectively.

  13. Making Trillion Correlations Feasible in Feature Grouping and Selection.

    Science.gov (United States)

    Zhai, Yiteng; Ong, Yew-Soon; Tsang, Ivor W

    2016-12-01

    Today, modern databases with "Big Dimensionality" are experiencing a growing trend. Existing approaches that require the calculations of pairwise feature correlations in their algorithmic designs have scored miserably on such databases, since computing the full correlation matrix (i.e., square of dimensionality in size) is computationally very intensive (i.e., million features would translate to trillion correlations). This poses a notable challenge that has received much lesser attention in the field of machine learning and data mining research. Thus, this paper presents a study to fill in this gap. Our findings on several established databases with big dimensionality across a wide spectrum of domains have indicated that an extremely small portion of the feature pairs contributes significantly to the underlying interactions and there exists feature groups that are highly correlated. Inspired by the intriguing observations, we introduce a novel learning approach that exploits the presence of sparse correlations for the efficient identifications of informative and correlated feature groups from big dimensional data that translates to a reduction in complexity from O(m(2)n) to O(mlogm + Ka mn), where Ka strategy, designed to filter out the large number of non-contributing correlations that could otherwise confuse the classifier while identifying the correlated and informative feature groups, forms one of the highlights of our approach. We also demonstrated the proposed method on one-class learning, where notable speedup can be observed when solving one-class problem on big dimensional data. Further, to identify robust informative features with minimal sampling bias, our feature selection strategy embeds the V-fold cross validation in the learning model, so as to seek for features that exhibit stable or consistent performance accuracy on multiple data folds. Extensive empirical studies on both synthetic and several real-world datasets comprising up to 30 million

  14. Two Stage Assessment of Thermal Hazard in An Underground Mine

    Science.gov (United States)

    Drenda, Jan; Sułkowski, Józef; Pach, Grzegorz; Różański, Zenon; Wrona, Paweł

    2016-06-01

    The results of research into the application of selected thermal indices of men's work and climate indices in a two stage assessment of climatic work conditions in underground mines have been presented in this article. The difference between these two kinds of indices was pointed out during the project entitled "The recruiting requirements for miners working in hot underground mine environments". The project was coordinated by The Institute of Mining Technologies at Silesian University of Technology. It was a part of a Polish strategic project: "Improvement of safety in mines" being financed by the National Centre of Research and Development. Climate indices are based only on physical parameters of air and their measurements. Thermal indices include additional factors which are strictly connected with work, e.g. thermal resistance of clothing, kind of work etc. Special emphasis has been put on the following indices - substitute Silesian temperature (TS) which is considered as the climatic index, and the thermal discomfort index (δ) which belongs to the thermal indices group. The possibility of the two stage application of these indices has been taken into consideration (preliminary and detailed estimation). Based on the examples it was proved that by the application of thermal hazard (detailed estimation) it is possible to avoid the use of additional technical solutions which would be necessary to reduce thermal hazard in particular work places according to the climate index. The threshold limit value for TS has been set, based on these results. It was shown that below TS = 24°C it is not necessary to perform detailed estimation.

  15. Feature selection and survival modeling in The Cancer Genome Atlas

    Directory of Open Access Journals (Sweden)

    Kim H

    2013-09-01

    Full Text Available Hyunsoo Kim,1 Markus Bredel2 1Department of Pathology, The University of Alabama at Birmingham, Birmingham, AL, USA; 2Department of Radiation Oncology, and Comprehensive Cancer Center, The University of Alabama at Birmingham, Birmingham, AL, USA Purpose: Personalized medicine is predicated on the concept of identifying subgroups of a common disease for better treatment. Identifying biomarkers that predict disease subtypes has been a major focus of biomedical science. In the era of genome-wide profiling, there is controversy as to the optimal number of genes as an input of a feature selection algorithm for survival modeling. Patients and methods: The expression profiles and outcomes of 544 patients were retrieved from The Cancer Genome Atlas. We compared four different survival prediction methods: (1 1-nearest neighbor (1-NN survival prediction method; (2 random patient selection method and a Cox-based regression method with nested cross-validation; (3 least absolute shrinkage and selection operator (LASSO optimization using whole-genome gene expression profiles; or (4 gene expression profiles of cancer pathway genes. Results: The 1-NN method performed better than the random patient selection method in terms of survival predictions, although it does not include a feature selection step. The Cox-based regression method with LASSO optimization using whole-genome gene expression data demonstrated higher survival prediction power than the 1-NN method, but was outperformed by the same method when using gene expression profiles of cancer pathway genes alone. Conclusion: The 1-NN survival prediction method may require more patients for better performance, even when omitting censored data. Using preexisting biological knowledge for survival prediction is reasonable as a means to understand the biological system of a cancer, unless the analysis goal is to identify completely unknown genes relevant to cancer biology. Keywords: brain, feature selection

  16. Feature Extraction and Selection From the Perspective of Explosive Detection

    Energy Technology Data Exchange (ETDEWEB)

    Sengupta, S K

    2009-09-01

    ) digitized 3-dimensional attenuation images with a voxel resolution of the order of one quarter of a milimeter. In the task of feature extraction and subsequent selection of an appropriate subset thereof, several important factors need to be considered. Foremost among them are: (1) Definition of the sampling unit from which the features will be extracted for the purpose of detection/ identification of the explosives. (2) The choice of features ( given the sampling unit) to be extracted that can be used to signal the existence / identity of the explosive. (3) Robustness of the computed features under different inspection conditions. To attain robustness, invariance under the transformations of translation, scaling, rotation and change of orientation is highly desirable. (4) The computational costs in the process of feature extraction, selection and their use in explosive detection/ identification In the search for extractable features, we have done a thorough literature survey with the above factors in mind and come out with a list of features that could possibly help us in meeting our objective. We are assuming that features will be based on sampling units that are single CT slices of the target. This may however change when appropriate modifications should be made to the feature extraction process. We indicate below some of the major types of features in 2- or 3-dimensional images that have been used in the literature on application of pattern recognition (PR) techniques in image understanding and are possibly pertinent to our study. In the following paragraph, we briefly indicate the motivation that guided us in the choice of these features, and identify the nature of the constraints. The principal feature types derivable from an image will be discussed in section 2. Once the features are extracted, one must select a subset of this feature set that will retain the most useful information and remove any redundant and irrelevant information that may have a detrimental effect

  17. Feature Selection for Generator Excitation Neurocontroller Development Using Filter Technique

    Directory of Open Access Journals (Sweden)

    Abdul Ghani Abro

    2011-09-01

    Full Text Available Essentially, motive behind using control system is to generate suitable control signal for yielding desired response of a physical process. Control of synchronous generator has always remained very critical in power system operation and control. For certain well known reasons power generators are normally operated well below their steady state stability limit. This raises demand for efficient and fast controllers. Artificial intelligence has been reported to give revolutionary outcomes in the field of control engineering. Artificial Neural Network (ANN, a branch of artificial intelligence has been used for nonlinear and adaptive control, utilizing its inherent observability. The overall performance of neurocontroller is dependent upon input features too. Selecting optimum features to train a neurocontroller optimally is very critical. Both quality and size of data are of equal importance for better performance. In this work filter technique is employed to select independent factors for ANN training.

  18. Feature Selection Strategies for Classifying High Dimensional Astronomical Data Sets

    CERN Document Server

    Donalek, Ciro; Djorgovski, S G; Mahabal, Ashish A; Graham, Matthew J; Fuchs, Thomas J; Turmon, Michael J; Philip, N Sajeeth; Yang, Michael Ting-Chang; Longo, Giuseppe

    2013-01-01

    The amount of collected data in many scientific fields is increasing, all of them requiring a common task: extract knowledge from massive, multi parametric data sets, as rapidly and efficiently possible. This is especially true in astronomy where synoptic sky surveys are enabling new research frontiers in the time domain astronomy and posing several new object classification challenges in multi dimensional spaces; given the high number of parameters available for each object, feature selection is quickly becoming a crucial task in analyzing astronomical data sets. Using data sets extracted from the ongoing Catalina Real-Time Transient Surveys (CRTS) and the Kepler Mission we illustrate a variety of feature selection strategies used to identify the subsets that give the most information and the results achieved applying these techniques to three major astronomical problems.

  19. Acute Exercise Modulates Feature-selective Responses in Human Cortex.

    Science.gov (United States)

    Bullock, Tom; Elliott, James C; Serences, John T; Giesbrecht, Barry

    2017-04-01

    An organism's current behavioral state influences ongoing brain activity. Nonhuman mammalian and invertebrate brains exhibit large increases in the gain of feature-selective neural responses in sensory cortex during locomotion, suggesting that the visual system becomes more sensitive when actively exploring the environment. This raises the possibility that human vision is also more sensitive during active movement. To investigate this possibility, we used an inverted encoding model technique to estimate feature-selective neural response profiles from EEG data acquired from participants performing an orientation discrimination task. Participants (n = 18) fixated at the center of a flickering (15 Hz) circular grating presented at one of nine different orientations and monitored for a brief shift in orientation that occurred on every trial. Participants completed the task while seated on a stationary exercise bike at rest and during low- and high-intensity cycling. We found evidence for inverted-U effects; such that the peak of the reconstructed feature-selective tuning profiles was highest during low-intensity exercise compared with those estimated during rest and high-intensity exercise. When modeled, these effects were driven by changes in the gain of the tuning curve and in the profile bandwidth during low-intensity exercise relative to rest. Thus, despite profound differences in visual pathways across species, these data show that sensitivity in human visual cortex is also enhanced during locomotive behavior. Our results reveal the nature of exercise-induced gain on feature-selective coding in human sensory cortex and provide valuable evidence linking the neural mechanisms of behavior state across species.

  20. HYBRID FEATURE SELECTION ALGORITHM FOR INTRUSION DETECTION SYSTEM

    Directory of Open Access Journals (Sweden)

    Seyed Reza Hasani

    2014-01-01

    Full Text Available Network security is a serious global concern. Usefulness Intrusion Detection Systems (IDS are increasing incredibly in Information Security research using Soft computing techniques. In the previous researches having irrelevant and redundant features are recognized causes of increasing the processing speed of evaluating the known intrusive patterns. In addition, an efficient feature selection method eliminates dimension of data and reduce redundancy and ambiguity caused by none important attributes. Therefore, feature selection methods are well-known methods to overcome this problem. There are various approaches being utilized in intrusion detections, they are able to perform their method and relatively they are achieved with some improvements. This work is based on the enhancement of the highest Detection Rate (DR algorithm which is Linear Genetic Programming (LGP reducing the False Alarm Rate (FAR incorporates with Bees Algorithm. Finally, Support Vector Machine (SVM is one of the best candidate solutions to settle IDSs problems. In this study four sample dataset containing 4000 random records are excluded randomly from this dataset for training and testing purposes. Experimental results show that the LGP_BA method improves the accuracy and efficiency compared with the previous related research and the feature subcategory offered by LGP_BA gives a superior representation of data.

  1. Online Feature Selection of Class Imbalance via PA Algorithm

    Institute of Scientific and Technical Information of China (English)

    Chao Han; Yun-Kun Tan; Jin-Hui Zhu; Yong Guo; Jian Chen; Qing-Yao Wu

    2016-01-01

    Imbalance classification techniques have been frequently applied in many machine learning application domains where the number of the majority (or positive) class of a dataset is much larger than that of the minority (or negative) class. Meanwhile, feature selection (FS) is one of the key techniques for the high-dimensional classification task in a manner which greatly improves the classification performance and the computational efficiency. However, most studies of feature selection and imbalance classification are restricted to off-line batch learning, which is not well adapted to some practical scenarios. In this paper, we aim to solve high-dimensional imbalanced classification problem accurately and efficiently with only a small number of active features in an online fashion, and we propose two novel online learning algorithms for this purpose. In our approach, a classifier which involves only a small and fixed number of features is constructed to classify a sequence of imbalanced data received in an online manner. We formulate the construction of such online learner into an optimization problem and use an iterative approach to solve the problem based on the passive-aggressive (PA) algorithm as well as a truncated gradient (TG) method. We evaluate the performance of the proposed algorithms based on several real-world datasets, and our experimental results have demonstrated the effectiveness of the proposed algorithms in comparison with the baselines.

  2. Feature selection for face recognition: a memetic algorithmic approach

    Institute of Scientific and Technical Information of China (English)

    Dinesh KUMAR; Shakti KUMAR; C. S. RAI

    2009-01-01

    The eigenface method that uses principal component analysis (PCA) has been the standard and popular method used in face recognition. This paper presents a PCA-memetic algorithm (PCA-MA) approach for feature selection. PCA has been extended by MAs where the former was used for feature extraction/dimensionality reduction and the latter exploited for feature selection. Simulations were performed over ORL and YaleB face databases using Euclidean norm as the classifier. It was found that as far as the recognition rate is concerned, PCA-MA completely outperforms the eigenface method. We compared the performance of PCA extended with genetic algorithm (PCA-GA) with our proposed PCA-MA method. The results also clearly established the supremacy of the PCA-MA method over the PCA-GA method. We further extended linear discriminant analysis (LDA) and kernel principal component analysis (KPCA) approaches with the MA and observed significant improvement in recognition rate with fewer features. This paper also compares the performance of PCA-MA, LDA-MA and KPCA-MA approaches.

  3. Use of genetic algorithm for the selection of EEG features

    Science.gov (United States)

    Asvestas, P.; Korda, A.; Kostopoulos, S.; Karanasiou, I.; Ouzounoglou, A.; Sidiropoulos, K.; Ventouras, E.; Matsopoulos, G.

    2015-09-01

    Genetic Algorithm (GA) is a popular optimization technique that can detect the global optimum of a multivariable function containing several local optima. GA has been widely used in the field of biomedical informatics, especially in the context of designing decision support systems that classify biomedical signals or images into classes of interest. The aim of this paper is to present a methodology, based on GA, for the selection of the optimal subset of features that can be used for the efficient classification of Event Related Potentials (ERPs), which are recorded during the observation of correct or incorrect actions. In our experiment, ERP recordings were acquired from sixteen (16) healthy volunteers who observed correct or incorrect actions of other subjects. The brain electrical activity was recorded at 47 locations on the scalp. The GA was formulated as a combinatorial optimizer for the selection of the combination of electrodes that maximizes the performance of the Fuzzy C Means (FCM) classification algorithm. In particular, during the evolution of the GA, for each candidate combination of electrodes, the well-known (Σ, Φ, Ω) features were calculated and were evaluated by means of the FCM method. The proposed methodology provided a combination of 8 electrodes, with classification accuracy 93.8%. Thus, GA can be the basis for the selection of features that discriminate ERP recordings of observations of correct or incorrect actions.

  4. Processing of Feature Selectivity in Cortical Networks with Specific Connectivity.

    Directory of Open Access Journals (Sweden)

    Sadra Sadeh

    Full Text Available Although non-specific at the onset of eye opening, networks in rodent visual cortex attain a non-random structure after eye opening, with a specific bias for connections between neurons of similar preferred orientations. As orientation selectivity is already present at eye opening, it remains unclear how this specificity in network wiring contributes to feature selectivity. Using large-scale inhibition-dominated spiking networks as a model, we show that feature-specific connectivity leads to a linear amplification of feedforward tuning, consistent with recent electrophysiological single-neuron recordings in rodent neocortex. Our results show that optimal amplification is achieved at an intermediate regime of specific connectivity. In this configuration a moderate increase of pairwise correlations is observed, consistent with recent experimental findings. Furthermore, we observed that feature-specific connectivity leads to the emergence of orientation-selective reverberating activity, and entails pattern completion in network responses. Our theoretical analysis provides a mechanistic understanding of subnetworks' responses to visual stimuli, and casts light on the regime of operation of sensory cortices in the presence of specific connectivity.

  5. An Optimal SVM with Feature Selection Using Multiobjective PSO

    Directory of Open Access Journals (Sweden)

    Iman Behravan

    2016-01-01

    Full Text Available Support vector machine is a classifier, based on the structured risk minimization principle. The performance of the SVM depends on different parameters such as penalty factor, C, and the kernel factor, σ. Also choosing an appropriate kernel function can improve the recognition score and lower the amount of computation. Furthermore, selecting the useful features among several features in dataset not only increases the performance of the SVM, but also reduces the computational time and complexity. So this is an optimization problem which can be solved by heuristic algorithm. In some cases besides the recognition score, the reliability of the classifier’s output is important. So in such cases a multiobjective optimization algorithm is needed. In this paper we have got the MOPSO algorithm to optimize the parameters of the SVM, choose appropriate kernel function, and select the best feature subset simultaneously in order to optimize the recognition score and the reliability of the SVM concurrently. Nine different datasets, from UCI machine learning repository, are used to evaluate the power and the effectiveness of the proposed method (MOPSO-SVM. The results of the proposed method are compared to those which are achieved by single SVM, RBF, and MLP neural networks.

  6. Feature selection applied to ultrasound carotid images segmentation.

    Science.gov (United States)

    Rosati, Samanta; Molinari, Filippo; Balestra, Gabriella

    2011-01-01

    The automated tracing of the carotid layers on ultrasound images is complicated by noise, different morphology and pathology of the carotid artery. In this study we benchmarked four methods for feature selection on a set of variables extracted from ultrasound carotid images. The main goal was to select those parameters containing the highest amount of information useful to classify the pixels in the carotid regions they belong to. Six different classes of pixels were identified: lumen, lumen-intima interface, intima-media complex, media-adventitia interface, adventitia and adventitia far boundary. The performances of QuickReduct Algorithm (QRA), Entropy-Based Algorithm (EBR), Improved QuickReduct Algorithm (IQRA) and Genetic Algorithm (GA) were compared using Artificial Neural Networks (ANNs). All methods returned subsets with a high dependency degree, even if the average classification accuracy was about 50%. Among all classes, the best results were obtained for the lumen. Overall, the four methods for feature selection assessed in this study return comparable results. Despite the need for accuracy improvement, this study could be useful to build a pre-classifier stage for the optimization of segmentation performance in ultrasound automated carotid segmentation.

  7. Information Theory for Gabor Feature Selection for Face Recognition

    Directory of Open Access Journals (Sweden)

    Shen Linlin

    2006-01-01

    Full Text Available A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004, our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.

  8. Information Theory for Gabor Feature Selection for Face Recognition

    Science.gov (United States)

    Shen, Linlin; Bai, Li

    2006-12-01

    A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004), our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.

  9. Recursive Feature Selection with Significant Variables of Support Vectors

    Directory of Open Access Journals (Sweden)

    Chen-An Tsai

    2012-01-01

    Full Text Available The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE and recursive support vector machine (RSVM. The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.

  10. Treatment of cadmium dust with two-stage leaching process

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    The treatment of cadmium dust with a two-stage leaching process was investigated to replace the existing sulphation roast-leaching processes. The process parameters in the first stage leaching were basically similar to the neutralleaching in zinc hydrometallurgy. The effects of process parameters in the second stage leaching on the extraction of zincand cadmium were mainly studied. The experimental results indicated that zinc and cadmium could be efficiently recoveredfrom the cadmium dust by two-stage leaching process. The extraction percentages of zinc and cadmium in two stage leach-ing reached 95% and 88% respectively under the optimum conditions. The total extraction percentage of Zn and Cdreached 94%.

  11. Improving permafrost distribution modelling using feature selection algorithms

    Science.gov (United States)

    Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail

    2016-04-01

    The availability of an increasing number of spatial data on the occurrence of mountain permafrost allows the employment of machine learning (ML) classification algorithms for modelling the distribution of the phenomenon. One of the major problems when dealing with high-dimensional dataset is the number of input features (variables) involved. Application of ML classification algorithms to this large number of variables leads to the risk of overfitting, with the consequence of a poor generalization/prediction. For this reason, applying feature selection (FS) techniques helps simplifying the amount of factors required and improves the knowledge on adopted features and their relation with the studied phenomenon. Moreover, taking away irrelevant or redundant variables from the dataset effectively improves the quality of the ML prediction. This research deals with a comparative analysis of permafrost distribution models supported by FS variable importance assessment. The input dataset (dimension = 20-25, 10 m spatial resolution) was constructed using landcover maps, climate data and DEM derived variables (altitude, aspect, slope, terrain curvature, solar radiation, etc.). It was completed with permafrost evidences (geophysical and thermal data and rock glacier inventories) that serve as training permafrost data. Used FS algorithms informed about variables that appeared less statistically important for permafrost presence/absence. Three different algorithms were compared: Information Gain (IG), Correlation-based Feature Selection (CFS) and Random Forest (RF). IG is a filter technique that evaluates the worth of a predictor by measuring the information gain with respect to the permafrost presence/absence. Conversely, CFS is a wrapper technique that evaluates the worth of a subset of predictors by considering the individual predictive ability of each variable along with the degree of redundancy between them. Finally, RF is a ML algorithm that performs FS as part of its

  12. Feature-Selective Attentional Modulations in Human Frontoparietal Cortex.

    Science.gov (United States)

    Ester, Edward F; Sutterer, David W; Serences, John T; Awh, Edward

    2016-08-03

    Control over visual selection has long been framed in terms of a dichotomy between "source" and "site," where top-down feedback signals originating in frontoparietal cortical areas modulate or bias sensory processing in posterior visual areas. This distinction is motivated in part by observations that frontoparietal cortical areas encode task-level variables (e.g., what stimulus is currently relevant or what motor outputs are appropriate), while posterior sensory areas encode continuous or analog feature representations. Here, we present evidence that challenges this distinction. We used fMRI, a roving searchlight analysis, and an inverted encoding model to examine representations of an elementary feature property (orientation) across the entire human cortical sheet while participants attended either the orientation or luminance of a peripheral grating. Orientation-selective representations were present in a multitude of visual, parietal, and prefrontal cortical areas, including portions of the medial occipital cortex, the lateral parietal cortex, and the superior precentral sulcus (thought to contain the human homolog of the macaque frontal eye fields). Additionally, representations in many-but not all-of these regions were stronger when participants were instructed to attend orientation relative to luminance. Collectively, these findings challenge models that posit a strict segregation between sources and sites of attentional control on the basis of representational properties by demonstrating that simple feature values are encoded by cortical regions throughout the visual processing hierarchy, and that representations in many of these areas are modulated by attention. Influential models of visual attention posit a distinction between top-down control and bottom-up sensory processing networks. These models are motivated in part by demonstrations showing that frontoparietal cortical areas associated with top-down control represent abstract or categorical stimulus

  13. LOGISTICS SCHEDULING: ANALYSIS OF TWO-STAGE PROBLEMS

    Institute of Scientific and Technical Information of China (English)

    Yung-Chia CHANG; Chung-Yee LEE

    2003-01-01

    This paper studies the coordination effects between stages for scheduling problems where decision-making is a two-stage process. Two stages are considered as one system. The system can be a supply chain that links two stages, one stage representing a manufacturer; and the other, a distributor.It also can represent a single manufacturer, while each stage represents a different department responsible for a part of operations. A problem that jointly considers both stages in order to achieve ideal overall system performance is defined as a system problem. In practice, at times, it might not be feasible for the two stages to make coordinated decisions due to (i) the lack of channels that allow decision makers at the two stages to cooperate, and/or (ii) the optimal solution to the system problem is too difficult (or costly) to achieve.Two practical approaches are applied to solve a variant of two-stage logistic scheduling problems. The Forward Approach is defined as a solution procedure by which the first stage of the system problem is solved first, followed by the second stage. Similarly, the Backward Approach is defined as a solution procedure by which the second stage of the system problem is solved prior to solving the first stage. In each approach, two stages are solved sequentially and the solution generated is treated as a heuristic solution with respect to the corresponding system problem. When decision makers at two stages make decisions locally without considering consequences to the entire system,ineffectiveness may result - even when each stage optimally solves its own problem. The trade-off between the time complexity and the solution quality is the main concern. This paper provides the worst-case performance analysis for each approach.

  14. Residential Two-Stage Gas Furnaces - Do They Save Energy?

    Energy Technology Data Exchange (ETDEWEB)

    Lekov, Alex; Franco, Victor; Lutz, James

    2006-05-12

    Residential two-stage gas furnaces account for almost a quarter of the total number of models listed in the March 2005 GAMA directory of equipment certified for sale in the United States. Two-stage furnaces are expanding their presence in the market mostly because they meet consumer expectations for improved comfort. Currently, the U.S. Department of Energy (DOE) test procedure serves as the method for reporting furnace total fuel and electricity consumption under laboratory conditions. In 2006, American Society of Heating Refrigeration and Air-conditioning Engineers (ASHRAE) proposed an update to its test procedure which corrects some of the discrepancies found in the DOE test procedure and provides an improved methodology for calculating the energy consumption of two-stage furnaces. The objectives of this paper are to explore the differences in the methods for calculating two-stage residential gas furnace energy consumption in the DOE test procedure and in the 2006 ASHRAE test procedure and to compare test results to research results from field tests. Overall, the DOE test procedure shows a reduction in the total site energy consumption of about 3 percent for two-stage compared to single-stage furnaces at the same efficiency level. In contrast, the 2006 ASHRAE test procedure shows almost no difference in the total site energy consumption. The 2006 ASHRAE test procedure appears to provide a better methodology for calculating the energy consumption of two-stage furnaces. The results indicate that, although two-stage technology by itself does not save site energy, the combination of two-stage furnaces with BPM motors provides electricity savings, which are confirmed by field studies.

  15. Unsupervised Feature Selection Based on the Morisita Index

    Science.gov (United States)

    Golay, Jean; Kanevski, Mikhail

    2016-04-01

    Recent breakthroughs in technology have radically improved our ability to collect and store data. As a consequence, the size of datasets has been increasing rapidly both in terms of number of variables (or features) and number of instances. Since the mechanism of many phenomena is not well known, too many variables are sampled. A lot of them are redundant and contribute to the emergence of three major challenges in data mining: (1) the complexity of result interpretation, (2) the necessity to develop new methods and tools for data processing, (3) the possible reduction in the accuracy of learning algorithms because of the curse of dimensionality. This research deals with a new algorithm for selecting the smallest subset of features conveying all the information of a dataset (i.e. an algorithm for removing redundant features). It is a new version of the Fractal Dimensionality Reduction (FDR) algorithm [1] and it relies on two ideas: (a) In general, data lie on non-linear manifolds of much lower dimension than that of the spaces where they are embedded. (b) The situation describes in (a) is partly due to redundant variables, since they do not contribute to increasing the dimension of manifolds, called Intrinsic Dimension (ID). The suggested algorithm implements these ideas by selecting only the variables influencing the data ID. Unlike the FDR algorithm, it resorts to a recently introduced ID estimator [2] based on the Morisita index of clustering and to a sequential forward search strategy. Consequently, in addition to its ability to capture non-linear dependences, it can deal with large datasets and its implementation is straightforward in any programming environment. Many real world case studies are considered. They are related to environmental pollution and renewable resources. References [1] C. Traina Jr., A.J.M. Traina, L. Wu, C. Faloutsos, Fast feature selection using fractal dimension, in: Proceedings of the XV Brazilian Symposium on Databases, SBBD, pp. 158

  16. Two-stage local M-estimation of additive models

    Institute of Scientific and Technical Information of China (English)

    JIANG JianCheng; LI JianTao

    2008-01-01

    This paper studies local M-estimation of the nonparametric components of additive models. A two-stage local M-estimation procedure is proposed for estimating the additive components and their derivatives. Under very mild conditions, the proposed estimators of each additive component and its derivative are jointly asymptotically normal and share the same asymptotic distributions as they would be if the other components were known. The established asymptotic results also hold for two particular local M-estimations: the local least squares and least absolute deviation estimations. However,for general two-stage local M-estimation with continuous and nonlinear ψ-functions, its implementation is time-consuming. To reduce the computational burden, one-step approximations to the two-stage local M-estimators are developed. The one-step estimators are shown to achieve the same efficiency as the fully iterative two-stage local M-estimators, which makes the two-stage local M-estimation more feasible in practice. The proposed estimators inherit the advantages and at the same time overcome the disadvantages of the local least-squares based smoothers. In addition, the practical implementation of the proposed estimation is considered in details. Simulations demonstrate the merits of the two-stage local M-estimation, and a real example illustrates the performance of the methodology.

  17. Two-stage local M-estimation of additive models

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    This paper studies local M-estimation of the nonparametric components of additive models.A two-stage local M-estimation procedure is proposed for estimating the additive components and their derivatives.Under very mild conditions,the proposed estimators of each additive component and its derivative are jointly asymptotically normal and share the same asymptotic distributions as they would be if the other components were known.The established asymptotic results also hold for two particular local M-estimations:the local least squares and least absolute deviation estimations.However,for general two-stage local M-estimation with continuous and nonlinear ψ-functions,its implementation is time-consuming.To reduce the computational burden,one-step approximations to the two-stage local M-estimators are developed.The one-step estimators are shown to achieve the same effciency as the fully iterative two-stage local M-estimators,which makes the two-stage local M-estimation more feasible in practice.The proposed estimators inherit the advantages and at the same time overcome the disadvantages of the local least-squares based smoothers.In addition,the practical implementation of the proposed estimation is considered in details.Simulations demonstrate the merits of the two-stage local M-estimation,and a real example illustrates the performance of the methodology.

  18. [Feature extraction for breast cancer data based on geometric algebra theory and feature selection using differential evolution].

    Science.gov (United States)

    Li, Jing; Hong, Wenxue

    2014-12-01

    The feature extraction and feature selection are the important issues in pattern recognition. Based on the geometric algebra representation of vector, a new feature extraction method using blade coefficient of geometric algebra was proposed in this study. At the same time, an improved differential evolution (DE) feature selection method was proposed to solve the elevated high dimension issue. The simple linear discriminant analysis was used as the classifier. The result of the 10-fold cross-validation (10 CV) classification of public breast cancer biomedical dataset was more than 96% and proved superior to that of the original features and traditional feature extraction method.

  19. Variable selection in near-infrared spectroscopy: Benchmarking of feature selection methods on biodiesel data

    Energy Technology Data Exchange (ETDEWEB)

    Balabin, Roman M., E-mail: balabin@org.chem.ethz.ch [Department of Chemistry and Applied Biosciences, ETH Zurich, 8093 Zurich (Switzerland); Smirnov, Sergey V. [Unimilk Joint Stock Co., 143421 Moscow Region (Russian Federation)

    2011-04-29

    During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm{sup -1}) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic

  20. A Two-Stage Compression Method for the Fault Detection of Roller Bearings

    Directory of Open Access Journals (Sweden)

    Huaqing Wang

    2016-01-01

    Full Text Available Data measurement of roller bearings condition monitoring is carried out based on the Shannon sampling theorem, resulting in massive amounts of redundant information, which will lead to a big-data problem increasing the difficulty of roller bearing fault diagnosis. To overcome the aforementioned shortcoming, a two-stage compressed fault detection strategy is proposed in this study. First, a sliding window is utilized to divide the original signals into several segments and a selected symptom parameter is employed to represent each segment, through which a symptom parameter wave can be obtained and the raw vibration signals are compressed to a certain level with the faulty information remaining. Second, a fault detection scheme based on the compressed sensing is applied to extract the fault features, which can compress the symptom parameter wave thoroughly with a random matrix called the measurement matrix. The experimental results validate the effectiveness of the proposed method and the comparison of the three selected symptom parameters is also presented in this paper.

  1. Accuracy of the One-Stage and Two-Stage Impression Techniques: A Comparative Analysis

    Directory of Open Access Journals (Sweden)

    Ladan Jamshidy

    2016-01-01

    Full Text Available Introduction. One of the main steps of impression is the selection and preparation of an appropriate tray. Hence, the present study aimed to analyze and compare the accuracy of one- and two-stage impression techniques. Materials and Methods. A resin laboratory-made model, as the first molar, was prepared by standard method for full crowns with processed preparation finish line of 1 mm depth and convergence angle of 3-4°. Impression was made 20 times with one-stage technique and 20 times with two-stage technique using an appropriate tray. To measure the marginal gap, the distance between the restoration margin and preparation finish line of plaster dies was vertically determined in mid mesial, distal, buccal, and lingual (MDBL regions by a stereomicroscope using a standard method. Results. The results of independent test showed that the mean value of the marginal gap obtained by one-stage impression technique was higher than that of two-stage impression technique. Further, there was no significant difference between one- and two-stage impression techniques in mid buccal region, but a significant difference was reported between the two impression techniques in MDL regions and in general. Conclusion. The findings of the present study indicated higher accuracy for two-stage impression technique than for the one-stage impression technique.

  2. Two-Stage Approach for Protein Superfamily Classification

    Directory of Open Access Journals (Sweden)

    Swati Vipsita

    2013-01-01

    Full Text Available We deal with the problem of protein superfamily classification in which the family membership of newly discovered amino acid sequence is predicted. Correct prediction is a matter of great concern for the researchers and drug analyst which helps them in discovery of new drugs. As this problem falls broadly under the category of pattern classification problem, we have made all efforts to optimize feature extraction in the first stage and classifier design in the second stage with an overall objective to maximize the performance accuracy of the classifier. In the feature extraction phase, Genetic Algorithm- (GA- based wrapper approach is used to select few eigenvectors from the principal component analysis (PCA space which are encoded as binary strings in the chromosome. On the basis of position of 1’s in the chromosome, the eigenvectors are selected to build the transformation matrix which then maps the original high-dimension feature space to lower dimension feature space. Using PCA-NSGA-II (non-dominated sorting GA, the nondominated solutions obtained from the Pareto front solve the trade-off problem by compromising between the number of eigenvectors selected and the accuracy obtained by the classifier. In the second stage, recursive orthogonal least square algorithm (ROLSA is used for training radial basis function network (RBFN to select optimal number of hidden centres as well as update the output layer weighting matrix. This approach can be applied to large data set with much lower requirements of computer memory. Thus, very small architectures having few number of hidden centres are obtained showing higher level of performance accuracy.

  3. STARS A Two Stage High Gain Harmonic Generation FEL Demonstrator

    Energy Technology Data Exchange (ETDEWEB)

    M. Abo-Bakr; W. Anders; J. Bahrdt; P. Budz; K.B. Buerkmann-Gehrlein; O. Dressler; H.A. Duerr; V. Duerr; W. Eberhardt; S. Eisebitt; J. Feikes; R. Follath; A. Gaupp; R. Goergen; K. Goldammer; S.C. Hessler; K. Holldack; E. Jaeschke; Thorsten Kamps; S. Klauke; J. Knobloch; O. Kugeler; B.C. Kuske; P. Kuske; A. Meseck; R. Mitzner; R. Mueller; M. Neeb; A. Neumann; K. Ott; D. Pfluckhahn; T. Quast; M. Scheer; Th. Schroeter; M. Schuster; F. Senf; G. Wuestefeld; D. Kramer; Frank Marhauser

    2007-08-01

    BESSY is proposing a demonstration facility, called STARS, for a two-stage high-gain harmonic generation free electron laser (HGHG FEL). STARS is planned for lasing in the wavelength range 40 to 70 nm, requiring a beam energy of 325 MeV. The facility consists of a normal conducting gun, three superconducting TESLA-type acceleration modules modified for CW operation, a single stage bunch compressor and finally a two-stage HGHG cascaded FEL. This paper describes the faciliy layout and the rationale behind the operation parameters.

  4. Dynamic Modelling of the Two-stage Gasification Process

    DEFF Research Database (Denmark)

    Gøbel, Benny; Henriksen, Ulrik B.; Houbak, Niels

    1999-01-01

    A two-stage gasification pilot plant was designed and built as a co-operative project between the Technical University of Denmark and the company REKA.A dynamic, mathematical model of the two-stage pilot plant was developed to serve as a tool for optimising the process and the operating conditions...... of the gasification plant.The model consists of modules corresponding to the different elements in the plant. The modules are coupled together through mass and heat conservation.Results from the model are compared with experimental data obtained during steady and unsteady operation of the pilot plant. A good...

  5. A classification system based on a new wrapper feature selection algorithm for the diagnosis of primary and secondary polycythemia.

    Science.gov (United States)

    Korfiatis, Vasileios Ch; Asvestas, Pantelis A; Delibasis, Konstantinos K; Matsopoulos, George K

    2013-12-01

    Primary and Secondary Polycythemia are diseases of the bone marrow that affect the blood's composition and prohibit patients from becoming blood donors. Since these diseases may become fatal, their early diagnosis is important. In this paper, a classification system for the diagnosis of Primary and Secondary Polycythemia is proposed. The proposed system classifies input data into three classes; Healthy, Primary Polycythemic (PP) and Secondary Polycythemic (SP) and is implemented using two separate binary classification levels. The first level performs the Healthy/non-Healthy classification and the second level the PP/SP classification. To this end, a novel wrapper feature selection algorithm, called the LM-FM algorithm, is presented in order to maximize the classifier's performance. The algorithm is comprised of two stages that are applied sequentially: the Local Maximization (LM) stage and the Floating Maximization (FM) stage. The LM stage finds the best possible subset of a fixed predefined size, which is then used as an input for the next stage. The FM stage uses a floating size technique to search for an even better solution by varying the initially provided subset size. Then, the Support Vector Machine (SVM) classifier is used for the discrimination of the data at each classification level. The proposed classification system is compared with various well-established feature selection techniques such as the Sequential Floating Forward Selection (SFFS) and the Maximum Output Information (MOI) wrapper schemes, and with standalone classification techniques such as the Multilayer Perceptron (MLP) and SVM classifier. The proposed LM-FM feature selection algorithm combined with the SVM classifier increases the overall performance of the classification system, scoring up to 98.9% overall accuracy at the first classification level and up to 96.6% at the second classification level. Moreover, it provides excellent robustness regardless of the size of the input feature

  6. A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large Peer-to-Peer Networks

    Data.gov (United States)

    National Aeronautics and Space Administration — In this paper we develop a local distributed privacy preserving algorithm for feature selection in a large peer-to-peer environment. Feature selection is often used...

  7. Efficient Two-Stage Group Testing Algorithms for DNA Screening

    CERN Document Server

    Huber, Michael

    2011-01-01

    Group testing algorithms are very useful tools for DNA library screening. Building on recent work by Levenshtein (2003) and Tonchev (2008), we construct in this paper new infinite classes of combinatorial structures, the existence of which are essential for attaining the minimum number of individual tests at the second stage of a two-stage disjunctive testing algorithm.

  8. High Performance Gasification with the Two-Stage Gasifier

    DEFF Research Database (Denmark)

    Gøbel, Benny; Hindsgaul, Claus; Henriksen, Ulrik Birk

    2002-01-01

    Based on more than 15 years of research and practical experience, the Technical University of Denmark (DTU) and COWI Consulting Engineers and Planners AS present the two-stage gasification process, a concept for high efficiency gasification of biomass producing negligible amounts of tars. In the ......Based on more than 15 years of research and practical experience, the Technical University of Denmark (DTU) and COWI Consulting Engineers and Planners AS present the two-stage gasification process, a concept for high efficiency gasification of biomass producing negligible amounts of tars....... In the two-stage gasification concept, the pyrolysis and the gasification processes are physical separated. The volatiles from the pyrolysis are partially oxidized, and the hot gases are used as gasification medium to gasify the char. Hot gases from the gasifier and a combustion unit can be used for drying...... a cold gas efficiency exceeding 90% is obtained. In the original design of the two-stage gasification process, the pyrolysis unit consists of a screw conveyor with external heating, and the char unit is a fixed bed gasifier. This design is well proven during more than 1000 hours of testing with various...

  9. FREE GRAFT TWO-STAGE URETHROPLASTY FOR HYPOSPADIAS REPAIR

    Institute of Scientific and Technical Information of China (English)

    Zhong-jin Yue; Ling-jun Zuo; Jia-ji Wang; Gan-ping Zhong; Jian-ming Duan; Zhi-ping Wang; Da-shan Qin

    2005-01-01

    Objective To evaluate the effectiveness of free graft transplantation two-stage urethroplasty for hypospadias repair.Methods Fifty-eight cases with different types of hypospadias including 10 subcoronal, 36 penile shaft, 9 scrotal, and 3 perineal were treated with free full-thickness skin graft or (and) buccal mucosal graft transplantation two-stage urethroplasty. Of 58 cases, 45 were new cases, 13 had history of previous failed surgeries. Operative procedure included two stages: the first stage is to correct penile curvature (chordee), prepare transplanting bed, harvest and prepare full-thickness skin graft, buccal mucosal graft, and perform graft transplantation. The second stage is to complete urethroplasty and glanuloplasty.Results After the first stage operation, 56 of 58 cases (96.6%) were successful with grafts healing well, another 2foreskin grafts got gangrened. After the second stage operation on 56 cases, 5 cases failed with newly formed urethras opened due to infection, 8 cases had fistulas, 43 (76.8%) cases healed well.Conclusions Free graft transplantation two-stage urethroplasty for hypospadias repair is a kind of effective treatment with broad indication, comparatively high success rate, less complicationsand good cosmatic results, indicative of various types of hypospadias repair.

  10. Composite likelihood and two-stage estimation in family studies

    DEFF Research Database (Denmark)

    Andersen, Elisabeth Anne Wreford

    2004-01-01

    In this paper register based family studies provide the motivation for linking a two-stage estimation procedure in copula models for multivariate failure time data with a composite likelihood approach. The asymptotic properties of the estimators in both parametric and semi-parametric models are d...

  11. A two-stage rank test using density estimation

    NARCIS (Netherlands)

    Albers, Willem/Wim

    1995-01-01

    For the one-sample problem, a two-stage rank test is derived which realizes a required power against a given local alternative, for all sufficiently smooth underlying distributions. This is achieved using asymptotic expansions resulting in a precision of orderm −1, wherem is the size of the first

  12. The construction of customized two-stage tests

    NARCIS (Netherlands)

    Adema, Jos J.

    1990-01-01

    In this paper mixed integer linear programming models for customizing two-stage tests are given. Model constraints are imposed with respect to test composition, administration time, inter-item dependencies, and other practical considerations. It is not difficult to modify the models to make them use

  13. BUILDING ROBUST APPEARANCE MODELS USING ON-LINE FEATURE SELECTION

    Energy Technology Data Exchange (ETDEWEB)

    PORTER, REID B. [Los Alamos National Laboratory; LOVELAND, ROHAN [Los Alamos National Laboratory; ROSTEN, ED [Los Alamos National Laboratory

    2007-01-29

    In many tracking applications, adapting the target appearance model over time can improve performance. This approach is most popular in high frame rate video applications where latent variables, related to the objects appearance (e.g., orientation and pose), vary slowly from one frame to the next. In these cases the appearance model and the tracking system are tightly integrated, and latent variables are often included as part of the tracking system's dynamic model. In this paper we describe our efforts to track cars in low frame rate data (1 frame/second) acquired from a highly unstable airborne platform. Due to the low frame rate, and poor image quality, the appearance of a particular vehicle varies greatly from one frame to the next. This leads us to a different problem: how can we build the best appearance model from all instances of a vehicle we have seen so far. The best appearance model should maximize the future performance of the tracking system, and maximize the chances of reacquiring the vehicle once it leaves the field of view. We propose an online feature selection approach to this problem and investigate the performance and computational trade-offs with a real-world dataset.

  14. GAIN RATIO BASED FEATURE SELECTION METHOD FOR PRIVACY PRESERVATION

    Directory of Open Access Journals (Sweden)

    R. Praveena Priyadarsini

    2011-04-01

    Full Text Available Privacy-preservation is a step in data mining that tries to safeguard sensitive information from unsanctioned disclosure and hence protecting individual data records and their privacy. There are various privacy preservation techniques like k-anonymity, l-diversity and t-closeness and data perturbation. In this paper k-anonymity privacy protection technique is applied to high dimensional datasets like adult and census. since, both the data sets are high dimensional, feature subset selection method like Gain Ratio is applied and the attributes of the datasets are ranked and low ranking attributes are filtered to form new reduced data subsets. K-anonymization privacy preservation technique is then applied on reduced datasets. The accuracy of the privacy preserved reduced datasets and the original datasets are compared for their accuracy on the two functionalities of data mining namely classification and clustering using naïve Bayesian and k-means algorithm respectively. Experimental results show that classification and clustering accuracy are comparatively the same for reduced k-anonym zed datasets and the original data sets.

  15. A Novel Two-Stage Illumination Estimation Framework for Expression Recognition

    Directory of Open Access Journals (Sweden)

    Zheng Zhang

    2014-01-01

    Full Text Available One of the critical issues for facial expression recognition is to eliminate the negative effect caused by variant poses and illuminations. In this paper a two-stage illumination estimation framework is proposed based on three-dimensional representative face and clustering, which can estimate illumination directions under a series of poses. First, 256 training 3D face models are adaptively categorized into a certain amount of facial structure types by k-means clustering to group people with similar facial appearance into clusters. Then the representative face of each cluster is generated to represent the facial appearance type of that cluster. Our training set is obtained by rotating all representative faces to a certain pose, illuminating them with a series of different illumination conditions, and then projecting them into two-dimensional images. Finally the saltire-over-cross feature is selected to train a group of SVM classifiers and satisfactory performance is achieved when estimating a number of test sets including images generated from 64 3D face models kept for testing, CAS-PEAL face database, CMU PIE database, and a small test set created by ourselves. Compared with other related works, our method is subject independent and has less computational complexity O(C×N without 3D facial reconstruction.

  16. Automatic feature selection for model-based reinforcement learning in factored MDPs

    NARCIS (Netherlands)

    Kroon, M.; Whiteson, S.; Wani, M.A.; Kantardzic, M.; Palade, V.; Kurgan, L.; Qi, A.

    2009-01-01

    Feature selection is an important challenge in machine learning. Unfortunately, most methods for automating feature selection are designed for supervised learning tasks and are thus either inapplicable or impractical for reinforcement learning. This paper presents a new approach to feature selection

  17. Soft computing based feature selection for environmental sound classification

    NARCIS (Netherlands)

    Shakoor, A.; May, T.M.; Van Schijndel, N.H.

    2010-01-01

    Environmental sound classification has a wide range of applications,like hearing aids, mobile communication devices, portable media players, and auditory protection devices. Sound classification systemstypically extract features from the input sound. Using too many features increases complexity unne

  18. Soft computing based feature selection for environmental sound classification

    NARCIS (Netherlands)

    Shakoor, A.; May, T.M.; Van Schijndel, N.H.

    2010-01-01

    Environmental sound classification has a wide range of applications,like hearing aids, mobile communication devices, portable media players, and auditory protection devices. Sound classification systemstypically extract features from the input sound. Using too many features increases complexity unne

  19. Supervised Feature Subset Selection based on Modified Fuzzy Relative Information Measure for classifier Cart

    Directory of Open Access Journals (Sweden)

    K.SAROJINI,

    2010-06-01

    Full Text Available Feature subset selection is an essential task in data mining. This paper presents a new method for dealing with supervised feature subset selection based on Modified Fuzzy Relative Information Measure (MFRIM. First, Discretization algorithm is applied to discretize numeric features to construct the membership functions of each fuzzy sets of a feature. Then the proposed MFRIM is applied to select the feature subset focusing on boundary samples. The proposed method can select feature subset with minimum number of features, which are relevant to get higher average classification accuracy for datasets. The experimental results with UCI datasets show that the proposed algorithm is effective and efficient in selecting subset with minimum number of features getting higher average classification accuracy than the consistency based feature subset selection method.

  20. A bidirectional feature selection method based on mutual information and redundancy-synergy coefficient

    Institute of Scientific and Technical Information of China (English)

    YANG Sheng; ZHANG Zhi; SHI Peng-fei

    2006-01-01

    Feature subset selection is a fundamental problem of data mining. The mutual information of feature subset is a measure for feature subset containing class feature information. A hashing mechanism is proposed to calculate the mutual information of feature subset. The feature relevancy is defined by mutual information. Redundancy-synergy coefficient, a novel redundancy and synergy measure for features to describe the class feature, is defined. In terms of information maximization rule, a bidirectional heuristic feature subset selection method based on mutual information and redundancy-synergy coefficient is presented. This study' s experiments show the good performance of the new method.

  1. Feature Selection Method Based on Neighborhood Relationships: Applications in EEG Signal Identification and Chinese Character Recognition

    Directory of Open Access Journals (Sweden)

    Yu-Xiang Zhao

    2016-06-01

    Full Text Available In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters.

  2. Feature Selection Method Based on Neighborhood Relationships: Applications in EEG Signal Identification and Chinese Character Recognition

    Science.gov (United States)

    Zhao, Yu-Xiang; Chou, Chien-Hsing

    2016-01-01

    In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS) algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters. PMID:27314346

  3. Feature Selection Method Based on Neighborhood Relationships: Applications in EEG Signal Identification and Chinese Character Recognition.

    Science.gov (United States)

    Zhao, Yu-Xiang; Chou, Chien-Hsing

    2016-06-14

    In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS) algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters.

  4. Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

    Science.gov (United States)

    Wang, Xiaojia; Mao, Qirong; Zhan, Yongzhao

    2008-11-01

    There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions. The experiments show that this method can improve the recognition rate and the time of feature extraction

  5. FunSAV: predicting the functional effect of single amino acid variants using a two-stage random forest model.

    Directory of Open Access Journals (Sweden)

    Mingjun Wang

    Full Text Available Single amino acid variants (SAVs are the most abundant form of known genetic variations associated with human disease. Successful prediction of the functional impact of SAVs from sequences can thus lead to an improved understanding of the underlying mechanisms of why a SAV may be associated with certain disease. In this work, we constructed a high-quality structural dataset that contained 679 high-quality protein structures with 2,048 SAVs by collecting the human genetic variant data from multiple resources and dividing them into two categories, i.e., disease-associated and neutral variants. We built a two-stage random forest (RF model, termed as FunSAV, to predict the functional effect of SAVs by combining sequence, structure and residue-contact network features with other additional features that were not explored in previous studies. Importantly, a two-step feature selection procedure was proposed to select the most important and informative features that contribute to the prediction of disease association of SAVs. In cross-validation experiments on the benchmark dataset, FunSAV achieved a good prediction performance with the area under the curve (AUC of 0.882, which is competitive with and in some cases better than other existing tools including SIFT, SNAP, Polyphen2, PANTHER, nsSNPAnalyzer and PhD-SNP. The sourcecodes of FunSAV and the datasets can be downloaded at http://sunflower.kuicr.kyoto-u.ac.jp/sjn/FunSAV.

  6. Neural Gen Feature Selection for Supervised Learning Classifier

    Directory of Open Access Journals (Sweden)

    Mohammed Hasan Abdulameer

    2014-04-01

    Full Text Available Face recognition has recently received significant attention, especially during the past few years. Many face recognition techniques were developed such as PSO-SVM and LDA-SVM However, inefficient features in the face recognition may lead to inadequate in the recognition results. Hence, a new face recognition system based on Genetic Algorithm and FFBNN technique is proposed. Our proposed face recognition system initially performs the feature extraction and these optimal features are promoted to the recognition process. In the feature extraction, the optimal features are extracted from the face image database by Genetic Algorithm (GA with FFBNN and the computed optimal features are given to the FFBNN technique to carry out the training and testing process. The optimal features from the feature database are fed to the FFBNN for accomplishing the training process. The well trained FFBNN with the optimal features provide the recognition result. The optimal features in FFBNN by GA efficiently perform the face recognition process. The human face dataset called YALE is utilized to analyze the performance of our proposed GA-FFNN technique and also this GA-FFBNN is compared with standard SVM and PSO-SVM techniques.

  7. Square Kilometre Array station configuration using two-stage beamforming

    CERN Document Server

    Jiwani, Aziz; Razavi-Ghods, Nima; Hall, Peter J; Padhi, Shantanu; de Vaate, Jan Geralt bij

    2012-01-01

    The lowest frequency band (70 - 450 MHz) of the Square Kilometre Array will consist of sparse aperture arrays grouped into geographically-localised patches, or stations. Signals from thousands of antennas in each station will be beamformed to produce station beams which form the inputs for the central correlator. Two-stage beamforming within stations can reduce SKA-low signal processing load and costs, but has not been previously explored for the irregular station layouts now favoured in radio astronomy arrays. This paper illustrates the effects of two-stage beamforming on sidelobes and effective area, for two representative station layouts (regular and irregular gridded tile on an irregular station). The performance is compared with a single-stage, irregular station. The inner sidelobe levels do not change significantly between layouts, but the more distant sidelobes are affected by the tile layouts; regular tile creates diffuse, but regular, grating lobes. With very sparse arrays, the station effective area...

  8. Two stage sorption type cryogenic refrigerator including heat regeneration system

    Science.gov (United States)

    Jones, Jack A.; Wen, Liang-Chi; Bard, Steven

    1989-01-01

    A lower stage chemisorption refrigeration system physically and functionally coupled to an upper stage physical adsorption refrigeration system is disclosed. Waste heat generated by the lower stage cycle is regenerated to fuel the upper stage cycle thereby greatly improving the energy efficiency of a two-stage sorption refrigerator. The two stages are joined by disposing a first pressurization chamber providing a high pressure flow of a first refrigerant for the lower stage refrigeration cycle within a second pressurization chamber providing a high pressure flow of a second refrigerant for the upper stage refrigeration cycle. The first pressurization chamber is separated from the second pressurization chamber by a gas-gap thermal switch which at times is filled with a thermoconductive fluid to allow conduction of heat from the first pressurization chamber to the second pressurization chamber.

  9. Recursive algorithm for the two-stage EFOP estimation method

    Institute of Scientific and Technical Information of China (English)

    LUO GuiMing; HUANG Jian

    2008-01-01

    A recursive algorithm for the two-stage empirical frequency-domain optimal param-eter (EFOP) estimation method Was proposed. The EFOP method was a novel sys-tem identificallon method for Black-box models that combines time-domain esti-mation and frequency-domain estimation. It has improved anti-disturbance perfor-mance, and could precisely identify models with fewer sample numbers. The two-stage EFOP method based on the boot-strap technique was generally suitable for Black-box models, but it was an iterative method and takes too much computation work so that it did not work well online. A recursive algorithm was proposed for dis-turbed stochastic systems. Some simulation examples are included to demonstrate the validity of the new method.

  10. Two-stage approach to full Chinese parsing

    Institute of Scientific and Technical Information of China (English)

    Cao Hailong; Zhao Tiejun; Yang Muyun; Li Sheng

    2005-01-01

    Natural language parsing is a task of great importance and extreme difficulty. In this paper, we present a full Chinese parsing system based on a two-stage approach. Rather than identifying all phrases by a uniform model, we utilize a divide and conquer strategy. We propose an effective and fast method based on Markov model to identify the base phrases. Then we make the first attempt to extend one of the best English parsing models i.e. the head-driven model to recognize Chinese complex phrases. Our two-stage approach is superior to the uniform approach in two aspects. First, it creates synergy between the Markov model and the head-driven model. Second, it reduces the complexity of full Chinese parsing and makes the parsing system space and time efficient. We evaluate our approach in PARSEVAL measures on the open test set, the parsing system performances at 87.53% precision, 87.95% recall.

  11. Income and Poverty across SMSAs: A Two-Stage Analysis

    OpenAIRE

    1993-01-01

    Two popular explanations of urban poverty are the "welfare-disincentive" and "urban-deindustrialization" theories. Using cross-sectional Census data, we develop a two-stage model to predict an SMSAs median family income and poverty rate. The model allows the city's welfare level and industrial structure to affect its median family income and poverty rate directly. It also allows welfare and industrial structure to affect income and poverty indirectly, through their effects on family structure...

  12. A Two-stage Polynomial Method for Spectrum Emissivity Modeling

    OpenAIRE

    Qiu, Qirong; Liu, Shi; Teng, Jing; Yan, Yong

    2015-01-01

    Spectral emissivity is a key in the temperature measurement by radiation methods, but not easy to determine in a combustion environment, due to the interrelated influence of temperature and wave length of the radiation. In multi-wavelength radiation thermometry, knowing the spectral emissivity of the material is a prerequisite. However in many circumstances such a property is a complex function of temperature and wavelength and reliable models are yet to be sought. In this study, a two stages...

  13. Measuring the Learning from Two-Stage Collaborative Group Exams

    CERN Document Server

    Ives, Joss

    2014-01-01

    A two-stage collaborative exam is one in which students first complete the exam individually, and then complete the same or similar exam in collaborative groups immediately afterward. To quantify the learning effect from the group component of these two-stage exams in an introductory Physics course, a randomized crossover design was used where each student participated in both the treatment and control groups. For each of the two two-stage collaborative group midterm exams, questions were designed to form matched near-transfer pairs with questions on an end-of-term diagnostic which was used as a learning test. For learning test questions paired with questions from the first midterm, which took place six to seven weeks before the learning test, an analysis using a mixed-effects logistic regression found no significant differences in learning-test performance between the control and treatment group. For learning test questions paired with questions from the second midterm, which took place one to two weeks prio...

  14. Feature selection method based on multi-fractal dimension and harmony search algorithm and its application

    Science.gov (United States)

    Zhang, Chen; Ni, Zhiwei; Ni, Liping; Tang, Na

    2016-10-01

    Feature selection is an important method of data preprocessing in data mining. In this paper, a novel feature selection method based on multi-fractal dimension and harmony search algorithm is proposed. Multi-fractal dimension is adopted as the evaluation criterion of feature subset, which can determine the number of selected features. An improved harmony search algorithm is used as the search strategy to improve the efficiency of feature selection. The performance of the proposed method is compared with that of other feature selection algorithms on UCI data-sets. Besides, the proposed method is also used to predict the daily average concentration of PM2.5 in China. Experimental results show that the proposed method can obtain competitive results in terms of both prediction accuracy and the number of selected features.

  15. Novel Automatic Filter-Class Feature Selection for Machine Learning Regression

    DEFF Research Database (Denmark)

    Wollsen, Morten Gill; Hallam, John; Jørgensen, Bo Nørregaard

    2017-01-01

    With the increased focus on application of Big Data in all sectors of society, the performance of machine learning becomes essential. Efficient machine learning depends on efficient feature selection algorithms. Filter feature selection algorithms are model-free and therefore very fast, but require...... model in the feature selection process. PCA is often used in machine learning litterature and can be considered the default feature selection method. RDESF outperformed PCA in both experiments in both prediction error and computational speed. RDESF is a new step into filter-based automatic feature...

  16. Novel Automatic Filter-Class Feature Selection for Machine Learning Regression

    DEFF Research Database (Denmark)

    Wollsen, Morten Gill; Hallam, John; Jørgensen, Bo Nørregaard

    2016-01-01

    With the increased focus on application of Big Data in all sectors of society, the performance of machine learning becomes essential. Efficient machine learning depends on efficient feature selection algorithms. Filter feature selection algorithms are model-free and therefore very fast, but require...... model in the feature selection process. PCA is often used in machine learning litterature and can be considered the default feature selection method. RDESF outperformed PCA in both experiments in both prediction error and computational speed. RDESF is a new step into filter-based automatic feature...

  17. Forty-five-degree two-stage venous cannula: advantages over standard two-stage venous cannulation.

    Science.gov (United States)

    Lawrence, D R; Desai, J B

    1997-01-01

    We present a 45-degree two-stage venous cannula that confers advantage to the surgeon using cardiopulmonary bypass. This cannula exits the mediastinum under the transverse bar of the sternal retractor, leaving the rostral end of the sternal incision free of apparatus. It allows for lifting of the heart with minimal effect on venous return and does not interfere with the radially laid out sutures of an aortic valve replacement using an interrupted suture technique.

  18. A Rank Aggregation Algorithm for Ensemble of Multiple Feature Selection Techniques in Credit Risk Evaluation

    Directory of Open Access Journals (Sweden)

    Shashi Dahiya

    2016-10-01

    Full Text Available In credit risk evaluation the accuracy of a classifier is very significant for classifying the high-risk loan applicants correctly. Feature selection is one way of improving the accuracy of a classifier. It provides the classifier with important and relevant features for model development. This study uses the ensemble of multiple feature ranking techniques for feature selection of credit data. It uses five individual rank based feature selection methods. It proposes a novel rank aggregation algorithm for combining the ranks of the individual feature selection methods of the ensemble. This algorithm uses the rank order along with the rank score of the features in the ranked list of each feature selection method for rank aggregation. The ensemble of multiple feature selection techniques uses the novel rank aggregation algorithm and selects the relevant features using the 80%, 60%, 40% and 20% thresholds from the top of the aggregated ranked list for building the C4.5, MLP, C4.5 based Bagging and MLP based Bagging models. It was observed that the performance of models using the ensemble of multiple feature selection techniques is better than the performance of 5 individual rank based feature selection methods. The average performance of all the models was observed as best for the ensemble of feature selection techniques at 60% threshold. Also, the bagging based models outperformed the individual models most significantly for the 60% threshold. This increase in performance is more significant from the fact that the number of features were reduced by 40% for building the highest performing models. This reduces the data dimensions and hence the overall data size phenomenally for model building. The use of the ensemble of feature selection techniques using the novel aggregation algorithm provided more accurate models which are simpler, faster and easy to interpret.

  19. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  20. The Bracka two-stage repair for severe proximal hypospadias: A single center experience

    Directory of Open Access Journals (Sweden)

    Rakesh S Joshi

    2015-01-01

    Full Text Available Background: Surgical correction of severe proximal hypospadias represents a significant surgical challenge and single-stage corrections are often associated with complications and reoperations. Bracka two-stage repair is an attractive alternative surgical procedure with superior, reliable, and reproducible results. Purpose: To study the feasibility and applicability of Bracka two-stage repair for the severe proximal hypospadias and to analyze the outcomes and complications of this surgical technique. Materials and Methods: This prospective study was conducted from January 2011 to December 2013. Bracka two-stage repair was performed using inner preputial skin as a free graft in subjects with proximal hypospadias in whom severe degree of chordee and/or poor urethral plate was present. Only primary cases were included in this study. All subjects received three doses of intra-muscular testosterone 3 weeks apart before first stage. Second stage was performed 6 months after the first stage. Follow-up ranged from 6 months to 24 months. Results: A total of 43 patients operated for Bracka repair, out of which 30 patients completed two-stage repair. Mean age of the patients was 4 years and 8 months. We achieved 100% graft uptake and no revision was required. Three patients developed fistula, while two had metal stenosis. Glans dehiscence, urethral stricture and the residual chordee were not found during follow-up and satisfactory cosmetic results with good urinary stream were achieved in all cases. Conclusion: The Bracka two-stage repair is a safe and reliable approach in select patients in whom it is impractical to maintain the axial integrity of the urethral plate, and, therefore, a full circumference urethral reconstruction become necessary. This gives good results both in terms of restoration of normal function with minimal complication.

  1. Optimisation of two-stage screw expanders for waste heat recovery applications

    Science.gov (United States)

    Read, M. G.; Smith, I. K.; Stosic, N.

    2015-08-01

    It has previously been shown that the use of two-phase screw expanders in power generation cycles can achieve an increase in the utilisation of available energy from a low temperature heat source when compared with more conventional single-phase turbines. However, screw expander efficiencies are more sensitive to expansion volume ratio than turbines, and this increases as the expander inlet vapour dryness fraction decreases. For singlestage screw machines with low inlet dryness, this can lead to under expansion of the working fluid and low isentropic efficiency for the expansion process. The performance of the cycle can potentially be improved by using a two-stage expander, consisting of a low pressure machine and a smaller high pressure machine connected in series. By expanding the working fluid over two stages, the built-in volume ratios of the two machines can be selected to provide a better match with the overall expansion process, thereby increasing efficiency for particular inlet and discharge conditions. The mass flow rate though both stages must however be matched, and the compromise between increasing efficiency and maximising power output must also be considered. This research uses a rigorous thermodynamic screw machine model to compare the performance of single and two-stage expanders over a range of operating conditions. The model allows optimisation of the required intermediate pressure in the two- stage expander, along with the rotational speed and built-in volume ratio of both screw machine stages. The results allow the two-stage machine to be fully specified in order to achieve maximum efficiency for a required power output.

  2. On Two-stage Seamless Adaptive Design in Clinical Trials

    Directory of Open Access Journals (Sweden)

    Shein-Chung Chow

    2008-12-01

    Full Text Available In recent years, the use of adaptive design methods in clinical research and development based on accrued data has become very popular because of its efficiency and flexibility in modifying trial and/or statistical procedures of ongoing clinical trials. One of the most commonly considered adaptive designs is probably a two-stage seamless adaptive trial design that combines two separate studies into one single study. In many cases, study endpoints considered in a two-stage seamless adaptive design may be similar but different (e.g. a biomarker versus a regular clinical endpoint or the same study endpoint with different treatment durations. In this case, it is important to determine how the data collected from both stages should be combined for the final analysis. It is also of interest to know how the sample size calculation/allocation should be done for achieving the study objectives originally set for the two stages (separate studies. In this article, formulas for sample size calculation/allocation are derived for cases in which the study endpoints are continuous, discrete (e.g. binary responses, and contain time-to-event data assuming that there is a well-established relationship between the study endpoints at different stages, and that the study objectives at different stages are the same. In cases in which the study objectives at different stages are different (e.g. dose finding at the first stage and efficacy confirmation at the second stage and when there is a shift in patient population caused by protocol amendments, the derived test statistics and formulas for sample size calculation and allocation are necessarily modified for controlling the overall type I error at the prespecified level.

  3. Two stage treatment of dairy effluent using immobilized Chlorella pyrenoidosa.

    Science.gov (United States)

    Yadavalli, Rajasri; Heggers, Goutham Rao Venkata Naga

    2013-12-19

    Dairy effluents contains high organic load and unscrupulous discharge of these effluents into aquatic bodies is a matter of serious concern besides deteriorating their water quality. Whilst physico-chemical treatment is the common mode of treatment, immobilized microalgae can be potentially employed to treat high organic content which offer numerous benefits along with waste water treatment. A novel low cost two stage treatment was employed for the complete treatment of dairy effluent. The first stage consists of treating the diary effluent in a photobioreactor (1 L) using immobilized Chlorella pyrenoidosa while the second stage involves a two column sand bed filtration technique. Whilst NH4+-N was completely removed, a 98% removal of PO43--P was achieved within 96 h of two stage purification processes. The filtrate was tested for toxicity and no mortality was observed in the zebra fish which was used as a model at the end of 96 h bioassay. Moreover, a significant decrease in biological oxygen demand and chemical oxygen demand was achieved by this novel method. Also the biomass separated was tested as a biofertilizer to the rice seeds and a 30% increase in terms of length of root and shoot was observed after the addition of biomass to the rice plants. We conclude that the two stage treatment of dairy effluent is highly effective in removal of BOD and COD besides nutrients like nitrates and phosphates. The treatment also helps in discharging treated waste water safely into the receiving water bodies since it is non toxic for aquatic life. Further, the algal biomass separated after first stage of treatment was highly capable of increasing the growth of rice plants because of nitrogen fixation ability of the green alga and offers a great potential as a biofertilizer.

  4. Two-stage series array SQUID amplifier for space applications

    Science.gov (United States)

    Tuttle, J. G.; DiPirro, M. J.; Shirron, P. J.; Welty, R. P.; Radparvar, M.

    We present test results for a two-stage integrated SQUID amplifier which uses a series array of d.c. SQUIDS to amplify the signal from a single input SQUID. The device was developed by Welty and Martinis at NIST and recent versions have been manufactured by HYPRES, Inc. Shielding and filtering techniques were employed during the testing to minimize the external noise. Energy resolution of 300 h was demonstrated using a d.c. excitation at frequencies above 1 kHz, and better than 500 h resolution was typical down to 300 Hz.

  5. Two-Stage Aggregate Formation via Streams in Myxobacteria

    Science.gov (United States)

    Alber, Mark; Kiskowski, Maria; Jiang, Yi

    2005-03-01

    In response to adverse conditions, myxobacteria form aggregates which develop into fruiting bodies. We model myxobacteria aggregation with a lattice cell model based entirely on short range (non-chemotactic) cell-cell interactions. Local rules result in a two-stage process of aggregation mediated by transient streams. Aggregates resemble those observed in experiment and are stable against even very large perturbations. Noise in individual cell behavior increases the effects of streams and result in larger, more stable aggregates. Phys. Rev. Lett. 93: 068301 (2004).

  6. Straw Gasification in a Two-Stage Gasifier

    DEFF Research Database (Denmark)

    Bentzen, Jens Dall; Hindsgaul, Claus; Henriksen, Ulrik Birk

    2002-01-01

    Additive-prepared straw pellets were gasified in the 100 kW two-stage gasifier at The Department of Mechanical Engineering of the Technical University of Denmark (DTU). The fixed bed temperature range was 800-1000°C. In order to avoid bed sintering, as observed earlier with straw gasification...... residues were examined after the test. No agglomeration or sintering was observed in the ash residues. The tar content was measured both by solid phase amino adsorption (SPA) method and cold trapping (Petersen method). Both showed low tar contents (~42 mg/Nm3 without gas cleaning). The particle content...

  7. Two-Stage Eagle Strategy with Differential Evolution

    CERN Document Server

    Yang, Xin-She

    2012-01-01

    Efficiency of an optimization process is largely determined by the search algorithm and its fundamental characteristics. In a given optimization, a single type of algorithm is used in most applications. In this paper, we will investigate the Eagle Strategy recently developed for global optimization, which uses a two-stage strategy by combing two different algorithms to improve the overall search efficiency. We will discuss this strategy with differential evolution and then evaluate their performance by solving real-world optimization problems such as pressure vessel and speed reducer design. Results suggest that we can reduce the computing effort by a factor of up to 10 in many applications.

  8. A Meta-Heuristic Regression-Based Feature Selection for Predictive Analytics

    Directory of Open Access Journals (Sweden)

    Bharat Singh

    2014-11-01

    Full Text Available A high-dimensional feature selection having a very large number of features with an optimal feature subset is an NP-complete problem. Because conventional optimization techniques are unable to tackle large-scale feature selection problems, meta-heuristic algorithms are widely used. In this paper, we propose a particle swarm optimization technique while utilizing regression techniques for feature selection. We then use the selected features to classify the data. Classification accuracy is used as a criterion to evaluate classifier performance, and classification is accomplished through the use of k-nearest neighbour (KNN and Bayesian techniques. Various high dimensional data sets are used to evaluate the usefulness of the proposed approach. Results show that our approach gives better results when compared with other conventional feature selection algorithms.

  9. Feature selection and validated predictive performance in the domain of Legionella pneumophila: A comparative study

    NARCIS (Netherlands)

    T. van der Ploeg (Tjeerd); E.W. Steyerberg (Ewout)

    2016-01-01

    textabstractBackground: Genetic comparisons of clinical and environmental Legionella strains form an essential part of outbreak investigations. DNA microarrays often comprise many DNA markers (features). Feature selection and the development of prediction models are particularly challenging in this

  10. Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm.

    Science.gov (United States)

    Martinez, Emmanuel; Alvarez, Mario Moises; Trevino, Victor

    2010-08-01

    Biomarker discovery is a typical application from functional genomics. Due to the large number of genes studied simultaneously in microarray data, feature selection is a key step. Swarm intelligence has emerged as a solution for the feature selection problem. However, swarm intelligence settings for feature selection fail to select small features subsets. We have proposed a swarm intelligence feature selection algorithm based on the initialization and update of only a subset of particles in the swarm. In this study, we tested our algorithm in 11 microarray datasets for brain, leukemia, lung, prostate, and others. We show that the proposed swarm intelligence algorithm successfully increase the classification accuracy and decrease the number of selected features compared to other swarm intelligence methods.

  11. UNLABELED SELECTED SAMPLES IN FEATURE EXTRACTION FOR CLASSIFICATION OF HYPERSPECTRAL IMAGES WITH LIMITED TRAINING SAMPLES

    Directory of Open Access Journals (Sweden)

    A. Kianisarkaleh

    2015-12-01

    Full Text Available Feature extraction plays a key role in hyperspectral images classification. Using unlabeled samples, often unlimitedly available, unsupervised and semisupervised feature extraction methods show better performance when limited number of training samples exists. This paper illustrates the importance of selecting appropriate unlabeled samples that used in feature extraction methods. Also proposes a new method for unlabeled samples selection using spectral and spatial information. The proposed method has four parts including: PCA, prior classification, posterior classification and sample selection. As hyperspectral image passes these parts, selected unlabeled samples can be used in arbitrary feature extraction methods. The effectiveness of the proposed unlabeled selected samples in unsupervised and semisupervised feature extraction is demonstrated using two real hyperspectral datasets. Results show that through selecting appropriate unlabeled samples, the proposed method can improve the performance of feature extraction methods and increase classification accuracy.

  12. Electrophysiological correlates of early attentional feature selection and distractor filtering

    NARCIS (Netherlands)

    Akyürek, Elkan G.; Schubö, Anna

    Using electrophysiology, the attentional functions of target selection and distractor filtering were investigated during visual search. Observers searched for multiple tilted line segments amidst vertical distractors. In different conditions, observers were either looking for a specific line

  13. Feature and Model Selection in Feedforward Neural Networks

    Science.gov (United States)

    1994-06-01

    smaller than those experienced with the derivative-based saliencies. However, a minimal number of nodes were used to analyze the FLUIR problem, these...A4m. 101 Table 15. FLUIR Problem: Saliency Metric Loadings after Varimax Rotation Features Saliency Metrics 1 2 3 4 5 6 7181 1.__ _ 1_1 1 2 1 1 1 1 1

  14. Linear feature selection in texture analysis - A PLS based method

    DEFF Research Database (Denmark)

    Marques, Joselene; Igel, Christian; Lillholm, Martin

    2013-01-01

    We present a texture analysis methodology that combined uncommitted machine-learning techniques and partial least square (PLS) in a fully automatic framework. Our approach introduces a robust PLS-based dimensionality reduction (DR) step to specifically address outliers and high-dimensional featur...

  15. Two-stage perceptual learning to break visual crowding.

    Science.gov (United States)

    Zhu, Ziyun; Fan, Zhenzhi; Fang, Fang

    2016-01-01

    When a target is presented with nearby flankers in the peripheral visual field, it becomes harder to identify, which is referred to as crowding. Crowding sets a fundamental limit of object recognition in peripheral vision, preventing us from fully appreciating cluttered visual scenes. We trained adult human subjects on a crowded orientation discrimination task and investigated whether crowding could be completely eliminated by training. We discovered a two-stage learning process with this training task. In the early stage, when the target and flankers were separated beyond a certain distance, subjects acquired a relatively general ability to break crowding, as evidenced by the fact that the breaking of crowding could transfer to another crowded orientation, even a crowded motion stimulus, although the transfer to the opposite visual hemi-field was weak. In the late stage, like many classical perceptual learning effects, subjects' performance gradually improved and showed specificity to the trained orientation. We also found that, when the target and flankers were spaced too finely, training could only reduce, rather than completely eliminate, the crowding effect. This two-stage learning process illustrates a learning strategy for our brain to deal with the notoriously difficult problem of identifying peripheral objects in clutter. The brain first learned to solve the "easy and general" part of the problem (i.e., improving the processing resolution and segmenting the target and flankers) and then tackle the "difficult and specific" part (i.e., refining the representation of the target).

  16. Runway Operations Planning: A Two-Stage Heuristic Algorithm

    Science.gov (United States)

    Anagnostakis, Ioannis; Clarke, John-Paul

    2003-01-01

    The airport runway is a scarce resource that must be shared by different runway operations (arrivals, departures and runway crossings). Given the possible sequences of runway events, careful Runway Operations Planning (ROP) is required if runway utilization is to be maximized. From the perspective of departures, ROP solutions are aircraft departure schedules developed by optimally allocating runway time for departures given the time required for arrivals and crossings. In addition to the obvious objective of maximizing throughput, other objectives, such as guaranteeing fairness and minimizing environmental impact, can also be incorporated into the ROP solution subject to constraints introduced by Air Traffic Control (ATC) procedures. This paper introduces a two stage heuristic algorithm for solving the Runway Operations Planning (ROP) problem. In the first stage, sequences of departure class slots and runway crossings slots are generated and ranked based on departure runway throughput under stochastic conditions. In the second stage, the departure class slots are populated with specific flights from the pool of available aircraft, by solving an integer program with a Branch & Bound algorithm implementation. Preliminary results from this implementation of the two-stage algorithm on real-world traffic data are presented.

  17. Two-Stage Heuristic Algorithm for Aircraft Recovery Problem

    Directory of Open Access Journals (Sweden)

    Cheng Zhang

    2017-01-01

    Full Text Available This study focuses on the aircraft recovery problem (ARP. In real-life operations, disruptions always cause schedule failures and make airlines suffer from great loss. Therefore, the main objective of the aircraft recovery problem is to minimize the total recovery cost and solve the problem within reasonable runtimes. An aircraft recovery model (ARM is proposed herein to formulate the ARP and use feasible line of flights as the basic variables in the model. We define the feasible line of flights (LOFs as a sequence of flights flown by an aircraft within one day. The number of LOFs exponentially grows with the number of flights. Hence, a two-stage heuristic is proposed to reduce the problem scale. The algorithm integrates a heuristic scoring procedure with an aggregated aircraft recovery model (AARM to preselect LOFs. The approach is tested on five real-life test scenarios. The computational results show that the proposed model provides a good formulation of the problem and can be solved within reasonable runtimes with the proposed methodology. The two-stage heuristic significantly reduces the number of LOFs after each stage and finally reduces the number of variables and constraints in the aircraft recovery model.

  18. Two-Stage Orthogonal Least Squares Methods for Neural Network Construction.

    Science.gov (United States)

    Zhang, Long; Li, Kang; Bai, Er-Wei; Irwin, George W

    2015-08-01

    A number of neural networks can be formulated as the linear-in-the-parameters models. Training such networks can be transformed to a model selection problem where a compact model is selected from all the candidates using subset selection algorithms. Forward selection methods are popular fast subset selection approaches. However, they may only produce suboptimal models and can be trapped into a local minimum. More recently, a two-stage fast recursive algorithm (TSFRA) combining forward selection and backward model refinement has been proposed to improve the compactness and generalization performance of the model. This paper proposes unified two-stage orthogonal least squares methods instead of the fast recursive-based methods. In contrast to the TSFRA, this paper derives a new simplified relationship between the forward and the backward stages to avoid repetitive computations using the inherent orthogonal properties of the least squares methods. Furthermore, a new term exchanging scheme for backward model refinement is introduced to reduce computational demand. Finally, given the error reduction ratio criterion, effective and efficient forward and backward subset selection procedures are proposed. Extensive examples are presented to demonstrate the improved model compactness constructed by the proposed technique in comparison with some popular methods.

  19. Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets

    Science.gov (United States)

    Aalaei, Shokoufeh; Shahraki, Hadi; Rowhanimanesh, Alireza; Eslami, Saeid

    2016-01-01

    Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we employed three different classifiers artificial neural network (ANN) and PS-classifier and genetic algorithm based classifier (GA-classifier) on Wisconsin breast cancer datasets include Wisconsin breast cancer dataset (WBC), Wisconsin diagnosis breast cancer (WDBC), and Wisconsin prognosis breast cancer (WPBC). Results: For WBC dataset, it is observed that feature selection improved the accuracy of all classifiers expect of ANN and the best accuracy with feature selection achieved by PS-classifier. For WDBC and WPBC, results show feature selection improved accuracy of all three classifiers and the best accuracy with feature selection achieved by ANN. Also specificity and sensitivity improved after feature selection. Conclusion: The results show that feature selection can improve accuracy, specificity and sensitivity of classifiers. Result of this study is comparable with the other studies on Wisconsin breast cancer datasets. PMID:27403253

  20. Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets

    Directory of Open Access Journals (Sweden)

    Shokoufeh Aalaei

    2016-05-01

    Full Text Available Objective(s: This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we employed three different classifiers artificial neural network (ANN and PS-classifier and genetic algorithm based classifier (GA-classifier on Wisconsin breast cancer datasets include Wisconsin breast cancer dataset (WBC, Wisconsin diagnosis breast cancer (WDBC, and Wisconsin prognosis breast cancer (WPBC. Results: For WBC dataset, it is observed that feature selection improved the accuracy of all classifiers expect of ANN and the best accuracy with feature selection achieved by PS-classifier. For WDBC and WPBC, results show feature selection improved accuracy of all three classifiers and the best accuracy with feature selection achieved by ANN. Also specificity and sensitivity improved after feature selection. Conclusion: The results show that feature selection can improve accuracy, specificity and sensitivity of classifiers. Result of this study is comparable with the other studies on Wisconsin breast cancer datasets.

  1. Feature selection based on mutual information and redundancy-synergy coefficient

    Institute of Scientific and Technical Information of China (English)

    杨胜; 顾钧

    2004-01-01

    Mutual information is an important information measure for feature subset. In this paper, a hashing mechanism is proposed to calculate the mutual information on the feature subset. Redundancy-synergy coefficient, a novel redundancy and synergy measure of features to express the class feature, is defined by mutual information. The information maximization rule was applied to derive the heuristic feature subset selection method based on mutual information and redundancy-synergy coefficient. Our experiment results showed the good performance of the new feature selection method.

  2. Our Selections and Decisions: Inherent Features of the Nervous System?

    Science.gov (United States)

    Rösler, Frank

    The chapter summarizes findings on the neuronal bases of decisionmaking. Taking the phenomenon of selection it will be explained that systems built only from excitatory and inhibitory neuron (populations) have the emergent property of selecting between different alternatives. These considerations suggest that there exists a hierarchical architecture with central selection switches. However, in such a system, functions of selection and decision-making are not localized, but rather emerge from an interaction of several participating networks. These are, on the one hand, networks that process specific input and output representations and, on the other hand, networks that regulate the relative activation/inhibition of the specific input and output networks. These ideas are supported by recent empirical evidence. Moreover, other studies show that rather complex psychological variables, like subjective probability estimates, expected gains and losses, prediction errors, etc., do have biological correlates, i.e., they can be localized in time and space as activation states of neural networks and single cells. These findings suggest that selections and decisions are consequences of an architecture which, seen from a biological perspective, is fully deterministic. However, a transposition of such nomothetic functional principles into the idiographic domain, i.e., using them as elements for comprehensive 'mechanistic' explanations of individual decisions, seems not to be possible because of principle limitations. Therefore, individual decisions will remain predictable by means of probabilistic models alone.

  3. Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection

    Science.gov (United States)

    Sarojini, Balakrishnan; Ramaraj, Narayanasamy; Nickolas, Savarimuthu

    Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.

  4. Multi-Objective Feature Subset Selection using Non-dominated Sorting Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    A. Khan

    2015-02-01

    Full Text Available This paper presents an evolutionary algorithm based technique to solve multi-objective feature subset selection problem. The data used for classification contains large number of features called attributes. Some of these attributes are not relevant and needs to be eliminated. In classification procedure, each feature has an effect on the accuracy, cost and learning time of the classifier. So, there is a strong requirement to select a subset of the features before building the classifier. This proposed technique treats feature subset selection as multi-objective optimization problem. This research uses one of the latest multi-objective genetic algorithms (NSGA - II. The fitness value of a particular feature subset is measured by using ID3. The testing accuracy acquired is then assigned to the fitness value. This technique is tested on several datasets taken from the UCI machine repository. The experiments demonstrate the feasibility of using NSGA-II for feature subset selection.

  5. Neighbourhood search feature selection method for content-based mammogram retrieval.

    Science.gov (United States)

    Chandy, D Abraham; Christinal, A Hepzibah; Theodore, Alwyn John; Selvan, S Easter

    2017-03-01

    Content-based image retrieval plays an increasing role in the clinical process for supporting diagnosis. This paper proposes a neighbourhood search method to select the near-optimal feature subsets for the retrieval of mammograms from the Mammographic Image Analysis Society (MIAS) database. The features based on grey level cooccurrence matrix, Daubechies-4 wavelet, Gabor, Cohen-Daubechies-Feauveau 9/7 wavelet and Zernike moments are extracted from mammograms available in the MIAS database to form the combined or fused feature set for testing various feature selection methods. The performance of feature selection methods is evaluated using precision, storage requirement and retrieval time measures. Using the proposed method, a significant improvement is achieved in mean precision rate and feature dimension. The results show that the proposed method outperforms the state-of-the-art feature selection methods.

  6. A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark

    Directory of Open Access Journals (Sweden)

    Yong Wang

    2016-02-01

    Full Text Available Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly.

  7. PSP_MCSVM: brainstorming consensus prediction of protein secondary structures using two-stage multiclass support vector machines.

    Science.gov (United States)

    Chatterjee, Piyali; Basu, Subhadip; Kundu, Mahantapas; Nasipuri, Mita; Plewczynski, Dariusz

    2011-09-01

    Secondary structure prediction is a crucial task for understanding the variety of protein structures and performed biological functions. Prediction of secondary structures for new proteins using their amino acid sequences is of fundamental importance in bioinformatics. We propose a novel technique to predict protein secondary structures based on position-specific scoring matrices (PSSMs) and physico-chemical properties of amino acids. It is a two stage approach involving multiclass support vector machines (SVMs) as classifiers for three different structural conformations, viz., helix, sheet and coil. In the first stage, PSSMs obtained from PSI-BLAST and five specially selected physicochemical properties of amino acids are fed into SVMs as features for sequence-to-structure prediction. Confidence values for forming helix, sheet and coil that are obtained from the first stage SVM are then used in the second stage SVM for performing structure-to-structure prediction. The two-stage cascaded classifiers (PSP_MCSVM) are trained with proteins from RS126 dataset. The classifiers are finally tested on target proteins of critical assessment of protein structure prediction experiment-9 (CASP9). PSP_MCSVM with brainstorming consensus procedure performs better than the prediction servers like Predator, DSC, SIMPA96, for randomly selected proteins from CASP9 targets. The overall performance is found to be comparable with the current state-of-the art. PSP_MCSVM source code, train-test datasets and supplementary files are available freely in public domain at: http://sysbio.icm.edu.pl/secstruct and http://code.google.com/p/cmater-bioinfo/

  8. A feature selection method based on multiple kernel learning with expression profiles of different types.

    Science.gov (United States)

    Du, Wei; Cao, Zhongbo; Song, Tianci; Li, Ying; Liang, Yanchun

    2017-01-01

    With the development of high-throughput technology, the researchers can acquire large number of expression data with different types from several public databases. Because most of these data have small number of samples and hundreds or thousands features, how to extract informative features from expression data effectively and robustly using feature selection technique is challenging and crucial. So far, a mass of many feature selection approaches have been proposed and applied to analyse expression data of different types. However, most of these methods only are limited to measure the performances on one single type of expression data by accuracy or error rate of classification. In this article, we propose a hybrid feature selection method based on Multiple Kernel Learning (MKL) and evaluate the performance on expression datasets of different types. Firstly, the relevance between features and classifying samples is measured by using the optimizing function of MKL. In this step, an iterative gradient descent process is used to perform the optimization both on the parameters of Support Vector Machine (SVM) and kernel confidence. Then, a set of relevant features is selected by sorting the optimizing function of each feature. Furthermore, we apply an embedded scheme of forward selection to detect the compact feature subsets from the relevant feature set. We not only compare the classification accuracy with other methods, but also compare the stability, similarity and consistency of different algorithms. The proposed method has a satisfactory capability of feature selection for analysing expression datasets of different types using different performance measurements.

  9. Unbiased Feature Selection in Learning Random Forests for High-Dimensional Data

    Directory of Open Access Journals (Sweden)

    Thanh-Tung Nguyen

    2015-01-01

    Full Text Available Random forests (RFs have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.

  10. Unbiased feature selection in learning random forests for high-dimensional data.

    Science.gov (United States)

    Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi

    2015-01-01

    Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.

  11. Comparison of Feature Selection Techniques in Machine Learning for Anatomical Brain MRI in Dementia.

    Science.gov (United States)

    Tohka, Jussi; Moradi, Elaheh; Huttunen, Heikki

    2016-07-01

    We present a comparative split-half resampling analysis of various data driven feature selection and classification methods for the whole brain voxel-based classification analysis of anatomical magnetic resonance images. We compared support vector machines (SVMs), with or without filter based feature selection, several embedded feature selection methods and stability selection. While comparisons of the accuracy of various classification methods have been reported previously, the variability of the out-of-training sample classification accuracy and the set of selected features due to independent training and test sets have not been previously addressed in a brain imaging context. We studied two classification problems: 1) Alzheimer's disease (AD) vs. normal control (NC) and 2) mild cognitive impairment (MCI) vs. NC classification. In AD vs. NC classification, the variability in the test accuracy due to the subject sample did not vary between different methods and exceeded the variability due to different classifiers. In MCI vs. NC classification, particularly with a large training set, embedded feature selection methods outperformed SVM-based ones with the difference in the test accuracy exceeding the test accuracy variability due to the subject sample. The filter and embedded methods produced divergent feature patterns for MCI vs. NC classification that suggests the utility of the embedded feature selection for this problem when linked with the good generalization performance. The stability of the feature sets was strongly correlated with the number of features selected, weakly correlated with the stability of classification accuracy, and uncorrelated with the average classification accuracy.

  12. Feature Selection Based on the SVM Weight Vector for Classification of Dementia.

    Science.gov (United States)

    Bron, Esther E; Smits, Marion; Niessen, Wiro J; Klein, Stefan

    2015-09-01

    Computer-aided diagnosis of dementia using a support vector machine (SVM) can be improved with feature selection. The relevance of individual features can be quantified from the SVM weights as a significance map (p-map). Although these p-maps previously showed clusters of relevant voxels in dementia-related brain regions, they have not yet been used for feature selection. Therefore, we introduce two novel feature selection methods based on p-maps using a direct approach (filter) and an iterative approach (wrapper). To evaluate these p-map feature selection methods, we compared them with methods based on the SVM weight vector directly, t-statistics, and expert knowledge. We used MRI data from the Alzheimer's disease neuroimaging initiative classifying Alzheimer's disease (AD) patients, mild cognitive impairment (MCI) patients who converted to AD (MCIc), MCI patients who did not convert to AD (MCInc), and cognitively normal controls (CN). Features for each voxel were derived from gray matter morphometry. Feature selection based on the SVM weights gave better results than t-statistics and expert knowledge. The p-map methods performed slightly better than those using the weight vector. The wrapper method scored better than the filter method. Recursive feature elimination based on the p-map improved most for AD-CN: the area under the receiver-operating-characteristic curve (AUC) significantly increased from 90.3% without feature selection to 92.0% when selecting 1.5%-3% of the features. This feature selection method also improved the other classifications: AD-MCI 0.1% improvement in AUC (not significant), MCI-CN 0.7%, and MCIc-MCInc 0.1% (not significant). Although the performance improvement due to feature selection was limited, the methods based on the p-map generally had the best performance, and were therefore better in estimating the relevance of individual features.

  13. A Two-Stage Penalized Logistic Regression Approach to Case-Control Genome-Wide Association Studies

    Directory of Open Access Journals (Sweden)

    Jingyuan Zhao

    2012-01-01

    Full Text Available We propose a two-stage penalized logistic regression approach to case-control genome-wide association studies. This approach consists of a screening stage and a selection stage. In the screening stage, main-effect and interaction-effect features are screened by using L1-penalized logistic like-lihoods. In the selection stage, the retained features are ranked by the logistic likelihood with the smoothly clipped absolute deviation (SCAD penalty (Fan and Li, 2001 and Jeffrey’s Prior penalty (Firth, 1993, a sequence of nested candidate models are formed, and the models are assessed by a family of extended Bayesian information criteria (J. Chen and Z. Chen, 2008. The proposed approach is applied to the analysis of the prostate cancer data of the Cancer Genetic Markers of Susceptibility (CGEMS project in the National Cancer Institute, USA. Simulation studies are carried out to compare the approach with the pair-wise multiple testing approach (Marchini et al. 2005 and the LASSO-patternsearch algorithm (Shi et al. 2007.

  14. Space Station Freedom carbon dioxide removal assembly two-stage rotary sliding vane pump

    Science.gov (United States)

    Matteau, Dennis

    1992-07-01

    The design and development of a positive displacement pump selected to operate as an essential part of the carbon dioxide removal assembly (CDRA) are described. An oilless two-stage rotary sliding vane pump was selected as the optimum concept to meet the CDRA application requirements. This positive displacement pump is characterized by low weight and small envelope per unit flow, ability to pump saturated gases and moderate amount of liquid, small clearance volumes, and low vibration. It is easily modified to accommodate several stages on a single shaft optimizing space and weight, which makes the concept ideal for a range of demanding space applications.

  15. Industrial demonstration plant for the gasification of herb residue by fluidized bed two-stage process.

    Science.gov (United States)

    Zeng, Xi; Shao, Ruyi; Wang, Fang; Dong, Pengwei; Yu, Jian; Xu, Guangwen

    2016-04-01

    A fluidized bed two-stage gasification process, consisting of a fluidized-bed (FB) pyrolyzer and a transport fluidized bed (TFB) gasifier, has been proposed to gasify biomass for fuel gas production with low tar content. On the basis of our previous fundamental study, an autothermal two-stage gasifier has been designed and built for gasify a kind of Chinese herb residue with a treating capacity of 600 kg/h. The testing data in the operational stable stage of the industrial demonstration plant showed that when keeping the reaction temperatures of pyrolyzer and gasifier respectively at about 700 °C and 850 °C, the heating value of fuel gas can reach 1200 kcal/Nm(3), and the tar content in the produced fuel gas was about 0.4 g/Nm(3). The results from this pilot industrial demonstration plant fully verified the feasibility and technical features of the proposed FB two-stage gasification process.

  16. An Approach for Optimal Feature Subset Selection using a New Term Weighting Scheme and Mutual Information

    Directory of Open Access Journals (Sweden)

    Shine N Das

    2011-01-01

    Full Text Available With the development of the web, large numbers of documents are available on the Internet and they are growing drastically day by day. Hence automatic text categorization becomes more and more important for dealing with massive data. However the major problem of document categorization is the high dimensionality of feature space.  The measures to decrease the feature dimension under not decreasing recognition effect are called the problems of feature optimum extraction or selection. Dealing with reduced relevant feature set can be more efficient and effective. The objective of feature selection is to find a subset of features that have all characteristics of the full features set. Instead Dependency among features is also important for classification. During past years, various metrics have been proposed to measure the dependency among different features. A popular approach to realize dependency is maximal relevance feature selection: selecting the features with the highest relevance to the target class. A new feature weighting scheme, we proposed have got a tremendous improvements in dimensionality reduction of the feature space. The experimental results clearly show that this integrated method works far better than the others.

  17. Feature Subset Selection by Estimation of Distribution Algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Cantu-Paz, E

    2002-01-17

    This paper describes the application of four evolutionary algorithms to the identification of feature subsets for classification problems. Besides a simple GA, the paper considers three estimation of distribution algorithms (EDAs): a compact GA, an extended compact GA, and the Bayesian Optimization Algorithm. The objective is to determine if the EDAs present advantages over the simple GA in terms of accuracy or speed in this problem. The experiments used a Naive Bayes classifier and public-domain and artificial data sets. In contrast with previous studies, we did not find evidence to support or reject the use of EDAs for this problem.

  18. A stratified two-stage sampling design for digital soil mapping in a Mediterranean basin

    Science.gov (United States)

    Blaschek, Michael; Duttmann, Rainer

    2015-04-01

    The quality of environmental modelling results often depends on reliable soil information. In order to obtain soil data in an efficient manner, several sampling strategies are at hand depending on the level of prior knowledge and the overall objective of the planned survey. This study focuses on the collection of soil samples considering available continuous secondary information in an undulating, 16 km²-sized river catchment near Ussana in southern Sardinia (Italy). A design-based, stratified, two-stage sampling design has been applied aiming at the spatial prediction of soil property values at individual locations. The stratification based on quantiles from density functions of two land-surface parameters - topographic wetness index and potential incoming solar radiation - derived from a digital elevation model. Combined with four main geological units, the applied procedure led to 30 different classes in the given test site. Up to six polygons of each available class were selected randomly excluding those areas smaller than 1ha to avoid incorrect location of the points in the field. Further exclusion rules were applied before polygon selection masking out roads and buildings using a 20m buffer. The selection procedure was repeated ten times and the set of polygons with the best geographical spread were chosen. Finally, exact point locations were selected randomly from inside the chosen polygon features. A second selection based on the same stratification and following the same methodology (selecting one polygon instead of six) was made in order to create an appropriate validation set. Supplementary samples were obtained during a second survey focusing on polygons that have either not been considered during the first phase at all or were not adequately represented with respect to feature size. In total, both field campaigns produced an interpolation set of 156 samples and a validation set of 41 points. The selection of sample point locations has been done using

  19. A two-stage method for inverse medium scattering

    KAUST Repository

    Ito, Kazufumi

    2013-03-01

    We present a novel numerical method to the time-harmonic inverse medium scattering problem of recovering the refractive index from noisy near-field scattered data. The approach consists of two stages, one pruning step of detecting the scatterer support, and one resolution enhancing step with nonsmooth mixed regularization. The first step is strictly direct and of sampling type, and it faithfully detects the scatterer support. The second step is an innovative application of nonsmooth mixed regularization, and it accurately resolves the scatterer size as well as intensities. The nonsmooth model can be efficiently solved by a semi-smooth Newton-type method. Numerical results for two- and three-dimensional examples indicate that the new approach is accurate, computationally efficient, and robust with respect to data noise. © 2012 Elsevier Inc.

  20. Laparoscopic management of a two staged gall bladdertorsion

    Institute of Scientific and Technical Information of China (English)

    2015-01-01

    Gall bladder torsion (GBT) is a relatively uncommonentity and rarely diagnosed preoperatively. A constantfactor in all occurrences of GBT is a freely mobilegall bladder due to congenital or acquired anomalies.GBT is commonly observed in elderly white females.We report a 77-year-old, Caucasian lady who wasoriginally diagnosed as gall bladder perforation butwas eventually found with a two staged torsion of thegall bladder with twisting of the Riedel's lobe (partof tongue like projection of liver segment 4A). Thistogether, has not been reported in literature, to thebest of our knowledge. We performed laparoscopiccholecystectomy and she had an uneventful postoperativeperiod. GBT may create a diagnostic dilemmain the context of acute cholecystitis. Timely diagnosisand intervention is necessary, with extra care whileoperating as the anatomy is generally distorted. Thefundus first approach can be useful due to alteredanatomy in the region of Calot's triangle. Laparoscopiccholecystectomy has the benefit of early recovery.

  1. Lightweight Concrete Produced Using a Two-Stage Casting Process

    Directory of Open Access Journals (Sweden)

    Jin Young Yoon

    2015-03-01

    Full Text Available The type of lightweight aggregate and its volume fraction in a mix determine the density of lightweight concrete. Minimizing the density obviously requires a higher volume fraction, but this usually causes aggregates segregation in a conventional mixing process. This paper proposes a two-stage casting process to produce a lightweight concrete. This process involves placing lightweight aggregates in a frame and then filling in the remaining interstitial voids with cementitious grout. The casting process results in the lowest density of lightweight concrete, which consequently has low compressive strength. The irregularly shaped aggregates compensate for the weak point in terms of strength while the round-shape aggregates provide a strength of 20 MPa. Therefore, the proposed casting process can be applied for manufacturing non-structural elements and structural composites requiring a very low density and a strength of at most 20 MPa.

  2. The hybrid two stage anticlockwise cycle for ecological energy conversion

    Directory of Open Access Journals (Sweden)

    Cyklis Piotr

    2016-01-01

    Full Text Available The anticlockwise cycle is commonly used for refrigeration, air conditioning and heat pumps applications. The application of refrigerant in the compression cycle is within the temperature limits of the triple point and the critical point. New refrigerants such as 1234yf or 1234ze have many disadvantages, therefore natural refrigerants application is favourable. The carbon dioxide and water can be applied only in the hybrid two stages cycle. The possibilities of this solutions are shown for refrigerating applications, as well some experimental results of the adsorption-compression double stages cycle, powered with solar collectors are shown. As a high temperature cycle the adsorption system is applied. The low temperature cycle is the compression stage with carbon dioxide as a working fluid. This allows to achieve relatively high COP for low temperature cycle and for the whole system.

  3. Selecting Testlet Features With Predictive Value for the Testlet Effect

    Directory of Open Access Journals (Sweden)

    Muirne C. S. Paap

    2015-04-01

    Full Text Available High-stakes tests often consist of sets of questions (i.e., items grouped around a common stimulus. Such groupings of items are often called testlets. A basic assumption of item response theory (IRT, the mathematical model commonly used in the analysis of test data, is that individual items are independent of one another. The potential dependency among items within a testlet is often ignored in practice. In this study, a technique called tree-based regression (TBR was applied to identify key features of stimuli that could properly predict the dependence structure of testlet data for the Analytical Reasoning section of a high-stakes test. Relevant features identified included Percentage of “If” Clauses, Number of Entities, Theme/Topic, and Predicate Propositional Density; the testlet effect was smallest for stimuli that contained 31% or fewer “if” clauses, contained 9.8% or fewer verbs, and had Media or Animals as the main theme. This study illustrates the merits of TBR in the analysis of test data.

  4. Machine Learning Feature Selection for Tuning Memory Page Swapping

    Science.gov (United States)

    2013-09-01

    erroneous and generally results in useful pages being paged out too early, only to be paged back in shortly there after. [1] The first in/first out ( FIFO ...the tail of the queue are selected. This algorithm has been shown to have significant shortcomings. When using a FIFO PRA, it is possible to encounter a...page which was just paged out. FIFO is therefore, a sub-optimal page replacement algorithm. Least recently used (LRU) is incredibly simple in concept

  5. Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach

    Directory of Open Access Journals (Sweden)

    Daniel Peralta

    2015-01-01

    Full Text Available Nowadays, many disciplines have to deal with big datasets that additionally involve a high number of features. Feature selection methods aim at eliminating noisy, redundant, or irrelevant features that may deteriorate the classification performance. However, traditional methods lack enough scalability to cope with datasets of millions of instances and extract successful results in a delimited time. This paper presents a feature selection algorithm based on evolutionary computation that uses the MapReduce paradigm to obtain subsets of features from big datasets. The algorithm decomposes the original dataset in blocks of instances to learn from them in the map phase; then, the reduce phase merges the obtained partial results into a final vector of feature weights, which allows a flexible application of the feature selection procedure using a threshold to determine the selected subset of features. The feature selection method is evaluated by using three well-known classifiers (SVM, Logistic Regression, and Naive Bayes implemented within the Spark framework to address big data problems. In the experiments, datasets up to 67 millions of instances and up to 2000 attributes have been managed, showing that this is a suitable framework to perform evolutionary feature selection, improving both the classification accuracy and its runtime when dealing with big data problems.

  6. Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals

    Science.gov (United States)

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141

  7. Feature selection for anomaly–based network intrusion detection using cluster validity indices

    CSIR Research Space (South Africa)

    Naidoo, T

    2015-09-01

    Full Text Available A feature selection algorithm that is novel in the context of anomaly–based network intrusion detection is proposed in this paper. The distinguishing factor of the proposed feature selection algorithm is its complete lack of dependency on labelled...

  8. SQL/JavaScript Hybrid Worms As Two-stage Quines

    CERN Document Server

    Orlicki, José I

    2009-01-01

    Delving into present trends and anticipating future malware trends, a hybrid, SQL on the server-side, JavaScript on the client-side, self-replicating worm based on two-stage quines was designed and implemented on an ad-hoc scenario instantiating a very common software pattern. The proof of concept code combines techniques seen in the wild, in the form of SQL injections leading to cross-site scripting JavaScript inclusion, and seen in the laboratory, in the form of SQL quines propa- gated via RFIDs, resulting in a hybrid code injection. General features of hybrid worms are also discussed.

  9. Feature Selection in Classification of Eye Movements Using Electrooculography for Activity Recognition

    Directory of Open Access Journals (Sweden)

    S. Mala

    2014-01-01

    Full Text Available Activity recognition is needed in different requisition, for example, reconnaissance system, patient monitoring, and human-computer interfaces. Feature selection plays an important role in activity recognition, data mining, and machine learning. In selecting subset of features, an efficient evolutionary algorithm Differential Evolution (DE, a very efficient optimizer, is used for finding informative features from eye movements using electrooculography (EOG. Many researchers use EOG signals in human-computer interactions with various computational intelligence methods to analyze eye movements. The proposed system involves analysis of EOG signals using clearness based features, minimum redundancy maximum relevance features, and Differential Evolution based features. This work concentrates more on the feature selection algorithm based on DE in order to improve the classification for faultless activity recognition.

  10. A two-stage method for microcalcification cluster segmentation in mammography by deformable models

    Energy Technology Data Exchange (ETDEWEB)

    Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.; Karahaliou, A.; Costaridou, L., E-mail: costarid@upatras.gr [Department of Medical Physics, School of Medicine, University of Patras, Patras 26504 (Greece); Vassiou, K. [Department of Anatomy, School of Medicine, University of Thessaly, Larissa 41500 (Greece)

    2015-10-15

    Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods are applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST{sub cluster}, average of minimum distance—AMINDIST{sub cluster}) and the area overlap measure (AOM{sub cluster}). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing

  11. Performance analysis of RDF gasification in a two stage fluidized bed-plasma process.

    Science.gov (United States)

    Materazzi, M; Lettieri, P; Taylor, R; Chapman, C

    2016-01-01

    The major technical problems faced by stand-alone fluidized bed gasifiers (FBG) for waste-to gas applications are intrinsically related to the composition and physical properties of waste materials, such as RDF. The high quantity of ash and volatile material in RDF can provide a decrease in thermal output, create high ash clinkering, and increase emission of tars and CO2, thus affecting the operability for clean syngas generation at industrial scale. By contrast, a two-stage process which separates primary gasification and selective tar and ash conversion would be inherently more forgiving and stable. This can be achieved with the use of a separate plasma converter, which has been successfully used in conjunction with conventional thermal treatment units, for the ability to 'polish' the producer gas by organic contaminants and collect the inorganic fraction in a molten (and inert) state. This research focused on the performance analysis of a two-stage fluid bed gasification-plasma process to transform solid waste into clean syngas. Thermodynamic assessment using the two-stage equilibrium method was carried out to determine optimum conditions for the gasification of RDF and to understand the limitations and influence of the second stage on the process performance (gas heating value, cold gas efficiency, carbon conversion efficiency), along with other parameters. Comparison with a different thermal refining stage, i.e. thermal cracking (via partial oxidation) was also performed. The analysis is supported by experimental data from a pilot plant.

  12. Effect of feature-selective attention on neuronal responses in macaque area MT.

    Science.gov (United States)

    Chen, X; Hoffmann, K-P; Albright, T D; Thiele, A

    2012-03-01

    Attention influences visual processing in striate and extrastriate cortex, which has been extensively studied for spatial-, object-, and feature-based attention. Most studies exploring neural signatures of feature-based attention have trained animals to attend to an object identified by a certain feature and ignore objects/displays identified by a different feature. Little is known about the effects of feature-selective attention, where subjects attend to one stimulus feature domain (e.g., color) of an object while features from different domains (e.g., direction of motion) of the same object are ignored. To study this type of feature-selective attention in area MT in the middle temporal sulcus, we trained macaque monkeys to either attend to and report the direction of motion of a moving sine wave grating (a feature for which MT neurons display strong selectivity) or attend to and report its color (a feature for which MT neurons have very limited selectivity). We hypothesized that neurons would upregulate their firing rate during attend-direction conditions compared with attend-color conditions. We found that feature-selective attention significantly affected 22% of MT neurons. Contrary to our hypothesis, these neurons did not necessarily increase firing rate when animals attended to direction of motion but fell into one of two classes. In one class, attention to color increased the gain of stimulus-induced responses compared with attend-direction conditions. The other class displayed the opposite effects. Feature-selective activity modulations occurred earlier in neurons modulated by attention to color compared with neurons modulated by attention to motion direction. Thus feature-selective attention influences neuronal processing in macaque area MT but often exhibited a mismatch between the preferred stimulus dimension (direction of motion) and the preferred attention dimension (attention to color).

  13. A robust and accurate method for feature selection and prioritization from multi-class OMICs data.

    Directory of Open Access Journals (Sweden)

    Vittorio Fortino

    Full Text Available Selecting relevant features is a common task in most OMICs data analysis, where the aim is to identify a small set of key features to be used as biomarkers. To this end, two alternative but equally valid methods are mainly available, namely the univariate (filter or the multivariate (wrapper approach. The stability of the selected lists of features is an often neglected but very important requirement. If the same features are selected in multiple independent iterations, they more likely are reliable biomarkers. In this study, we developed and evaluated the performance of a novel method for feature selection and prioritization, aiming at generating robust and stable sets of features with high predictive power. The proposed method uses the fuzzy logic for a first unbiased feature selection and a Random Forest built from conditional inference trees to prioritize the candidate discriminant features. Analyzing several multi-class gene expression microarray data sets, we demonstrate that our technique provides equal or better classification performance and a greater stability as compared to other Random Forest-based feature selection methods.

  14. An Efficient Cost-Sensitive Feature Selection Using Chaos Genetic Algorithm for Class Imbalance Problem

    Directory of Open Access Journals (Sweden)

    Jing Bian

    2016-01-01

    Full Text Available In the era of big data, feature selection is an essential process in machine learning. Although the class imbalance problem has recently attracted a great deal of attention, little effort has been undertaken to develop feature selection techniques. In addition, most applications involving feature selection focus on classification accuracy but not cost, although costs are important. To cope with imbalance problems, we developed a cost-sensitive feature selection algorithm that adds the cost-based evaluation function of a filter feature selection using a chaos genetic algorithm, referred to as CSFSG. The evaluation function considers both feature-acquiring costs (test costs and misclassification costs in the field of network security, thereby weakening the influence of many instances from the majority of classes in large-scale datasets. The CSFSG algorithm reduces the total cost of feature selection and trades off both factors. The behavior of the CSFSG algorithm is tested on a large-scale dataset of network security, using two kinds of classifiers: C4.5 and k-nearest neighbor (KNN. The results of the experimental research show that the approach is efficient and able to effectively improve classification accuracy and to decrease classification time. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.

  15. Effect of Silica Fume on two-stage Concrete Strength

    Science.gov (United States)

    Abdelgader, H. S.; El-Baden, A. S.

    2015-11-01

    Two-stage concrete (TSC) is an innovative concrete that does not require vibration for placing and compaction. TSC is a simple concept; it is made using the same basic constituents as traditional concrete: cement, coarse aggregate, sand and water as well as mineral and chemical admixtures. As its name suggests, it is produced through a two-stage process. Firstly washed coarse aggregate is placed into the formwork in-situ. Later a specifically designed self compacting grout is introduced into the form from the lowest point under gravity pressure to fill the voids, cementing the aggregate into a monolith. The hardened concrete is dense, homogeneous and has in general improved engineering properties and durability. This paper presents the results from a research work attempt to study the effect of silica fume (SF) and superplasticizers admixtures (SP) on compressive and tensile strength of TSC using various combinations of water to cement ratio (w/c) and cement to sand ratio (c/s). Thirty six concrete mixes with different grout constituents were tested. From each mix twenty four standard cylinder samples of size (150mm×300mm) of concrete containing crushed aggregate were produced. The tested samples were made from combinations of w/c equal to: 0.45, 0.55 and 0.85, and three c/s of values: 0.5, 1 and 1.5. Silica fume was added at a dosage of 6% of weight of cement, while superplasticizer was added at a dosage of 2% of cement weight. Results indicated that both tensile and compressive strength of TSC can be statistically derived as a function of w/c and c/s with good correlation coefficients. The basic principle of traditional concrete, which says that an increase in water/cement ratio will lead to a reduction in compressive strength, was shown to hold true for TSC specimens tested. Using a combination of both silica fume and superplasticisers caused a significant increase in strength relative to control mixes.

  16. COMPUTATIONALLY INEXPENSIVE SEQUENTIAL FORWARD FLOATING SELECTION FOR ACQUIRING SIGNIFICANT FEATURES FOR AUTHORSHIP INVARIANCENESS IN WRITER IDENTIFICATION

    OpenAIRE

    Satrya Fajri Pratama; Azah Kamilah Muda; Yun-Huoy Choo; and Noor Azilah Muda

    2011-01-01

    Handwriting is individualistic. The uniqueness of shape and style of handwriting can be used to identify the significant features in authenticating the author of writing. Acquiring these significant features leads to an important research in Writer Identification domain where to find the unique features of individual which also known as Individuality of Handwriting. This paper proposes an improved Sequential Forward Floating Selection method besides the exploration of significant features for...

  17. New evolutions in TRNSYS : a selection of version 16 features

    Energy Technology Data Exchange (ETDEWEB)

    Bradley, D. [Thermal Energy System Specialists, Madison, WI (United States); Kummert, M. [Wisconsin Univ., Madison, WI (United States). Solar Energy Laboratory

    2005-07-01

    TRNSYS is a transient energy simulation package that has undergone continuous improvement since its development in 1975. TRNSYS was initially developed for the simulation of solar thermal processes, but has since expanded into a total energy modeling package. It models each component of an energy system as an individual black box component. Simulating a system involves connecting the inputs and outputs of the components to one another. If certain models are missing, they are quickly developed and added to the package by the international group of developers and users which include the Solar Energy Laboratory at the University of Wisconsin in Madison United States, the Centre Scientifique et Technique du Batiment in Nice France, and Transsolar Energietechnik GmBH in Stuttgart, Germany. This paper presented some of the issues faced by the users in updating the TRNSYS simulation tool to meet the challenges posed by new technologies and to make use of better algorithms and updated computing resources. In particular, it focused on adding new component models to the program and on increasing the ease of use of the program and continuing the trend to move TRNSYS from an academic research tool to a manageable commercial tool. The subset of the features that were added to the sixteenth version of the simulation package in November 2004 were presented. These include modeling the energy transfer between a conditioned building and the surrounding ground, implementing ASHRAE's effective Heat Flow method into TRNSYS, and implementing combined thermal/air flow simulations using a software link between TRNSYS and COMIS or CONTAM for the air flow simulation. A brief description of the hydrogen system components which model hydrogen power systems was also included along with graphical interface enhancements, and a description of simulation engine modifications such as starting time and the drop-in dynamic link libraries (DLL). 12 refs., 5 figs.

  18. Evaluation of Meta-Heuristic Algorithms for Stable Feature Selection

    Directory of Open Access Journals (Sweden)

    Maysam Toghraee

    2016-07-01

    Full Text Available Now a days, developing the science and technology and technology tools, the ability of reviewing and saving the important data has been provided. It is needed to have knowledge for searching the data to reach the necessary useful results. Data mining is searching for big data sources automatically to find patterns and dependencies which are not done by simple statistical analysis. The scope is to study the predictive role and usage domain of data mining in medical science and suggesting a frame for creating, assessing and exploiting the data mining patterns in this field. As it has been found out from previous researches that assessing methods can not be used to specify the data discrepancies, our suggestion is a new approach for assessing the data similarities to find out the relations between the variation in data and stability in selection. Therefore we have chosen meta heuristic methods to be able to choose the best and the stable algorithms among a set of algorithms

  19. Feature Selection for Bayesian Evaluation of Trauma Death Risk

    CERN Document Server

    Jakaite, L

    2008-01-01

    In the last year more than 70,000 people have been brought to the UK hospitals with serious injuries. Each time a clinician has to urgently take a patient through a screening procedure to make a reliable decision on the trauma treatment. Typically, such procedure comprises around 20 tests; however the condition of a trauma patient remains very difficult to be tested properly. What happens if these tests are ambiguously interpreted, and information about the severity of the injury will come misleading? The mistake in a decision can be fatal: using a mild treatment can put a patient at risk of dying from posttraumatic shock, while using an overtreatment can also cause death. How can we reduce the risk of the death caused by unreliable decisions? It has been shown that probabilistic reasoning, based on the Bayesian methodology of averaging over decision models, allows clinicians to evaluate the uncertainty in decision making. Based on this methodology, in this paper we aim at selecting the most important screeni...

  20. Adaptive feature selection using v-shaped binary particle swarm optimization

    Science.gov (United States)

    Dong, Hongbin; Zhou, Xiurong

    2017-01-01

    Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers. PMID:28358850

  1. Feature-Selective Attention Adaptively Shifts Noise Correlations in Primary Auditory Cortex.

    Science.gov (United States)

    Downer, Joshua D; Rapone, Brittany; Verhein, Jessica; O'Connor, Kevin N; Sutter, Mitchell L

    2017-05-24

    Sensory environments often contain an overwhelming amount of information, with both relevant and irrelevant information competing for neural resources. Feature attention mediates this competition by selecting the sensory features needed to form a coherent percept. How attention affects the activity of populations of neurons to support this process is poorly understood because population coding is typically studied through simulations in which one sensory feature is encoded without competition. Therefore, to study the effects of feature attention on population-based neural coding, investigations must be extended to include stimuli with both relevant and irrelevant features. We measured noise correlations (rnoise) within small neural populations in primary auditory cortex while rhesus macaques performed a novel feature-selective attention task. We found that the effect of feature-selective attention on rnoise depended not only on the population tuning to the attended feature, but also on the tuning to the distractor feature. To attempt to explain how these observed effects might support enhanced perceptual performance, we propose an extension of a simple and influential model in which shifts in rnoise can simultaneously enhance the representation of the attended feature while suppressing the distractor. These findings present a novel mechanism by which attention modulates neural populations to support sensory processing in cluttered environments.SIGNIFICANCE STATEMENT Although feature-selective attention constitutes one of the building blocks of listening in natural environments, its neural bases remain obscure. To address this, we developed a novel auditory feature-selective attention task and measured noise correlations (rnoise) in rhesus macaque A1 during task performance. Unlike previous studies showing that the effect of attention on rnoise depends on population tuning to the attended feature, we show that the effect of attention depends on the tuning to the

  2. Characterization of component interactions in two-stage axial turbine

    Directory of Open Access Journals (Sweden)

    Adel Ghenaiet

    2016-08-01

    Full Text Available This study concerns the characterization of both the steady and unsteady flows and the analysis of stator/rotor interactions of a two-stage axial turbine. The predicted aerodynamic performances show noticeable differences when simulating the turbine stages simultaneously or separately. By considering the multi-blade per row and the scaling technique, the Computational fluid dynamics (CFD produced better results concerning the effect of pitchwise positions between vanes and blades. The recorded pressure fluctuations exhibit a high unsteadiness characterized by a space–time periodicity described by a double Fourier decomposition. The Fast Fourier Transform FFT analysis of the static pressure fluctuations recorded at different interfaces reveals the existence of principal harmonics and their multiples, and each lobed structure of pressure wave corresponds to the number of vane/blade count. The potential effect is seen to propagate both upstream and downstream of each blade row and becomes accentuated at low mass flow rates. Between vanes and blades, the potential effect is seen to dominate the quasi totality of blade span, while downstream the blades this effect seems to dominate from hub to mid span. Near the shroud the prevailing effect is rather linked to the blade tip flow structure.

  3. A continuous two stage solar coal gasification system

    Science.gov (United States)

    Mathur, V. K.; Breault, R. W.; Lakshmanan, S.; Manasse, F. K.; Venkataramanan, V.

    The characteristics of a two-stage fluidized-bed hybrid coal gasification system to produce syngas from coal, lignite, and peat are described. Devolatilization heat of 823 K is supplied by recirculating gas heated by a solar receiver/coal heater. A second-stage gasifier maintained at 1227 K serves to crack remaining tar and light oil to yield a product free from tar and other condensables, and sulfur can be removed by hot clean-up processes. CO is minimized because the coal is not burned with oxygen, and the product gas contains 50% H2. Bench scale reactors consist of a stage I unit 0.1 m in diam which is fed coal 200 microns in size. A stage II reactor has an inner diam of 0.36 m and serves to gasify the char from stage I. A solar power source of 10 kWt is required for the bench model, and will be obtained from a central receiver with quartz or heat pipe configurations for heat transfer.

  4. Characterization of component interactions in two-stage axial turbine

    Institute of Scientific and Technical Information of China (English)

    Adel Ghenaiet; Kaddour Touil

    2016-01-01

    This study concerns the characterization of both the steady and unsteady flows and the analysis of stator/rotor interactions of a two-stage axial turbine. The predicted aerodynamic perfor-mances show noticeable differences when simulating the turbine stages simultaneously or sepa-rately. By considering the multi-blade per row and the scaling technique, the Computational fluid dynamics (CFD) produced better results concerning the effect of pitchwise positions between vanes and blades. The recorded pressure fluctuations exhibit a high unsteadiness characterized by a space–time periodicity described by a double Fourier decomposition. The Fast Fourier Transform FFT analysis of the static pressure fluctuations recorded at different interfaces reveals the existence of principal harmonics and their multiples, and each lobed structure of pressure wave corresponds to the number of vane/blade count. The potential effect is seen to propagate both upstream and downstream of each blade row and becomes accentuated at low mass flow rates. Between vanes and blades, the potential effect is seen to dominate the quasi totality of blade span, while down-stream the blades this effect seems to dominate from hub to mid span. Near the shroud the prevail-ing effect is rather linked to the blade tip flow structure.

  5. Two stages kinetics of municipal solid waste inoculation composting processes

    Institute of Scientific and Technical Information of China (English)

    XI Bei-dou1; HUANG Guo-he; QIN Xiao-sheng; LIU Hong-liang

    2004-01-01

    In order to understand the key mechanisms of the composting processes, the municipal solid waste(MSW) composting processes were divided into two stages, and the characteristics of typical experimental scenarios from the viewpoint of microbial kinetics was analyzed. Through experimentation with advanced composting reactor under controlled composting conditions, several equations were worked out to simulate the degradation rate of the substrate. The equations showed that the degradation rate was controlled by concentration of microbes in the first stage. The degradation rates of substrates of inoculation Run A, B, C and Control composting systems were 13.61 g/(kg·h), 13.08 g/(kg·h), 15.671 g/(kg·h), and 10.5 g/(kg·h), respectively. The value of Run C is around 1.5 times higher than that of Control system. The decomposition rate of the second stage is controlled by concentration of substrate. Although the organic matter decomposition rates were similar to all Runs, inoculation could reduce the values of the half velocity coefficient and could be more efficient to make the composting stable. Particularly. For Run C, the decomposition rate is high in the first stage, and is low in the second stage. The results indicated that the inoculation was efficient for the composting processes.

  6. Gas loading system for LANL two-stage gas guns

    Science.gov (United States)

    Gibson, Lee; Bartram, Brian; Dattelbaum, Dana; Lang, John; Morris, John

    2015-06-01

    A novel gas loading system was designed for the specific application of remotely loading high purity gases into targets for gas-gun driven plate impact experiments. The high purity gases are loaded into well-defined target configurations to obtain Hugoniot states in the gas phase at greater than ambient pressures. The small volume of the gas samples is challenging, as slight changing in the ambient temperature result in measurable pressure changes. Therefore, the ability to load a gas gun target and continually monitor the sample pressure prior to firing provides the most stable and reliable target fielding approach. We present the design and evaluation of a gas loading system built for the LANL 50 mm bore two-stage light gas gun. Targets for the gun are made of 6061 Al or OFHC Cu, and assembled to form a gas containment cell with a volume of approximately 1.38 cc. The compatibility of materials was a major consideration in the design of the system, particularly for its use with corrosive gases. Piping and valves are stainless steel with wetted seals made from Kalrez and Teflon. Preliminary testing was completed to ensure proper flow rate and that the proper safety controls were in place. The system has been used to successfully load Ar, Kr, Xe, and anhydrous ammonia with purities of up to 99.999 percent. The design of the system, and example data from the plate impact experiments will be shown. LA-UR-15-20521

  7. Object learning improves feature extraction but does not improve feature selection.

    Directory of Open Access Journals (Sweden)

    Linus Holm

    Full Text Available A single glance at your crowded desk is enough to locate your favorite cup. But finding an unfamiliar object requires more effort. This superiority in recognition performance for learned objects has at least two possible sources. For familiar objects observers might: 1 select more informative image locations upon which to fixate their eyes, or 2 extract more information from a given eye fixation. To test these possibilities, we had observers localize fragmented objects embedded in dense displays of random contour fragments. Eight participants searched for objects in 600 images while their eye movements were recorded in three daily sessions. Performance improved as subjects trained with the objects: The number of fixations required to find an object decreased by 64% across the 3 sessions. An ideal observer model that included measures of fragment confusability was used to calculate the information available from a single fixation. Comparing human performance to the model suggested that across sessions information extraction at each eye fixation increased markedly, by an amount roughly equal to the extra information that would be extracted following a 100% increase in functional field of view. Selection of fixation locations, on the other hand, did not improve with practice.

  8. Analysis of Different Feature Selection Criteria Based on a Covariance Convergence Perspective for a SLAM Algorithm

    Directory of Open Access Journals (Sweden)

    Fernando A. Auat Cheein

    2010-12-01

    Full Text Available This paper introduces several non-arbitrary feature selection techniques for a Simultaneous Localization and Mapping (SLAM algorithm. The feature selection criteria are based on the determination of the most significant features from a SLAM convergence perspective. The SLAM algorithm implemented in this work is a sequential EKF (Extended Kalman filter SLAM. The feature selection criteria are applied on the correction stage of the SLAM algorithm, restricting it to correct the SLAM algorithm with the most significant features. This restriction also causes a decrement in the processing time of the SLAM. Several experiments with a mobile robot are shown in this work. The experiments concern the map reconstruction and a comparison between the different proposed techniques performance. The experiments were carried out at an outdoor environment  composed by trees, although the results shown herein are not restricted to a special type of features.

  9. Analysis of different feature selection criteria based on a covariance convergence perspective for a SLAM algorithm.

    Science.gov (United States)

    Auat Cheein, Fernando A; Carelli, Ricardo

    2011-01-01

    This paper introduces several non-arbitrary feature selection techniques for a Simultaneous Localization and Mapping (SLAM) algorithm. The feature selection criteria are based on the determination of the most significant features from a SLAM convergence perspective. The SLAM algorithm implemented in this work is a sequential EKF (Extended Kalman filter) SLAM. The feature selection criteria are applied on the correction stage of the SLAM algorithm, restricting it to correct the SLAM algorithm with the most significant features. This restriction also causes a decrement in the processing time of the SLAM. Several experiments with a mobile robot are shown in this work. The experiments concern the map reconstruction and a comparison between the different proposed techniques performance. The experiments were carried out at an outdoor environment composed by trees, although the results shown herein are not restricted to a special type of features.

  10. Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information.

    Science.gov (United States)

    Mortazavi, Atiyeh; Moattar, Mohammad Hossein

    2016-01-01

    High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI) for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches.

  11. Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information

    Directory of Open Access Journals (Sweden)

    Atiyeh Mortazavi

    2016-01-01

    Full Text Available High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches.

  12. Efficient feature selection using a hybrid algorithm for the task of epileptic seizure detection

    Science.gov (United States)

    Lai, Kee Huong; Zainuddin, Zarita; Ong, Pauline

    2014-07-01

    Feature selection is a very important aspect in the field of machine learning. It entails the search of an optimal subset from a very large data set with high dimensional feature space. Apart from eliminating redundant features and reducing computational cost, a good selection of feature also leads to higher prediction and classification accuracy. In this paper, an efficient feature selection technique is introduced in the task of epileptic seizure detection. The raw data are electroencephalography (EEG) signals. Using discrete wavelet transform, the biomedical signals were decomposed into several sets of wavelet coefficients. To reduce the dimension of these wavelet coefficients, a feature selection method that combines the strength of both filter and wrapper methods is proposed. Principal component analysis (PCA) is used as part of the filter method. As for wrapper method, the evolutionary harmony search (HS) algorithm is employed. This metaheuristic method aims at finding the best discriminating set of features from the original data. The obtained features were then used as input for an automated classifier, namely wavelet neural networks (WNNs). The WNNs model was trained to perform a binary classification task, that is, to determine whether a given EEG signal was normal or epileptic. For comparison purposes, different sets of features were also used as input. Simulation results showed that the WNNs that used the features chosen by the hybrid algorithm achieved the highest overall classification accuracy.

  13. Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.

    Science.gov (United States)

    Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha

    2015-02-01

    Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l1-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can

  14. Modeling and Implementing Two-Stage AdaBoost for Real-Time Vehicle License Plate Detection

    Directory of Open Access Journals (Sweden)

    Moon Kyou Song

    2014-01-01

    Full Text Available License plate (LP detection is the most imperative part of the automatic LP recognition system. In previous years, different methods, techniques, and algorithms have been developed for LP detection (LPD systems. This paper proposes to automatical detection of car LPs via image processing techniques based on classifier or machine learning algorithms. In this paper, we propose a real-time and robust method for LPD systems using the two-stage adaptive boosting (AdaBoost algorithm combined with different image preprocessing techniques. Haar-like features are used to compute and select features from LP images. The AdaBoost algorithm is used to classify parts of an image within a search window by a trained strong classifier as either LP or non-LP. Adaptive thresholding is used for the image preprocessing method applied to those images that are of insufficient quality for LPD. This method is of a faster speed and higher accuracy than most of the existing methods used in LPD. Experimental results demonstrate that the average LPD rate is 98.38% and the computational time is approximately 49 ms.

  15. Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters.

    Science.gov (United States)

    Li, Yifeng; Chen, Chih-Yu; Wasserman, Wyeth W

    2016-05-01

    Sparse linear models approximate target variable(s) by a sparse linear combination of input variables. Since they are simple, fast, and able to select features, they are widely used in classification and regression. Essentially they are shallow feed-forward neural networks that have three limitations: (1) incompatibility to model nonlinearity of features, (2) inability to learn high-level features, and (3) unnatural extensions to select features in a multiclass case. Deep neural networks are models structured by multiple hidden layers with nonlinear activation functions. Compared with linear models, they have two distinctive strengths: the capability to (1) model complex systems with nonlinear structures and (2) learn high-level representation of features. Deep learning has been applied in many large and complex systems where deep models significantly outperform shallow ones. However, feature selection at the input level, which is very helpful to understand the nature of a complex system, is still not well studied. In genome research, the cis-regulatory elements in noncoding DNA sequences play a key role in the expression of genes. Since the activity of regulatory elements involves highly interactive factors, a deep tool is strongly needed to discover informative features. In order to address the above limitations of shallow and deep models for selecting features of a complex system, we propose a deep feature selection (DFS) model that (1) takes advantages of deep structures to model nonlinearity and (2) conveniently selects a subset of features right at the input level for multiclass data. Simulation experiments convince us that this model is able to correctly identify both linear and nonlinear features. We applied this model to the identification of active enhancers and promoters by integrating multiple sources of genomic information. Results show that our model outperforms elastic net in terms of size of discriminative feature subset and classification accuracy.

  16. Use of two-stage membrane countercurrent cascade for natural gas purification from carbon dioxide

    Science.gov (United States)

    Kurchatov, I. M.; Laguntsov, N. I.; Karaseva, M. D.

    2016-09-01

    Membrane technology scheme is offered and presented as a two-stage countercurrent recirculating cascade, in order to solve the problem of natural gas dehydration and purification from CO2. The first stage is a single divider, and the second stage is a recirculating two-module divider. This scheme allows natural gas to be cleaned from impurities, with any desired degree of methane extraction. In this paper, the optimal values of the basic parameters of the selected technological scheme are determined. An estimation of energy efficiency was carried out, taking into account the energy consumption of interstage compressor and methane losses in energy units.

  17. Accuracy of the One-Stage and Two-Stage Impression Techniques: A Comparative Analysis

    OpenAIRE

    Ladan Jamshidy; Hamid Reza Mozaffari; Payam Faraji; Roohollah Sharifi

    2016-01-01

    Introduction. One of the main steps of impression is the selection and preparation of an appropriate tray. Hence, the present study aimed to analyze and compare the accuracy of one- and two-stage impression techniques. Materials and Methods. A resin laboratory-made model, as the first molar, was prepared by standard method for full crowns with processed preparation finish line of 1 mm depth and convergence angle of 3-4°. Impression was made 20 times with one-stage technique and 20 times with ...

  18. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data

    KAUST Repository

    Abusamra, Heba

    2013-05-01

    Microarray technology has enriched the study of gene expression in such a way that scientists are now able to measure the expression levels of thousands of genes in a single experiment. Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification, interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This thesis aims on a comparative study of state-of-the-art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k- nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t- statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used for this study. Different experiments have been applied to compare the performance of the classification methods with and without performing feature selection. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in

  19. Regression-Based Feature Selection on Large Scale Human Activity Recognition

    Directory of Open Access Journals (Sweden)

    Hussein Mazaar

    2016-02-01

    Full Text Available In this paper, we present an approach for regression-based feature selection in human activity recognition. Due to high dimensional features in human activity recognition, the model may have over-fitting and can’t learn parameters well. Moreover, the features are redundant or irrelevant. The goal is to select important discriminating features to recognize the human activities in videos. R-Squared regression criterion can identify the best features based on the ability of a feature to explain the variations in the target class. The features are significantly reduced, nearly by 99.33%, resulting in better classification accuracy. Support Vector Machine with a linear kernel is used to classify the activities. The experiments are tested on UCF50 dataset. The results show that the proposed model significantly outperforms state-of-the-art methods.

  20. A Two-Stage State Recognition Method for Asynchronous SSVEP-Based Brain-Computer Interface System

    Institute of Scientific and Technical Information of China (English)

    ZHANG Zimu; DENG Zhidong

    2013-01-01

    A two-stage state recognition method is proposed for asynchronous SSVEP (steady-state visual evoked potential) based brain-computer interface (SBCI) system.The two-stage method is composed of the idle state (IS) detection and control state (CS) discrimination modules.Based on blind source separation and continuous wavelet transform techniques,the proposed method integrates functions of multi-electrode spatial filtering and feature extraction.In IS detection module,a method using the ensemble IS feature is proposed.In CS discrimination module,the ensemble CS feature is designed as feature vector for control intent classification.Further,performance comparisons are investigated among our IS detection module and other existing ones.Also the experimental results validate the satisfactory performance of our CS discrimination module.

  1. Emotional textile image classification based on cross-domain convolutional sparse autoencoders with feature selection

    Science.gov (United States)

    Li, Zuhe; Fan, Yangyu; Liu, Weihua; Yu, Zeqi; Wang, Fengqin

    2017-01-01

    We aim to apply sparse autoencoder-based unsupervised feature learning to emotional semantic analysis for textile images. To tackle the problem of limited training data, we present a cross-domain feature learning scheme for emotional textile image classification using convolutional autoencoders. We further propose a correlation-analysis-based feature selection method for the weights learned by sparse autoencoders to reduce the number of features extracted from large size images. First, we randomly collect image patches on an unlabeled image dataset in the source domain and learn local features with a sparse autoencoder. We then conduct feature selection according to the correlation between different weight vectors corresponding to the autoencoder's hidden units. We finally adopt a convolutional neural network including a pooling layer to obtain global feature activations of textile images in the target domain and send these global feature vectors into logistic regression models for emotional image classification. The cross-domain unsupervised feature learning method achieves 65% to 78% average accuracy in the cross-validation experiments corresponding to eight emotional categories and performs better than conventional methods. Feature selection can reduce the computational cost of global feature extraction by about 50% while improving classification performance.

  2. Eigenvalue-weighting and feature selection for computer-aided polyp detection in CT colonography

    Science.gov (United States)

    Zhu, Hongbin; Wang, Su; Fan, Yi; Lu, Hongbing; Liang, Zhengrong

    2010-03-01

    With the development of computer-aided polyp detection towards virtual colonoscopy screening, the trade-off between detection sensitivity and specificity has gained increasing attention. An optimum detection, with least number of false positives and highest true positive rate, is desirable and involves interdisciplinary knowledge, such as feature extraction, feature selection as well as machine learning. Toward that goal, various geometrical and textural features, associated with each suspicious polyp candidate, have been individually extracted and stacked together as a feature vector. However, directly inputting these high-dimensional feature vectors into a learning machine, e.g., neural network, for polyp detection may introduce redundant information due to feature correlation and induce the curse of dimensionality. In this paper, we explored an indispensable building block of computer-aided polyp detection, i.e., principal component analysis (PCA)-weighted feature selection for neural network classifier of true and false positives. The major concepts proposed in this paper include (1) the use of PCA to reduce the feature correlation, (2) the scheme of adaptively weighting each principal component (PC) by the associated eigenvalue, and (3) the selection of feature combinations via the genetic algorithm. As such, the eigenvalue is also taken as part of the characterizing feature, and the necessary number of features can be exposed to mitigate the curse of dimensionality. Learned and tested by radial basis neural network, the proposed computer-aided polyp detection has achieved 95% sensitivity at a cost of average 2.99 false positives per polyp.

  3. Artificial immune system based on adaptive clonal selection for feature selection and parameters optimisation of support vector machines

    Science.gov (United States)

    Sadat Hashemipour, Maryam; Soleimani, Seyed Ali

    2016-01-01

    Artificial immune system (AIS) algorithm based on clonal selection method can be defined as a soft computing method inspired by theoretical immune system in order to solve science and engineering problems. Support vector machine (SVM) is a popular pattern classification method with many diverse applications. Kernel parameter setting in the SVM training procedure along with the feature selection significantly impacts on the classification accuracy rate. In this study, AIS based on Adaptive Clonal Selection (AISACS) algorithm has been used to optimise the SVM parameters and feature subset selection without degrading the SVM classification accuracy. Several public datasets of University of California Irvine machine learning (UCI) repository are employed to calculate the classification accuracy rate in order to evaluate the AISACS approach then it was compared with grid search algorithm and Genetic Algorithm (GA) approach. The experimental results show that the feature reduction rate and running time of the AISACS approach are better than the GA approach.

  4. Selection of individual features of a speech signal using genetic algorithms

    Directory of Open Access Journals (Sweden)

    Kamil Kamiński

    2016-03-01

    Full Text Available The paper presents an automatic speaker’s recognition system, implemented in the Matlab environment, and demonstrates how to achieve and optimize various elements of the system. The main emphasis was put on features selection of a speech signal using a genetic algorithm which takes into account synergy of features. The results of optimization of selected elements of a classifier have been also shown, including the number of Gaussian distributions used to model each of the voices. In addition, for creating voice models, a universal voice model has been used.[b]Keywords[/b]: biometrics, automatic speaker recognition, genetic algorithms, feature selection

  5. PERFORMANCE STUDY OF A TWO STAGE SOLAR ADSORPTION REFRIGERATION SYSTEM

    Directory of Open Access Journals (Sweden)

    BAIJU. V

    2011-07-01

    Full Text Available The present study deals with the performance of a two stage solar adsorption refrigeration system with activated carbon-methanol pair investigated experimentally. Such a system was fabricated and tested under the conditions of National Institute of Technology Calicut, Kerala, India. The system consists of a parabolic solar concentrator,two water tanks, two adsorbent beds, condenser, expansion device, evaporator and accumulator. In this particular system the second water tank is act as a sensible heat storage device so that the system can be used during night time also. The system has been designed for heating 50 litres of water from 25oC to 90oC as well ascooling 10 litres of water from 30oC to 10oC within one hour. The performance parameters such as specific cooling power (SCP, coefficient of performance, solar COP and exergetic efficiency are studied. The dependency between the exergetic efficiency and cycle COP with the driving heat source temperature is also studied. The optimum heat source temperature for this system is determined as 72.4oC. The results show that the system has better performance during night time as compared to the day time. The system has a mean cycle COP of 0.196 during day time and 0.335 for night time. The mean SCP values during day time and night time are 47.83 and 68.2, respectively. The experimental results also demonstrate that the refrigerator has cooling capacity of 47 to 78 W during day time and 57.6 W to 104.4W during night time.

  6. AlPOs Synthetic Factor Analysis Based on Maximum Weight and Minimum Redundancy Feature Selection

    Directory of Open Access Journals (Sweden)

    Yinghua Lv

    2013-11-01

    Full Text Available The relationship between synthetic factors and the resulting structures is critical for rational synthesis of zeolites and related microporous materials. In this paper, we develop a new feature selection method for synthetic factor analysis of (6,12-ring-containing microporous aluminophosphates (AlPOs. The proposed method is based on a maximum weight and minimum redundancy criterion. With the proposed method, we can select the feature subset in which the features are most relevant to the synthetic structure while the redundancy among these selected features is minimal. Based on the database of AlPO synthesis, we use (6,12-ring-containing AlPOs as the target class and incorporate 21 synthetic factors including gel composition, solvent and organic template to predict the formation of (6,12-ring-containing microporous aluminophosphates (AlPOs. From these 21 features, 12 selected features are deemed as the optimized features to distinguish (6,12-ring-containing AlPOs from other AlPOs without such rings. The prediction model achieves a classification accuracy rate of 91.12% using the optimal feature subset. Comprehensive experiments demonstrate the effectiveness of the proposed algorithm, and deep analysis is given for the synthetic factors selected by the proposed method.

  7. Using genetic algorithms to select and create features for pattern classification. Technical report

    Energy Technology Data Exchange (ETDEWEB)

    Chang, E.I.; Lippmann, R.P.

    1991-03-11

    Genetic algorithms were used to select and create features and to select reference exemplar patterns for machine vision and speech pattern classification tasks. On a 15-feature machine-vision inspection task, it was found that genetic algorithms performed no better than conventional approaches to feature selection but required much more computation. For a speech recognition task, genetic algorithms required no more computation time than traditional approaches but reduced the number of features required by a factor of five (from 153 to 33 features). On a difficult artificial machine-vision task, genetic algorithms were able to create new features (polynomial functions of the original features) that reduced classification error rates from 10 to almost 0 percent. Neural net and nearest-neighbor classifiers were unable to provide such low error rates using only the original features. Genetic algorithms were also used to reduce the number of reference exemplar patterns and to select the value of k for a k-nearest-neighbor classifier. On a .338 training pattern vowel recognition problem with 10 classes, genetic algorithms simultaneously reduced the number of stored exemplars from 338 to 63 and selected k without significantly decreasing classification accuracy. In all applications, genetic algorithms were easy to apply and found good solutions in many fewer trials than would be required by an exhaustive search. Run times were long but not unreasonable. These results suggest that genetic algorithms may soon be practical for pattern classification problems as faster serial and parallel computers are developed.

  8. Feature selection for appearance-based vehicle tracking in geospatial video

    Science.gov (United States)

    Poostchi, Mahdieh; Bunyak, Filiz; Palaniappan, Kannappan; Seetharaman, Guna

    2013-05-01

    Current video tracking systems often employ a rich set of intensity, edge, texture, shape and object level features combined with descriptors for appearance modeling. This approach increases tracker robustness but is compu- tationally expensive for realtime applications and localization accuracy can be adversely affected by including distracting features in the feature fusion or object classification processes. This paper explores offline feature subset selection using a filter-based evaluation approach for video tracking to reduce the dimensionality of the feature space and to discover relevant representative lower dimensional subspaces for online tracking. We com- pare the performance of the exhaustive FOCUS algorithm to the sequential heuristic SFFS, SFS and RELIEF feature selection methods. Experiments show that using offline feature selection reduces computational complex- ity, improves feature fusion and is expected to translate into better online tracking performance. Overall SFFS and SFS perform very well, close to the optimum determined by FOCUS, but RELIEF does not work as well for feature selection in the context of appearance-based object tracking.

  9. A Hierarchical Feature and Sample Selection Framework and Its Application for Alzheimer’s Disease Diagnosis

    Science.gov (United States)

    An, Le; Adeli, Ehsan; Liu, Mingxia; Zhang, Jun; Lee, Seong-Whan; Shen, Dinggang

    2017-01-01

    Classification is one of the most important tasks in machine learning. Due to feature redundancy or outliers in samples, using all available data for training a classifier may be suboptimal. For example, the Alzheimer’s disease (AD) is correlated with certain brain regions or single nucleotide polymorphisms (SNPs), and identification of relevant features is critical for computer-aided diagnosis. Many existing methods first select features from structural magnetic resonance imaging (MRI) or SNPs and then use those features to build the classifier. However, with the presence of many redundant features, the most discriminative features are difficult to be identified in a single step. Thus, we formulate a hierarchical feature and sample selection framework to gradually select informative features and discard ambiguous samples in multiple steps for improved classifier learning. To positively guide the data manifold preservation process, we utilize both labeled and unlabeled data during training, making our method semi-supervised. For validation, we conduct experiments on AD diagnosis by selecting mutually informative features from both MRI and SNP, and using the most discriminative samples for training. The superior classification results demonstrate the effectiveness of our approach, as compared with the rivals. PMID:28358032

  10. A Hierarchical Feature and Sample Selection Framework and Its Application for Alzheimer’s Disease Diagnosis

    Science.gov (United States)

    An, Le; Adeli, Ehsan; Liu, Mingxia; Zhang, Jun; Lee, Seong-Whan; Shen, Dinggang

    2017-03-01

    Classification is one of the most important tasks in machine learning. Due to feature redundancy or outliers in samples, using all available data for training a classifier may be suboptimal. For example, the Alzheimer’s disease (AD) is correlated with certain brain regions or single nucleotide polymorphisms (SNPs), and identification of relevant features is critical for computer-aided diagnosis. Many existing methods first select features from structural magnetic resonance imaging (MRI) or SNPs and then use those features to build the classifier. However, with the presence of many redundant features, the most discriminative features are difficult to be identified in a single step. Thus, we formulate a hierarchical feature and sample selection framework to gradually select informative features and discard ambiguous samples in multiple steps for improved classifier learning. To positively guide the data manifold preservation process, we utilize both labeled and unlabeled data during training, making our method semi-supervised. For validation, we conduct experiments on AD diagnosis by selecting mutually informative features from both MRI and SNP, and using the most discriminative samples for training. The superior classification results demonstrate the effectiveness of our approach, as compared with the rivals.

  11. Selection of LiDAR geometric features with adaptive neighborhood size for urban land cover classification

    Science.gov (United States)

    Dong, Weihua; Lan, Jianhang; Liang, Shunlin; Yao, Wei; Zhan, Zhicheng

    2017-08-01

    LiDAR has been an effective technology for acquiring urban land cover data in recent decades. Previous studies indicate that geometric features have a strong impact on land cover classification. Here, we analyzed an urban LiDAR dataset to explore the optimal feature subset from 25 geometric features incorporating 25 scales under 6 definitions for urban land cover classification. We performed a feature selection strategy to remove irrelevant or redundant features based on the correlation coefficient between features and classification accuracy of each features. The neighborhood scales were divided into small (0.5-1.5 m), medium (1.5-6 m) and large (>6 m) scale. Combining features with lower correlation coefficient and better classification performance would improve classification accuracy. The feature depicting homogeneity or heterogeneity of points would be calculated at a small scale, and the features to smooth points at a medium scale and the features of height different at large scale. As to the neighborhood definition, cuboid and cylinder were recommended. This study can guide the selection of optimal geometric features with adaptive neighborhood scale for urban land cover classification.

  12. Diagnosis of Hepatocellular Carcinoma Spectroscopy Based on the Feature Selection Approach of the Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Shao-qing Wang

    2013-06-01

    Full Text Available This paper aims to study the application of medical imaging technology with artificial intelligence technology on how to improve the diagnostic accuracy rate for hepatocellular carcinoma. The   recognition method based on genetic algorithm (GA and Neural Network are presented. GA was used to select 20 optimal features from the 401 initial features. BP (Back-propagation Neural Network, BP and PNN (Probabilistic Neural Network, PNN were used to classify tested samples based on these optimized features, and make comparison between results based on 20 optimal features and the all 401 features. The results of the experiment show that the method can improve the recognition rate.

  13. Simultaneous Spectral-Spatial Feature Selection and Extraction for Hyperspectral Images.

    Science.gov (United States)

    Zhang, Lefei; Zhang, Qian; Du, Bo; Huang, Xin; Tang, Yuan Yan; Tao, Dacheng

    2016-09-12

    In hyperspectral remote sensing data mining, it is important to take into account of both spectral and spatial information, such as the spectral signature, texture feature, and morphological property, to improve the performances, e.g., the image classification accuracy. In a feature representation point of view, a nature approach to handle this situation is to concatenate the spectral and spatial features into a single but high dimensional vector and then apply a certain dimension reduction technique directly on that concatenated vector before feed it into the subsequent classifier. However, multiple features from various domains definitely have different physical meanings and statistical properties, and thus such concatenation has not efficiently explore the complementary properties among different features, which should benefit for boost the feature discriminability. Furthermore, it is also difficult to interpret the transformed results of the concatenated vector. Consequently, finding a physically meaningful consensus low dimensional feature representation of original multiple features is still a challenging task. In order to address these issues, we propose a novel feature learning framework, i.e., the simultaneous spectral-spatial feature selection and extraction algorithm, for hyperspectral images spectral-spatial feature representation and classification. Specifically, the proposed method learns a latent low dimensional subspace by projecting the spectral-spatial feature into a common feature space, where the complementary information has been effectively exploited, and simultaneously, only the most significant original features have been transformed. Encouraging experimental results on three public available hyperspectral remote sensing datasets confirm that our proposed method is effective and efficient.

  14. Fuzzy - Rough Feature Selection With {\\Pi}- Membership Function For Mammogram Classification

    CERN Document Server

    Thangavel, K

    2012-01-01

    Breast cancer is the second leading cause for death among women and it is diagnosed with the help of mammograms. Oncologists are miserably failed in identifying the micro calcification at the early stage with the help of the mammogram visually. In order to improve the performance of the breast cancer screening, most of the researchers have proposed Computer Aided Diagnosis using image processing. In this study mammograms are preprocessed and features are extracted, then the abnormality is identified through the classification. If all the extracted features are used, most of the cases are misidentified. Hence feature selection procedure is sought. In this paper, Fuzzy-Rough feature selection with {\\pi} membership function is proposed. The selected features are used to classify the abnormalities with help of Ant-Miner and Weka tools. The experimental analysis shows that the proposed method improves the mammograms classification accuracy.

  15. Feature Selection Strategy for Classification of Single-Trial EEG Elicited by Motor Imagery

    DEFF Research Database (Denmark)

    Prasad, Swati; Tan, Zheng-Hua; Prasad, Ramjee

    2011-01-01

    Brain-Computer Interface (BCI) provides new means of communication for people with motor disabilities by utilizing electroencephalographic activity. Selection of features from Electroencephalogram (EEG) signals for classification plays a key part in the development of BCI systems. In this paper, we...... present a feature selection strategy consisting of channel selection by fisher ratio analysis in the frequency domain and time segment selection by visual inspection in time domain. The proposed strategy achieves an absolute improvement of 7.5% in the misclassification rate as compared with the baseline...

  16. Computing visual target distinctness through selective filtering, statistical features, and visual patterns

    NARCIS (Netherlands)

    Fdez-Vidal, X.R.; Toet, A.; Garcia, J.A.; Fdez-Valdivia, J.

    2000-01-01

    This paper presents three computational visual distinctness measures, computed from image representational models based on selective filtering, statistical features, and visual patterns, respectively. They are applied to quantify the visual distinctness of targets in complex natural scenes. The

  17. A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data.

    Science.gov (United States)

    Yang, Runtao; Zhang, Chengjin; Gao, Rui; Zhang, Lina

    2016-02-06

    The Golgi Apparatus (GA) is a major collection and dispatch station for numerous proteins destined for secretion, plasma membranes and lysosomes. The dysfunction of GA proteins can result in neurodegenerative diseases. Therefore, accurate identification of protein subGolgi localizations may assist in drug development and understanding the mechanisms of the GA involved in various cellular processes. In this paper, a new computational method is proposed for identifying cis-Golgi proteins from trans-Golgi proteins. Based on the concept of Common Spatial Patterns (CSP), a novel feature extraction technique is developed to extract evolutionary information from protein sequences. To deal with the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted. A feature selection method called Random Forest-Recursive Feature Elimination (RF-RFE) is employed to search the optimal features from the CSP based features and g-gap dipeptide composition. Based on the optimal features, a Random Forest (RF) module is used to distinguish cis-Golgi proteins from trans-Golgi proteins. Through the jackknife cross-validation, the proposed method achieves a promising performance with a sensitivity of 0.889, a specificity of 0.880, an accuracy of 0.885, and a Matthew's Correlation Coefficient (MCC) of 0.765, which remarkably outperforms previous methods. Moreover, when tested on a common independent dataset, our method also achieves a significantly improved performance. These results highlight the promising performance of the proposed method to identify Golgi-resident protein types. Furthermore, the CSP based feature extraction method may provide guidelines for protein function predictions.

  18. A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data

    Directory of Open Access Journals (Sweden)

    Runtao Yang

    2016-02-01

    Full Text Available The Golgi Apparatus (GA is a major collection and dispatch station for numerous proteins destined for secretion, plasma membranes and lysosomes. The dysfunction of GA proteins can result in neurodegenerative diseases. Therefore, accurate identification of protein subGolgi localizations may assist in drug development and understanding the mechanisms of the GA involved in various cellular processes. In this paper, a new computational method is proposed for identifying cis-Golgi proteins from trans-Golgi proteins. Based on the concept of Common Spatial Patterns (CSP, a novel feature extraction technique is developed to extract evolutionary information from protein sequences. To deal with the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE is adopted. A feature selection method called Random Forest-Recursive Feature Elimination (RF-RFE is employed to search the optimal features from the CSP based features and g-gap dipeptide composition. Based on the optimal features, a Random Forest (RF module is used to distinguish cis-Golgi proteins from trans-Golgi proteins. Through the jackknife cross-validation, the proposed method achieves a promising performance with a sensitivity of 0.889, a specificity of 0.880, an accuracy of 0.885, and a Matthew’s Correlation Coefficient (MCC of 0.765, which remarkably outperforms previous methods. Moreover, when tested on a common independent dataset, our method also achieves a significantly improved performance. These results highlight the promising performance of the proposed method to identify Golgi-resident protein types. Furthermore, the CSP based feature extraction method may provide guidelines for protein function predictions.

  19. A two-stage approach for improved prediction of residue contact maps

    Directory of Open Access Journals (Sweden)

    Pollastri Gianluca

    2006-03-01

    Full Text Available Abstract Background Protein topology representations such as residue contact maps are an important intermediate step towards ab initio prediction of protein structure. Although improvements have occurred over the last years, the problem of accurately predicting residue contact maps from primary sequences is still largely unsolved. Among the reasons for this are the unbalanced nature of the problem (with far fewer examples of contacts than non-contacts, the formidable challenge of capturing long-range interactions in the maps, the intrinsic difficulty of mapping one-dimensional input sequences into two-dimensional output maps. In order to alleviate these problems and achieve improved contact map predictions, in this paper we split the task into two stages: the prediction of a map's principal eigenvector (PE from the primary sequence; the reconstruction of the contact map from the PE and primary sequence. Predicting the PE from the primary sequence consists in mapping a vector into a vector. This task is less complex than mapping vectors directly into two-dimensional matrices since the size of the problem is drastically reduced and so is the scale length of interactions that need to be learned. Results We develop architectures composed of ensembles of two-layered bidirectional recurrent neural networks to classify the components of the PE in 2, 3 and 4 classes from protein primary sequence, predicted secondary structure, and hydrophobicity interaction scales. Our predictor, tested on a non redundant set of 2171 proteins, achieves classification performances of up to 72.6%, 16% above a base-line statistical predictor. We design a system for the prediction of contact maps from the predicted PE. Our results show that predicting maps through the PE yields sizeable gains especially for long-range contacts which are particularly critical for accurate protein 3D reconstruction. The final predictor's accuracy on a non-redundant set of 327 targets is 35

  20. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data of Glioma

    KAUST Repository

    Abusamra, Heba

    2013-11-01

    Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification. Interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This paper aims on a comparative study of state-of-the- art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k-nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t-statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used in the experiments. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in different feature selection methods is investigated and the most frequent features selected in each fold among all methods for both datasets are evaluated.

  1. A New Feature Selection Algorithm Based on the Mean Impact Variance

    Directory of Open Access Journals (Sweden)

    Weidong Cheng

    2014-01-01

    Full Text Available The selection of fewer or more representative features from multidimensional features is important when the artificial neural network (ANN algorithm is used as a classifier. In this paper, a new feature selection method called the mean impact variance (MIVAR method is proposed to determine the feature that is more suitable for classification. Moreover, this method is constructed on the basis of the training process of the ANN algorithm. To verify the effectiveness of the proposed method, the MIVAR value is used to rank the multidimensional features of the bearing fault diagnosis. In detail, (1 70-dimensional all waveform features are extracted from a rolling bearing vibration signal with four different operating states, (2 the corresponding MIVAR values of all 70-dimensional features are calculated to rank all features, (3 14 groups of 10-dimensional features are separately generated according to the ranking results and the principal component analysis (PCA algorithm and a back propagation (BP network is constructed, and (4 the validity of the ranking result is proven by training this BP network with these seven groups of 10-dimensional features and by comparing the corresponding recognition rates. The results prove that the features with larger MIVAR value can lead to higher recognition rates.

  2. Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

    Directory of Open Access Journals (Sweden)

    Raghavendra B. K

    2010-11-01

    Full Text Available A credit-risk evaluation decision involves processing huge volumes of raw data, and hence requires powerful data mining tools. Several techniques that were developed in machine learning have been used for financial credit-risk evaluation decisions. Data mining is the process of finding patterns and relations in large databases. Neural Networks are one of the popular tools for building predictive models in data mining. The major drawback of neural network is the curse of dimensionality which requires optimal feature subset. Feature selection is an important topic of research in data mining. Feature selection is the problem of choosing a small subset of features that optimally is necessary and sufficient to describe the target concept. In this research an attempt has been made to investigate the preprocessing framework for feature selection in credit scoring using neural network. Feature selection techniques like best first search, info gain etc. methods have been evaluated for the effectiveness of the classification of the risk groups on publicly available data sets. In particular, German, Australian, and Japanese credit rating data sets have been used for evaluation. The results have been conclusive about the effectiveness of feature selection for neural networks and validate the hypothesis of the research.

  3. A FEATURE SELECTION ALGORITHM DESIGN AND ITS IMPLEMENTATION IN INTRUSION DETECTION SYSTEM

    Institute of Scientific and Technical Information of China (English)

    杨向荣; 沈钧毅

    2003-01-01

    Objective Present a new features selection algorithm. Methods based on rule induction and field knowledge. Results This algorithm can be applied in catching dataflow when detecting network intrusions, only the sub-dataset including discriminating features is catched. Then the time spend in following behavior patterns mining is reduced and the patterns mined are more precise. Conclusion The experiment results show that the feature subset catched by this algorithm is more informative and the dataset's quantity is reduced significantly.

  4. Feature Selection Combined with Neural Network Structure Optimization for HIV-1 Protease Cleavage Site Prediction

    Directory of Open Access Journals (Sweden)

    Hui Liu

    2015-01-01

    Full Text Available It is crucial to understand the specificity of HIV-1 protease for designing HIV-1 protease inhibitors. In this paper, a new feature selection method combined with neural network structure optimization is proposed to analyze the specificity of HIV-1 protease and find the important positions in an octapeptide that determined its cleavability. Two kinds of newly proposed features based on Amino Acid Index database plus traditional orthogonal encoding features are used in this paper, taking both physiochemical and sequence information into consideration. Results of feature selection prove that p2, p1, p1′, and p2′ are the most important positions. Two feature fusion methods are used in this paper: combination fusion and decision fusion aiming to get comprehensive feature representation and improve prediction performance. Decision fusion of subsets that getting after feature selection obtains excellent prediction performance, which proves feature selection combined with decision fusion is an effective and useful method for the task of HIV-1 protease cleavage site prediction. The results and analysis in this paper can provide useful instruction and help designing HIV-1 protease inhibitor in the future.

  5. A hybrid feature selection method using multiclass SVM for diagnosis of erythemato-squamous disease

    Science.gov (United States)

    Maryam, Setiawan, Noor Akhmad; Wahyunggoro, Oyas

    2017-08-01

    The diagnosis of erythemato-squamous disease is a complex problem and difficult to detect in dermatology. Besides that, it is a major cause of skin cancer. Data mining implementation in the medical field helps expert to diagnose precisely, accurately, and inexpensively. In this research, we use data mining technique to developed a diagnosis model based on multiclass SVM with a novel hybrid feature selection method to diagnose erythemato-squamous disease. Our hybrid feature selection method, named ChiGA (Chi Square and Genetic Algorithm), uses the advantages from filter and wrapper methods to select the optimal feature subset from original feature. Chi square used as filter method to remove redundant features and GA as wrapper method to select the ideal feature subset with SVM used as classifier. Experiment performed with 10 fold cross validation on erythemato-squamous diseases dataset taken from University of California Irvine (UCI) machine learning database. The experimental result shows that the proposed model based multiclass SVM with Chi Square and GA can give an optimum feature subset. There are 18 optimum features with 99.18% accuracy.

  6. Entropy based unsupervised Feature Selection in digital mammogram image using rough set theory.

    Science.gov (United States)

    Velayutham, C; Thangavel, K

    2012-01-01

    Feature Selection (FS) is a process, which attempts to select features, which are more informative. In the supervised FS methods various feature subsets are evaluated using an evaluation function or metric to select only those features, which are related to the decision classes of the data under consideration. However, for many data mining applications, decision class labels are often unknown or incomplete, thus indicating the significance of unsupervised FS. However, in unsupervised learning, decision class labels are not provided. The problem is that not all features are important. Some of the features may be redundant, and others may be irrelevant and noisy. In this paper, a novel unsupervised FS in mammogram image, using rough set-based entropy measures, is proposed. A typical mammogram image processing system generally consists of mammogram image acquisition, pre-processing of image, segmentation, features extracted from the segmented mammogram image. The proposed method is used to select features from data set, the method is compared with the existing rough set-based supervised FS methods and classification performance of both methods are recorded and demonstrates the efficiency of the method.

  7. Multi-Stage Recognition of Speech Emotion Using Sequential Forward Feature Selection

    Directory of Open Access Journals (Sweden)

    Liogienė Tatjana

    2016-07-01

    Full Text Available The intensive research of speech emotion recognition introduced a huge collection of speech emotion features. Large feature sets complicate the speech emotion recognition task. Among various feature selection and transformation techniques for one-stage classification, multiple classifier systems were proposed. The main idea of multiple classifiers is to arrange the emotion classification process in stages. Besides parallel and serial cases, the hierarchical arrangement of multi-stage classification is most widely used for speech emotion recognition. In this paper, we present a sequential-forward-feature-selection-based multi-stage classification scheme. The Sequential Forward Selection (SFS and Sequential Floating Forward Selection (SFFS techniques were employed for every stage of the multi-stage classification scheme. Experimental testing of the proposed scheme was performed using the German and Lithuanian emotional speech datasets. Sequential-feature-selection-based multi-stage classification outperformed the single-stage scheme by 12–42 % for different emotion sets. The multi-stage scheme has shown higher robustness to the growth of emotion set. The decrease in recognition rate with the increase in emotion set for multi-stage scheme was lower by 10–20 % in comparison with the single-stage case. Differences in SFS and SFFS employment for feature selection were negligible.

  8. Different Cortical Mechanisms for Spatial vs. Feature-Based Attentional Selection in Visual Working Memory.

    Science.gov (United States)

    Heuer, Anna; Schubö, Anna; Crawford, J D

    2016-01-01

    The limited capacity of visual working memory (VWM) necessitates attentional mechanisms that selectively update and maintain only the most task-relevant content. Psychophysical experiments have shown that the retroactive selection of memory content can be based on visual properties such as location or shape, but the neural basis for such differential selection is unknown. For example, it is not known if there are different cortical modules specialized for spatial vs. feature-based mnemonic attention, in the same way that has been demonstrated for attention to perceptual input. Here, we used transcranial magnetic stimulation (TMS) to identify areas in human parietal and occipital cortex involved in the selection of objects from memory based on cues to their location (spatial information) or their shape (featural information). We found that TMS over the supramarginal gyrus (SMG) selectively facilitated spatial selection, whereas TMS over the lateral occipital cortex (LO) selectively enhanced feature-based selection for remembered objects in the contralateral visual field. Thus, different cortical regions are responsible for spatial vs. feature-based selection of working memory representations. Since the same regions are involved in terms of attention to external events, these new findings indicate overlapping mechanisms for attentional control over perceptual input and mnemonic representations.

  9. Different Cortical Mechanisms for Spatial vs. Feature-Based Attentional Selection in Visual Working Memory

    Science.gov (United States)

    Heuer, Anna; Schubö, Anna; Crawford, J. D.

    2016-01-01

    The limited capacity of visual working memory (VWM) necessitates attentional mechanisms that selectively update and maintain only the most task-relevant content. Psychophysical experiments have shown that the retroactive selection of memory content can be based on visual properties such as location or shape, but the neural basis for such differential selection is unknown. For example, it is not known if there are different cortical modules specialized for spatial vs. feature-based mnemonic attention, in the same way that has been demonstrated for attention to perceptual input. Here, we used transcranial magnetic stimulation (TMS) to identify areas in human parietal and occipital cortex involved in the selection of objects from memory based on cues to their location (spatial information) or their shape (featural information). We found that TMS over the supramarginal gyrus (SMG) selectively facilitated spatial selection, whereas TMS over the lateral occipital cortex (LO) selectively enhanced feature-based selection for remembered objects in the contralateral visual field. Thus, different cortical regions are responsible for spatial vs. feature-based selection of working memory representations. Since the same regions are involved in terms of attention to external events, these new findings indicate overlapping mechanisms for attentional control over perceptual input and mnemonic representations. PMID:27582701

  10. Different cortical mechanisms for spatial vs. feature-based attentional selection in visual working memory

    Directory of Open Access Journals (Sweden)

    Anna Heuer

    2016-08-01

    Full Text Available The limited capacity of visual working memory necessitates attentional mechanisms that selectively update and maintain only the most task-relevant content. Psychophysical experiments have shown that the retroactive selection of memory content can be based on visual properties such as location or shape, but the neural basis for such differential selection is unknown. For example, it is not known if there are different cortical modules specialized for spatial versus feature-based mnemonic attention, in the same way that has been demonstrated for attention to perceptual input. Here, we used transcranial magnetic stimulation (TMS to identify areas in human parietal and occipital cortex involved in the selection of objects from memory based on cues to their location (spatial information or their shape (featural information. We found that TMS over the supramarginal gyrus (SMG selectively facilitated spatial selection, whereas TMS over the lateral occipital cortex selectively enhanced feature-based selection for remembered objects in the contralateral visual field. Thus, different cortical regions are responsible for spatial vs. feature-based selection of working memory representations. Since the same regions are involved in attention to external events, these new findings indicate overlapping mechanisms for attentional control over perceptual input and mnemonic representations.

  11. Selecting Optimal Feature Set in High-Dimensional Data by Swarm Search

    Directory of Open Access Journals (Sweden)

    Simon Fong

    2013-01-01

    Full Text Available Selecting the right set of features from data of high dimensionality for inducing an accurate classification model is a tough computational challenge. It is almost a NP-hard problem as the combinations of features escalate exponentially as the number of features increases. Unfortunately in data mining, as well as other engineering applications and bioinformatics, some data are described by a long array of features. Many feature subset selection algorithms have been proposed in the past, but not all of them are effective. Since it takes seemingly forever to use brute force in exhaustively trying every possible combination of features, stochastic optimization may be a solution. In this paper, we propose a new feature selection scheme called Swarm Search to find an optimal feature set by using metaheuristics. The advantage of Swarm Search is its flexibility in integrating any classifier into its fitness function and plugging in any metaheuristic algorithm to facilitate heuristic search. Simulation experiments are carried out by testing the Swarm Search over some high-dimensional datasets, with different classification algorithms and various metaheuristic algorithms. The comparative experiment results show that Swarm Search is able to attain relatively low error rates in classification without shrinking the size of the feature subset to its minimum.

  12. A two-stage model for blog feed search

    NARCIS (Netherlands)

    Weerkamp, W.; Balog, K.; de Rijke, M.

    2010-01-01

    We consider blog feed search: identifying relevant blogs for a given topic. An individual's search behavior often involves a combination of exploratory behavior triggered by salient features of the information objects being examined plus goal-directed in-depth information seeking behavior. We presen

  13. A two-stage model for blog feed search

    NARCIS (Netherlands)

    Weerkamp, W.; Balog, K.; de Rijke, M.

    2010-01-01

    We consider blog feed search: identifying relevant blogs for a given topic. An individual's search behavior often involves a combination of exploratory behavior triggered by salient features of the information objects being examined plus goal-directed in-depth information seeking behavior. We

  14. Right Axillary Sweating After Left Thoracoscopic Sypathectomy in Two-Stage Surgery

    Directory of Open Access Journals (Sweden)

    Berkant Ozpolat

    2013-06-01

    Full Text Available One stage bilateral or two stage unilateral video assisted thoracoscopic sympathectomy could be performed in the treatment of primary focal hyperhidrosis. Here we present a case with compensatory sweating of contralateral side after a two stage operation.

  15. The Two-stage Constrained Equal Awards and Losses Rules for Multi-Issue Allocation Situation

    NARCIS (Netherlands)

    Lorenzo-Freire, S.; Casas-Mendez, B.; Hendrickx, R.L.P.

    2005-01-01

    This paper considers two-stage solutions for multi-issue allocation situations.Characterisations are provided for the two-stage constrained equal awards and constrained equal losses rules, based on the properties of composition and path independence.

  16. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.

    Science.gov (United States)

    Yu, Sheng; Liao, Katherine P; Shaw, Stanley Y; Gainer, Vivian S; Churchill, Susanne E; Szolovits, Peter; Murphy, Shawn N; Kohane, Isaac S; Cai, Tianxi

    2015-09-01

    Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy. Comprehensive medical concepts were collected from publicly available knowledge sources in an automated, unbiased fashion. Natural language processing (NLP) revealed the occurrence patterns of these concepts in EHR narrative notes, which enabled selection of informative features for phenotype classification. When combined with additional codified features, a penalized logistic regression model was trained to classify the target phenotype. The authors applied our method to develop algorithms to identify patients with rheumatoid arthritis and coronary artery disease cases among those with rheumatoid arthritis from a large multi-institutional EHR. The area under the receiver operating characteristic curves (AUC) for classifying RA and CAD using models trained with automated features were 0.951 and 0.929, respectively, compared to the AUCs of 0.938 and 0.929 by models trained with expert-curated features. Models trained with NLP text features selected through an unbiased, automated procedure achieved comparable or slightly higher accuracy than those trained with expert-curated features. The majority of the selected model features were interpretable. The proposed automated feature extraction method, generating highly accurate phenotyping algorithms with improved efficiency, is a significant step toward high-throughput phenotyping. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All

  17. Performance Evaluation of Content Based Image Retrieval on Feature Optimization and Selection Using Swarm Intelligence

    Directory of Open Access Journals (Sweden)

    Kirti Jain

    2016-03-01

    Full Text Available The diversity and applicability of swarm intelligence is increasing everyday in the fields of science and engineering. Swarm intelligence gives the features of the dynamic features optimization concept. We have used swarm intelligence for the process of feature optimization and feature selection for content-based image retrieval. The performance of content-based image retrieval faced the problem of precision and recall. The value of precision and recall depends on the retrieval capacity of the image. The basic raw image content has visual features such as color, texture, shape and size. The partial feature extraction technique is based on geometric invariant function. Three swarm intelligence algorithms were used for the optimization of features: ant colony optimization, particle swarm optimization (PSO, and glowworm optimization algorithm. Coral image dataset and MatLab software were used for evaluating performance.

  18. A Novel Feature Selection Strategy for Enhanced Biomedical Event Extraction Using the Turku System

    Directory of Open Access Journals (Sweden)

    Jingbo Xia

    2014-01-01

    Full Text Available Feature selection is of paramount importance for text-mining classifiers with high-dimensional features. The Turku Event Extraction System (TEES is the best performing tool in the GENIA BioNLP 2009/2011 shared tasks, which relies heavily on high-dimensional features. This paper describes research which, based on an implementation of an accumulated effect evaluation (AEE algorithm applying the greedy search strategy, analyses the contribution of every single feature class in TEES with a view to identify important features and modify the feature set accordingly. With an updated feature set, a new system is acquired with enhanced performance which achieves an increased F-score of 53.27% up from 51.21% for Task 1 under strict evaluation criteria and 57.24% according to the approximate span and recursive criterion.

  19. Feature subset selection based on mahalanobis distance: a statistical rough set method

    Institute of Scientific and Technical Information of China (English)

    Sun Liang; Han Chongzhao

    2008-01-01

    In order to select effective feature subsets for pattern classification, a novel statistics rough set method is presented based on generalized attribute reduction. Unlike classical reduction approaches, the objects in universe of discourse are signs of training sample sets and values of attributes are taken as statistical parameters. The binary relation and discernibility matrix for the reduction are induced by distance function. Furthermore, based on the monotony of the distance function defined by Mahalanobis distance, the effective feature subsets are obtained as generalized attribute reducts. Experiment result shows that the classification performance can be improved by using the selected feature subsets.

  20. Biomass waste gasification - can be the two stage process suitable for tar reduction and power generation?

    Science.gov (United States)

    Sulc, Jindřich; Stojdl, Jiří; Richter, Miroslav; Popelka, Jan; Svoboda, Karel; Smetana, Jiří; Vacek, Jiří; Skoblja, Siarhei; Buryan, Petr

    2012-04-01

    A pilot scale gasification unit with novel co-current, updraft arrangement in the first stage and counter-current downdraft in the second stage was developed and exploited for studying effects of two stage gasification in comparison with one stage gasification of biomass (wood pellets) on fuel gas composition and attainable gas purity. Significant producer gas parameters (gas composition, heating value, content of tar compounds, content of inorganic gas impurities) were compared for the two stage and the one stage method of the gasification arrangement with only the upward moving bed (co-current updraft). The main novel features of the gasifier conception include grate-less reactor, upward moving bed of biomass particles (e.g. pellets) by means of a screw elevator with changeable rotational speed and gradual expanding diameter of the cylindrical reactor in the part above the upper end of the screw. The gasifier concept and arrangement are considered convenient for thermal power range 100-350 kW(th). The second stage of the gasifier served mainly for tar compounds destruction/reforming by increased temperature (around 950°C) and for gasification reaction of the fuel gas with char. The second stage used additional combustion of the fuel gas by preheated secondary air for attaining higher temperature and faster gasification of the remaining char from the first stage. The measurements of gas composition and tar compound contents confirmed superiority of the two stage gasification system, drastic decrease of aromatic compounds with two and higher number of benzene rings by 1-2 orders. On the other hand the two stage gasification (with overall ER=0.71) led to substantial reduction of gas heating value (LHV=3.15 MJ/Nm(3)), elevation of gas volume and increase of nitrogen content in fuel gas. The increased temperature (>950°C) at the entrance to the char bed caused also substantial decrease of ammonia content in fuel gas. The char with higher content of ash leaving the

  1. A two-stage broadcast message propagation model in social networks

    Science.gov (United States)

    Wang, Dan; Cheng, Shun-Jun

    2016-11-01

    Message propagation in social networks is becoming a popular topic in complex networks. One of the message types in social networks is called broadcast message. It refers to a type of message which has a unique and unknown destination for the publisher, such as 'lost and found'. Its propagation always has two stages. Due to this feature, rumor propagation model and epidemic propagation model have difficulty in describing this message's propagation accurately. In this paper, an improved two-stage susceptible-infected-removed model is proposed. We come up with the concept of the first forwarding probability and the second forwarding probability. Another part of our work is figuring out the influence to the successful message transmission chance in each level resulting from multiple reasons, including the topology of the network, the receiving probability, the first stage forwarding probability, the second stage forwarding probability as well as the length of the shortest path between the publisher and the relevant destination. The proposed model has been simulated on real networks and the results proved the model's effectiveness.

  2. Survival Prediction and Feature Selection in Patients with Breast Cancer Using Support Vector Regression

    Directory of Open Access Journals (Sweden)

    Shahrbanoo Goli

    2016-01-01

    Full Text Available The Support Vector Regression (SVR model has been broadly used for response prediction. However, few researchers have used SVR for survival analysis. In this study, a new SVR model is proposed and SVR with different kernels and the traditional Cox model are trained. The models are compared based on different performance measures. We also select the best subset of features using three feature selection methods: combination of SVR and statistical tests, univariate feature selection based on concordance index, and recursive feature elimination. The evaluations are performed using available medical datasets and also a Breast Cancer (BC dataset consisting of 573 patients who visited the Oncology Clinic of Hamadan province in Iran. Results show that, for the BC dataset, survival time can be predicted more accurately by linear SVR than nonlinear SVR. Based on the three feature selection methods, metastasis status, progesterone receptor status, and human epidermal growth factor receptor 2 status are the best features associated to survival. Also, according to the obtained results, performance of linear and nonlinear kernels is comparable. The proposed SVR model performs similar to or slightly better than other models. Also, SVR performs similar to or better than Cox when all features are included in model.

  3. A feature selection approach towards progressive vector transmission over the Internet

    Science.gov (United States)

    Miao, Ru; Song, Jia; Feng, Min

    2017-09-01

    WebGIS has been applied for visualizing and sharing geospatial information popularly over the Internet. In order to improve the efficiency of the client applications, the web-based progressive vector transmission approach is proposed. Important features should be selected and transferred firstly, and the methods for measuring the importance of features should be further considered in the progressive transmission. However, studies on progressive transmission for large-volume vector data have mostly focused on map generalization in the field of cartography, but rarely discussed on the selection of geographic features quantitatively. This paper applies information theory for measuring the feature importance of vector maps. A measurement model for the amount of information of vector features is defined based upon the amount of information for dealing with feature selection issues. The measurement model involves geometry factor, spatial distribution factor and thematic attribute factor. Moreover, a real-time transport protocol (RTP)-based progressive transmission method is then presented to improve the transmission of vector data. To clearly demonstrate the essential methodology and key techniques, a prototype for web-based progressive vector transmission is presented, and an experiment of progressive selection and transmission for vector features is conducted. The experimental results indicate that our approach clearly improves the performance and end-user experience of delivering and manipulating large vector data over the Internet.

  4. Genetic Algorithm (GA) in Feature Selection for CRF Based Manipuri Multiword Expression (MWE) Identification

    CERN Document Server

    Nongmeikapam, Kishorjit; 10.5121/ijcsit.2011.350

    2011-01-01

    This paper deals with the identification of Multiword Expressions (MWEs) in Manipuri, a highly agglutinative Indian Language. Manipuri is listed in the Eight Schedule of Indian Constitution. MWE plays an important role in the applications of Natural Language Processing(NLP) like Machine Translation, Part of Speech tagging, Information Retrieval, Question Answering etc. Feature selection is an important factor in the recognition of Manipuri MWEs using Conditional Random Field (CRF). The disadvantage of manual selection and choosing of the appropriate features for running CRF motivates us to think of Genetic Algorithm (GA). Using GA we are able to find the optimal features to run the CRF. We have tried with fifty generations in feature selection along with three fold cross validation as fitness function. This model demonstrated the Recall (R) of 64.08%, Precision (P) of 86.84% and F-measure (F) of 73.74%, showing an improvement over the CRF based Manipuri MWE identification without GA application.

  5. TOPSIS Based Multi-Criteria Decision Making of Feature Selection Techniques for Network Traffic Dataset

    Directory of Open Access Journals (Sweden)

    Raman Singh

    2014-01-01

    Full Text Available Intrusion detection systems (IDS have to process millions of packets with many features, which delay the detection of anomalies. Sampling and feature selection may be used to reduce computation time and hence minimizing intrusion detection time. This paper aims to suggest some feature selection algorithm on the basis of The Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS. TOPSIS is used to suggest one or more choice(s among some alternatives, having many attributes. Total ten feature selection techniques have been used for the analysis of KDD network dataset. Three classifiers namely Naïve Bayes, J48 and PART have been considered for this experiment using Weka data mining tool. Ranking of the techniques using TOPSIS have been calculated by using MATLAB as a tool. Out of these techniques Filtered Subset Evaluation has been found suitable for intrusion detection in terms of very less computational time with acceptable accuracy.

  6. Cost effective approach on feature selection using genetic algorithms and fuzzy logic for diabetes diagnosis

    CERN Document Server

    Ephzibah, E P

    2011-01-01

    A way to enhance the performance of a model that combines genetic algorithms and fuzzy logic for feature selection and classification is proposed. Early diagnosis of any disease with less cost is preferable. Diabetes is one such disease. Diabetes has become the fourth leading cause of death in developed countries and there is substantial evidence that it is reaching epidemic proportions in many developing and newly industrialized nations. In medical diagnosis, patterns consist of observable symptoms along with the results of diagnostic tests. These tests have various associated costs and risks. In the automated design of pattern classification, the proposed system solves the feature subset selection problem. It is a task of identifying and selecting a useful subset of pattern-representing features from a larger set of features. Using fuzzy rule-based classification system, the proposed system proves to improve the classification accuracy.

  7. Two-Stage Exams Improve Student Learning in an Introductory Geology Course: Logistics, Attendance, and Grades

    Science.gov (United States)

    Knierim, Katherine; Turner, Henry; Davis, Ralph K.

    2015-01-01

    Two-stage exams--where students complete part one of an exam closed book and independently and part two is completed open book and independently (two-stage independent, or TS-I) or collaboratively (two-stage collaborative, or TS-C)--provide a means to include collaborative learning in summative assessments. Collaborative learning has been shown to…

  8. Heuristic for Critical Machine Based a Lot Streaming for Two-Stage Hybrid Production Environment

    Science.gov (United States)

    Vivek, P.; Saravanan, R.; Chandrasekaran, M.; Pugazhenthi, R.

    2017-03-01

    Lot streaming in Hybrid flowshop [HFS] is encountered in many real world problems. This paper deals with a heuristic approach for Lot streaming based on critical machine consideration for a two stage Hybrid Flowshop. The first stage has two identical parallel machines and the second stage has only one machine. In the second stage machine is considered as a critical by valid reasons these kind of problems is known as NP hard. A mathematical model developed for the selected problem. The simulation modelling and analysis were carried out in Extend V6 software. The heuristic developed for obtaining optimal lot streaming schedule. The eleven cases of lot streaming were considered. The proposed heuristic was verified and validated by real time simulation experiments. All possible lot streaming strategies and possible sequence under each lot streaming strategy were simulated and examined. The heuristic consistently yielded optimal schedule consistently in all eleven cases. The identification procedure for select best lot streaming strategy was suggested.

  9. A Multistage Feature Selection Model for Document Classification Using Information Gain and Rough Set

    Directory of Open Access Journals (Sweden)

    Mrs. Leena. H. Patil

    2014-11-01

    Full Text Available Huge number of documents are increasing rapidly, therefore, to organize it in digitized form text categorization becomes an challenging issue. A major issue for text categorization is its large number of features. Most of the features are noisy, irrelevant and redundant, which may mislead the classifier. Hence, it is most important to reduce dimensionality of data to get smaller subset and provide the most gain in information. Feature selection techniques reduce the dimensionality of feature space. It also improves the overall accuracy and performance. Hence, to overcome the issues of text categorization feature selection is considered as an efficient technique . Therefore, we, proposed a multistage feature selection model to improve the overall accuracy and performance of classification. In the first stage document preprocessing part is performed. Secondly, each term within the documents are ranked according to their importance for classification using the information gain. Thirdly rough set technique is applied to the terms which are ranked importantly and feature reduction is carried out. Finally a document classification is performed on the core features using Naive Bayes and KNN classifier. Experiments are carried out on three UCI datasets, Reuters 21578, Classic 04 and Newsgroup 20. Results show the better accuracy and performance of the proposed model.

  10. Biometric hashing for handwriting: entropy-based feature selection and semantic fusion

    Science.gov (United States)

    Scheidat, Tobias; Vielhauer, Claus

    2008-02-01

    Some biometric algorithms lack of the problem of using a great number of features, which were extracted from the raw data. This often results in feature vectors of high dimensionality and thus high computational complexity. However, in many cases subsets of features do not contribute or with only little impact to the correct classification of biometric algorithms. The process of choosing more discriminative features from a given set is commonly referred to as feature selection. In this paper we present a study on feature selection for an existing biometric hash generation algorithm for the handwriting modality, which is based on the strategy of entropy analysis of single components of biometric hash vectors, in order to identify and suppress elements carrying little information. To evaluate the impact of our feature selection scheme to the authentication performance of our biometric algorithm, we present an experimental study based on data of 86 users. Besides discussing common biometric error rates such as Equal Error Rates, we suggest a novel measurement to determine the reproduction rate probability for biometric hashes. Our experiments show that, while the feature set size may be significantly reduced by 45% using our scheme, there are marginal changes both in the results of a verification process as well as in the reproducibility of biometric hashes. Since multi-biometrics is a recent topic, we additionally carry out a first study on a pair wise multi-semantic fusion based on reduced hashes and analyze it by the introduced reproducibility measure.

  11. A HYBRID FILTER AND WRAPPER FEATURE SELECTION APPROACH FOR DETECTING CONTAMINATION IN DRINKING WATER MANAGEMENT SYSTEM

    Directory of Open Access Journals (Sweden)

    S. VISALAKSHI

    2017-07-01

    Full Text Available Feature selection is an important task in predictive models which helps to identify the irrelevant features in the high - dimensional dataset. In this case of water contamination detection dataset, the standard wrapper algorithm alone cannot be applied because of the complexity. To overcome this computational complexity problem and making it lighter, filter-wrapper based algorithm has been proposed. In this work, reducing the feature space is a significant component of water contamination. The main findings are as follows: (1 The main goal is speeding up the feature selection process, so the proposed filter - based feature pre-selection is applied and guarantees that useful data are improbable to be detached in the initial stage which discussed briefly in this paper. (2 The resulting features are again filtered by using the Genetic Algorithm coded with Support Vector Machine method, where it facilitates to nutshell the subset of features with high accuracy and decreases the expense. Experimental results show that the proposed methods trim down redundant features effectively and achieved better classification accuracy.

  12. Feature selection and classification methodology for the detection of knee-joint disorders.

    Science.gov (United States)

    Nalband, Saif; Sundar, Aditya; Prince, A Amalin; Agarwal, Anita

    2016-04-01

    Vibroarthographic (VAG) signals emitted from the knee joint disorder provides an early diagnostic tool. The nonstationary and nonlinear nature of VAG signal makes an important aspect for feature extraction. In this work, we investigate VAG signals by proposing a wavelet based decomposition. The VAG signals are decomposed into sub-band signals of different frequencies. Nonlinear features such as recurrence quantification analysis (RQA), approximate entropy (ApEn) and sample entropy (SampEn) are extracted as features of VAG signal. A total of twenty-four features form a vector to characterize a VAG signal. Two feature selection (FS) techniques, apriori algorithm and genetic algorithm (GA) selects six and four features as the most significant features. Least square support vector machines (LS-SVM) and random forest are proposed as classifiers to evaluate the performance of FS techniques. Results indicate that the classification accuracy was more prominent with features selected from FS algorithms. Results convey that LS-SVM using the apriori algorithm gives the highest accuracy of 94.31% with false discovery rate (FDR) of 0.0892. The proposed work also provided better classification accuracy than those reported in the previous studies which gave an accuracy of 88%. This work can enhance the performance of existing technology for accurately distinguishing normal and abnormal VAG signals. And the proposed methodology could provide an effective non-invasive diagnostic tool for knee joint disorders.

  13. An Enhancement of Bayesian Inference Network for Ligand-Based Virtual Screening using Features Selection

    Directory of Open Access Journals (Sweden)

    Ali Ahmed

    2011-01-01

    Full Text Available Problem statement: Similarity based Virtual Screening (VS deals with a large amount of data containing irrelevant and/or redundant fragments or features. Recent use of Bayesian network as an alternative for existing tools for similarity based VS has received noticeable attention of the researchers in the field of chemoinformatics. Approach: To this end, different models of Bayesian network have been developed. In this study, we enhance the Bayesian Inference Network (BIN using a subset of selected molecules features. Results: In this approach, a few features were filtered from the molecular fingerprint features based on a features selection approach. Conclusion: Simulated virtual screening experiments with MDL Drug Data Report (MDDR data sets showed that the proposed method provides simple ways of enhancing the cost effectiveness of ligand-based virtual screening searches, especially for higher diversity data set.

  14. Using genetic algorithm feature selection in neural classification systems for image pattern recognition

    Directory of Open Access Journals (Sweden)

    Margarita R. Gamarra A.

    2012-09-01

    Full Text Available Pattern recognition performance depends on variations during extraction, selection and classification stages. This paper presents an approach to feature selection by using genetic algorithms with regard to digital image recognition and quality control. Error rate and kappa coefficient were used for evaluating the genetic algorithm approach Neural networks were used for classification, involving the features selected by the genetic algorithms. The neural network approach was compared to a K-nearest neighbor classifier. The proposed approach performed better than the other methods.

  15. Improving Image steganalysis performance using a graph-based feature selection method

    Directory of Open Access Journals (Sweden)

    Amir Nouri

    2016-05-01

    Full Text Available Steganalysis is the skill of discovering the use of steganography algorithms within an image with low or no information regarding the steganography algorithm or/and its parameters. The high-dimensionality of image data with small number of samples has presented a difficult challenge for the steganalysis task. Several methods have been presented to improve the steganalysis performance by feature selection. Feature selection, also known as variable selection, is one of the fundamental problems in the fields of machine learning, pattern recognition and statistics. The aim of feature selection is to reduce the dimensionality of image data in order to enhance the accuracy of Steganalysis task. In this paper, we have proposed a new graph-based blind steganalysis method for detecting stego images from the cover images in JPEG images using a feature selection technique based on community detection. The experimental results show that the proposed approach is easy to be employed for steganalysis purposes. Moreover, performance of proposed method is better than several recent and well-known feature selection-based Image steganalysis methods.

  16. Minimum redundancy maximum relevance feature selection approach for temporal gene expression data.

    Science.gov (United States)

    Radovic, Milos; Ghalwash, Mohamed; Filipovic, Nenad; Obradovic, Zoran

    2017-01-03

    Feature selection, aiming to identify a subset of features among a possibly large set of features that are relevant for predicting a response, is an important preprocessing step in machine learning. In gene expression studies this is not a trivial task for several reasons, including potential temporal character of data. However, most feature selection approaches developed for microarray data cannot handle multivariate temporal data without previous data flattening, which results in loss of temporal information. We propose a temporal minimum redundancy - maximum relevance (TMRMR) feature selection approach, which is able to handle multivariate temporal data without previous data flattening. In the proposed approach we compute relevance of a gene by averaging F-statistic values calculated across individual time steps, and we compute redundancy between genes by using a dynamical time warping approach. The proposed method is evaluated on three temporal gene expression datasets from human viral challenge studies. Obtained results show that the proposed method outperforms alternatives widely used in gene expression studies. In particular, the proposed method achieved improvement in accuracy in 34 out of 54 experiments, while the other methods outperformed it in no more than 4 experiments. We developed a filter-based feature selection method for temporal gene expression data based on maximum relevance and minimum redundancy criteria. The proposed method incorporates temporal information by combining relevance, which is calculated as an average F-statistic value across different time steps, with redundancy, which is calculated by employing dynamical time warping approach. As evident in our experiments, incorporating the temporal information into the feature selection process leads to selection of more discriminative features.

  17. Two-stage earth-to-orbit vehicles with dual-fuel propulsion in the Orbiter

    Science.gov (United States)

    Martin, J. A.

    1982-01-01

    Earth-to-orbit vehicle studies of future replacements for the Space Shuttle are needed to guide technology development. Previous studies that have examined single-stage vehicles have shown advantages for dual-fuel propulsion. Previous two-stage system studies have assumed all-hydrogen fuel for the Orbiters. The present study examined dual-fuel Orbiters and found that the system dry mass could be reduced with this concept. The possibility of staging the booster at a staging velocity low enough to allow coast-back to the launch site is shown to be beneficial, particularly in combination with a dual-fuel Orbiter. An engine evaluation indicated the same ranking of engines as did a previous single-stage study. Propane and RP-1 fuels result in lower vehicle dry mass than methane, and staged-combustion engines are preferred over gas-generator engines. The sensitivity to the engine selection is less for two-stage systems than for single-stage systems.

  18. A New Two-Stage Approach to Short Term Electrical Load Forecasting

    Directory of Open Access Journals (Sweden)

    Dragan Tasić

    2013-04-01

    Full Text Available In the deregulated energy market, the accuracy of load forecasting has a significant effect on the planning and operational decision making of utility companies. Electric load is a random non-stationary process influenced by a number of factors which make it difficult to model. To achieve better forecasting accuracy, a wide variety of models have been proposed. These models are based on different mathematical methods and offer different features. This paper presents a new two-stage approach for short-term electrical load forecasting based on least-squares support vector machines. With the aim of improving forecasting accuracy, one more feature was added to the model feature set, the next day average load demand. As this feature is unknown for one day ahead, in the first stage, forecasting of the next day average load demand is done and then used in the model in the second stage for next day hourly load forecasting. The effectiveness of the presented model is shown on the real data of the ISO New England electricity market. The obtained results confirm the validity advantage of the proposed approach.

  19. Two-stage high frequency pulse tube cooler for refrigeration at 25 K

    CERN Document Server

    Dietrich, M

    2009-01-01

    A two-stage Stirling-type U-shape pulse tube cryocooler driven by a 10 kW-class linear compressor was designed, built and tested. A special feature of the cold head is the absence of a heat exchanger at the cold end of the first stage, since the intended application requires no cooling power at an intermediate temperature. Simulations where done using Sage-software to find optimum operating conditions and cold head geometry. Flow-impedance matching was required to connect the compressor designed for 60 Hz operation to the 40 Hz cold head. A cooling power of 12.9 W at 25 K with an electrical input power of 4.6 kW has been achieved up to now. The lowest temperature reached is 13.7 K.

  20. Fast Image Segmentation Based on a Two-Stage Geometrical Active Contour

    Institute of Scientific and Technical Information of China (English)

    肖昌炎; 张素; 陈亚珠

    2005-01-01

    A fast two-stage geometric active contour algorithm for image segmentation is developed. First, the Eikonal equation problem is quickly solved using an improved fast sweeping method, and a criterion of local minimum of area gradient (LMAG) is presented to extract the optimal arrival time. Then, the final time function is passed as an initial state to an area and length minimizing flow model, which adjusts the interface more accurately and prevents it from leaking. For object with complete and salient edge, using the first stage only is able to obtain an ideal result, and this results in a time complexity of O(M), where M is the number of points in each coordinate direction. Both stages are needed for convoluted shapes, but the computation cost can be drastically reduced. Efficiency of the algorithm is verified in segmentation experiments of real images with different feature.

  1. A simulation to analyze feature selection methods utilizing gene ontology for gene expression classification.

    Science.gov (United States)

    Gillies, Christopher E; Siadat, Mohammad-Reza; Patel, Nilesh V; Wilson, George D

    2013-12-01

    Gene expression profile classification is a pivotal research domain assisting in the transformation from traditional to personalized medicine. A major challenge associated with gene expression data classification is the small number of samples relative to the large number of genes. To address this problem, researchers have devised various feature selection algorithms to reduce the number of genes. Recent studies have been experimenting with the use of semantic similarity between genes in Gene Ontology (GO) as a method to improve feature selection. While there are few studies that discuss how to use GO for feature selection, there is no simulation study that addresses when to use GO-based feature selection. To investigate this, we developed a novel simulation, which generates binary class datasets, where the differentially expressed genes between two classes have some underlying relationship in GO. This allows us to investigate the effects of various factors such as the relative connectedness of the underlying genes in GO, the mean magnitude of separation between differentially expressed genes denoted by δ, and the number of training samples. Our simulation results suggest that the connectedness in GO of the differentially expressed genes for a biological condition is the primary factor for determining the efficacy of GO-based feature selection. In particular, as the connectedness of differentially expressed genes increases, the classification accuracy improvement increases. To quantify this notion of connectedness, we defined a measure called Biological Condition Annotation Level BCAL(G), where G is a graph of differentially expressed genes. Our main conclusions with respect to GO-based feature selection are the following: (1) it increases classification accuracy when BCAL(G) ≥ 0.696; (2) it decreases classification accuracy when BCAL(G) ≤ 0.389; (3) it provides marginal accuracy improvement when 0.389genes in a biological condition increases beyond 50 and

  2. Electrocardiogram Based Identification using a New Effective Intelligent Selection of Fused Features

    Science.gov (United States)

    Abbaspour, Hamidreza; Razavi, Seyyed Mohammad; Mehrshad, Nasser

    2015-01-01

    Over the years, the feasibility of using Electrocardiogram (ECG) signal for human identification issue has been investigated, and some methods have been suggested. In this research, a new effective intelligent feature selection method from ECG signals has been proposed. This method is developed in such a way that it is able to select important features that are necessary for identification using analysis of the ECG signals. For this purpose, after ECG signal preprocessing, its characterizing features were extracted and then compressed using the cosine transform. The more effective features in the identification, among the characterizing features, are selected using a combination of the genetic algorithm and artificial neural networks. The proposed method was tested on three public ECG databases, namely, MIT-BIH Arrhythmias Database, MITBIH Normal Sinus Rhythm Database and The European ST-T Database, in order to evaluate the proposed subject identification method on normal ECG signals as well as ECG signals with arrhythmias. Identification rates of 99.89% and 99.84% and 99.99% are obtained for these databases respectively. The proposed algorithm exhibits remarkable identification accuracies not only with normal ECG signals, but also in the presence of various arrhythmias. Simulation results showed that the proposed method despite the low number of selected features has a high performance in identification task. PMID:25709939

  3. An ant colony optimization based feature selection for web page classification.

    Science.gov (United States)

    Saraç, Esra; Özel, Selma Ayşe

    2014-01-01

    The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

  4. An Ant Colony Optimization Based Feature Selection for Web Page Classification

    Directory of Open Access Journals (Sweden)

    Esra Saraç

    2014-01-01

    Full Text Available The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines’ performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

  5. Reducing Sweeping Frequencies in Microwave NDT Employing Machine Learning Feature Selection

    Directory of Open Access Journals (Sweden)

    Abdelniser Moomen

    2016-04-01

    Full Text Available Nondestructive Testing (NDT assessment of materials’ health condition is useful for classifying healthy from unhealthy structures or detecting flaws in metallic or dielectric structures. Performing structural health testing for coated/uncoated metallic or dielectric materials with the same testing equipment requires a testing method that can work on metallics and dielectrics such as microwave testing. Reducing complexity and expenses associated with current diagnostic practices of microwave NDT of structural health requires an effective and intelligent approach based on feature selection and classification techniques of machine learning. Current microwave NDT methods in general based on measuring variation in the S-matrix over the entire operating frequency ranges of the sensors. For instance, assessing the health of metallic structures using a microwave sensor depends on the reflection or/and transmission coefficient measurements as a function of the sweeping frequencies of the operating band. The aim of this work is reducing sweeping frequencies using machine learning feature selection techniques. By treating sweeping frequencies as features, the number of top important features can be identified, then only the most influential features (frequencies are considered when building the microwave NDT equipment. The proposed method of reducing sweeping frequencies was validated experimentally using a waveguide sensor and a metallic plate with different cracks. Among the investigated feature selection techniques are information gain, gain ratio, relief, chi-squared. The effectiveness of the selected features were validated through performance evaluations of various classification models; namely, Nearest Neighbor, Neural Networks, Random Forest, and Support Vector Machine. Results showed good crack classification accuracy rates after employing feature selection algorithms.

  6. Reducing Sweeping Frequencies in Microwave NDT Employing Machine Learning Feature Selection.

    Science.gov (United States)

    Moomen, Abdelniser; Ali, Abdulbaset; Ramahi, Omar M

    2016-04-19

    Nondestructive Testing (NDT) assessment of materials' health condition is useful for classifying healthy from unhealthy structures or detecting flaws in metallic or dielectric structures. Performing structural health testing for coated/uncoated metallic or dielectric materials with the same testing equipment requires a testing method that can work on metallics and dielectrics such as microwave testing. Reducing complexity and expenses associated with current diagnostic practices of microwave NDT of structural health requires an effective and intelligent approach based on feature selection and classification techniques of machine learning. Current microwave NDT methods in general based on measuring variation in the S-matrix over the entire operating frequency ranges of the sensors. For instance, assessing the health of metallic structures using a microwave sensor depends on the reflection or/and transmission coefficient measurements as a function of the sweeping frequencies of the operating band. The aim of this work is reducing sweeping frequencies using machine learning feature selection techniques. By treating sweeping frequencies as features, the number of top important features can be identified, then only the most influential features (frequencies) are considered when building the microwave NDT equipment. The proposed method of reducing sweeping frequencies was validated experimentally using a waveguide sensor and a metallic plate with different cracks. Among the investigated feature selection techniques are information gain, gain ratio, relief, chi-squared. The effectiveness of the selected features were validated through performance evaluations of various classification models; namely, Nearest Neighbor, Neural Networks, Random Forest, and Support Vector Machine. Results showed good crack classification accuracy rates after employing feature selection algorithms.

  7. Exploitation of Intra-Spectral Band Correlation for Rapid Feature Selection, and Target Identification in Hyperspectral Imagery

    Science.gov (United States)

    2009-03-01

    entitled “Improved Feature Extraction, Feature Selection, and Identification Techniques that Create a Fast Unsupervised Hyperspectral Target Detection...thesis proposal “Improved Feature Extraction, Feature Selection, and Identification Techniques that Create a Fast Unsupervised Hyperspectral Target...target or non-target classifications . Integration of this type of autonomous target detection algorithm along with hyperspectral imaging sensors

  8. SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.

    Science.gov (United States)

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.

  9. A DYNAMIC FEATURE SELECTION METHOD FOR DOCUMENT RANKING WITH RELEVANCE FEEDBACK APPROACH

    Directory of Open Access Journals (Sweden)

    K. Latha

    2010-07-01

    Full Text Available Ranking search results is essential for information retrieval and Web search. Search engines need to not only return highly relevant results, but also be fast to satisfy users. As a result, not all available features can be used for ranking, and in fact only a small percentage of these features can be used. Thus, it is crucial to have a feature selection mechanism that can find a subset of features that both meets latency requirements and achieves high relevance. In this paper we describe a 0/1 knapsack procedure for automatically selecting features to use within Generalization model for Document Ranking. We propose an approach for Relevance Feedback using Expectation Maximization method and evaluate the algorithm on the TREC Collection for describing classes of feedback textual information retrieval features. Experimental results, evaluated on standard TREC-9 part of the OHSUMED collections, show that our feature selection algorithm produces models that are either significantly more effective than, or equally effective as, models such as Markov Random Field model, Correlation Co-efficient and Count Difference method

  10. SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier

    Science.gov (United States)

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W. M.; Li, R. K.; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases. PMID:25295306

  11. Comparative Study on Feature Selection and Fusion Schemes for Emotion Recognition from Speech

    Directory of Open Access Journals (Sweden)

    Santiago Planet

    2012-09-01

    Full Text Available The automatic analysis of speech to detect affective states may improve the way users interact with electronic devices. However, the analysis only at the acoustic level could be not enough to determine the emotion of a user in a realistic scenario. In this paper we analyzed the spontaneous speech recordings of the FAU Aibo Corpus at the acoustic and linguistic levels to extract two sets of features. The acoustic set was reduced by a greedy procedure selecting the most relevant features to optimize the learning stage. We compared two versions of this greedy selection algorithm by performing the search of the relevant features forwards and backwards. We experimented with three classification approaches: Naïve-Bayes, a support vector machine and a logistic model tree, and two fusion schemes: decision-level fusion, merging the hard-decisions of the acoustic and linguistic classifiers by means of a decision tree; and feature-level fusion, concatenating both sets of features before the learning stage. Despite the low performance achieved by the linguistic data, a dramatic improvement was achieved after its combination with the acoustic information, improving the results achieved by this second modality on its own. The results achieved by the classifiers using the parameters merged at feature level outperformed the classification results of the decision-level fusion scheme, despite the simplicity of the scheme. Moreover, the extremely reduced set of acoustic features obtained by the greedy forward search selection algorithm improved the results provided by the full set.

  12. Discriminative multi-task feature selection for multi-modality classification of Alzheimer's disease.

    Science.gov (United States)

    Ye, Tingting; Zu, Chen; Jie, Biao; Shen, Dinggang; Zhang, Daoqiang

    2016-09-01

    Recently, multi-task based feature selection methods have been used in multi-modality based classification of Alzheimer's disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI). However, in traditional multi-task feature selection methods, some useful discriminative information among subjects is usually not well mined for further improving the subsequent classification performance. Accordingly, in this paper, we propose a discriminative multi-task feature selection method to select the most discriminative features for multi-modality based classification of AD/MCI. Specifically, for each modality, we train a linear regression model using the corresponding modality of data, and further enforce the group-sparsity regularization on weights of those regression models for joint selection of common features across multiple modalities. Furthermore, we propose a discriminative regularization term based on the intra-class and inter-class Laplacian matrices to better use the discriminative information among subjects. To evaluate our proposed method, we perform extensive experiments on 202 subjects, including 51 AD patients, 99 MCI patients, and 52 healthy controls (HC), from the baseline MRI and FDG-PET image data of the Alzheimer's Disease Neuroimaging Initiative (ADNI). The experimental results show that our proposed method not only improves the classification performance, but also has potential to discover the disease-related biomarkers useful for diagnosis of disease, along with the comparison to several state-of-the-art methods for multi-modality based AD/MCI classification.

  13. A biological mechanism for Bayesian feature selection: Weight decay and raising the LASSO.

    Science.gov (United States)

    Connor, Patrick; Hollensen, Paul; Krigolson, Olav; Trappenberg, Thomas

    2015-07-01

    Biological systems are capable of learning that certain stimuli are valuable while ignoring the many that are not, and thus perform feature selection. In machine learning, one effective feature selection approach is the least absolute shrinkage and selection operator (LASSO) form of regularization, which is equivalent to assuming a Laplacian prior distribution on the parameters. We review how such Bayesian priors can be implemented in gradient descent as a form of weight decay, which is a biologically plausible mechanism for Bayesian feature selection. In particular, we describe a new prior that offsets or "raises" the Laplacian prior distribution. We evaluate this alongside the Gaussian and Cauchy priors in gradient descent using a generic regression task where there are few relevant and many irrelevant features. We find that raising the Laplacian leads to less prediction error because it is a better model of the underlying distribution. We also consider two biologically relevant online learning tasks, one synthetic and one modeled after the perceptual expertise task of Krigolson et al. (2009). Here, raising the Laplacian prior avoids the fast erosion of relevant parameters over the period following training because it only allows small weights to decay. This better matches the limited loss of association seen between days in the human data of the perceptual expertise task. Raising the Laplacian prior thus results in a biologically plausible form of Bayesian feature selection that is effective in biologically relevant contexts.

  14. Cost-Sensitive Feature Selection of Numeric Data with Measurement Errors

    Directory of Open Access Journals (Sweden)

    Hong Zhao

    2013-01-01

    Full Text Available Feature selection is an essential process in data mining applications since it reduces a model’s complexity. However, feature selection with various types of costs is still a new research topic. In this paper, we study the cost-sensitive feature selection problem of numeric data with measurement errors. The major contributions of this paper are fourfold. First, a new data model is built to address test costs and misclassification costs as well as error boundaries. It is distinguished from the existing models mainly on the error boundaries. Second, a covering-based rough set model with normal distribution measurement errors is constructed. With this model, coverings are constructed from data rather than assigned by users. Third, a new cost-sensitive feature selection problem is defined on this model. It is more realistic than the existing feature selection problems. Fourth, both backtracking and heuristic algorithms are proposed to deal with the new problem. Experimental results show the efficiency of the pruning techniques for the backtracking algorithm and the effectiveness of the heuristic algorithm. This study is a step toward realistic applications of the cost-sensitive learning.

  15. On the selection of optimal feature region set for robust digital image watermarking.

    Science.gov (United States)

    Tsai, Jen-Sheng; Huang, Win-Bin; Kuo, Yau-Hwang

    2011-03-01

    A novel feature region selection method for robust digital image watermarking is proposed in this paper. This method aims to select a nonoverlapping feature region set, which has the greatest robustness against various attacks and can preserve image quality as much as possible after watermarked. It first performs a simulated attacking procedure using some predefined attacks to evaluate the robustness of every candidate feature region. According to the evaluation results, it then adopts a track-with-pruning procedure to search a minimal primary feature set which can resist the most predefined attacks. In order to enhance its resistance to undefined attacks under the constraint of preserving image quality, the primary feature set is then extended by adding into some auxiliary feature regions. This work is formulated as a multidimensional knapsack problem and solved by a genetic algorithm based approach. The experimental results for StirMark attacks on some benchmark images support our expectation that the primary feature set can resist all the predefined attacks and its extension can enhance the robustness against undefined attacks. Comparing with some well-known feature-based methods, the proposed method exhibits better performance in robust digital watermarking.

  16. DWFS: A Wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm

    KAUST Repository

    Soufan, Othman

    2015-02-26

    Many scientific problems can be formulated as classification tasks. Data that harbor relevant information are usually described by a large number of features. Frequently, many of these features are irrelevant for the class prediction. The efficient implementation of classification models requires identification of suitable combinations of features. The smaller number of features reduces the problem\\'s dimensionality and may result in higher classification performance. We developed DWFS, a web-based tool that allows for efficient selection of features for a variety of problems. DWFS follows the wrapper paradigm and applies a search strategy based on Genetic Algorithms (GAs). A parallel GA implementation examines and evaluates simultaneously large number of candidate collections of features. DWFS also integrates various filteringmethods thatmay be applied as a pre-processing step in the feature selection process. Furthermore, weights and parameters in the fitness function of GA can be adjusted according to the application requirements. Experiments using heterogeneous datasets from different biomedical applications demonstrate that DWFS is fast and leads to a significant reduction of the number of features without sacrificing performance as compared to several widely used existing methods. DWFS can be accessed online at www.cbrc.kaust.edu.sa/dwfs.

  17. An Empirical Study of Wrappers for Feature Subset Selection based on a Parallel Genetic Algorithm: The Multi-Wrapper Model

    KAUST Repository

    Soufan, Othman

    2012-09-01

    Feature selection is the first task of any learning approach that is applied in major fields of biomedical, bioinformatics, robotics, natural language processing and social networking. In feature subset selection problem, a search methodology with a proper criterion seeks to find the best subset of features describing data (relevance) and achieving better performance (optimality). Wrapper approaches are feature selection methods which are wrapped around a classification algorithm and use a performance measure to select the best subset of features. We analyze the proper design of the objective function for the wrapper approach and highlight an objective based on several classification algorithms. We compare the wrapper approaches to different feature selection methods based on distance and information based criteria. Significant improvement in performance, computational time, and selection of minimally sized feature subsets is achieved by combining different objectives for the wrapper model. In addition, considering various classification methods in the feature selection process could lead to a global solution of desirable characteristics.

  18. Feature Selection based on Machine Learning in MRIs for Hippocampal Segmentation

    CERN Document Server

    Tangaro, Sabina; Brescia, Massimo; Cavuoti, Stefano; Chincarini, Andrea; Errico, Rosangela; Inglese, Paolo; Longo, Giuseppe; Maglietta, Rosalia; Tateo, Andrea; Riccio, Giuseppe; Bellotti, Roberto

    2015-01-01

    Neurodegenerative diseases are frequently associated with structural changes in the brain. Magnetic Resonance Imaging (MRI) scans can show these variations and therefore be used as a supportive feature for a number of neurodegenerative diseases. The hippocampus has been known to be a biomarker for Alzheimer disease and other neurological and psychiatric diseases. However, it requires accurate, robust and reproducible delineation of hippocampal structures. Fully automatic methods are usually the voxel based approach, for each voxel a number of local features were calculated. In this paper we compared four different techniques for feature selection from a set of 315 features extracted for each voxel: (i) filter method based on the Kolmogorov-Smirnov test; two wrapper methods, respectively, (ii) Sequential Forward Selection and (iii) Sequential Backward Elimination; and (iv) embedded method based on the Random Forest Classifier on a set of 10 T1-weighted brain MRIs and tested on an independent set of 25 subjects...

  19. Application of Fisher Score and mRMR Techniques for Feature Selection in Compressed Medical Images

    Directory of Open Access Journals (Sweden)

    Vamsidhar Enireddy

    2015-12-01

    Full Text Available In nowadays there is a large increase in the digital medical images and different medical imaging equipments are available for diagnoses, medical professionals are increasingly relying on computer aided techniques for both indexing these images and retrieving similar images from large repositories. To develop systems which are computationally less intensive without compromising on the accuracy from the high dimensional feature space is always challenging. In this paper an investigation is made on the retrieval of compressed medical images. Images are compressed using the visually lossless compression technique. Shape and texture features are extracted and best features are selected using the fisher technique and mRMR. Using these selected features RNN with BPTT was utilized for classification of the compressed images.

  20. Feature Selection for Natural Language Call Routing Based on Self-Adaptive Genetic Algorithm

    Science.gov (United States)

    Koromyslova, A.; Semenkina, M.; Sergienko, R.

    2017-02-01

    The text classification problem for natural language call routing was considered in the paper. Seven different term weighting methods were applied. As dimensionality reduction methods, the feature selection based on self-adaptive GA is considered. k-NN, linear SVM and ANN were used as classification algorithms. The tasks of the research are the following: perform research of text classification for natural language call routing with different term weighting methods and classification algorithms and investigate the feature selection method based on self-adaptive GA. The numerical results showed that the most effective term weighting is TRR. The most effective classification algorithm is ANN. Feature selection with self-adaptive GA provides improvement of classification effectiveness and significant dimensionality reduction with all term weighting methods and with all classification algorithms.

  1. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System

    Science.gov (United States)

    Partila, Pavol; Voznak, Miroslav; Tovarek, Jaromir

    2015-01-01

    The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency. PMID:26346654

  2. A Feature Selection Approach Based on Interclass and Intraclass Relative Contributions of Terms.

    Science.gov (United States)

    Zhou, Hongfang; Guo, Jie; Wang, Yinghui; Zhao, Minghua

    2016-01-01

    Feature selection plays a critical role in text categorization. During feature selecting, high-frequency terms and the interclass and intraclass relative contributions of terms all have significant effects on classification results. So we put forward a feature selection approach, IIRCT, based on interclass and intraclass relative contributions of terms in the paper. In our proposed algorithm, three critical factors, which are term frequency and the interclass relative contribution and the intraclass relative contribution of terms, are all considered synthetically. Finally, experiments are made with the help of kNN classifier. And the corresponding results on 20 NewsGroup and SougouCS corpora show that IIRCT algorithm achieves better performance than DF, t-Test, and CMFS algorithms.

  3. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System

    Directory of Open Access Journals (Sweden)

    Pavol Partila

    2015-01-01

    Full Text Available The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.

  4. Identity Recognition Algorithm Using Improved Gabor Feature Selection of Gait Energy Image

    Science.gov (United States)

    Chao, LIANG; Ling-yao, JIA; Dong-cheng, SHI

    2017-01-01

    This paper describes an effective gait recognition approach based on Gabor features of gait energy image. In this paper, the kernel Fisher analysis combined with kernel matrix is proposed to select dominant features. The nearest neighbor classifier based on whitened cosine distance is used to discriminate different gait patterns. The approach proposed is tested on the CASIA and USF gait databases. The results show that our approach outperforms other state of gait recognition approaches in terms of recognition accuracy and robustness.

  5. Functional connectivity supporting the selective maintenance of feature-location binding in visual working memory

    Directory of Open Access Journals (Sweden)

    Sachiko eTakahama

    2014-06-01

    Full Text Available Information on an object’s features bound to its location is very important for maintaining object representations in visual working memory. Interactions with dynamic multi-dimensional objects in an external environment require complex cognitive control, including the selective maintenance of feature-location binding. Here, we used event-related functional magnetic resonance imaging to investigate brain activity and functional connectivity related to the maintenance of complex feature-location binding. Participants were required to detect task-relevant changes in feature-location binding between objects defined by color, orientation, and location. We compared a complex binding task requiring complex feature-location binding (color-orientation-location with a simple binding task in which simple feature-location binding, such as color-location, was task-relevant and the other feature was task-irrelevant. Univariate analyses showed that the dorsolateral prefrontal cortex (DLPFC, hippocampus, and frontoparietal network were activated during the maintenance of complex feature-location binding. Functional connectivity analyses indicated cooperation between the inferior precentral sulcus (infPreCS, DLPFC, and hippocampus during the maintenance of complex feature-location binding. In contrast, the connectivity for the spatial updating of simple feature-location binding determined by reanalyzing the data from Takahama et al. (2010 demonstrated that the superior parietal lobule (SPL cooperated with the DLPFC and hippocampus. These results suggest that the connectivity for complex feature-location binding does not simply reflect general memory load and that the DLPFC and hippocampus flexibly modulate the dorsal frontoparietal network, depending on the task requirements, with the infPreCS involved in the maintenance of complex feature-location binding and the SPL involved in the spatial updating of simple feature-location binding.

  6. Feature Selection Applying Statistical and Neurofuzzy Methods to EEG-Based BCI.

    Science.gov (United States)

    Martinez-Leon, Juan-Antonio; Cano-Izquierdo, Jose-Manuel; Ibarrola, Julio

    2015-01-01

    This paper presents an investigation aimed at drastically reducing the processing burden required by motor imagery brain-computer interface (BCI) systems based on electroencephalography (EEG). In this research, the focus has moved from the channel to the feature paradigm, and a 96% reduction of the number of features required in the process has been achieved maintaining and even improving the classification success rate. This way, it is possible to build cheaper, quicker, and more portable BCI systems. The data set used was provided within the framework of BCI Competition III, which allows it to compare the presented results with the classification accuracy achieved in the contest. Furthermore, a new three-step methodology has been developed which includes a feature discriminant character calculation stage; a score, order, and selection phase; and a final feature selection step. For the first stage, both statistics method and fuzzy criteria are used. The fuzzy criteria are based on the S-dFasArt classification algorithm which has shown excellent performance in previous papers undertaking the BCI multiclass motor imagery problem. The score, order, and selection stage is used to sort the features according to their discriminant nature. Finally, both order selection and Group Method Data Handling (GMDH) approaches are used to choose the most discriminant ones.

  7. Explore Interregional EEG Correlations Changed by Sport Training Using Feature Selection

    Directory of Open Access Journals (Sweden)

    Jia Gao

    2016-01-01

    Full Text Available This paper investigated the interregional correlation changed by sport training through electroencephalography (EEG signals using the techniques of classification and feature selection. The EEG data are obtained from students with long-time professional sport training and normal students without sport training as baseline. Every channel of the 19-channel EEG signals is considered as a node in the brain network and Pearson Correlation Coefficients are calculated between every two nodes as the new features of EEG signals. Then, the Partial Least Square (PLS is used to select the top 10 most varied features and Pearson Correlation Coefficients of selected features are compared to show the difference of two groups. Result shows that the classification accuracy of two groups is improved from 88.13% by the method using measurement of EEG overall energy to 97.19% by the method using EEG correlation measurement. Furthermore, the features selected reveal that the most important interregional EEG correlation changed by training is the correlation between left inferior frontal and left middle temporal with a decreased value.

  8. Optimal Feature Space Selection in Detecting Epileptic Seizure based on Recurrent Quantification Analysis and Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Saleh LAshkari

    2016-06-01

    Full Text Available Selecting optimal features based on nature of the phenomenon and high discriminant ability is very important in the data classification problems. Since it doesn't require any assumption about stationary condition and size of the signal and the noise in Recurrent Quantification Analysis (RQA, it may be useful for epileptic seizure Detection. In this study, RQA was used to discriminate ictal EEG from the normal EEG where optimal features selected by combination of algorithm genetic and Bayesian Classifier. Recurrence plots of hundred samples in each two categories were obtained with five distance norms in this study: Euclidean, Maximum, Minimum, Normalized and Fixed Norm. In order to choose optimal threshold for each norm, ten threshold of ε was generated and then the best feature space was selected by genetic algorithm in combination with a bayesian classifier. The results shown that proposed method is capable of discriminating the ictal EEG from the normal EEG where for Minimum norm and 0.1˂ε˂1, accuracy was 100%. In addition, the sensitivity of proposed framework to the ε and the distance norm parameters was low. The optimal feature presented in this study is Trans which it was selected in most feature spaces with high accuracy.

  9. Less is more: Avoiding the LIBS dimensionality curse through judicious feature selection for explosive detection

    Science.gov (United States)

    Kumar Myakalwar, Ashwin; Spegazzini, Nicolas; Zhang, Chi; Kumar Anubham, Siva; Dasari, Ramachandra R.; Barman, Ishan; Kumar Gundawar, Manoj

    2015-08-01

    Despite its intrinsic advantages, translation of laser induced breakdown spectroscopy for material identification has been often impeded by the lack of robustness of developed classification models, often due to the presence of spurious correlations. While a number of classifiers exhibiting high discriminatory power have been reported, efforts in establishing the subset of relevant spectral features that enable a fundamental interpretation of the segmentation capability and avoid the ‘curse of dimensionality’ have been lacking. Using LIBS data acquired from a set of secondary explosives, we investigate judicious feature selection approaches and architect two different chemometrics classifiers -based on feature selection through prerequisite knowledge of the sample composition and genetic algorithm, respectively. While the full spectral input results in classification rate of ca.92%, selection of only carbon to hydrogen spectral window results in near identical performance. Importantly, the genetic algorithm-derived classifier shows a statistically significant improvement to ca. 94% accuracy for prospective classification, even though the number of features used is an order of magnitude smaller. Our findings demonstrate the impact of rigorous feature selection in LIBS and also hint at the feasibility of using a discrete filter based detector thereby enabling a cheaper and compact system more amenable to field operations.

  10. Texture feature selection with relevance learning to classify interstitial lung disease patterns

    Science.gov (United States)

    Huber, Markus B.; Bunte, Kerstin; Nagarajan, Mahesh B.; Biehl, Michael; Ray, Lawrence A.; Wismueller, Axel

    2011-03-01

    The Generalized Matrix Learning Vector Quantization (GMLVQ) is used to estimate the relevance of texture features in their ability to classify interstitial lung disease patterns in high-resolution computed tomography (HRCT) images. After a stochastic gradient descent, the GMLVQ algorithm provides a discriminative distance measure of relevance factors, which can account for pairwise correlations between different texture features and their importance for the classification of healthy and diseased patterns. Texture features were extracted from gray-level co-occurrence matrices (GLCMs), and were ranked and selected according to their relevance obtained by GMLVQ and, for comparison, to a mutual information (MI) criteria. A k-nearest-neighbor (kNN) classifier and a Support Vector Machine with a radial basis function kernel (SVMrbf) were optimized in a 10-fold crossvalidation for different texture feature sets. In our experiment with real-world data, the feature sets selected by the GMLVQ approach had a significantly better classification performance compared with feature sets selected by a MI ranking.

  11. Harnessing the Power of GPUs to Speed Up Feature Selection for Outlier Detection

    Institute of Scientific and Technical Information of China (English)

    Fatemeh Azmandian; Ayse Yilmazer; Jennifer G Dy; Javed A Aslam; David R Kaeli

    2014-01-01

    Acquiring a set of features that emphasize the differences between normal data points and outliers can drastically facilitate the task of identifying outliers. In our work, we present a novel non-parametric evaluation criterion for filter-based feature selection which has an eye towards the final goal of outlier detection. The proposed method seeks the subset of features that represent the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared with popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points.

  12. A ROC-based feature selection method for computer-aided detection and diagnosis

    Science.gov (United States)

    Wang, Songyuan; Zhang, Guopeng; Liao, Qimei; Zhang, Junying; Jiao, Chun; Lu, Hongbing

    2014-03-01

    Image-based computer-aided detection and diagnosis (CAD) has been a very active research topic aiming to assist physicians to detect lesions and distinguish them from benign to malignant. However, the datasets fed into a classifier usually suffer from small number of samples, as well as significantly less samples available in one class (have a disease) than the other, resulting in the classifier's suboptimal performance. How to identifying the most characterizing features of the observed data for lesion detection is critical to improve the sensitivity and minimize false positives of a CAD system. In this study, we propose a novel feature selection method mR-FAST that combines the minimal-redundancymaximal relevance (mRMR) framework with a selection metric FAST (feature assessment by sliding thresholds) based on the area under a ROC curve (AUC) generated on optimal simple linear discriminants. With three feature datasets extracted from CAD systems for colon polyps and bladder cancer, we show that the space of candidate features selected by mR-FAST is more characterizing for lesion detection with higher AUC, enabling to find a compact subset of superior features at low cost.

  13. A wavelet-based two-stage near-lossless coder.

    Science.gov (United States)

    Yea, Sehoon; Pearlman, William A

    2006-11-01

    In this paper, we present a two-stage near-lossless compression scheme. It belongs to the class of "lossy plus residual coding" and consists of a wavelet-based lossy layer followed by arithmetic coding of the quantized residual to guarantee a given L(infinity) error bound in the pixel domain. We focus on the selection of the optimum bit rate for the lossy layer to achieve the minimum total bit rate. Unlike other similar lossy plus lossless approaches using a wavelet-based lossy layer, the proposed method does not require iteration of decoding and inverse discrete wavelet transform in succession to locate the optimum bit rate. We propose a simple method to estimate the optimal bit rate, with a theoretical justification based on the critical rate argument from the rate-distortion theory and the independence of the residual error.

  14. Prey-Predator Model with Two-Stage Infection in Prey: Concerning Pest Control

    Directory of Open Access Journals (Sweden)

    Swapan Kumar Nandi

    2015-01-01

    Full Text Available A prey-predator model system is developed; specifically the disease is considered into the prey population. Here the prey population is taken as pest and the predators consume the selected pest. Moreover, we assume that the prey species is infected with a viral disease forming into susceptible and two-stage infected classes, and the early stage of infected prey is more vulnerable to predation by the predator. Also, it is assumed that the later stage of infected pests is not eaten by the predator. Different equilibria of the system are investigated and their stability analysis and Hopf bifurcation of the system around the interior equilibriums are discussed. A modified model has been constructed by considering some alternative source of food for the predator population and the dynamical behavior of the modified model has been investigated. We have demonstrated the analytical results by numerical analysis by taking some simulated set of parameter values.

  15. A Two-Stage Approach for Medical Supplies Intermodal Transportation in Large-Scale Disaster Responses

    Directory of Open Access Journals (Sweden)

    Junhu Ruan

    2014-10-01

    Full Text Available We present a two-stage approach for the “helicopters and vehicles” intermodal transportation of medical supplies in large-scale disaster responses. In the first stage, a fuzzy-based method and its heuristic algorithm are developed to select the locations of temporary distribution centers (TDCs and assign medial aid points (MAPs to each TDC. In the second stage, an integer-programming model is developed to determine the delivery routes. Numerical experiments verified the effectiveness of the approach, and observed several findings: (i More TDCs often increase the efficiency and utility of medical supplies; (ii It is not definitely true that vehicles should load more and more medical supplies in emergency responses; (iii The more contrasting the traveling speeds of helicopters and vehicles are, the more advantageous the intermodal transportation is.

  16. A Two-Stage LGSM for Three-Point BVPs of Second-Order ODEs

    Directory of Open Access Journals (Sweden)

    Chein-Shan Liu

    2008-08-01

    Full Text Available The study in this paper is a numerical integration of second-order three-point boundary value problems under two imposed nonlocal boundary conditions at t=t0, t=ξ, and t=t1 in a general setting, where t0<ξtwo-stage Lie-group shooting method for finding unknown initial conditions, which are obtained through an iterative solution of derived algebraic equations in terms of a weighting factor r∈(0,1. The best r is selected by matching the target with a minimal discrepancy. Numerical examples are examined to confirm that the new approach has high efficiency and accuracy with a fast speed of convergence. Even for multiple solutions, the present method is also effective to find them.

  17. A Two-Stage LGSM for Three-Point BVPs of Second-Order ODEs

    Directory of Open Access Journals (Sweden)

    Liu Chein-Shan

    2008-01-01

    Full Text Available Abstract The study in this paper is a numerical integration of second-order three-point boundary value problems under two imposed nonlocal boundary conditions at , , and in a general setting, where . We construct a two-stage Lie-group shooting method for finding unknown initial conditions, which are obtained through an iterative solution of derived algebraic equations in terms of a weighting factor . The best is selected by matching the target with a minimal discrepancy. Numerical examples are examined to confirm that the new approach has high efficiency and accuracy with a fast speed of convergence. Even for multiple solutions, the present method is also effective to find them.

  18. Alignment and characterization of the two-stage time delay compensating XUV monochromator

    CERN Document Server

    Eckstein, Martin; Kubin, Markus; Yang, Chung-Hsin; Frassetto, Fabio; Poletto, Luca; Vrakking, Marc J J; Kornilov, Oleg

    2016-01-01

    We present the design, implementation and alignment procedure for a two-stage time delay compensating monochromator. The setup spectrally filters the radiation of a high-order harmonic generation source providing wavelength-selected XUV pulses with a bandwidth of 300 to 600~meV in the photon energy range of 3 to 50~eV. XUV pulses as short as $12\\pm3$~fs are demonstrated. Transmission of the 400~nm (3.1~eV) light facilitates precise alignment of the monochromator. This alignment strategy together with the stable mechanical design of the motorized beamline components enables us to automatically scan the XUV photon energ in pump-probe experiments that require XUV beam pointing stability. The performance of the beamline is demonstrated by the generation of IR-assisted sidebands in XUV photoionization of argon atoms.

  19. Analysis and Selection of Features for Gesture Recognition Based on a Micro Wearable Device

    Directory of Open Access Journals (Sweden)

    Yinghui Zhou

    2012-01-01

    Full Text Available More and More researchers concerned about designing a health supporting system for elders that is light weight, no disturbing to user, and low computing complexity. In the paper, we introduced a micro wearable device based on a tri-axis accelerometer, which can detect acceleration change of human body based on the position of the device being set. Considering the flexibility of human finger, we put it on a finger to detect the finger gestures. 12 kinds of one-stroke finger gestures are defined according to the sensing characteristic of the accelerometer. Feature is a paramount factor in the recognition task. In the paper, gestures features both in time domain and frequency domain are described since features decide the recognition accuracy directly. Feature generation method and selection process is analyzed in detail to get the optimal feature subset from the candidate feature set. Experiment results indicate the feature subset can get satisfactory classification results of 90.08% accuracy using 12 features considering the recognition accuracy and dimension of feature set.

  20. Textural feature selection for enhanced detection of stationary humans in through-the-wall radar imagery

    Science.gov (United States)

    Chaddad, A.; Ahmad, F.; Amin, M. G.; Sevigny, P.; DiFilippo, D.

    2014-05-01

    Feature-based methods have been recently considered in the literature for detection of stationary human targets in through-the-wall radar imagery. Specifically, textural features, such as contrast, correlation, energy, entropy, and homogeneity, have been extracted from gray-level co-occurrence matrices (GLCMs) to aid in discriminating the true targets from multipath ghosts and clutter that closely mimic the target in size and intensity. In this paper, we address the task of feature selection to identify the relevant subset of features in the GLCM domain, while discarding those that are either redundant or confusing, thereby improving the performance of feature-based scheme to distinguish between targets and ghosts/clutter. We apply a Decision Tree algorithm to find the optimal combination of co-occurrence based textural features for the problem at hand. We employ a K-Nearest Neighbor classifier to evaluate the performance of the optimal textural feature based scheme in terms of its target and ghost/clutter discrimination capability and use real-data collected with the vehicle-borne multi-channel through-the-wall radar imaging system by Defence Research and Development Canada. For the specific data analyzed, it is shown that the identified dominant features yield a higher classification accuracy, with lower number of false alarms and missed detections, compared to the full GLCM based feature set.

  1. Cluster analysis based on dimensional information with applications to feature selection and classification

    Science.gov (United States)

    Eigen, D. J.; Fromm, F. R.; Northouse, R. A.

    1974-01-01

    A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.

  2. Cluster analysis based on dimensional information with applications to feature selection and classification

    Science.gov (United States)

    Eigen, D. J.; Fromm, F. R.; Northouse, R. A.

    1974-01-01

    A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.

  3. Diagnosis of Chronic Kidney Disease Based on Support Vector Machine by Feature Selection Methods.

    Science.gov (United States)

    Polat, Huseyin; Danaei Mehr, Homay; Cetin, Aydin

    2017-04-01

    As Chronic Kidney Disease progresses slowly, early detection and effective treatment are the only cure to reduce the mortality rate. Machine learning techniques are gaining significance in medical diagnosis because of their classification ability with high accuracy rates. The accuracy of classification algorithms depend on the use of correct feature selection algorithms to reduce the dimension of datasets. In this study, Support Vector Machine classification algorithm was used to diagnose Chronic Kidney Disease. To diagnose the Chronic Kidney Disease, two essential types of feature selection methods namely, wrapper and filter approaches were chosen to reduce the dimension of Chronic Kidney Disease dataset. In wrapper approach, classifier subset evaluator with greedy stepwise search engine and wrapper subset evaluator with the Best First search engine were used. In filter approach, correlation feature selection subset evaluator with greedy stepwise search engine and filtered subset evaluator with the Best First search engine were used. The results showed that the Support Vector Machine classifier by using filtered subset evaluator with the Best First search engine feature selection method has higher accuracy rate (98.5%) in the diagnosis of Chronic Kidney Disease compared to other selected methods.

  4. Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis.

    Science.gov (United States)

    Al-Rajab, Murad; Lu, Joan; Xu, Qiang

    2017-07-01

    This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process. In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these. It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%). It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Early Visual Cortex Dynamics during Top-Down Modulated Shifts of Feature-Selective Attention.

    Science.gov (United States)

    Müller, Matthias M; Trautmann, Mireille; Keitel, Christian

    2016-04-01

    Shifting attention from one color to another color or from color to another feature dimension such as shape or orientation is imperative when searching for a certain object in a cluttered scene. Most attention models that emphasize feature-based selection implicitly assume that all shifts in feature-selective attention underlie identical temporal dynamics. Here, we recorded time courses of behavioral data and steady-state visual evoked potentials (SSVEPs), an objective electrophysiological measure of neural dynamics in early visual cortex to investigate temporal dynamics when participants shifted attention from color or orientation toward color or orientation, respectively. SSVEPs were elicited by four random dot kinematograms that flickered at different frequencies. Each random dot kinematogram was composed of dashes that uniquely combined two features from the dimensions color (red or blue) and orientation (slash or backslash). Participants were cued to attend to one feature (such as color or orientation) and respond to coherent motion targets of the to-be-attended feature. We found that shifts toward color occurred earlier after the shifting cue compared with shifts toward orientation, regardless of the original feature (i.e., color or orientation). This was paralleled in SSVEP amplitude modulations as well as in the time course of behavioral data. Overall, our results suggest different neural dynamics during shifts of attention from color and orientation and the respective shifting destinations, namely, either toward color or toward orientation.

  6. Feature selection from short amino acid sequences in phosphorylation prediction problem

    Science.gov (United States)

    Wecławski, Jakub; Jankowski, Stanisław; Szymański, Zbigniew

    The paper describes solution of feature selection from amino acid sequences in phosphorylation prediction problem. We show that even for short sequences the variable selection leads to better classification performance. Moreover, the final simplicity of models allows for better data understanding and can be used by an expert for further analysis. The feature selection process is divided into two parts: i) the classification tree is used for finding the most relevant positions in amino acid sequences, ii) then the contrast pattern kernel is applied for pattern selection. This work summarizes the research made on classification of short amino acid sequences. The results of the research allowed us to propose a general scheme of amino acid sequence analysis.

  7. How can selection of biologically inspired features improve the performance of a robust object recognition model?

    Directory of Open Access Journals (Sweden)

    Masoud Ghodrati

    Full Text Available Humans can effectively and swiftly recognize objects in complex natural scenes. This outstanding ability has motivated many computational object recognition models. Most of these models try to emulate the behavior of this remarkable system. The human visual system hierarchically recognizes objects in several processing stages. Along these stages a set of features with increasing complexity is extracted by different parts of visual system. Elementary features like bars and edges are processed in earlier levels of visual pathway and as far as one goes upper in this pathway more complex features will be spotted. It is an important interrogation in the field of visual processing to see which features of an object are selected and represented by the visual cortex. To address this issue, we extended a hierarchical model, which is motivated by biology, for different object recognition tasks. In this model, a set of object parts, named patches, extracted in the intermediate stages. These object parts are used for training procedure in the model and have an important role in object recognition. These patches are selected indiscriminately from different positions of an image and this can lead to the extraction of non-discriminating patches which eventually may reduce the performance. In the proposed model we used an evolutionary algorithm approach to select a set of informative patches. Our reported results indicate that these patches are more informative than usual random patches. We demonstrate the strength of the proposed model on a range of object recognition tasks. The proposed model outperforms the original model in diverse object recognition tasks. It can be seen from the experiments that selected features are generally particular parts of target images. Our results suggest that selected features which are parts of target objects provide an efficient set for robust object recognition.

  8. How can selection of biologically inspired features improve the performance of a robust object recognition model?

    Science.gov (United States)

    Ghodrati, Masoud; Khaligh-Razavi, Seyed-Mahdi; Ebrahimpour, Reza; Rajaei, Karim; Pooyan, Mohammad

    2012-01-01

    Humans can effectively and swiftly recognize objects in complex natural scenes. This outstanding ability has motivated many computational object recognition models. Most of these models try to emulate the behavior of this remarkable system. The human visual system hierarchically recognizes objects in several processing stages. Along these stages a set of features with increasing complexity is extracted by different parts of visual system. Elementary features like bars and edges are processed in earlier levels of visual pathway and as far as one goes upper in this pathway more complex features will be spotted. It is an important interrogation in the field of visual processing to see which features of an object are selected and represented by the visual cortex. To address this issue, we extended a hierarchical model, which is motivated by biology, for different object recognition tasks. In this model, a set of object parts, named patches, extracted in the intermediate stages. These object parts are used for training procedure in the model and have an important role in object recognition. These patches are selected indiscriminately from different positions of an image and this can lead to the extraction of non-discriminating patches which eventually may reduce the performance. In the proposed model we used an evolutionary algorithm approach to select a set of informative patches. Our reported results indicate that these patches are more informative than usual random patches. We demonstrate the strength of the proposed model on a range of object recognition tasks. The proposed model outperforms the original model in diverse object recognition tasks. It can be seen from the experiments that selected features are generally particular parts of target images. Our results suggest that selected features which are parts of target objects provide an efficient set for robust object recognition.

  9. Efficient feature selection and multiclass classification with integrated instance and model based learning.

    Science.gov (United States)

    Liu, Zhenqiu; Bensmail, Halima; Tan, Ming

    2012-01-01

    Multiclass classification and feature (variable) selections are commonly encountered in many biological and medical applications. However, extending binary classification approaches to multiclass problems is not trivial. Instance-based methods such as the K nearest neighbor (KNN) can naturally extend to multiclass problems and usually perform well with unbalanced data, but suffer from the curse of dimensionality. Their performance is degraded when applied to high dimensional data. On the other hand, model-based methods such as logistic regression require the decomposition of the multiclass problem into several binary problems with one-vs.-one or one-vs.-rest schemes. Even though they can be applied to high dimensional data with L(1) or L(p) penalized methods, such approaches can only select independent features and the features selected with different binary problems are usually different. They also produce unbalanced classification problems with one vs. the rest scheme even if the original multiclass problem is balanced.By combining instance-based and model-based learning, we propose an efficient learning method with integrated KNN and constrained logistic regression (KNNLog) for simultaneous multiclass classification and feature selection. Our proposed method simultaneously minimizes the intra-class distance and maximizes the interclass distance with fewer estimated parameters. It is very efficient for problems with small sample size and unbalanced classes, a case common in many real applications. In addition, our model-based feature selection methods can identify highly correlated features simultaneously avoiding the multiplicity problem due to multiple tests. The proposed method is evaluated with simulation and real data including one unbalanced microRNA dataset for leukemia and one multiclass metagenomic dataset from the Human Microbiome Project (HMP). It performs well with limited computational experiments.

  10. Feature selection and multi-kernel learning for adaptive graph regularized nonnegative matrix factorization

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-09-20

    Nonnegative matrix factorization (NMF), a popular part-based representation technique, does not capture the intrinsic local geometric structure of the data space. Graph regularized NMF (GNMF) was recently proposed to avoid this limitation by regularizing NMF with a nearest neighbor graph constructed from the input data set. However, GNMF has two main bottlenecks. First, using the original feature space directly to construct the graph is not necessarily optimal because of the noisy and irrelevant features and nonlinear distributions of data samples. Second, one possible way to handle the nonlinear distribution of data samples is by kernel embedding. However, it is often difficult to choose the most suitable kernel. To solve these bottlenecks, we propose two novel graph-regularized NMF methods, AGNMFFS and AGNMFMK, by introducing feature selection and multiple-kernel learning to the graph regularized NMF, respectively. Instead of using a fixed graph as in GNMF, the two proposed methods learn the nearest neighbor graph that is adaptive to the selected features and learned multiple kernels, respectively. For each method, we propose a unified objective function to conduct feature selection/multi-kernel learning, NMF and adaptive graph regularization simultaneously. We further develop two iterative algorithms to solve the two optimization problems. Experimental results on two challenging pattern classification tasks demonstrate that the proposed methods significantly outperform state-of-the-art data representation methods.

  11. Research into a Feature Selection Method for Hyperspectral Imagery Using PSO and SVM

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Classification and recognition of hyperspectral remote sensing images is not the same as that of conventional multi-spectral remote sensing images.We propose, a novel feature selection and classification method for hyperspectral images by combining the global optimization ability of particle swarm optimization (PSO) algorithm and the superior classification performance of a support vector machine (SVM).Global optimal search performance of PSO is improved by using a chaotic optimization search technique.Granularity based grid search strategy is used to optimize the SVM model parameters.Parameter optimization and classification of the SVM are addressed using the training date corresponding to the feature subset.A false classification rate is adopted as a fitness function.Tests of feature selection and classification are carried out on a hyperspectral data set.Classification performances are also compared among different feature extraction methods commonly used today.Results indicate that this hybrid method has a higher classification accuracy and can effectively extract optimal bands.A feasible approach is provided for feature selection and classification of hyperspectral image data.

  12. Feature selection and multi-kernel learning for sparse representation on a manifold.

    Science.gov (United States)

    Wang, Jim Jing-Yan; Bensmail, Halima; Gao, Xin

    2014-03-01

    Sparse representation has been widely studied as a part-based data representation method and applied in many scientific and engineering fields, such as bioinformatics and medical imaging. It seeks to represent a data sample as a sparse linear combination of some basic items in a dictionary. Gao et al. (2013) recently proposed Laplacian sparse coding by regularizing the sparse codes with an affinity graph. However, due to the noisy features and nonlinear distribution of the data samples, the affinity graph constructed directly from the original feature space is not necessarily a reliable reflection of the intrinsic manifold of the data samples. To overcome this problem, we integrate feature selection and multiple kernel learning into the sparse coding on the manifold. To this end, unified objectives are defined for feature selection, multiple kernel learning, sparse coding, and graph regularization. By optimizing the objective functions iteratively, we develop novel data representation algorithms with feature selection and multiple kernel learning respectively. Experimental results on two challenging tasks, N-linked glycosylation prediction and mammogram retrieval, demonstrate that the proposed algorithms outperform the traditional sparse coding methods.

  13. Feature selection and multi-kernel learning for sparse representation on a manifold

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-03-01

    Sparse representation has been widely studied as a part-based data representation method and applied in many scientific and engineering fields, such as bioinformatics and medical imaging. It seeks to represent a data sample as a sparse linear combination of some basic items in a dictionary. Gao etal. (2013) recently proposed Laplacian sparse coding by regularizing the sparse codes with an affinity graph. However, due to the noisy features and nonlinear distribution of the data samples, the affinity graph constructed directly from the original feature space is not necessarily a reliable reflection of the intrinsic manifold of the data samples. To overcome this problem, we integrate feature selection and multiple kernel learning into the sparse coding on the manifold. To this end, unified objectives are defined for feature selection, multiple kernel learning, sparse coding, and graph regularization. By optimizing the objective functions iteratively, we develop novel data representation algorithms with feature selection and multiple kernel learning respectively. Experimental results on two challenging tasks, N-linked glycosylation prediction and mammogram retrieval, demonstrate that the proposed algorithms outperform the traditional sparse coding methods. © 2013 Elsevier Ltd.

  14. A hybrid feature selection approach for the early diagnosis of Alzheimer’s disease

    Science.gov (United States)

    Gallego-Jutglà, Esteve; Solé-Casals, Jordi; Vialatte, François-Benoît; Elgendi, Mohamed; Cichocki, Andrzej; Dauwels, Justin

    2015-02-01

    Objective. Recently, significant advances have been made in the early diagnosis of Alzheimer’s disease (AD) from electroencephalography (EEG). However, choosing suitable measures is a challenging task. Among other measures, frequency relative power (RP) and loss of complexity have been used with promising results. In the present study we investigate the early diagnosis of AD using synchrony measures and frequency RP on EEG signals, examining the changes found in different frequency ranges. Approach. We first explore the use of a single feature for computing the classification rate (CR), looking for the best frequency range. Then, we present a multiple feature classification system that outperforms all previous results using a feature selection strategy. These two approaches are tested in two different databases, one containing mild cognitive impairment (MCI) and healthy subjects (patients age: 71.9 ± 10.2, healthy subjects age: 71.7 ± 8.3), and the other containing Mild AD and healthy subjects (patients age: 77.6 ± 10.0 healthy subjects age: 69.4 ± 11.5). Main results. Using a single feature to compute CRs we achieve a performance of 78.33% for the MCI data set and of 97.56% for Mild AD. Results are clearly improved using the multiple feature classification, where a CR of 95% is found for the MCI data set using 11 features, and 100% for the Mild AD data set using four features. Significance. The new features selection method described in this work may be a reliable tool that could help to design a realistic system that does not require prior knowledge of a patient's status. With that aim, we explore the standardization of features for MCI and Mild AD data sets with promising results.

  15. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres-Focus on Feature Selection.

    Directory of Open Access Journals (Sweden)

    Hossam M Zawbaa

    Full Text Available Poly-lactide-co-glycolide (PLGA is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP, multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR. The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven.

  16. Feature Selection Strategy for Classification of Single-Trial EEG Elicited by Motor Imagery

    DEFF Research Database (Denmark)

    Prasad, Swati; Tan, Zheng-Hua; Prasad, Ramjee

    2011-01-01

    Brain-Computer Interface (BCI) provides new means of communication for people with motor disabilities by utilizing electroencephalographic activity. Selection of features from Electroencephalogram (EEG) signals for classification plays a key part in the development of BCI systems. In this paper, we...

  17. The Influence of Selected Personality and Workplace Features on Burnout among Nurse Academics

    Science.gov (United States)

    Kizilci, Sevgi; Erdogan, Vesile; Sozen, Emine

    2012-01-01

    This study aimed to determine the influence of selected individual and situational features on burnout among nurse academics. The Maslach Burnout Inventory was used to assess the burnout levels of academics. The sample population comprised 94 female participant. The emotion exhaustion (EE) score of the nurse academics was 16.43[plus or minus]5.97,…

  18. The Use of Self Organizing Map Method and Feature Selection in Image Database Classification System

    CERN Document Server

    Pratiwi, Dian

    2012-01-01

    This paper presents a technique in classifying the images into a number of classes or clusters desired by means of Self Organizing Map (SOM) Artificial Neural Network method. A number of 250 color images to be classified as previously done some processing, such as RGB to grayscale color conversion, color histogram, feature vector selection, and then classifying by the SOM Feature vector selection in this paper will use two methods, namely by PCA (Principal Component Analysis) and LSA (Latent Semantic Analysis) in which each of these methods would have taken the characteristic vector of 50, 100, and 150 from 256 initial feature vector into the process of color histogram. Then the selection will be processed into the SOM network to be classified into five classes using a learning rate of 0.5 and calculated accuracy. Classification of some of the test results showed that the highest percentage of accuracy obtained when using PCA and the selection of 100 feature vector that is equal to 88%, compared to when using...

  19. Attentional spreading to task-irrelevant object features: experimental support and a 3-step model of attention for object-based selection and feature-based processing modulation.

    Science.gov (United States)

    Wegener, Detlef; Galashan, Fingal Orlando; Aurich, Maike Kathrin; Kreiter, Andreas Kurt

    2014-01-01

    Directing attention to a specific feature of an object has been linked to different forms of attentional modulation. Object-based attention theory founds on the finding that even task-irrelevant features at the selected object are subject to attentional modulation, while feature-based attention theory proposes a global processing benefit for the selected feature even at other objects. Most studies investigated either the one or the other form of attention, leaving open the possibility that both object- and feature-specific attentional effects do occur at the same time and may just represent two sides of a single attention system. We here investigate this issue by testing attentional spreading within and across objects, using reaction time (RT) measurements to changes of attended and unattended features on both attended and unattended objects. We asked subjects to report color and speed changes occurring on one of two overlapping random dot patterns (RDPs), presented at the center of gaze. The key property of the stimulation was that only one of the features (e.g., motion direction) was unique for each object, whereas the other feature (e.g., color) was shared by both. The results of two experiments show that co-selection of unattended features even occurs when those features have no means for selecting the object. At the same time, they demonstrate that this processing benefit is not restricted to the selected object but spreads to the task-irrelevant one. We conceptualize these findings by a 3-step model of attention that assumes a task-dependent top-down gain, object-specific feature selection based on task- and binding characteristics, and a global feature-specific processing enhancement. The model allows for the unification of a vast amount of experimental results into a single model, and makes various experimentally testable predictions for the interaction of object- and feature-specific processes.

  20. Attentional spreading to task-irrelevant object features: Experimental support and a 3-step model of attention for object-based selection and feature-based processing modulation

    Directory of Open Access Journals (Sweden)

    Detlef eWegener

    2014-06-01

    Full Text Available Directing attention to a specific feature of an object has been linked to different forms of attentional modulation. Object-based attention theory founds on the finding that even task-irrelevant features at the selected object are subject to attentional modulation, while feature-based attention theory proposes a global processing benefit for the selected feature even at other objects. Most studies investigated either the one or the other form of attention, leaving open the possibility that both object- and feature-specific attentional effects do occur at the same time and may just represent two sides of a single attention system. We here investigate this issue by testing attentional spreading within and across objects, using reaction time measurements to changes of attended and unattended features on both attended and unattended objects. We asked subjects to report color and speed changes occurring on one of two overlapping random dot patterns, presented at the center of gaze. The key property of the stimulation was that only one of the features (e.g. motion direction was unique for each object, whereas the other feature (e.g. color was shared by both. The results of two experiments show that co-selection of unattended features even occurs when those features have no means for selecting the object. At the same time, they demonstrate that this processing benefit is not restricted to the selected object but spreads to the task-irrelevant one. We conceptualize these findings by a 3-step model of attention that assumes a task-dependent top-down gain, object-specific feature selection based on task- and binding characteristics, and a global feature-specific processing enhancement. The model allows for the unification of a vast amount of experimental results into a single model, and makes various experimentally testable predictions for the interaction of object- and feature-specific processes.

  1. Feature selection for outcome prediction in oesophageal cancer using genetic algorithm and random forest classifier.

    Science.gov (United States)

    Paul, Desbordes; Su, Ruan; Romain, Modzelewski; Sébastien, Vauclin; Pierre, Vera; Isabelle, Gardin

    2016-12-28

    The outcome prediction of patients can greatly help to personalize cancer treatment. A large amount of quantitative features (clinical exams, imaging, …) are potentially useful to assess the patient outcome. The challenge is to choose the most predictive subset of features. In this paper, we propose a new feature selection strategy called GARF (genetic algorithm based on random forest) extracted from positron emission tomography (PET) images and clinical data. The most relevant features, predictive of the therapeutic response or which are prognoses of the patient survival 3 years after the end of treatment, were selected using GARF on a cohort of 65 patients with a local advanced oesophageal cancer eligible for chemo-radiation therapy. The most relevant predictive results were obtained with a subset of 9 features leading to a random forest misclassification rate of 18±4% and an areas under the of receiver operating characteristic (ROC) curves (AUC) of 0.823±0.032. The most relevant prognostic results were obtained with 8 features leading to an error rate of 20±7% and an AUC of 0.750±0.108. Both predictive and prognostic results show better performances using GARF than using 4 other studied methods.

  2. Feature Selection for Better Identification of Subtypes of Guillain-Barré Syndrome

    Directory of Open Access Journals (Sweden)

    José Hernández-Torruco

    2014-01-01

    Full Text Available Guillain-Barré syndrome (GBS is a neurological disorder which has not been explored using clustering algorithms. Clustering algorithms perform more efficiently when they work only with relevant features. In this work, we applied correlation-based feature selection (CFS, chi-squared, information gain, symmetrical uncertainty, and consistency filter methods to select the most relevant features from a 156-feature real dataset. This dataset contains clinical, serological, and nerve conduction tests data obtained from GBS patients. The most relevant feature subsets, determined with each filter method, were used to identify four subtypes of GBS present in the dataset. We used partitions around medoids (PAM clustering algorithm to form four clusters, corresponding to the GBS subtypes. We applied the purity of each cluster as evaluation measure. After experimentation, symmetrical uncertainty and information gain determined a feature subset of seven variables. These variables conformed as a dataset were used as input to PAM and reached a purity of 0.7984. This result leads to a first characterization of this syndrome using computational techniques.

  3. A survey on filter techniques for feature selection in gene expression microarray analysis.

    Science.gov (United States)

    Lazar, Cosmin; Taminau, Jonatan; Meganck, Stijn; Steenhoff, David; Coletta, Alain; Molter, Colin; de Schaetzen, Virginie; Duque, Robin; Bersini, Hugues; Nowé, Ann

    2012-01-01

    A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyze data of very high dimension, usually hundreds or thousands of variables. Such data sets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging, or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers, and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities.

  4. Improving the performance of the ripper in insurance risk classification : A comparitive study using feature selection

    CERN Document Server

    Duma, Mlungisi; Marwala, Tshilidzi

    2011-01-01

    The Ripper algorithm is designed to generate rule sets for large datasets with many features. However, it was shown that the algorithm struggles with classification performance in the presence of missing data. The algorithm struggles to classify instances when the quality of the data deteriorates as a result of increasing missing data. In this paper, a feature selection technique is used to help improve the classification performance of the Ripper model. Principal component analysis and evidence automatic relevance determination techniques are used to improve the performance. A comparison is done to see which technique helps the algorithm improve the most. Training datasets with completely observable data were used to construct the model and testing datasets with missing values were used for measuring accuracy. The results showed that principal component analysis is a better feature selection for the Ripper in improving the classification performance.

  5. Human activity recognition based on feature selection in smart home using back-propagation algorithm.

    Science.gov (United States)

    Fang, Hongqing; He, Lei; Si, Hao; Liu, Peng; Xie, Xiaolei

    2014-09-01

    In this paper, Back-propagation(BP) algorithm has been used to train the feed forward neural network for human activity recognition in smart home environments, and inter-class distance method for feature selection of observed motion sensor events is discussed and tested. And then, the human activity recognition performances of neural network using BP algorithm have been evaluated and compared with other probabilistic algorithms: Naïve Bayes(NB) classifier and Hidden Markov Model(HMM). The results show that different feature datasets yield different activity recognition accuracy. The selection of unsuitable feature datasets increases the computational complexity and degrades the activity recognition accuracy. Furthermore, neural network using BP algorithm has relatively better human activity recognition performances than NB classifier and HMM.

  6. Feature-based and spatial attentional selection in visual working memory.

    Science.gov (United States)

    Heuer, Anna; Schubö, Anna

    2016-05-01

    The contents of visual working memory (VWM) can be modulated by spatial cues presented during the maintenance interval ("retrocues"). Here, we examined whether attentional selection of representations in VWM can also be based on features. In addition, we investigated whether the mechanisms of feature-based and spatial attention in VWM differ with respect to parallel access to noncontiguous locations. In two experiments, we tested the efficacy of valid retrocues relying on different kinds of information. Specifically, participants were presented with a typical spatial retrocue pointing to two locations, a symbolic spatial retrocue (numbers mapping onto two locations), and two feature-based retrocues: a color retrocue (a blob of the same color as two of the items) and a shape retrocue (an outline of the shape of two of the items). The two cued items were presented at either contiguous or noncontiguous locations. Overall retrocueing benefits, as compared to a neutral condition, were observed for all retrocue types. Whereas feature-based retrocues yielded benefits for cued items presented at both contiguous and noncontiguous locations, spatial retrocues were only effective when the cued items had been presented at contiguous locations. These findings demonstrate that attentional selection and updating in VWM can operate on different kinds of information, allowing for a flexible and efficient use of this limited system. The observation that the representations of items presented at noncontiguous locations could only be reliably selected with feature-based retrocues suggests that feature-based and spatial attentional selection in VWM rely on different mechanisms, as has been shown for attentional orienting in the external world.

  7. Cuckoo search optimisation for feature selection in cancer classification: a new approach.

    Science.gov (United States)

    Gunavathi, C; Premalatha, K

    2015-01-01

    Cuckoo Search (CS) optimisation algorithm is used for feature selection in cancer classification using microarray gene expression data. Since the gene expression data has thousands of genes and a small number of samples, feature selection methods can be used for the selection of informative genes to improve the classification accuracy. Initially, the genes are ranked based on T-statistics, Signal-to-Noise Ratio (SNR) and F-statistics values. The CS is used to find the informative genes from the top-m ranked genes. The classification accuracy of k-Nearest Neighbour (kNN) technique is used as the fitness function for CS. The proposed method is experimented and analysed with ten different cancer gene expression datasets. The results show that the CS gives 100% average accuracy for DLBCL Harvard, Lung Michigan, Ovarian Cancer, AML-ALL and Lung Harvard2 datasets and it outperforms the existing techniques in DLBCL outcome and prostate datasets.

  8. A new ensemble feature selection and its application to pattern classification

    Institute of Scientific and Technical Information of China (English)

    Dongbo ZHANG; Yaonan WANG

    2009-01-01

    Neural network ensemble based on rough sets reduct is proposed to decrease the computational complexity of conventional ensemble feature selection algorithm. First, a dynamic reduction technology combining genetic algorithm with resampling method is adopted to obtain reducts with good generalization ability. Second, Multiple BP neural networks based on different reducts are built as base classifiers. According to the idea of selective ensemble, the neural network ensemble with best generalization ability can be found by search strategies. Finally, classification based on neural network ensemble is implemented by combining the predictions of component networks with voting. The method has been verified in the experiment of remote sensing image and five UCI datasets classification. Compared with conventional ensemble feature selection algorithms, it costs less time and lower computing complexity, and the classification accuracy is satisfactory.

  9. Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection

    Directory of Open Access Journals (Sweden)

    Xiuquan Du

    2014-01-01

    Full Text Available Identifying cancer-associated mutations (driver mutations is critical for understanding the cellular function of cancer genome that leads to activation of oncogenes or inactivation of tumor suppressor genes. Many approaches are proposed which use supervised machine learning techniques for prediction with features obtained by some databases. However, often we do not know which features are important for driver mutations prediction. In this study, we propose a novel feature selection method (called DX from 126 candidate features’ set. In order to obtain the best performance, rotation forest algorithm was adopted to perform the experiment. On the train dataset which was collected from COSMIC and Swiss-Prot databases, we are able to obtain high prediction performance with 88.03% accuracy, 93.9% precision, and 81.35% recall when the 11 top-ranked features were used. Comparison with other various techniques in the TP53, EGFR, and Cosmic2plus datasets shows the generality of our method.

  10. Low-Complexity Discriminative Feature Selection From EEG Before and After Short-Term Memory Task.

    Science.gov (United States)

    Behzadfar, Neda; Firoozabadi, S Mohammad P; Badie, Kambiz

    2016-10-01

    A reliable and unobtrusive quantification of changes in cortical activity during short-term memory task can be used to evaluate the efficacy of interfaces and to provide real-time user-state information. In this article, we investigate changes in electroencephalogram signals in short-term memory with respect to the baseline activity. The electroencephalogram signals have been analyzed using 9 linear and nonlinear/dynamic measures. We applied statistical Wilcoxon examination and Davis-Bouldian criterion to select optimal discriminative features. The results show that among the features, the permutation entropy significantly increased in frontal lobe and the occipital second lower alpha band activity decreased during memory task. These 2 features reflect the same mental task; however, their correlation with memory task varies in different intervals. In conclusion, it is suggested that the combination of the 2 features would improve the performance of memory based neurofeedback systems. © EEG and Clinical Neuroscience Society (ECNS) 2016.

  11. Feature selection and definition for contours classification of thermograms in breast cancer detection

    Science.gov (United States)

    Jagodziński, Dariusz; Matysiewicz, Mateusz; Neumann, Łukasz; Nowak, Robert M.; Okuniewski, Rafał; Oleszkiewicz, Witold; Cichosz, Paweł

    2016-09-01

    This contribution introduces the method of cancer pathologies detection on breast skin temperature distribution images. The use of thermosensitive foils applied to the breasts skin allows to create thermograms, which displays the amount of infrared energy emitted by all breast cells. The significant foci of hyperthermia or inflammation are typical for cancer cells. That foci can be recognized on thermograms as a contours, which are the areas of higher temperature. Every contour can be converted to a feature set that describe it, using the raw, central, Hu, outline, Fourier and colour moments of image pixels processing. This paper defines also the new way of describing a set of contours through theirs neighbourhood relations. Contribution introduces moreover the way of ranking and selecting most relevant features. Authors used Neural Network with Gevrey`s concept and recursive feature elimination, to estimate feature importance.

  12. Sequential feature selection for detecting buried objects using forward looking ground penetrating radar

    Science.gov (United States)

    Shaw, Darren; Stone, Kevin; Ho, K. C.; Keller, James M.; Luke, Robert H.; Burns, Brian P.

    2016-05-01

    Forward looking ground penetrating radar (FLGPR) has the benefit of detecting objects at a significant standoff distance. The FLGPR signal is radiated over a large surface area and the radar signal return is often weak. Improving detection, especially for buried in road targets, while maintaining an acceptable false alarm rate remains to be a challenging task. Various kinds of features have been developed over the years to increase the FLGPR detection performance. This paper focuses on investigating the use of as many features as possible for detecting buried targets and uses the sequential feature selection technique to automatically choose the features that contribute most for improving performance. Experimental results using data collected at a government test site are presented.

  13. Train Stop Scheduling in a High-Speed Rail Network by Utilizing a Two-Stage Approach

    Directory of Open Access Journals (Sweden)

    Huiling Fu

    2012-01-01

    Full Text Available Among the most commonly used methods of scheduling train stops are practical experience and various “one-step” optimal models. These methods face problems of direct transferability and computational complexity when considering a large-scale high-speed rail (HSR network such as the one in China. This paper introduces a two-stage approach for train stop scheduling with a goal of efficiently organizing passenger traffic into a rational train stop pattern combination while retaining features of regularity, connectivity, and rapidity (RCR. Based on a three-level station classification definition, a mixed integer programming model and a train operating tactics descriptive model along with the computing algorithm are developed and presented for the two stages. A real-world numerical example is presented using the Chinese HSR network as the setting. The performance of the train stop schedule and the applicability of the proposed approach are evaluated from the perspective of maintaining RCR.

  14. Improved face representation by nonuniform multilevel selection of Gabor convolution features.

    Science.gov (United States)

    Du, Shan; Ward, Rabab Kreidieh

    2009-12-01

    Gabor wavelets are widely employed in face representation to decompose face images into their spatial-frequency domains. The Gabor wavelet transform, however, introduces very high dimensional data. To reduce this dimensionality, uniform sampling of Gabor features has traditionally been used. Since uniform sampling equally treats all the features, it can lead to a loss of important features while retaining trivial ones. In this paper, we propose a new face representation method that employs nonuniform multilevel selection of Gabor features. The proposed method is based on the local statistics of the Gabor features and is implemented using a coarse-to-fine hierarchical strategy. Gabor features that correspond to important face regions are automatically selected and sampled finer than other features. The nonuniformly extracted Gabor features are then classified using principal component analysis and/or linear discriminant analysis for the purpose of face recognition. To verify the effectiveness of the proposed method, experiments have been conducted on benchmark face image databases where the images vary in illumination, expression, pose, and scale. Compared with the methods that use the original gray-scale image with 4096-dimensional data and uniform sampling with 2560-dimensional data, the proposed method results in a significantly higher recognition rate, with a substantial lower dimension of around 700. The experimental results also show that the proposed method works well not only when multiple sample images are available for training but also when only one sample image is available for each person. The proposed face representation method has the advantages of low complexity, low dimensionality, and high discriminance.

  15. Multivariate Feature Selection for Predicting Scour-Related Bridge Damage using a Genetic Algorithm

    Science.gov (United States)

    Anderson, I.

    2015-12-01

    Scour and hydraulic damage are the most common cause of bridge failure, reported to be responsible for over 60% of bridge failure nationwide. Scour is a complex process, and is likely an epistatic function of both bridge and stream conditions that are both stationary and in dynamic flux. Bridge inspections, conducted regularly on bridges nationwide, rate bridge health assuming a static stream condition, and typically do not include dynamically changing geomorphological adjustments. The Vermont Agency of Natural Resources stream geomorphic assessment data could add value into the current bridge inspection and scour design. The 2011 bridge damage from Tropical Storm Irene served as a case study for feature selection to improve bridge scour damage prediction in extreme events. The bridge inspection (with over 200 features on more than 300 damaged and 2,000 non-damaged bridges), and the stream geomorphic assessment (with over 300 features on more than 5000 stream reaches) constitute "Big Data", and together have the potential to generate large numbers of combined features ("epistatic relationships") that might better predict scour-related bridge damage. The potential combined features pose significant computational challenges for traditional statistical techniques (e.g., multivariate logistic regression). This study uses a genetic algorithm to perform a search of the multivariate feature space to identify epistatic relationships that are indicative of bridge scour damage. The combined features identified could be used to improve bridge scour design, and to better monitor and rate bridge scour vulnerability.

  16. Feature Selection in Detection of Adverse Drug Reactions from the Health Improvement Network (THIN Database

    Directory of Open Access Journals (Sweden)

    Yihui Liu

    2015-02-01

    Full Text Available Adverse drug reaction (ADR is widely concerned for public health issue. ADRs are one of most common causes to withdraw some drugs from market. Prescription event monitoring (PEM is an important approach to detect the adverse drug reactions. The main problem to deal with this method is how to automatically extract the medical events or side effects from high-throughput medical events, which are collected from day to day clinical practice. In this study we propose a novel concept of feature matrix to detect the ADRs. Feature matrix, which is extracted from big medical data from The Health Improvement Network (THIN database, is created to characterize the medical events for the patients who take drugs. Feature matrix builds the foundation for the irregular and big medical data. Then feature selection methods are performed on feature matrix to detect the significant features. Finally the ADRs can be located based on the significant features. The experiments are carried out on three drugs: Atorvastatin, Alendronate, and Metoclopramide. Major side effects for each drug are detected and better performance is achieved compared to other computerized methods. The detected ADRs are based on computerized methods, further investigation is needed.

  17. Selection of Entropy Based Features for Automatic Analysis of Essential Tremor

    Directory of Open Access Journals (Sweden)

    Karmele López-de-Ipiña

    2016-05-01

    Full Text Available Biomedical systems produce biosignals that arise from interaction mechanisms. In a general form, those mechanisms occur across multiple scales, both spatial and temporal, and contain linear and non-linear information. In this framework, entropy measures are good candidates in order provide useful evidence about disorder in the system, lack of information in time-series and/or irregularity of the signals. The most common movement disorder is essential tremor (ET, which occurs 20 times more than Parkinson’s disease. Interestingly, about 50%–70% of the cases of ET have a genetic origin. One of the most used standard tests for clinical diagnosis of ET is Archimedes’ spiral drawing. This work focuses on the selection of non-linear biomarkers from such drawings and handwriting, and it is part of a wider cross study on the diagnosis of essential tremor, where our piece of research presents the selection of entropy features for early ET diagnosis. Classic entropy features are compared with features based on permutation entropy. Automatic analysis system settled on several Machine Learning paradigms is performed, while automatic features selection is implemented by means of ANOVA (analysis of variance test. The obtained results for early detection are promising and appear applicable to real environments.

  18. A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data

    Directory of Open Access Journals (Sweden)

    Rabia Aziz

    2016-06-01

    Full Text Available Feature (gene selection and classification of microarray data are the two most interesting machine learning challenges. In the present work two existing feature selection/extraction algorithms, namely independent component analysis (ICA and fuzzy backward feature elimination (FBFE are used which is a new combination of selection/extraction. The main objective of this paper is to select the independent components of the DNA microarray data using FBFE to improve the performance of support vector machine (SVM and Naïve Bayes (NB classifier, while making the computational expenses affordable. To show the validity of the proposed method, it is applied to reduce the number of genes for five DNA microarray datasets namely; colon cancer, acute leukemia, prostate cancer, lung cancer II, and high-grade glioma. Now these datasets are then classified using SVM and NB classifiers. Experimental results on these five microarray datasets demonstrate that gene selected by proposed approach, effectively improve the performance of SVM and NB classifiers in terms of classification accuracy. We compare our proposed method with principal component analysis (PCA as a standard extraction algorithm and find that the proposed method can obtain better classification accuracy, using SVM and NB classifiers with a smaller number of selected genes than the PCA. The curve between the average error rate and number of genes with each dataset represents the selection of required number of genes for the highest accuracy with our proposed method for both the classifiers. ROC shows best subset of genes for both the classifier of different datasets with propose method.

  19. EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION PREDICTION

    Directory of Open Access Journals (Sweden)

    Noura AlNuaimi

    2015-11-01

    Full Text Available Large amount of heterogeneous medical data is generated every day in various healthcare organizations. Those data could derive insights for improving monitoring and care delivery in the Intensive Care Unit. Conversely, these data presents a challenge in reducing this amount of data without information loss. Dimension reduction is considered the most popular approach for reducing data size and also to reduce noise and redundancies in data. In this paper, we are investigate the effect of the average laboratory test value and number of total laboratory in predicting patient deterioration in the Intensive Care Unit, where we consider laboratory tests as features. Choosing a subset of features would mean choosing the most important lab tests to perform. Thus, our approach uses state-of-the-art feature selection to identify the most discriminative attributes, where we would have a better understanding of patient deterioration problem. If the number of tests can be reduced by identifying the most important tests, then we could also identify the redundant tests. By omitting the redundant tests, observation time could be reduced and early treatment could be provided to avoid the risk. Additionally, unnecessary monetary cost would be avoided. We apply our technique on the publicly available MIMIC-II database and show the effectiveness of the feature selection. We also provide a detailed analysis of the best features identified by our approach.

  20. BLProt: Prediction of bioluminescent proteins based on support vector machine and relieff feature selection

    KAUST Repository

    Kandaswamy, Krishna Kumar

    2011-08-17

    Background: Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence.Results: In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated.Conclusion: BLProt achieves 80% accuracy from training (5 fold cross-validations) and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. 2011 Kandaswamy et al; licensee BioMed Central Ltd.

  1. Characterization of computer network events through simultaneous feature selection and clustering of intrusion alerts

    Science.gov (United States)

    Chen, Siyue; Leung, Henry; Dondo, Maxwell

    2014-05-01

    As computer network security threats increase, many organizations implement multiple Network Intrusion Detection Systems (NIDS) to maximize the likelihood of intrusion detection and provide a comprehensive understanding of intrusion activities. However, NIDS trigger a massive number of alerts on a daily basis. This can be overwhelming for computer network security analysts since it is a slow and tedious process to manually analyse each alert produced. Thus, automated and intelligent clustering of alerts is important to reveal the structural correlation of events by grouping alerts with common features. As the nature of computer network attacks, and therefore alerts, is not known in advance, unsupervised alert clustering is a promising approach to achieve this goal. We propose a joint optimization technique for feature selection and clustering to aggregate similar alerts and to reduce the number of alerts that analysts have to handle individually. More precisely, each identified feature is assigned a binary value, which reflects the feature's saliency. This value is treated as a hidden variable and incorporated into a likelihood function for clustering. Since computing the optimal solution of the likelihood function directly is analytically intractable, we use the Expectation-Maximisation (EM) algorithm to iteratively update the hidden variable and use it to maximize the expected likelihood. Our empirical results, using a labelled Defense Advanced Research Projects Agency (DARPA) 2000 reference dataset, show that the proposed method gives better results than the EM clustering without feature selection in terms of the clustering accuracy.

  2. Selection of clinical features for pattern recognition applied to gait analysis.

    Science.gov (United States)

    Altilio, Rosa; Paoloni, Marco; Panella, Massimo

    2017-04-01

    This paper deals with the opportunity of extracting useful information from medical data retrieved directly from a stereophotogrammetric system applied to gait analysis. A feature selection method to exhaustively evaluate all the possible combinations of the gait parameters is presented, in order to find the best subset able to classify among diseased and healthy subjects. This procedure will be used for estimating the performance of widely used classification algorithms, whose performance has been ascertained in many real-world problems with respect to well-known classification benchmarks, both in terms of number of selected features and classification accuracy. Precisely, support vector machine, Naive Bayes and K nearest neighbor classifiers can obtain the lowest classification error, with an accuracy greater than 97 %. For the considered classification problem, the whole set of features will be proved to be redundant and it can be significantly pruned. Namely, groups of 3 or 5 features only are able to preserve high accuracy when the aim is to check the anomaly of a gait. The step length and the swing speed are the most informative features for the gait analysis, but also cadence and stride may add useful information for the movement evaluation.

  3. Ant-cuckoo colony optimization for feature selection in digital mammogram.

    Science.gov (United States)

    Jona, J B; Nagaveni, N

    2014-01-15

    Digital mammogram is the only effective screening method to detect the breast cancer. Gray Level Co-occurrence Matrix (GLCM) textural features are extracted from the mammogram. All the features are not essential to detect the mammogram. Therefore identifying the relevant feature is the aim of this work. Feature selection improves the classification rate and accuracy of any classifier. In this study, a new hybrid metaheuristic named Ant-Cuckoo Colony Optimization a hybrid of Ant Colony Optimization (ACO) and Cuckoo Search (CS) is proposed for feature selection in Digital Mammogram. ACO is a good metaheuristic optimization technique but the drawback of this algorithm is that the ant will walk through the path where the pheromone density is high which makes the whole process slow hence CS is employed to carry out the local search of ACO. Support Vector Machine (SVM) classifier with Radial Basis Kernal Function (RBF) is done along with the ACO to classify the normal mammogram from the abnormal mammogram. Experiments are conducted in miniMIAS database. The performance of the new hybrid algorithm is compared with the ACO and PSO algorithm. The results show that the hybrid Ant-Cuckoo Colony Optimization algorithm is more accurate than the other techniques.

  4. Feature Selection and Classifier Parameters Estimation for EEG Signals Peak Detection Using Particle Swarm Optimization

    Directory of Open Access Journals (Sweden)

    Asrul Adam

    2014-01-01

    Full Text Available Electroencephalogram (EEG signal peak detection is widely used in clinical applications. The peak point can be detected using several approaches, including time, frequency, time-frequency, and nonlinear domains depending on various peak features from several models. However, there is no study that provides the importance of every peak feature in contributing to a good and generalized model. In this study, feature selection and classifier parameters estimation based on particle swarm optimization (PSO are proposed as a framework for peak detection on EEG signals in time domain analysis. Two versions of PSO are used in the study: (1 standard PSO and (2 random asynchronous particle swarm optimization (RA-PSO. The proposed framework tries to find the best combination of all the available features that offers good peak detection and a high classification rate from the results in the conducted experiments. The evaluation results indicate that the accuracy of the peak detection can be improved up to 99.90% and 98.59% for training and testing, respectively, as compared to the framework without feature selection adaptation. Additionally, the proposed framework based on RA-PSO offers a better and reliable classification rate as compared to standard PSO as it produces low variance model.

  5. Pattern Classification Using an Olfactory Model with PCA Feature Selection in Electronic Noses: Study and Application

    Directory of Open Access Journals (Sweden)

    Junbao Zheng

    2012-03-01

    Full Text Available Biologically-inspired models and algorithms are considered as promising sensor array signal processing methods for electronic noses. Feature selection is one of the most important issues for developing robust pattern recognition models in machine learning. This paper describes an investigation into the classification performance of a bionic olfactory model with the increase of the dimensions of input feature vector (outer factor as well as its parallel channels (inner factor. The principal component analysis technique was applied for feature selection and dimension reduction. Two data sets of three classes of wine derived from different cultivars and five classes of green tea derived from five different provinces of China were used for experiments. In the former case the results showed that the average correct classification rate increased as more principal components were put in to feature vector. In the latter case the results showed that sufficient parallel channels should be reserved in the model to avoid pattern space crowding. We concluded that 6~8 channels of the model with principal component feature vector values of at least 90% cumulative variance is adequate for a classification task of 3~5 pattern classes considering the trade-off between time consumption and classification rate.

  6. Joint feature-sample selection and robust diagnosis of Parkinson's disease from MRI data.

    Science.gov (United States)

    Adeli, Ehsan; Shi, Feng; An, Le; Wee, Chong-Yaw; Wu, Guorong; Wang, Tao; Shen, Dinggang

    2016-11-01

    Parkinson's disease (PD) is an overwhelming neurodegenerative disorder caused by deterioration of a neurotransmitter, known as dopamine. Lack of this chemical messenger impairs several brain regions and yields various motor and non-motor symptoms. Incidence of PD is predicted to double in the next two decades, which urges more research to focus on its early diagnosis and treatment. In this paper, we propose an approach to diagnose PD using magnetic resonance imaging (MRI) data. Specifically, we first introduce a joint feature-sample selection (JFSS) method for selecting an optimal subset of samples and features, to learn a reliable diagnosis model. The proposed JFSS model effectively discards poor samples and irrelevant features. As a result, the selected features play an important role in PD characterization, which will help identify the most relevant and critical imaging biomarkers for PD. Then, a robust classification framework is proposed to simultaneously de-noise the selected subset of features and samples, and learn a classification model. Our model can also de-noise testing samples based on the cleaned training data. Unlike many previous works that perform de-noising in an unsupervised manner, we perform supervised de-noising for both training and testing data, thus boosting the diagnostic accuracy. Experimental results on both synthetic and publicly available PD datasets show promising results. To evaluate the proposed method, we use the popular Parkinson's progression markers initiative (PPMI) database. Our results indicate that the proposed method can differentiate between PD and normal control (NC), and outperforms the competing methods by a relatively large margin. It is noteworthy to mention that our proposed framework can also be used for diagnosis of other brain disorders. To show this, we have also conducted experiments on the widely-used ADNI database. The obtained results indicate that our proposed method can identify the imaging biomarkers and

  7. Feature Subset Selection for Hot Method Prediction using Genetic Algorithm wrapped with Support Vector Machines

    Directory of Open Access Journals (Sweden)

    S. Johnson

    2011-01-01

    Full Text Available Problem statement: All compilers have simple profiling-based heuristics to identify and predict program hot methods and also to make optimization decisions. The major challenge in the profile-based optimization is addressing the problem of overhead. The aim of this work is to perform feature subset selection using Genetic Algorithms (GA to improve and refine the machine learnt static hot method predictive technique and to compare the performance of the new models against the simple heuristics. Approach: The relevant features for training the predictive models are extracted from an initial set of randomly selected ninety static program features, with the help of the GA wrapped with the predictive model using the Support Vector Machine (SVM, a Machine Learning (ML algorithm. Results: The GA-generated feature subsets containing thirty and twenty nine features respectively for the two predictive models when tested on MiBench predict Long Running Hot Methods (LRHM and frequently called hot methods (FCHM with the respective accuracies of 71% and 80% achieving an increase of 19% and 22%. Further, inlining of the predicted LRHM and FCHM improve the program performance by 3% and 5% as against 4% and 6% with Low Level Virtual Machines (LLVM default heuristics. When intra-procedural optimizations (IPO are performed on the predicted hot methods, this system offers a performance improvement of 5% and 4% as against 0% and 3% by LLVM default heuristics on LRHM and FCHM respectively. However, we observe an improvement of 36% in certain individual programs. Conclusion: Overall, the results indicate that the GA wrapped with SVM derived feature reduction improves the hot method prediction accuracy and that the technique of hot method prediction based optimization is potentially useful in selective optimization.

  8. Feature Selection by Merging Sequential Bidirectional Search into Relevance Vector Machine in Condition Monitoring

    Institute of Scientific and Technical Information of China (English)

    ZHANG Kui; DONG Yu; BALL Andrew

    2015-01-01

    For more accurate fault detection and diagnosis, there is an increasing trend to use a large number of sensors and to collect data at high frequency. This inevitably produces large-scale data and causes difficulties in fault classification. Actually, the classification methods are simply intractable when applied to high-dimensional condition monitoring data. In order to solve the problem, engineers have to resort to complicated feature extraction methods to reduce the dimensionality of data. However, the features transformed by the methods cannot be understood by the engineers due to a loss of the original engineering meaning. In this paper, other forms of dimensionality reduction technique(feature selection methods) are employed to identify machinery condition, based only on frequency spectrum data. Feature selection methods are usually divided into three main types: filter, wrapper and embedded methods. Most studies are mainly focused on the first two types, whilst the development and application of the embedded feature selection methods are very limited. This paper attempts to explore a novel embedded method. The method is formed by merging a sequential bidirectional search algorithm into scale parameters tuning within a kernel function in the relevance vector machine. To demonstrate the potential for applying the method to machinery fault diagnosis, the method is implemented to rolling bearing experimental data. The results obtained by using the method are consistent with the theoretical interpretation, proving that this algorithm has important engineering significance in revealing the correlation between the faults and relevant frequency features. The proposed method is a theoretical extension of relevance vector machine, and provides an effective solution to detect the fault-related frequency components with high efficiency.

  9. Feature selection by merging sequential bidirectional search into relevance vector machine in condition monitoring

    Science.gov (United States)

    Zhang, Kui; Dong, Yu; Ball, Andrew

    2015-11-01

    For more accurate fault detection and diagnosis, there is an increasing trend to use a large number of sensors and to collect data at high frequency. This inevitably produces large-scale data and causes difficulties in fault classification. Actually, the classification methods are simply intractable when applied to high-dimensional condition monitoring data. In order to solve the problem, engineers have to resort to complicated feature extraction methods to reduce the dimensionality of data. However, the features transformed by the methods cannot be understood by the engineers due to a loss of the original engineering meaning. In this paper, other forms of dimensionality reduction technique(feature selection methods) are employed to identify machinery condition, based only on frequency spectrum data. Feature selection methods are usually divided into three main types: filter, wrapper and embedded methods. Most studies are mainly focused on the first two types, whilst the development and application of the embedded feature selection methods are very limited. This paper attempts to explore a novel embedded method. The method is formed by merging a sequential bidirectional search algorithm into scale parameters tuning within a kernel function in the relevance vector machine. To demonstrate the potential for applying the method to machinery fault diagnosis, the method is implemented to rolling bearing experimental data. The results obtained by using the method are consistent with the theoretical interpretation, proving that this algorithm has important engineering significance in revealing the correlation between the faults and relevant frequency features. The proposed method is a theoretical extension of relevance vector machine, and provides an effective solution to detect the fault-related frequency components with high efficiency.

  10. SU-E-T-214: Predicting Plan Quality from Patient Geometry: Feature Selection and Inference Modeling.

    Science.gov (United States)

    Ruan, D; Shao, W; DeMarco, J; Kupelian, P; Low, D

    2012-06-01

    To investigate and develop methods to infer treatment plan quality from the geometric features of PTV/OAR structures; to discover and identify features of high prognostic values. This study explores the prognostic utility of geometric features of two categories: (1) absolute geometry, characterizing the volumes of single structures (PTV, OARs); and (2) relative geometry, based on the minimal 3D distance and/or overlapping volume between pairs of structures. Using prostate as a pilot site, we developed inference models to 'predict' SBRT plan quality of DVH end points. We developed and assessed (1) a full linear regression model based on both absolute and relative geometric features, (2) a sparsity-penalized linear regression model, (3) a linear regression model based on absolute geometry features only; (4) a learning-based nonparametric model. Cross-validation was used for both selecting the parameter values as well as quantifying the inference performance. The best inference method for each of the DVH end points was identified to reveal the structural and prognostic differences among them. For linear regression, using sparsity-regularization discovered geometric features that were mostly absolute, demonstrating their dominant linear prognostic utility. However, introducing relative geometric features improved the plan quality prediction by 15% for all DVH end points. In contrast, nonparametric models had a heavier dependence on relative geometry features. While linear regression based on both features sets predicted OAR DVH points slightly better, the nonparametric method excelled in predicting PTV coverage and conformality. The inference result from this study provides an 'expectation' for the plan quality before the planning is to be performed, providing reference goals for the planner and a baseline for detecting abnormality. The use of relative geometry complements the absolute geometry with information on spatial configuration of the PTV/OAR structures of

  11. Context-dependent feature selection for landmine detection with ground-penetrating radar

    Science.gov (United States)

    Ratto, Christopher R.; Torrione, Peter A.; Collins, Leslie M.

    2009-05-01

    We present a novel method for improving landmine detection with ground-penetrating radar (GPR) by utilizing a priori knowledge of environmental conditions to facilitate algorithm training. The goal of Context-Dependent Feature Selection (CDFS) is to mitigate performance degradation caused by environmental factors. CDFS operates on GPR data by first identifying its environmental context, and then fuses the decisions of several classifiers trained on context-dependent subsets of features. CDFS was evaluated on GPR data collected at several distinct sites under a variety of weather conditions. Results show that using prior environmental knowledge in this fashion has the potential to improve landmine detection.

  12. Highly accurate SVM model with automatic feature selection for word sense disambiguation

    Institute of Scientific and Technical Information of China (English)

    王浩; 陈贵林; 吴连献

    2004-01-01

    A novel algorithm for word sense disambiguation(WSD) that is based on SVM model improved with automatic feature selection is introduced. This learning method employs rich contextual features to predict the proper senses for specific words. Experimental results show that this algorithm can achieve an execellent performance on the set of data released during the SENSEEVAL-2 competition. We present the results obtained and discuss the transplantation of this algorithm to other languages such as Chinese. Experimental results on Chinese corpus show that our algorithm achieves an accuracy of 70.0 % even with small training data.

  13. AREA DETERMINATION OF DIABETIC FOOT ULCER IMAGES USING A CASCADED TWO-STAGE SVM BASED CLASSIFICATION.

    Science.gov (United States)

    Wang, Lei; Pedersen, Peder; Agu, Emmanuel; Strong, Diane; Tulu, Bengisu

    2016-11-23

    It is standard practice for clinicians and nurses to primarily assess patients' wounds via visual examination. This subjective method can be inaccurate in wound assessment and also represents a significant clinical workload. Hence, computer-based systems, especially implemented on mobile devices, can provide automatic, quantitative wound assessment and can thus be valuable for accurately monitoring wound healing status. Out of all wound assessment parameters, the measurement of the wound area is the most suitable for automated analysis. Most of the current wound boundary determination methods only process the image of the wound area along with a small amount of surrounding healthy skin. In this paper, we present a novel approach that uses Support Vector Machine (SVM) to determine the wound boundary on a foot ulcer image captured with an image capture box, which provides controlled lighting, angle and range conditions. The Simple Linear Iterative Clustering (SLIC) method is applied for effective super-pixel segmentation. A cascaded two-stage classifier is trained as follows: in the first stage a set of k binary SVM classifiers are trained and applied to different subsets of the entire training images dataset, and a set of incorrectly classified instances are collected. In the second stage, another binary SVM classifier is trained on the incorrectly classified set. We extracted various color and texture descriptors from super-pixels that are used as input for each stage in the classifier training. Specifically, we apply the color and Bag-of-Word (BoW) representation of local Dense SIFT features (DSIFT) as the descriptor for ruling out irrelevant regions (first stage), and apply color and wavelet based features as descriptors for distinguishing healthy tissue from wound regions (second stage). Finally, the detected wound boundary is refined by applying a Conditional Random Field (CRF) image processing technique. We have implemented the wound classification on a Nexus

  14. A Flexible Mechanism of Rule Selection Enables Rapid Feature-Based Reinforcement Learning.

    Science.gov (United States)

    Balcarras, Matthew; Womelsdorf, Thilo

    2016-01-01

    Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or color) and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or color. Two-thirds of subjects (n = 22/32) exhibited behavior that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behavior of other subjects (n = 10/32) was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioral rules by leveraging simple model-free reinforcement learning and context

  15. A kernel-based multivariate feature selection method for microarray data classification.

    Directory of Open Access Journals (Sweden)

    Shiquan Sun

    Full Text Available High dimensionality and small sample sizes, and their inherent risk of overfitting, pose great challenges for constructing efficient classifiers in microarray data classification. Therefore a feature selection technique should be conducted prior to data classification to enhance prediction performance. In general, filter methods can be considered as principal or auxiliary selection mechanism because of their simplicity, scalability, and low computational complexity. However, a series of trivial examples show that filter methods result in less accurate performance because they ignore the dependencies of features. Although few publications have devoted their attention to reveal the relationship of features by multivariate-based methods, these methods describe relationships among features only by linear methods. While simple linear combination relationship restrict the improvement in performance. In this paper, we used kernel method to discover inherent nonlinear correlations among features as well as between feature and target. Moreover, the number of orthogonal components was determined by kernel Fishers linear discriminant analysis (FLDA in a self-adaptive manner rather than by manual parameter settings. In order to reveal the effectiveness of our method we performed several experiments and compared the results between our method and other competitive multivariate-based features selectors. In our comparison, we used two classifiers (support vector machine, [Formula: see text]-nearest neighbor on two group datasets, namely two-class and multi-class datasets. Experimental results demonstrate that the performance of our method is better than others, especially on three hard-classify datasets, namely Wang's Breast Cancer, Gordon's Lung Adenocarcinoma and Pomeroy's Medulloblastoma.

  16. A flexible mechanism of rule selection enables rapid feature-based reinforcement learning

    Directory of Open Access Journals (Sweden)

    Matthew eBalcarras

    2016-03-01

    Full Text Available Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or colour and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or colour. Two-thirds of subjects (n=22/32 exhibited behaviour that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behaviour of other subjects (n=10/32 was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioural rules by leveraging simple model-free reinforcement

  17. 双向自动分支界限特征选择算法%Bidirectional Automated Branch and Bound Algorithm for Feature Selection

    Institute of Scientific and Technical Information of China (English)

    杨胜; 施鹏飞

    2005-01-01

    Feature selection is a process where a minimal feature subset is selected from an original feature set according to a certain measure. In this paper, feature relevancy is defined by an Inconsistency rate. A bidirectional automated branch and bound algorithm is presented. It is a new complete search algorithm for feature selection, which performs feature deletion and feature addition in parallel.it is fit for feature selection.

  18. Preemptive scheduling in a two-stage supply chain to minimize the makespan

    NARCIS (Netherlands)

    Pei, Jun; Fan, Wenjuan; Pardalos, Panos M.; Liu, Xinbao; Goldengorin, Boris; Yang, Shanlin

    2015-01-01

    This paper deals with the problem of preemptive scheduling in a two-stage supply chain framework. The supply chain environment contains two stages: production and transportation. In the production stage jobs are processed on a manufacturer's bounded serial batching machine, preemptions are allowed,

  19. Inference for feature selection using the Lasso with high-dimensional data

    DEFF Research Database (Denmark)

    Brink-Jensen, Kasper; Ekstrøm, Claus Thorn

    2014-01-01

    Penalized regression models such as the Lasso have proved useful for variable selection in many fields - especially for situations with high-dimensional data where the numbers of predictors far exceeds the number of observations. These methods identify and rank variables of importance but do...... not generally provide any inference of the selected variables. Thus, the variables selected might be the "most important" but need not be significant. We propose a significance test for the selection found by the Lasso. We introduce a procedure that computes inference and p-values for features chosen...... by the Lasso. This method rephrases the null hypothesis and uses a randomization approach which ensures that the error rate is controlled even for small samples. We demonstrate the ability of the algorithm to compute $p$-values of the expected magnitude with simulated data using a multitude of scenarios...

  20. Feature Selection and Blind Source Separation in an EEG-Based Brain-Computer Interface

    Directory of Open Access Journals (Sweden)

    Michael H. Thaut

    2005-11-01

    Full Text Available Most EEG-based BCI systems make use of well-studied patterns of brain activity. However, those systems involve tasks that indirectly map to simple binary commands such as “yes” or “no” or require many weeks of biofeedback training. We hypothesized that signal processing and machine learning methods can be used to discriminate EEG in a direct “yes”/“no” BCI from a single session. Blind source separation (BSS and spectral transformations of the EEG produced a 180-dimensional feature space. We used a modified genetic algorithm (GA wrapped around a support vector machine (SVM classifier to search the space of feature subsets. The GA-based search found feature subsets that outperform full feature sets and random feature subsets. Also, BSS transformations of the EEG outperformed the original time series, particularly in conjunction with a subset search of both spaces. The results suggest that BSS and feature selection can be used to improve the performance of even a “direct,” single-session BCI.

  1. Feature selection using angle modulated simulated Kalman filter for peak classification of EEG signals.

    Science.gov (United States)

    Adam, Asrul; Ibrahim, Zuwairie; Mokhtar, Norrima; Shapiai, Mohd Ibrahim; Mubin, Marizan; Saad, Ismail

    2016-01-01

    In the existing electroencephalogram (EEG) signals peak classification research, the existing models, such as Dumpala, Acir, Liu, and Dingle peak models, employ different set of features. However, all these models may not be able to offer good performance for various applications and it is found to be problem dependent. Therefore, the objective of this study is to combine all the associated features from the existing models before selecting the best combination of features. A new optimization algorithm, namely as angle modulated simulated Kalman filter (AMSKF) will be employed as feature selector. Also, the neural network random weight method is utilized in the proposed AMSKF technique as a classifier. In the conducted experiment, 11,781 samples of peak candidate are employed in this study for the validation purpose. The samples are collected from three different peak event-related EEG signals of 30 healthy subjects; (1) single eye blink, (2) double eye blink, and (3) eye movement signals. The experimental results have shown that the proposed AMSKF feature selector is able to find the best combination of features and performs at par with the existing related studies of epileptic EEG events classification.

  2. Classification of features selected through Optimum Index Factor (OIF)for improving classification accuracy

    Institute of Scientific and Technical Information of China (English)

    Nilanchal Patel; Brijesh Kaushal

    2011-01-01

    The present investigation was performed to determine if the features selected through Optimum Index Factor (OIF) could provide improved classification accuracy of the various categories on the satellite images of the individual years as well as stacked images of two different years as compared to all the features considered together. Further, in order to determine if there occurs increase in the classification accuracy of the different categories with corresponding increase in the OIF values of the features extracted from both the individual years' and stacked images, we performed linear regression between the producer's accuracy (PA) of the various categories with the OIF values of the different combinations of the features. The investigations demonstrated that there occurs significant improvement in the PA of two impervious categories viz. moderate built-up and low density built-up determined from the classification of the bands and principal components associated with the highest OIF value as compared to all the bands and principal components for both the individual years' and stacked images respectively. Regression analyses exhibited positive trends between the regression coefficients and OIF values forthe various categories determined for the individual years' and stacked images respectively signifying the prevalence of direct relationship between the increase in the information content with corresponding increase in the OIF values. The research proved that features extracted through OIF from both the individual years' and stacked images are capable of providing significantly improved PA as compared to all the features pooled together.

  3. Bayesian and frequentist two-stage treatment strategies based on sequential failure times subject to interval censoring.

    Science.gov (United States)

    Thall, Peter F; Wooten, Leiko H; Logothetis, Christopher J; Millikan, Randall E; Tannir, Nizar M

    2007-11-20

    For many diseases, therapy involves multiple stages, with the treatment in each stage chosen adaptively based on the patient's current disease status and history of previous treatments and clinical outcomes. Physicians routinely use such multi-stage treatment strategies, also called dynamic treatment regimes or treatment policies. We present a Bayesian framework for a clinical trial comparing two-stage strategies based on the time to overall failure, defined as either second disease worsening or discontinuation of therapy. Each patient is randomized among a set of treatments at enrollment, and if disease worsening occurs the patient is then re-randomized among a set of treatments excluding the treatment received initially. The goal is to select the two-stage strategy having the largest average overall failure time. A parametric model is formulated to account for non-constant failure time hazards, regression of the second failure time on the patient's first worsening time, and the complications that the failure time in either stage may be interval censored and there may be a delay between first worsening and the start of the second stage of therapy. Four different criteria, two Bayesian and two frequentist, for selecting a best strategy are considered. The methods are applied to a trial comparing two-stage strategies for treating metastatic renal cancer, and a simulation study in the context of this trial is presented. Advantages and disadvantages of this design compared to standard methods are discussed.

  4. Partial imputation to improve predictive modelling in insurance risk classification using a hybrid positive selection algorithm and correlation-based feature selection

    CSIR Research Space (South Africa)

    Duma, M

    2013-09-01

    Full Text Available We propose a hybrid missing data imputation technique using positive selection and correlation-based feature selection for insurance data. The hybrid is used to help supervised learning methods improve their classification accuracy and resilience...

  5. Left thoracoscopic two-stage repair of tracheoesophageal fistula with a right aortic arch and a vascular ring

    Science.gov (United States)

    Oshima, Kazuo; Uchida, Hiroo; Tainaka, Takahisa; Tanano, Akihide; Shirota, Chiyoe; Yokota, Kazuki; Murase, Naruhiko; Shirotsuki, Ryo; Chiba, Kosuke; Hinoki, Akinari

    2017-01-01

    A right aortic arch (RAA) is found in 5% of neonates with tracheoesophageal fistulae (TEF) and may be associated with vascular rings. Oesophageal repairs for TEF with an RAA via the right chest often pose surgical difficulties. We report for the first time in the world a successful two-stage repair by left-sided thoracoscope for TEF with an RAA and a vascular ring. We switched from right to left thoracoscopy after finding an RAA. A proximal oesophageal pouch was hemmed into the vascular ring; therefore, we selected a two-stage repair. The TEF was resected and simple internal traction was placed into the oesophagus at the first stage. Detailed examination showed the patent ductus arteriosus (PDA) completing a vascular ring. The subsequent primary oesophago-oesophagostomy and dissection of PDA was performed by left-sided thoracoscope. Therefore, left thoracoscopic repair is safe and feasible for treating TEF with an RAA and a vascular ring. PMID:27143697

  6. Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data

    Science.gov (United States)

    Aorigele; Zeng, Weiming; Hong, Xiaomin

    2016-01-01

    Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes. PMID:27579323

  7. CO removal by two-stage methanation for polymer electrolyte fuel cell

    Institute of Scientific and Technical Information of China (English)

    Zhiyuan Li; Wanliang Mi; Juan Gong; Zhenlong Lu; Lihao Xu; Qingquan Su

    2008-01-01

    In order to remove CO to achieve lower CO content of below 10 ppm in the CO removal step of reformer for polymer electrolyte fuel cell (PEFC) co-generation systems, CO preferential methanation under various conditions were studied in this paper. Results showed that, with a single kind of catalyst, it was difficult to reach both CO removal depth and CO2 conversion ratio of below 5%. Thus, a two-stage methanation process applying two kinds of catalysts is proposed in this study, that is, one kind of catalyst with relatively low activity and high selectivity for the first stage at higher temperature, and another kind of catalyst with relatively high activity and high selectivity for the second stage at lower temperature. Experimental results showed that at the first stage CO content was decreased from 1% to below 0.1% at 250-300 ℃, and at the second stage to below 10 ppm at 150-185 ℃. CO2 conversion was kept less than 5%. At the same time, influence of inlet CO content and GHSV on CO removal depth was also discussed in this paper.

  8. A two-stage cascade model of BOLD responses in human visual cortex.

    Directory of Open Access Journals (Sweden)

    Kendrick N Kay

    Full Text Available Visual neuroscientists have discovered fundamental properties of neural representation through careful analysis of responses to controlled stimuli. Typically, different properties are studied and modeled separately. To integrate our knowledge, it is necessary to build general models that begin with an input image and predict responses to a wide range of stimuli. In this study, we develop a model that accepts an arbitrary band-pass grayscale image as input and predicts blood oxygenation level dependent (BOLD responses in early visual cortex as output. The model has a cascade architecture, consisting of two stages of linear and nonlinear operations. The first stage involves well-established computations-local oriented filters and divisive normalization-whereas the second stage involves novel computations-compressive spatial summation (a form of normalization and a variance-like nonlinearity that generates selectivity for second-order contrast. The parameters of the model, which are estimated from BOLD data, vary systematically across visual field maps: compared to primary visual cortex, extrastriate maps generally have larger receptive field size, stronger levels of normalization, and increased selectivity for second-order contrast. Our results provide insight into how stimuli are encoded and transformed in successive stages of visual processing.

  9. GalNAc-transferase specificity prediction based on feature selection method.

    Science.gov (United States)

    Lu, Lin; Niu, Bing; Zhao, Jun; Liu, Liang; Lu, Wen-Cong; Liu, Xiao-Jun; Li, Yi-Xue; Cai, Yu-Dong

    2009-02-01

    GalNAc-transferase can catalyze the biosynthesis of O-linked oligosaccharides. The specificity of GalNAc-transferase is composed of nine amino acid residues denoted by R4, R3, R2, R1, R0, R1', R2', R3', R4'. To predict whether the reducing monosaccharide will be covalently linked to the central residue R0(Ser or Thr), a new method based on feature selection has been proposed in our work. 277 nonapeptides from reference [Chou KC. A sequence-coupled vector-projection model for predicting the specificity of GalNAc-transferase. Protein Sci 1995;4:1365-83] are chosen for training set. Each nonapeptide is represented by hundreds of amino acid properties collected by Amino Acid Index database (http://www.genome.jp/aaindex) and transformed into a numeric vector with 4554 features. The Maximum Relevance Minimum Redundancy (mRMR) method combining with Incremental Feature Selection (IFS) and Feature Forward Selection (FFS) are then applied for feature selection. Nearest Neighbor Algorithm (NNA) is used to build prediction models. The optimal model contains 54 features and its correct rate tested by Jackknife cross-validation test reaches 91.34%. Final feature analysis indicates that amino acid residues at position R3' play the most important role in the recognition of GalNAc-transferase specificity, which were confirmed by the experiments [Elhammer AP, Poorman RA, Brown E, Maggiora LL, Hoogerheide JG, Kezdy FJ. The specificity of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase as inferred from a database of in vivo substrates and from the in vitro glycosylation of proteins and peptides. J Biol Chem 1993;268:10029-38; O'Connell BC, Hagen FK, Tabak LA. The influence of flanking sequence on the O-glycosylation of threonine in vitro. J Biol Chem 1992;267:25010-8; Yoshida A, Suzuki M, Ikenaga H, Takeuchi M. Discovery of the shortest sequence motif for high level mucin-type O-glycosylation. J Biol Chem 1997;272:16884-8]. Our method can be used as a tool for predicting O

  10. 2-DE combined with two-layer feature selection accurately establishes the origin of oolong tea.

    Science.gov (United States)

    Chien, Han-Ju; Chu, Yen-Wei; Chen, Chi-Wei; Juang, Yu-Min; Chien, Min-Wei; Liu, Chih-Wei; Wu, Chia-Chang; Tzen, Jason T C; Lai, Chien-Chen

    2016-11-15

    Taiwan is known for its high quality oolong tea. Because of high consumer demand, some tea manufactures mix lower quality leaves with genuine Taiwan oolong tea in order to increase profits. Robust scientific methods are, therefore, needed to verify the origin and quality of tea leaves. In this study, we investigated whether two-dimensional gel electrophoresis (2-DE) and nanoscale liquid chromatography/tandem mass spectroscopy (nano-LC/MS/MS) coupled with a two-layer feature selection mechanism comprising information gain attribute evaluation (IGAE) and support vector machine feature selection (SVM-FS) are useful in identifying characteristic proteins that can be used as markers of the original source of oolong tea. Samples in this study included oolong tea leaves from 23 different sources. We found that our method had an accuracy of 95.5% in correctly identifying the origin of the leaves. Overall, our method is a novel approach for determining the origin of oolong tea leaves.

  11. Intelligent feature selection techniques for pattern classification of Lamb wave signals

    Energy Technology Data Exchange (ETDEWEB)

    Hinders, Mark K.; Miller, Corey A. [College of William and Mary, Department of Applied Science, Williamsburg, Virginia 23187-8795 (United States)

    2014-02-18

    Lamb wave interaction with flaws is a complex, three-dimensional phenomenon, which often frustrates signal interpretation schemes based on mode arrival time shifts predicted by dispersion curves. As the flaw severity increases, scattering and mode conversion effects will often dominate the time-domain signals, obscuring available information about flaws because multiple modes may arrive on top of each other. Even for idealized flaw geometries the scattering and mode conversion behavior of Lamb waves is very complex. Here, multi-mode Lamb waves in a metal plate are propagated across a rectangular flat-bottom hole in a sequence of pitch-catch measurements corresponding to the double crosshole tomography geometry. The flaw is sequentially deepened, with the Lamb wave measurements repeated at each flaw depth. Lamb wave tomography reconstructions are used to identify which waveforms have interacted with the flaw and thereby carry information about its depth. Multiple features are extracted from each of the Lamb wave signals using wavelets, which are then fed to statistical pattern classification algorithms that identify flaw severity. In order to achieve the highest classification accuracy, an optimal feature space is required but it’s never known a priori which features are going to be best. For structural health monitoring we make use of the fact that physical flaws, such as corrosion, will only increase over time. This allows us to identify feature vectors which are topologically well-behaved by requiring that sequential classes “line up” in feature vector space. An intelligent feature selection routine is illustrated that identifies favorable class distributions in multi-dimensional feature spaces using computational homology theory. Betti numbers and formal classification accuracies are calculated for each feature space subset to establish a correlation between the topology of the class distribution and the corresponding classification accuracy.

  12. Effective feature selection of clinical and genetic to predict warfarin dose using artificial neural network

    Directory of Open Access Journals (Sweden)

    Mohammad Karim Sohrabi

    2016-03-01

    Full Text Available Background: Warfarin is one of the most common oral anticoagulant, which role is to prevent the clots. The dose of this medicine is very important because changes can be dangerous for patients. Diagnosis is difficult for physicians because increase and decrease in use of warfarin is so dangerous for patients. Identifying the clinical and genetic features involved in determining dose could be useful to predict using data mining techniques. The aim of this paper is to provide a convenient way to select the clinical and genetic features to determine the dose of warfarin using artificial neural networks (ANN and evaluate it in order to predict the dose patients. Methods: This experimental study, was investigate from April to May 2014 on 552 patients in Tehran Heart Center Hospital (THC candidates for warfarin anticoagulant therapy within the international normalized ratio (INR therapeutic target. Factors affecting the dose include clinical characteristics and genetic extracted, and different methods of feature selection based on genetic algorithm and particle swarm optimization (PSO and evaluation function neural networks in MATLAB (MathWorks, MA, USA, were performed. Results: Between algorithms used, particle swarm optimization algorithm accuracy was more appropriate, for the mean square error (MSE, root mean square error (RMSE and mean absolute error (MAE were 0.0262, 0.1621 and 0.1164, respectively. Conclusion: In this article, the most important characteristics were identified using methods of feature selection and the stable dose had been predicted based on artificial neural networks. The output is acceptable and with less features, it is possible to achieve the prediction warfarin dose accurately. Since the prescribed dose for the patients is important, the output of the obtained model can be used as a decision support system.

  13. Explore Interregional EEG Correlations Changed by Sport Training Using Feature Selection

    OpenAIRE

    2016-01-01

    This paper investigated the interregional correlation changed by sport training through electroencephalography (EEG) signals using the techniques of classification and feature selection. The EEG data are obtained from students with long-time professional sport training and normal students without sport training as baseline. Every channel of the 19-channel EEG signals is considered as a node in the brain network and Pearson Correlation Coefficients are calculated between every two nodes as the...

  14. Applications of feature selection. [development of classification algorithms for LANDSAT data

    Science.gov (United States)

    Guseman, L. F., Jr.

    1976-01-01

    The use of satellite-acquired (LANDSAT) multispectral scanner (MSS) data to conduct an inventory of some crop of economic interest such as wheat over a large geographical area is considered in relation to the development of accurate and efficient algorithms for data classification. The dimension of the measurement space and the computational load for a classification algorithm is increased by the use of multitemporal measurements. Feature selection/combination techniques used to reduce the dimensionality of the problem are described.

  15. Feature subset selection based on mahalanobis distance: a statistical rough set method

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    In order to select effective feature subsets for pattern classification, a novel statistics rough set method is presented based on generalized attribute reduction. Unlike classical reduction approaches, the objects in universe of discourse are signs of training sample sets and values of attributes are taken as statistical parameters. The binary relation and discernibility matrix for the reduction are induced by distance function. Furthermore, based on the monotony of the distance function defined by Mahalan...

  16. A combinational feature selection and ensemble neural network method for classification of gene expression data

    Directory of Open Access Journals (Sweden)

    Jiang Tianzi

    2004-09-01

    Full Text Available Abstract Background Microarray experiments are becoming a powerful tool for clinical diagnosis, as they have the potential to discover gene expression patterns that are characteristic for a particular disease. To date, this problem has received most attention in the context of cancer research, especially in tumor classification. Various feature selection methods and classifier design strategies also have been generally used and compared. However, most published articles on tumor classification have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. But, it has been verified that differently selected features reflect different aspects of the dataset and some selected features can obtain better solutions on some certain problems. At the same time, faced with a large amount of microarray data with little knowledge, it is difficult to find the intrinsic characteristics using traditional methods. In this paper, we attempt to introduce a combinational feature selection method in conjunction with ensemble neural networks to generally improve the accuracy and robustness of sample classification. Results We validate our new method on several recent publicly available datasets both with predictive accuracy of testing samples and through cross validation. Compared with the best performance of other current methods, remarkably improved results can be obtained using our new strategy on a wide range of different datasets. Conclusions Thus, we conclude that our methods can obtain more information in microarray data to get more accurate classification and also can help to extract the latent marker genes of the diseases for better diagnosis and treatment.

  17. Feature selection of seismic waveforms for long period event detection at Cotopaxi Volcano

    Science.gov (United States)

    Lara-Cueva, R. A.; Benítez, D. S.; Carrera, E. V.; Ruiz, M.; Rojo-Álvarez, J. L.

    2016-04-01

    Volcano Early Warning Systems (VEWS) have become a research topic in order to preserve human lives and material losses. In this setting, event detection criteria based on classification using machine learning techniques have proven useful, and a number of systems have been proposed in the literature. However, to the best of our knowledge, no comprehensive and principled study has been conducted to compare the influence of the many different sets of possible features that have been used as input spaces in previous works. We present an automatic recognition system of volcano seismicity, by considering feature extraction, event classification, and subsequent event detection, in order to reduce the processing time as a first step towards a high reliability automatic detection system in real-time. We compiled and extracted a comprehensive set of temporal, moving average, spectral, and scale-domain features, for separating long period seismic events from background noise. We benchmarked two usual kinds of feature selection techniques, namely, filter (mutual information and statistical dependence) and embedded (cross-validation and pruning), each of them by using suitable and appropriate classification algorithms such as k Nearest Neighbors (k-NN) and Decision Trees (DT). We applied this approach to the seismicity presented at Cotopaxi Volcano in Ecuador during 2009 and 2010. The best results were obtained by using a 15 s segmentation window, feature matrix in the frequency domain, and DT classifier, yielding 99% of detection accuracy and sensitivity. Selected features and their interpretation were consistent among different input spaces, in simple terms of amplitude and spectral content. Our study provides the framework for an event detection system with high accuracy and reduced computational requirements.

  18. Microcanonical Annealing and Threshold Accepting for Parameter Determination and Feature Selection of Support Vector Machines

    Directory of Open Access Journals (Sweden)

    Seyyid Ahmed Medjahed

    2016-12-01

    Full Text Available Support vector machine (SVM is a popular classification technique with many diverse applications. Parameter determination and feature selection significantly influences the classification accuracy rate and the SVM model quality. This paper proposes two novel approaches based on: Microcanonical Annealing (MA-SVM and Threshold Accepting (TA-SVM to determine the optimal value parameter and the relevant features subset, without reducing SVM classification accuracy. In order to evaluate the performance of MA-SVM and TA-SVM, several public datasets are employed to compute the classification accuracy rate. The proposed approaches were tested in the context of medical diagnosis. Also, we tested the approaches on DNA microarray datasets used for cancer diagnosis. The results obtained by the MA-SVM and TA-SVM algorithms are shown to be superior and have given a good performance in the DNA microarray data sets which are characterized by the large number of features. Therefore, the MA-SVM and TA-SVM approaches are well suited for parameter determination and feature selection in SVM.

  19. An Enhanced Grey Wolf Optimization Based Feature Selection Wrapped Kernel Extreme Learning Machine for Medical Diagnosis

    Science.gov (United States)

    Li, Qiang; Zhao, Xuehua; Cai, ZhenNao; Tong, Changfei; Liu, Wenbin; Tian, Xin

    2017-01-01

    In this study, a new predictive framework is proposed by integrating an improved grey wolf optimization (IGWO) and kernel extreme learning machine (KELM), termed as IGWO-KELM, for medical diagnosis. The proposed IGWO feature selection approach is used for the purpose of finding the optimal feature subset for medical data. In the proposed approach, genetic algorithm (GA) was firstly adopted to generate the diversified initial positions, and then grey wolf optimization (GWO) was used to update the current positions of population in the discrete searching space, thus getting the optimal feature subset for the better classification purpose based on KELM. The proposed approach is compared against the original GA and GWO on the two common disease diagnosis problems in terms of a set of performance metrics, including classification accuracy, sensitivity, specificity, precision, G-mean, F-measure, and the size of selected features. The simulation results have proven the superiority of the proposed method over the other two competitive counterparts. PMID:28246543

  20. Self-adaptive MOEA feature selection for classification of bankruptcy prediction data.

    Science.gov (United States)

    Gaspar-Cunha, A; Recio, G; Costa, L; Estébanez, C

    2014-01-01

    Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy). The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier.

  1. Oxygen Saturation and RR Intervals Feature Selection for Sleep Apnea Detection

    Directory of Open Access Journals (Sweden)

    Antonio G. Ravelo-García

    2015-05-01

    Full Text Available A diagnostic system for sleep apnea based on oxygen saturation and RR intervals obtained from the EKG (electrocardiogram is proposed with the goal to detect and quantify minute long segments of sleep with breathing pauses. We measured the discriminative capacity of combinations of features obtained from RR series and oximetry to evaluate improvements of the performance compared to oximetry-based features alone. Time and frequency domain variables derived from oxygen saturation (SpO2 as well as linear and non-linear variables describing the RR series have been explored in recordings from 70 patients with suspected sleep apnea. We applied forward feature selection in order to select a minimal set of variables that are able to locate patterns indicating respiratory pauses. Linear discriminant analysis (LDA was used to classify the presence of apnea during specific segments. The system will finally provide a global score indicating the presence of clinically significant apnea integrating the segment based apnea detection. LDA results in an accuracy of 87%; sensitivity of 76% and specificity of 91% (AUC = 0.90 with a global classification of 97% when only oxygen saturation is used. In case of additionally including features from the RR series; the system performance improves to an accuracy of 87%; sensitivity of 73% and specificity of 92% (AUC = 0.92, with a global classification rate of 100%.

  2. Feature selection and classification of multiparametric medical images using bagging and SVM

    Science.gov (United States)

    Fan, Yong; Resnick, Susan M.; Davatzikos, Christos

    2008-03-01

    This paper presents a framework for brain classification based on multi-parametric medical images. This method takes advantage of multi-parametric imaging to provide a set of discriminative features for classifier construction by using a regional feature extraction method which takes into account joint correlations among different image parameters; in the experiments herein, MRI and PET images of the brain are used. Support vector machine classifiers are then trained based on the most discriminative features selected from the feature set. To facilitate robust classification and optimal selection of parameters involved in classification, in view of the well-known "curse of dimensionality", base classifiers are constructed in a bagging (bootstrap aggregating) framework for building an ensemble classifier and the classification parameters of these base classifiers are optimized by means of maximizing the area under the ROC (receiver operating characteristic) curve estimated from their prediction performance on left-out samples of bootstrap sampling. This classification system is tested on a sex classification problem, where it yields over 90% classification rates for unseen subjects. The proposed classification method is also compared with other commonly used classification algorithms, with favorable results. These results illustrate that the methods built upon information jointly extracted from multi-parametric images have the potential to perform individual classification with high sensitivity and specificity.

  3. On a Variational Model for Selective Image Segmentation of Features with Infinite Perimeter

    Institute of Scientific and Technical Information of China (English)

    Lavdie RADA; Ke CHEN

    2013-01-01

    Variational models provide reliable formulation for segmentation of features and their boundaries in an image,following the seminal work of Mumford-Shah (1989,Commun.Pure Appl.Math.) on dividing a general surface into piecewise smooth sub-surfaces.A central idea of models based on this work is to minimize the length of feature's boundaries (i.e.,(H)1 Hausdorff measure).However there exist problems with irregular and oscillatory object boundaries,where minimizing such a length is not appropriate,as noted by Barchiesi et al.(2010,SIAM J.Multiscale Model.Simu.) who proposed to miminize (L)2 Lebesgue measure of the γ-neighborhood of the boundaries.This paper presents a dual level set selective segmentation model based on Barchiesi et al.(2010) to automatically select a local feature instead of all global features.Our model uses two level set functions:a global level set which segments all boundaries,and the local level set which evolves and finds the boundary of the object closest to the geometric constraints.Using real life images with oscillatory boundaries,we show qualitative results demonstrating the effectiveness of the proposed method.

  4. Self-Adaptive MOEA Feature Selection for Classification of Bankruptcy Prediction Data

    Directory of Open Access Journals (Sweden)

    A. Gaspar-Cunha

    2014-01-01

    Full Text Available Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy. The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier.

  5. Comparison of Genetic Algorithm, Particle Swarm Optimization and Biogeography-based Optimization for Feature Selection to Classify Clusters of Microcalcifications

    Science.gov (United States)

    Khehra, Baljit Singh; Pharwaha, Amar Partap Singh

    2016-06-01

    Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.

  6. Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression

    Science.gov (United States)

    Laimighofer, Michael; Krumsiek, Jan; Theis, Fabian J.

    2016-01-01

    Abstract With widespread availability of omics profiling techniques, the analysis and interpretation of high-dimensional omics data, for example, for biomarkers, is becoming an increasingly important part of clinical medicine because such datasets constitute a promising resource for predicting survival outcomes. However, early experience has shown that biomarkers often generalize poorly. Thus, it is crucial that models are not overfitted and give accurate results with new data. In addition, reliable detection of multivariate biomarkers with high predictive power (feature selection) is of particular interest in clinical settings. We present an approach that addresses both aspects in high-dimensional survival models. Within a nested cross-validation (CV), we fit a survival model, evaluate a dataset in an unbiased fashion, and select features with the best predictive power by applying a weighted combination of CV runs. We evaluate our approach using simulated toy data, as well as three breast cancer datasets, to predict the survival of breast cancer patients after treatment. In all datasets, we achieve more reliable estimation of predictive power for unseen cases and better predictive performance compared to the standard CoxLasso model. Taken together, we present a comprehensive and flexible framework for survival models, including performance estimation, final feature selection, and final model construction. The proposed algorithm is implemented in an open source R package (SurvRank) available on CRAN. PMID:26894327

  7. Advances in feature selection methods for hyperspectral image processing in food industry applications: a review.

    Science.gov (United States)

    Dai, Qiong; Cheng, Jun-Hu; Sun, Da-Wen; Zeng, Xin-An

    2015-01-01

    There is an increased interest in the applications of hyperspectral imaging (HSI) for assessing food quality, safety, and authenticity. HSI provides abundance of spatial and spectral information from foods by combining both spectroscopy and imaging, resulting in hundreds of contiguous wavebands for each spatial position of food samples, also known as the curse of dimensionality. It is desirable to employ feature selection algorithms for decreasing computation burden and increasing predicting accuracy, which are especially relevant in the development of online applications. Recently, a variety of feature selection algorithms have been proposed that can be categorized into three groups based on the searching strategy namely complete search, heuristic search and random search. This review mainly introduced the fundamental of each algorithm, illustrated its applications in hyperspectral data analysis in the food field, and discussed the advantages and disadvantages of these algorithms. It is hoped that this review should provide a guideline for feature selections and data processing in the future development of hyperspectral imaging technique in foods.

  8. A Two-Stage Bayesian Network Method for 3D Human Pose Estimation from Monocular Image Sequences

    Directory of Open Access Journals (Sweden)

    Wang Yuan-Kai

    2010-01-01

    Full Text Available Abstract This paper proposes a novel human motion capture method that locates human body joint position and reconstructs the human pose in 3D space from monocular images. We propose a two-stage framework including 2D and 3D probabilistic graphical models which can solve the occlusion problem for the estimation of human joint positions. The 2D and 3D models adopt directed acyclic structure to avoid error propagation of inference. Image observations corresponding to shape and appearance features of humans are considered as evidence for the inference of 2D joint positions in the 2D model. Both the 2D and 3D models utilize the Expectation Maximization algorithm to learn prior distributions of the models. An annealed Gibbs sampling method is proposed for the two-stage method to inference the maximum posteriori distributions of joint positions. The annealing process can efficiently explore the mode of distributions and find solutions in high-dimensional space. Experiments are conducted on the HumanEva dataset with image sequences of walking motion, which has challenges of occlusion and loss of image observations. Experimental results show that the proposed two-stage approach can efficiently estimate more accurate human poses.

  9. Contextual Classification of Point Clouds Using a Two-Stage Crf

    Science.gov (United States)

    Niemeyer, J.; Rottensteiner, F.; Soergel, U.; Heipke, C.

    2015-03-01

    In this investigation, we address the task of airborne LiDAR point cloud labelling for urban areas by presenting a contextual classification methodology based on a Conditional Random Field (CRF). A two-stage CRF is set up: in a first step, a point-based CRF is applied. The resulting labellings are then used to generate a segmentation of the classified points using a Conditional Euclidean Clustering algorithm. This algorithm combines neighbouring points with the same object label into one segment. The second step comprises the classification of these segments, again with a CRF. As the number of the segments is much smaller than the number of points, it is computationally feasible to integrate long range interactions into this framework. Additionally, two different types of interactions are introduced: one for the local neighbourhood and another one operating on a coarser scale. This paper presents the entire processing chain. We show preliminary results achieved using the Vaihingen LiDAR dataset from the ISPRS Benchmark on Urban Classification and 3D Reconstruction, which consists of three test areas characterised by different and challenging conditions. The utilised classification features are described, and the advantages and remaining problems of our approach are discussed. We also compare our results to those generated by a point-based classification and show that a slight improvement is obtained with this first implementation.

  10. A Two-Stage Queue Model to Optimize Layout of Urban Drainage System considering Extreme Rainstorms

    Directory of Open Access Journals (Sweden)

    Xinhua He

    2017-01-01

    Full Text Available Extreme rainstorm is a main factor to cause urban floods when urban drainage system cannot discharge stormwater successfully. This paper investigates distribution feature of rainstorms and draining process of urban drainage systems and uses a two-stage single-counter queue method M/M/1→M/D/1 to model urban drainage system. The model emphasizes randomness of extreme rainstorms, fuzziness of draining process, and construction and operation cost of drainage system. Its two objectives are total cost of construction and operation and overall sojourn time of stormwater. An improved genetic algorithm is redesigned to solve this complex nondeterministic problem, which incorporates with stochastic and fuzzy characteristics in whole drainage process. A numerical example in Shanghai illustrates how to implement the model, and comparisons with alternative algorithms show its performance in computational flexibility and efficiency. Discussions on sensitivity of four main parameters, that is, quantity of pump stations, drainage pipe diameter, rainstorm precipitation intensity, and confidence levels, are also presented to provide guidance for designing urban drainage system.

  11. Evidence that viral RNAs have evolved for efficient, two-stage packaging.

    Science.gov (United States)

    Borodavka, Alexander; Tuma, Roman; Stockley, Peter G

    2012-09-25

    Genome packaging is an essential step in virus replication and a potential drug target. Single-stranded RNA viruses have been thought to encapsidate their genomes by gradual co-assembly with capsid subunits. In contrast, using a single molecule fluorescence assay to monitor RNA conformation and virus assembly in real time, with two viruses from differing structural families, we have discovered that packaging is a two-stage process. Initially, the genomic RNAs undergo rapid and dramatic (approximately 20-30%) collapse of their solution conformations upon addition of cognate coat proteins. The collapse occurs with a substoichiometric ratio of coat protein subunits and is followed by a gradual increase in particle size, consistent with the recruitment of additional subunits to complete a growing capsid. Equivalently sized nonviral RNAs, including high copy potential in vivo competitor mRNAs, do not collapse. They do support particle assembly, however, but yield many aberrant structures in contrast to viral RNAs that make only capsids of the correct size. The collapse is specific to viral RNA fragments, implying that it depends on a series of specific RNA-protein interactions. For bacteriophage MS2, we have shown that collapse is driven by subsequent protein-protein interactions, consistent with the RNA-protein contacts occurring in defined spatial locations. Conformational collapse appears to be a distinct feature of viral RNA that has evolved to facilitate assembly. Aspects of this process mimic those seen in ribosome assembly.

  12. Multifunctional Solar Systems Based On Two-Stage Regeneration Absorbent Solution

    Directory of Open Access Journals (Sweden)

    Doroshenko A.V.

    2015-04-01

    Full Text Available The concepts of multifunctional dehumidification solar systems, heat supply, cooling, and air conditioning based on the open absorption cycle with direct absorbent regeneration developed. The solar systems based on preliminary drainage of current of air and subsequent evaporated cooling. The solar system using evaporative coolers both types (direct and indirect. The principle of two-stage regeneration of absorbent used in the solar systems, it used as the basis of liquid and gas-liquid solar collectors. The main principle solutions are designed for the new generation of gas-liquid solar collectors. Analysis of the heat losses in the gas-liquid solar collectors, due to the mechanism of convection and radiation is made. Optimal cost of gas and liquid, as well as the basic dimensions and configuration of the working channel of the solar collector identified. Heat and mass transfer devices, belonging to the evaporative cooling system based on the interaction between the film and the gas stream and the liquid therein. Multichannel structure of the polymeric materials used to create the tip. Evaporative coolers of water and air both types (direct and indirect are used in the cooling of the solar systems. Preliminary analysis of the possibilities of multifunctional solar absorption systems made reference to problems of cooling media and air conditioning on the basis of experimental data the authors. Designed solar systems feature low power consumption and environmental friendliness.

  13. A two-stage visual tracking algorithm using dual-template

    Directory of Open Access Journals (Sweden)

    Yu Xia

    2016-10-01

    Full Text Available Template matching and updates are crucial steps in visual object tracking. In this article, we propose a two-stage object tracking algorithm using a dual-template. By design, the initial state of a target can be estimated using a prior fixed template at the first stage with a particle-filter-based tracking framework. The use of prior templates maintains the stability of an object tracking algorithm, because it consists of invariant and important features. In the second step, a mean shift is used to gain the optimal location of the object with the stage update template. The stage template improves the ability of target recognition using a classified update method. The complementary of dual-template improves the quality of template matching and the performance of object tracking. Experimental results demonstrate that the proposed algorithm improves the tracking performance in terms of accuracy and robustness, and it exhibits good results in the presence of deformation, noise and occlusion.

  14. Kernel-based Joint Feature Selection and Max-Margin Classification for Early Diagnosis of Parkinson’s Disease

    Science.gov (United States)

    Adeli, Ehsan; Wu, Guorong; Saghafi, Behrouz; An, Le; Shi, Feng; Shen, Dinggang

    2017-01-01

    Feature selection methods usually select the most compact and relevant set of features based on their contribution to a linear regression model. Thus, these features might not be the best for a non-linear classifier. This is especially crucial for the tasks, in which the performance is heavily dependent on the feature selection techniques, like the diagnosis of neurodegenerative diseases. Parkinson’s disease (PD) is one of the most common neurodegenerative disorders, which progresses slowly while affects the quality of life dramatically. In this paper, we use the data acquired from multi-modal neuroimaging data to diagnose PD by investigating the brain regions, known to be affected at the early stages. We propose a joint kernel-based feature selection and classification framework. Unlike conventional feature selection techniques that select features based on their performance in the original input feature space, we select features that best benefit the classification scheme in the kernel space. We further propose kernel functions, specifically designed for our non-negative feature types. We use MRI and SPECT data of 538 subjects from the PPMI database, and obtain a diagnosis accuracy of 97.5%, which outperforms all baseline and state-of-the-art methods.

  15. Kernel-based Joint Feature Selection and Max-Margin Classification for Early Diagnosis of Parkinson’s Disease

    Science.gov (United States)

    Adeli, Ehsan; Wu, Guorong; Saghafi, Behrouz; An, Le; Shi, Feng; Shen, Dinggang

    2017-01-01

    Feature selection methods usually select the most compact and relevant set of features based on their contribution to a linear regression model. Thus, these features might not be the best for a non-linear classifier. This is especially crucial for the tasks, in which the performance is heavily dependent on the feature selection techniques, like