WorldWideScience

Sample records for identify samples classified

  1. Consensus of sample-balanced classifiers for identifying ligand-binding residue by co-evolutionary physicochemical characteristics of amino acids

    KAUST Repository

    Chen, Peng

    2013-01-01

    Protein-ligand binding is an important mechanism for some proteins to perform their functions, and those binding sites are the residues of proteins that physically bind to ligands. So far, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. Due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we constructed several balanced data sets, for each of which a random forest (RF)-based classifier was trained. The ensemble of these RF classifiers formed a sequence-based protein-ligand binding site predictor. Experimental results on CASP9 targets demonstrated that our method compared favorably with the state-of-the-art. © Springer-Verlag Berlin Heidelberg 2013.

  2. Classifier-Guided Sampling for Complex Energy System Optimization

    Energy Technology Data Exchange (ETDEWEB)

    Backlund, Peter B. [Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States); Eddy, John P. [Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)

    2015-09-01

    This report documents the results of a Laboratory Directed Research and Development (LDRD) effort enti tled "Classifier - Guided Sampling for Complex Energy System Optimization" that was conducted during FY 2014 and FY 2015. The goal of this proj ect was to develop, implement, and test major improvements to the classifier - guided sampling (CGS) algorithm. CGS is type of evolutionary algorithm for perform ing search and optimization over a set of discrete design variables in the face of one or more objective functions. E xisting evolutionary algorithms, such as genetic algorithms , may require a large number of o bjecti ve function evaluations to identify optimal or near - optimal solutions . Reducing the number of evaluations can result in significant time savings, especially if the objective function is computationally expensive. CGS reduce s the evaluation count by us ing a Bayesian network classifier to filter out non - promising candidate designs , prior to evaluation, based on their posterior probabilit ies . In this project, b oth the single - objective and multi - objective version s of the CGS are developed and tested on a set of benchm ark problems. As a domain - specific case study, CGS is used to design a microgrid for use in islanded mode during an extended bulk power grid outage.

  3. Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.

    Science.gov (United States)

    Tuo, Youlin; An, Ning; Zhang, Ming

    2018-03-01

    The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of PSVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several

  4. Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies

    Science.gov (United States)

    Theis, Fabian J.

    2017-01-01

    Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia. PMID:29312464

  5. Classifier-guided sampling for discrete variable, discontinuous design space exploration: Convergence and computational performance

    Energy Technology Data Exchange (ETDEWEB)

    Backlund, Peter B. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Shahan, David W. [HRL Labs., LLC, Malibu, CA (United States); Seepersad, Carolyn Conner [Univ. of Texas, Austin, TX (United States)

    2014-04-22

    A classifier-guided sampling (CGS) method is introduced for solving engineering design optimization problems with discrete and/or continuous variables and continuous and/or discontinuous responses. The method merges concepts from metamodel-guided sampling and population-based optimization algorithms. The CGS method uses a Bayesian network classifier for predicting the performance of new designs based on a set of known observations or training points. Unlike most metamodeling techniques, however, the classifier assigns a categorical class label to a new design, rather than predicting the resulting response in continuous space, and thereby accommodates nondifferentiable and discontinuous functions of discrete or categorical variables. The CGS method uses these classifiers to guide a population-based sampling process towards combinations of discrete and/or continuous variable values with a high probability of yielding preferred performance. Accordingly, the CGS method is appropriate for discrete/discontinuous design problems that are ill-suited for conventional metamodeling techniques and too computationally expensive to be solved by population-based algorithms alone. In addition, the rates of convergence and computational properties of the CGS method are investigated when applied to a set of discrete variable optimization problems. Results show that the CGS method significantly improves the rate of convergence towards known global optima, on average, when compared to genetic algorithms.

  6. Using lot quality-assurance sampling and area sampling to identify priority areas for trachoma control: Viet Nam.

    Science.gov (United States)

    Myatt, Mark; Mai, Nguyen Phuong; Quynh, Nguyen Quang; Nga, Nguyen Huy; Tai, Ha Huy; Long, Nguyen Hung; Minh, Tran Hung; Limburg, Hans

    2005-10-01

    To report on the use of lot quality-assurance sampling (LQAS) surveys undertaken within an area-sampling framework to identify priority areas for intervention with trachoma control activities in Viet Nam. The LQAS survey method for the rapid assessment of the prevalence of active trachoma was adapted for use in Viet Nam with the aim of classifying individual communes by the prevalence of active trachoma among children in primary school. School-based sampling was used; school sites to be sampled were selected using an area-sampling approach. A total of 719 communes in 41 districts in 18 provinces were surveyed. Survey staff found the LQAS survey method both simple and rapid to use after initial problems with area-sampling methods were identified and remedied. The method yielded a finer spatial resolution of prevalence than had been previously achieved in Viet Nam using semiquantitative rapid assessment surveys and multistage cluster-sampled surveys. When used with area-sampling techniques, the LQAS survey method has the potential to form the basis of survey instruments that can be used to efficiently target resources for interventions against active trachoma. With additional work, such methods could provide a generally applicable tool for effective programme planning and for the certification of the elimination of trachoma as a blinding disease.

  7. Gene expression-based classifiers identify Staphylococcus aureus infection in mice and humans.

    Directory of Open Access Journals (Sweden)

    Sun Hee Ahn

    Full Text Available Staphylococcus aureus causes a spectrum of human infection. Diagnostic delays and uncertainty lead to treatment delays and inappropriate antibiotic use. A growing literature suggests the host's inflammatory response to the pathogen represents a potential tool to improve upon current diagnostics. The hypothesis of this study is that the host responds differently to S. aureus than to E. coli infection in a quantifiable way, providing a new diagnostic avenue. This study uses Bayesian sparse factor modeling and penalized binary regression to define peripheral blood gene-expression classifiers of murine and human S. aureus infection. The murine-derived classifier distinguished S. aureus infection from healthy controls and Escherichia coli-infected mice across a range of conditions (mouse and bacterial strain, time post infection and was validated in outbred mice (AUC>0.97. A S. aureus classifier derived from a cohort of 94 human subjects distinguished S. aureus blood stream infection (BSI from healthy subjects (AUC 0.99 and E. coli BSI (AUC 0.84. Murine and human responses to S. aureus infection share common biological pathways, allowing the murine model to classify S. aureus BSI in humans (AUC 0.84. Both murine and human S. aureus classifiers were validated in an independent human cohort (AUC 0.95 and 0.92, respectively. The approach described here lends insight into the conserved and disparate pathways utilized by mice and humans in response to these infections. Furthermore, this study advances our understanding of S. aureus infection; the host response to it; and identifies new diagnostic and therapeutic avenues.

  8. Inter- and Intraexaminer Reliability in Identifying and Classifying Myofascial Trigger Points in Shoulder Muscles.

    Science.gov (United States)

    Nascimento, José Diego Sales do; Alburquerque-Sendín, Francisco; Vigolvino, Lorena Passos; Oliveira, Wandemberg Fortunato de; Sousa, Catarina de Oliveira

    2018-01-01

    To determine inter- and intraexaminer reliability of examiners without clinical experience in identifying and classifying myofascial trigger points (MTPs) in the shoulder muscles of subjects asymptomatic and symptomatic for unilateral subacromial impact syndrome (SIS). Within-day inter- and intraexaminer reliability study. Physical therapy department of a university. Fifty-two subjects participated in the study, 26 symptomatic and 26 asymptomatic for unilateral SIS. Two examiners, without experience for assessing MTPs, independent and blind to the clinical conditions of the subjects, assessed bilaterally the presence of MTPs (present or absent) in 6 shoulder muscles and classified them (latent or active) on the affected side of the symptomatic group. Each examiner performed the same assessment twice in the same day. Reliability was calculated through percentage agreement, prevalence- and bias-adjusted kappa (PABAK) statistics, and weighted kappa. Intraexaminer reliability in identifying MTPs for the symptomatic and asymptomatic groups was moderate to perfect (PABAK, .46-1 and .60-1, respectively). Interexaminer reliability was between moderate and almost perfect in the 2 groups (PABAK, .46-.92), except for the muscles of the symptomatic group, which were below these values. With respect to MTP classification, intraexaminer reliability was moderate to high for most muscles, but interexaminer reliability was moderate for only 1 muscle (weighted κ=.45), and between weak and reasonable for the rest (weighted κ=.06-.31). Intraexaminer reliability is acceptable in clinical practice to identify and classify MTPs. However, interexaminer reliability proved to be reliable only to identify MTPs, with the symptomatic side exhibiting lower values of reliability. Copyright © 2017 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  9. Identifying and Classifying Mobile Business Models Based on Meta-Synthesis Approach

    Directory of Open Access Journals (Sweden)

    Porrandokht Niroomand

    2012-03-01

    Full Text Available The appearance of mobile has provided unique opportunities and fields through the development and creation of businesses and has been able to create the new job opportunities. The current research tries to familiarize entrepreneures who are running the businesses especially in the area of mobile services with business models. These business models can familiarize them for implementing the new ideas and designs since they can enter to business market. Searching in many papers shows that there are no propitiated papers and researches that can identify, categorize and analyze the mobile business models. Consequently, this paper involves innovation. The first part of this paper presents the review about the concepts and theories about the different mobile generations, the mobile commerce and business models. Afterwards, 92 models are compared, interpreted, translated and combined using 33 papers, books based on two different criteria that are expert criterion and kind of product criterion. In the classification of models according to models that are presented by experts, the models are classified based on criteria such as business fields, business partners, the rate of dynamism, the kind of activity, the focus areas, the mobile generations, transparency, the type of operator activities, marketing and advertisements. The models that are classified based on the kind of product have been analyzed and classified at four different areas of mobile commerce including the content production, technology (software and hardware, network and synthetic.

  10. Toward a More Responsive Consumable Materiel Supply Chain: Leveraging New Metrics to Identify and Classify Items of Concern

    Science.gov (United States)

    2016-06-01

    MATERIEL SUPPLY CHAIN: LEVERAGING NEW METRICS TO IDENTIFY AND CLASSIFY ITEMS OF CONCERN by Andrew R. Haley June 2016 Thesis Advisor: Robert...IDENTIFY AND CLASSIFY ITEMS OF CONCERN 5. FUNDING NUMBERS 6. AUTHOR(S) Andrew R. Haley 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Naval...Supply Systems Command (NAVSUP), logistics, inventory, consumable, NSNs at Risk, Bad Actors, Bad Actors with Trend, items of concern , customer time

  11. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers.

    Science.gov (United States)

    McIntyre, Alexa B R; Ounit, Rachid; Afshinnekoo, Ebrahim; Prill, Robert J; Hénaff, Elizabeth; Alexander, Noah; Minot, Samuel S; Danko, David; Foox, Jonathan; Ahsanuddin, Sofia; Tighe, Scott; Hasan, Nur A; Subramanian, Poorani; Moffat, Kelly; Levy, Shawn; Lonardi, Stefano; Greenfield, Nick; Colwell, Rita R; Rosen, Gail L; Mason, Christopher E

    2017-09-21

    One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.

  12. CAGE peaks identified as true TSS by TSS classifier - FANTOM5 | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us FANTOM...p://ftp.biosciencedbc.jp/archive/fantom5/datafiles/phase1.3/extra/TSS_classifier/ File size: 32 MB Simple se...arch URL - Data acquisition method - Data analysis method TSS Classifier( http://sourceforge.net/p/tom...ase Description Download License Update History of This Database Site Policy | Contact Us CAGE peaks identified as true TSS by TSS classifier - FANTOM5 | LSDB Archive ...

  13. Passive sampling as a tool for identifying micro-organic compounds in groundwater.

    Science.gov (United States)

    Mali, N; Cerar, S; Koroša, A; Auersperger, P

    2017-09-01

    The paper presents the use of a simple and cost efficient passive sampling device with integrated active carbon with which to test the possibility of determining the presence of micro-organic compounds (MOs) in groundwater and identifying the potential source of pollution as well as the seasonal variability of contamination. Advantage of the passive sampler is to cover a long sampling period by integrating the pollutant concentration over time, and the consequently analytical costs over the monitoring period can be reduced substantially. Passive samplers were installed in 15 boreholes in the Maribor City area in Slovenia, with two sampling campaigns covered a period about one year. At all sampling sites in the first series a total of 103 compounds were detected, and 144 in the second series. Of all detected compounds the 53 most frequently detected were selected for further analysis. These were classified into eight groups based on the type of their source: Pesticides, Halogenated solvents, Non-halogenated solvents, Domestic and personal, Plasticizers and additives, Other industrial, Sterols and Natural compounds. The most frequently detected MO compounds in groundwater were tetrachloroethene and trichloroethene from the Halogenated solvents group. The most frequently detected among the compound's groups were pesticides. Analysis of frequency also showed significant differences between the two sampling series, with less frequent detections in the summer series. For the analysis to determine the origin of contamination three groups of compounds were determined according to type of use: agriculture, urban and industry. Frequency of detection indicates mixed land use in the recharge areas of sampling sites, which makes it difficult to specify the dominant origin of the compound. Passive sampling has proved to be useful tool with which to identify MOs in groundwater and for assessing groundwater quality. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Identifying and classifying quality-of-life tools for assessing pressure ulcers after spinal cord injury

    Science.gov (United States)

    Hitzig, Sander L.; Balioussis, Christina; Nussbaum, Ethne; McGillivray, Colleen F.; Catharine Craven, B.; Noreau, Luc

    2013-01-01

    Context Although pressure ulcers may negatively influence quality of life (QoL) post-spinal cord injury (SCI), our understanding of how to assess their impact is confounded by conceptual and measurement issues. To ensure that descriptions of pressure ulcer impact are appropriately characterized, measures should be selected according to the domains that they evaluate and the population and pathologies for which they are designed. Objective To conduct a systematic literature review to identify and classify outcome measures used to assess the impact of pressure ulcers on QoL after SCI. Methods Electronic databases (Medline/PubMed, CINAHL, and PsycInfo) were searched for studies published between 1975 and 2011. Identified outcome measures were classified as being either subjective or objective using a QoL model. Results Fourteen studies were identified. The majority of tools identified in these studies did not have psychometric evidence supporting their use in the SCI population with the exception of two objective measures, the Short-Form 36 and the Craig Handicap Assessment and Reporting Technique, and two subjective measures, the Life Situation Questionnaire-Revised and the Ferrans and Powers Quality of Life Index SCI-Version. Conclusion Many QoL outcome tools showed promise in being sensitive to the presence of pressure ulcers, but few of them have been validated for use with SCI. Prospective studies should employ more rigorous methods for collecting data on pressure ulcer severity and location to improve the quality of findings with regard to their impact on QoL. The Cardiff Wound Impact Schedule is a potential tool for assessing impact of pressure ulcers-post SCI. PMID:24090238

  15. Convolutional Neural Networks with Batch Normalization for Classifying Hi-hat, Snare, and Bass Percussion Sound Samples

    DEFF Research Database (Denmark)

    Gajhede, Nicolai; Beck, Oliver; Purwins, Hendrik

    2016-01-01

    After having revolutionized image and speech processing, convolu- tional neural networks (CNN) are now starting to become more and more successful in music information retrieval as well. We compare four CNN types for classifying a dataset of more than 3000 acoustic and synthesized samples...

  16. Classifying a smoker scale in adult daily and nondaily smokers.

    Science.gov (United States)

    Pulvers, Kim; Scheuermann, Taneisha S; Romero, Devan R; Basora, Brittany; Luo, Xianghua; Ahluwalia, Jasjit S

    2014-05-01

    Smoker identity, or the strength of beliefs about oneself as a smoker, is a robust marker of smoking behavior. However, many nondaily smokers do not identify as smokers, underestimating their risk for tobacco-related disease and resulting in missed intervention opportunities. Assessing underlying beliefs about characteristics used to classify smokers may help explain the discrepancy between smoking behavior and smoker identity. This study examines the factor structure, reliability, and validity of the Classifying a Smoker scale among a racially diverse sample of adult smokers. A cross-sectional survey was administered through an online panel survey service to 2,376 current smokers who were at least 25 years of age. The sample was stratified to obtain equal numbers of 3 racial/ethnic groups (African American, Latino, and White) across smoking level (nondaily and daily smoking). The Classifying a Smoker scale displayed a single factor structure and excellent internal consistency (α = .91). Classifying a Smoker scores significantly increased at each level of smoking, F(3,2375) = 23.68, p smoker identity, stronger dependence on cigarettes, greater health risk perceptions, more smoking friends, and were more likely to carry cigarettes. Classifying a Smoker scores explained unique variance in smoking variables above and beyond that explained by smoker identity. The present study supports the use of the Classifying a Smoker scale among diverse, experienced smokers. Stronger endorsement of characteristics used to classify a smoker (i.e., stricter criteria) was positively associated with heavier smoking and related characteristics. Prospective studies are needed to inform prevention and treatment efforts.

  17. Fungi identify the geographic origin of dust samples.

    Directory of Open Access Journals (Sweden)

    Neal S Grantham

    Full Text Available There is a long history of archaeologists and forensic scientists using pollen found in a dust sample to identify its geographic origin or history. Such palynological approaches have important limitations as they require time-consuming identification of pollen grains, a priori knowledge of plant species distributions, and a sufficient diversity of pollen types to permit spatial or temporal identification. We demonstrate an alternative approach based on DNA sequencing analyses of the fungal diversity found in dust samples. Using nearly 1,000 dust samples collected from across the continental U.S., our analyses identify up to 40,000 fungal taxa from these samples, many of which exhibit a high degree of geographic endemism. We develop a statistical learning algorithm via discriminant analysis that exploits this geographic endemicity in the fungal diversity to correctly identify samples to within a few hundred kilometers of their geographic origin with high probability. In addition, our statistical approach provides a measure of certainty for each prediction, in contrast with current palynology methods that are almost always based on expert opinion and devoid of statistical inference. Fungal taxa found in dust samples can therefore be used to identify the origin of that dust and, more importantly, we can quantify our degree of certainty that a sample originated in a particular place. This work opens up a new approach to forensic biology that could be used by scientists to identify the origin of dust or soil samples found on objects, clothing, or archaeological artifacts.

  18. Stable Sparse Classifiers Identify qEEG Signatures that Predict Learning Disabilities (NOS) Severity.

    Science.gov (United States)

    Bosch-Bayard, Jorge; Galán-García, Lídice; Fernandez, Thalia; Lirio, Rolando B; Bringas-Vega, Maria L; Roca-Stappung, Milene; Ricardo-Garcell, Josefina; Harmony, Thalía; Valdes-Sosa, Pedro A

    2017-01-01

    In this paper, we present a novel methodology to solve the classification problem, based on sparse (data-driven) regressions, combined with techniques for ensuring stability, especially useful for high-dimensional datasets and small samples number. The sensitivity and specificity of the classifiers are assessed by a stable ROC procedure, which uses a non-parametric algorithm for estimating the area under the ROC curve. This method allows assessing the performance of the classification by the ROC technique, when more than two groups are involved in the classification problem, i.e., when the gold standard is not binary. We apply this methodology to the EEG spectral signatures to find biomarkers that allow discriminating between (and predicting pertinence to) different subgroups of children diagnosed as Not Otherwise Specified Learning Disabilities (LD-NOS) disorder. Children with LD-NOS have notable learning difficulties, which affect education but are not able to be put into some specific category as reading (Dyslexia), Mathematics (Dyscalculia), or Writing (Dysgraphia). By using the EEG spectra, we aim to identify EEG patterns that may be related to specific learning disabilities in an individual case. This could be useful to develop subject-based methods of therapy, based on information provided by the EEG. Here we study 85 LD-NOS children, divided in three subgroups previously selected by a clustering technique over the scores of cognitive tests. The classification equation produced stable marginal areas under the ROC of 0.71 for discrimination between Group 1 vs. Group 2; 0.91 for Group 1 vs. Group 3; and 0.75 for Group 2 vs. Group1. A discussion of the EEG characteristics of each group related to the cognitive scores is also presented.

  19. Obscenity detection using haar-like features and Gentle Adaboost classifier.

    Science.gov (United States)

    Mustafa, Rashed; Min, Yang; Zhu, Dingju

    2014-01-01

    Large exposure of skin area of an image is considered obscene. This only fact may lead to many false images having skin-like objects and may not detect those images which have partially exposed skin area but have exposed erotogenic human body parts. This paper presents a novel method for detecting nipples from pornographic image contents. Nipple is considered as an erotogenic organ to identify pornographic contents from images. In this research Gentle Adaboost (GAB) haar-cascade classifier and haar-like features used for ensuring detection accuracy. Skin filter prior to detection made the system more robust. The experiment showed that, considering accuracy, haar-cascade classifier performs well, but in order to satisfy detection time, train-cascade classifier is suitable. To validate the results, we used 1198 positive samples containing nipple objects and 1995 negative images. The detection rates for haar-cascade and train-cascade classifiers are 0.9875 and 0.8429, respectively. The detection time for haar-cascade is 0.162 seconds and is 0.127 seconds for train-cascade classifier.

  20. Obscenity Detection Using Haar-Like Features and Gentle Adaboost Classifier

    Directory of Open Access Journals (Sweden)

    Rashed Mustafa

    2014-01-01

    Full Text Available Large exposure of skin area of an image is considered obscene. This only fact may lead to many false images having skin-like objects and may not detect those images which have partially exposed skin area but have exposed erotogenic human body parts. This paper presents a novel method for detecting nipples from pornographic image contents. Nipple is considered as an erotogenic organ to identify pornographic contents from images. In this research Gentle Adaboost (GAB haar-cascade classifier and haar-like features used for ensuring detection accuracy. Skin filter prior to detection made the system more robust. The experiment showed that, considering accuracy, haar-cascade classifier performs well, but in order to satisfy detection time, train-cascade classifier is suitable. To validate the results, we used 1198 positive samples containing nipple objects and 1995 negative images. The detection rates for haar-cascade and train-cascade classifiers are 0.9875 and 0.8429, respectively. The detection time for haar-cascade is 0.162 seconds and is 0.127 seconds for train-cascade classifier.

  1. Stable Sparse Classifiers Identify qEEG Signatures that Predict Learning Disabilities (NOS Severity

    Directory of Open Access Journals (Sweden)

    Jorge Bosch-Bayard

    2018-01-01

    Full Text Available In this paper, we present a novel methodology to solve the classification problem, based on sparse (data-driven regressions, combined with techniques for ensuring stability, especially useful for high-dimensional datasets and small samples number. The sensitivity and specificity of the classifiers are assessed by a stable ROC procedure, which uses a non-parametric algorithm for estimating the area under the ROC curve. This method allows assessing the performance of the classification by the ROC technique, when more than two groups are involved in the classification problem, i.e., when the gold standard is not binary. We apply this methodology to the EEG spectral signatures to find biomarkers that allow discriminating between (and predicting pertinence to different subgroups of children diagnosed as Not Otherwise Specified Learning Disabilities (LD-NOS disorder. Children with LD-NOS have notable learning difficulties, which affect education but are not able to be put into some specific category as reading (Dyslexia, Mathematics (Dyscalculia, or Writing (Dysgraphia. By using the EEG spectra, we aim to identify EEG patterns that may be related to specific learning disabilities in an individual case. This could be useful to develop subject-based methods of therapy, based on information provided by the EEG. Here we study 85 LD-NOS children, divided in three subgroups previously selected by a clustering technique over the scores of cognitive tests. The classification equation produced stable marginal areas under the ROC of 0.71 for discrimination between Group 1 vs. Group 2; 0.91 for Group 1 vs. Group 3; and 0.75 for Group 2 vs. Group1. A discussion of the EEG characteristics of each group related to the cognitive scores is also presented.

  2. Identifying aggressive prostate cancer foci using a DNA methylation classifier.

    Science.gov (United States)

    Mundbjerg, Kamilla; Chopra, Sameer; Alemozaffar, Mehrdad; Duymich, Christopher; Lakshminarasimhan, Ranjani; Nichols, Peter W; Aron, Manju; Siegmund, Kimberly D; Ukimura, Osamu; Aron, Monish; Stern, Mariana; Gill, Parkash; Carpten, John D; Ørntoft, Torben F; Sørensen, Karina D; Weisenberger, Daniel J; Jones, Peter A; Duddalwar, Vinay; Gill, Inderbir; Liang, Gangning

    2017-01-12

    Slow-growing prostate cancer (PC) can be aggressive in a subset of cases. Therefore, prognostic tools to guide clinical decision-making and avoid overtreatment of indolent PC and undertreatment of aggressive disease are urgently needed. PC has a propensity to be multifocal with several different cancerous foci per gland. Here, we have taken advantage of the multifocal propensity of PC and categorized aggressiveness of individual PC foci based on DNA methylation patterns in primary PC foci and matched lymph node metastases. In a set of 14 patients, we demonstrate that over half of the cases have multiple epigenetically distinct subclones and determine the primary subclone from which the metastatic lesion(s) originated. Furthermore, we develop an aggressiveness classifier consisting of 25 DNA methylation probes to determine aggressive and non-aggressive subclones. Upon validation of the classifier in an independent cohort, the predicted aggressive tumors are significantly associated with the presence of lymph node metastases and invasive tumor stages. Overall, this study provides molecular-based support for determining PC aggressiveness with the potential to impact clinical decision-making, such as targeted biopsy approaches for early diagnosis and active surveillance, in addition to focal therapy.

  3. LCC: Light Curves Classifier

    Science.gov (United States)

    Vo, Martin

    2017-08-01

    Light Curves Classifier uses data mining and machine learning to obtain and classify desired objects. This task can be accomplished by attributes of light curves or any time series, including shapes, histograms, or variograms, or by other available information about the inspected objects, such as color indices, temperatures, and abundances. After specifying features which describe the objects to be searched, the software trains on a given training sample, and can then be used for unsupervised clustering for visualizing the natural separation of the sample. The package can be also used for automatic tuning parameters of used methods (for example, number of hidden neurons or binning ratio). Trained classifiers can be used for filtering outputs from astronomical databases or data stored locally. The Light Curve Classifier can also be used for simple downloading of light curves and all available information of queried stars. It natively can connect to OgleII, OgleIII, ASAS, CoRoT, Kepler, Catalina and MACHO, and new connectors or descriptors can be implemented. In addition to direct usage of the package and command line UI, the program can be used through a web interface. Users can create jobs for ”training” methods on given objects, querying databases and filtering outputs by trained filters. Preimplemented descriptors, classifier and connectors can be picked by simple clicks and their parameters can be tuned by giving ranges of these values. All combinations are then calculated and the best one is used for creating the filter. Natural separation of the data can be visualized by unsupervised clustering.

  4. High dimensional classifiers in the imbalanced case

    DEFF Research Database (Denmark)

    Bak, Britta Anker; Jensen, Jens Ledet

    We consider the binary classification problem in the imbalanced case where the number of samples from the two groups differ. The classification problem is considered in the high dimensional case where the number of variables is much larger than the number of samples, and where the imbalance leads...... to a bias in the classification. A theoretical analysis of the independence classifier reveals the origin of the bias and based on this we suggest two new classifiers that can handle any imbalance ratio. The analytical results are supplemented by a simulation study, where the suggested classifiers in some...

  5. Identifying Sources of Scientific Knowledge: classifying non-source items in the WoS

    Energy Technology Data Exchange (ETDEWEB)

    Calero-Medina, C.M.

    2016-07-01

    The sources of scientific knowledge can be tracked using the references in scientific publications. For instance, the publications from the scientific journals covered by the Web of Science database (WoS) contain references to publications for which an indexed source record exist in the WoS (source items) or to references for which an indexed source record does not exist in the WoS (non-source items). The classification of the non-source items is the main objective of the work in progress presented here. Some other scholars have classified and identified non-source items with different purposes (e.g. Butler & Visser (2006); Liseé, Larivière & Archambault (2008); Nerderhof, van Leeuwen & van Raan (2010); Hicks & Wang (2013); Boyack & Klavans (2014)). But these studies are focused in specific source types, fields or set of papers. The work presented here is much broader in terms of the number of publications, source types and fields. (Author)

  6. A systematic procedure for identifying and classifying children with dyscalculia among primary school children in India.

    Science.gov (United States)

    Ramaa, S; Gowramma, I P

    2002-01-01

    This paper describes the procedures adopted by two independent studies in India for identifying and classifying children with dyscalculia in primary schools. For determining the presence of dyscalculia both inclusionary and exclusionary criteria were used. When other possible causes of arithmetic failure had been excluded, figures for dyscalculia came out as 5.98% (15 cases out of 251) in one study and 5.54% (78 out of 1408) in the second. It was found in the latter study that 40 out of the 78 (51.27%) also had reading and writing problems. The findings are discussed in the light of previous studies.

  7. REPTREE CLASSIFIER FOR IDENTIFYING LINK SPAM IN WEB SEARCH ENGINES

    Directory of Open Access Journals (Sweden)

    S.K. Jayanthi

    2013-01-01

    Full Text Available Search Engines are used for retrieving the information from the web. Most of the times, the importance is laid on top 10 results sometimes it may shrink as top 5, because of the time constraint and reliability on the search engines. Users believe that top 10 or 5 of total results are more relevant. Here comes the problem of spamdexing. It is a method to deceive the search result quality. Falsified metrics such as inserting enormous amount of keywords or links in website may take that website to the top 10 or 5 positions. This paper proposes a classifier based on the Reptree (Regression tree representative. As an initial step Link-based features such as neighbors, pagerank, truncated pagerank, trustrank and assortativity related attributes are inferred. Based on this features, tree is constructed. The tree uses the feature inference to differentiate spam sites from legitimate sites. WEBSPAM-UK-2007 dataset is taken as a base. It is preprocessed and converted into five datasets FEATA, FEATB, FEATC, FEATD and FEATE. Only link based features are taken for experiments. This paper focus on link spam alone. Finally a representative tree is created which will more precisely classify the web spam entries. Results are given. Regression tree classification seems to perform well as shown through experiments.

  8. Stacking machine learning classifiers to identify Higgs bosons at the LHC

    International Nuclear Information System (INIS)

    Alves, A.

    2017-01-01

    Machine learning (ML) algorithms have been employed in the problem of classifying signal and background events with high accuracy in particle physics. In this paper, we compare the performance of a widespread ML technique, namely, stacked generalization , against the results of two state-of-art algorithms: (1) a deep neural network (DNN) in the task of discovering a new neutral Higgs boson and (2) a scalable machine learning system for tree boosting, in the Standard Model Higgs to tau leptons channel, both at the 8 TeV LHC. In a cut-and-count analysis, stacking three algorithms performed around 16% worse than DNN but demanding far less computation efforts, however, the same stacking outperforms boosted decision trees. Using the stacked classifiers in a multivariate statistical analysis (MVA), on the other hand, significantly enhances the statistical significance compared to cut-and-count in both Higgs processes, suggesting that combining an ensemble of simpler and faster ML algorithms with MVA tools is a better approach than building a complex state-of-art algorithm for cut-and-count.

  9. Three data partitioning strategies for building local classifiers (Chapter 14)

    NARCIS (Netherlands)

    Zliobaite, I.; Okun, O.; Valentini, G.; Re, M.

    2011-01-01

    Divide-and-conquer approach has been recognized in multiple classifier systems aiming to utilize local expertise of individual classifiers. In this study we experimentally investigate three strategies for building local classifiers that are based on different routines of sampling data for training.

  10. Robust Framework to Combine Diverse Classifiers Assigning Distributed Confidence to Individual Classifiers at Class Level

    Directory of Open Access Journals (Sweden)

    Shehzad Khalid

    2014-01-01

    Full Text Available We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes.

  11. Consistency Analysis of Nearest Subspace Classifier

    OpenAIRE

    Wang, Yi

    2015-01-01

    The Nearest subspace classifier (NSS) finds an estimation of the underlying subspace within each class and assigns data points to the class that corresponds to its nearest subspace. This paper mainly studies how well NSS can be generalized to new samples. It is proved that NSS is strongly consistent under certain assumptions. For completeness, NSS is evaluated through experiments on various simulated and real data sets, in comparison with some other linear model based classifiers. It is also ...

  12. Assessment of a 44 gene classifier for the evaluation of chronic fatigue syndrome from peripheral blood mononuclear cell gene expression.

    Directory of Open Access Journals (Sweden)

    Daniel Frampton

    Full Text Available Chronic fatigue syndrome (CFS is a clinically defined illness estimated to affect millions of people worldwide causing significant morbidity and an annual cost of billions of dollars. Currently there are no laboratory-based diagnostic methods for CFS. However, differences in gene expression profiles between CFS patients and healthy persons have been reported in the literature. Using mRNA relative quantities for 44 previously identified reporter genes taken from a large dataset comprising both CFS patients and healthy volunteers, we derived a gene profile scoring metric to accurately classify CFS and healthy samples. This metric out-performed any of the reporter genes used individually as a classifier of CFS.To determine whether the reporter genes were robust across populations, we applied this metric to classify a separate blind dataset of mRNA relative quantities from a new population of CFS patients and healthy persons with limited success. Although the metric was able to successfully classify roughly two-thirds of both CFS and healthy samples correctly, the level of misclassification was high. We conclude many of the previously identified reporter genes are study-specific and thus cannot be used as a broad CFS diagnostic.

  13. Sampling And Identifying Of Mould In The Library Building

    Directory of Open Access Journals (Sweden)

    Abdul Wahab Suriani Ngah

    2016-01-01

    Full Text Available Despite the growing concern over mould and fungi infestations on library building, little has been reported in the literature on the development of an objective tool and criteria for measuring and characterising the mould and fungi. In this paper, an objective based approach to mould and fungi growth assessment using various sampling techniques and its identification using microscopic observation are proposed. This study involved three library buildings of Higher Institution Educational in Malaysia for data collection purpose and study of mould growth. The mould sampling of three libraries was collected using Coriolis air sampler, settling plate air sampling using Malt Extract Agar (MEA, IAQ MOLD Alexeter IAQ-Pro Asp/Pen® Test and swab sampling techniques. The IAQ MOLD Alexeter IAQ-Pro Asp/Pen® Test and traditional method technique identified various mould species immediately on the site, and the microscopic observation identifies common types of the mould such as Aspergillus, Penicillium and Stachybotrys’s. The sample size and particular characteristics of each library will result in the mould growth pattern and finding.

  14. How large a training set is needed to develop a classifier for microarray data?

    Science.gov (United States)

    Dobbin, Kevin K; Zhao, Yingdong; Simon, Richard M

    2008-01-01

    A common goal of gene expression microarray studies is the development of a classifier that can be used to divide patients into groups with different prognoses, or with different expected responses to a therapy. These types of classifiers are developed on a training set, which is the set of samples used to train a classifier. The question of how many samples are needed in the training set to produce a good classifier from high-dimensional microarray data is challenging. We present a model-based approach to determining the sample size required to adequately train a classifier. It is shown that sample size can be determined from three quantities: standardized fold change, class prevalence, and number of genes or features on the arrays. Numerous examples and important experimental design issues are discussed. The method is adapted to address ex post facto determination of whether the size of a training set used to develop a classifier was adequate. An interactive web site for performing the sample size calculations is provided. We showed that sample size calculations for classifier development from high-dimensional microarray data are feasible, discussed numerous important considerations, and presented examples.

  15. Identification and optimization of classifier genes from multi-class earthworm microarray dataset.

    Directory of Open Access Journals (Sweden)

    Ying Li

    Full Text Available Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. A variety of toxicological effects have been associated with explosive compounds TNT and RDX. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. We have developed an earthworm microarray containing 15,208 unique oligo probes and have used it to profile gene expression in 248 earthworms exposed to TNT, RDX or neither. We assembled a new machine learning pipeline consisting of several well-established feature filtering/selection and classification techniques to analyze the 248-array dataset in order to construct classifier models that can separate earthworm samples into three groups: control, TNT-treated, and RDX-treated. First, a total of 869 genes differentially expressed in response to TNT or RDX exposure were identified using a univariate statistical algorithm of class comparison. Then, decision tree-based algorithms were applied to select a subset of 354 classifier genes, which were ranked by their overall weight of significance. A multiclass support vector machine (MC-SVM method and an unsupervised K-mean clustering method were applied to independently refine the classifier, producing a smaller subset of 39 and 30 classifier genes, separately, with 11 common genes being potential biomarkers. The combined 58 genes were considered the refined subset and used to build MC-SVM and clustering models with classification accuracy of 83.5% and 56.9%, respectively. This study demonstrates that the machine learning approach can be used to identify and optimize a small subset of classifier/biomarker genes from high dimensional datasets and generate classification models of acceptable precision for multiple classes.

  16. Bayes classifiers for imbalanced traffic accidents datasets.

    Science.gov (United States)

    Mujalli, Randa Oqab; López, Griselda; Garach, Laura

    2016-03-01

    Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under-sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Frog sound identification using extended k-nearest neighbor classifier

    Science.gov (United States)

    Mukahar, Nordiana; Affendi Rosdi, Bakhtiar; Athiar Ramli, Dzati; Jaafar, Haryati

    2017-09-01

    Frog sound identification based on the vocalization becomes important for biological research and environmental monitoring. As a result, different types of feature extractions and classifiers have been employed to evaluate the accuracy of frog sound identification. This paper presents a frog sound identification with Extended k-Nearest Neighbor (EKNN) classifier. The EKNN classifier integrates the nearest neighbors and mutual sharing of neighborhood concepts, with the aims of improving the classification performance. It makes a prediction based on who are the nearest neighbors of the testing sample and who consider the testing sample as their nearest neighbors. In order to evaluate the classification performance in frog sound identification, the EKNN classifier is compared with competing classifier, k -Nearest Neighbor (KNN), Fuzzy k -Nearest Neighbor (FKNN) k - General Nearest Neighbor (KGNN)and Mutual k -Nearest Neighbor (MKNN) on the recorded sounds of 15 frog species obtained in Malaysia forest. The recorded sounds have been segmented using Short Time Energy and Short Time Average Zero Crossing Rate (STE+STAZCR), sinusoidal modeling (SM), manual and the combination of Energy (E) and Zero Crossing Rate (ZCR) (E+ZCR) while the features are extracted by Mel Frequency Cepstrum Coefficient (MFCC). The experimental results have shown that the EKNCN classifier exhibits the best performance in terms of accuracy compared to the competing classifiers, KNN, FKNN, GKNN and MKNN for all cases.

  18. Glaucomatous patterns in Frequency Doubling Technology (FDT) perimetry data identified by unsupervised machine learning classifiers.

    Science.gov (United States)

    Bowd, Christopher; Weinreb, Robert N; Balasubramanian, Madhusudhanan; Lee, Intae; Jang, Giljin; Yousefi, Siamak; Zangwill, Linda M; Medeiros, Felipe A; Girkin, Christopher A; Liebmann, Jeffrey M; Goldbaum, Michael H

    2014-01-01

    The variational Bayesian independent component analysis-mixture model (VIM), an unsupervised machine-learning classifier, was used to automatically separate Matrix Frequency Doubling Technology (FDT) perimetry data into clusters of healthy and glaucomatous eyes, and to identify axes representing statistically independent patterns of defect in the glaucoma clusters. FDT measurements were obtained from 1,190 eyes with normal FDT results and 786 eyes with abnormal FDT results from the UCSD-based Diagnostic Innovations in Glaucoma Study (DIGS) and African Descent and Glaucoma Evaluation Study (ADAGES). For all eyes, VIM input was 52 threshold test points from the 24-2 test pattern, plus age. FDT mean deviation was -1.00 dB (S.D. = 2.80 dB) and -5.57 dB (S.D. = 5.09 dB) in FDT-normal eyes and FDT-abnormal eyes, respectively (p<0.001). VIM identified meaningful clusters of FDT data and positioned a set of statistically independent axes through the mean of each cluster. The optimal VIM model separated the FDT fields into 3 clusters. Cluster N contained primarily normal fields (1109/1190, specificity 93.1%) and clusters G1 and G2 combined, contained primarily abnormal fields (651/786, sensitivity 82.8%). For clusters G1 and G2 the optimal number of axes were 2 and 5, respectively. Patterns automatically generated along axes within the glaucoma clusters were similar to those known to be indicative of glaucoma. Fields located farther from the normal mean on each glaucoma axis showed increasing field defect severity. VIM successfully separated FDT fields from healthy and glaucoma eyes without a priori information about class membership, and identified familiar glaucomatous patterns of loss.

  19. A deep-learning classifier identifies patients with clinical heart failure using whole-slide images of H&E tissue.

    Directory of Open Access Journals (Sweden)

    Jeffrey J Nirschl

    Full Text Available Over 26 million people worldwide suffer from heart failure annually. When the cause of heart failure cannot be identified, endomyocardial biopsy (EMB represents the gold-standard for the evaluation of disease. However, manual EMB interpretation has high inter-rater variability. Deep convolutional neural networks (CNNs have been successfully applied to detect cancer, diabetic retinopathy, and dermatologic lesions from images. In this study, we develop a CNN classifier to detect clinical heart failure from H&E stained whole-slide images from a total of 209 patients, 104 patients were used for training and the remaining 105 patients for independent testing. The CNN was able to identify patients with heart failure or severe pathology with a 99% sensitivity and 94% specificity on the test set, outperforming conventional feature-engineering approaches. Importantly, the CNN outperformed two expert pathologists by nearly 20%. Our results suggest that deep learning analytics of EMB can be used to predict cardiac outcome.

  20. Detecting and classifying method based on similarity matching of Android malware behavior with profile.

    Science.gov (United States)

    Jang, Jae-Wook; Yun, Jaesung; Mohaisen, Aziz; Woo, Jiyoung; Kim, Huy Kang

    2016-01-01

    Mass-market mobile security threats have increased recently due to the growth of mobile technologies and the popularity of mobile devices. Accordingly, techniques have been introduced for identifying, classifying, and defending against mobile threats utilizing static, dynamic, on-device, and off-device techniques. Static techniques are easy to evade, while dynamic techniques are expensive. On-device techniques are evasion, while off-device techniques need being always online. To address some of those shortcomings, we introduce Andro-profiler, a hybrid behavior based analysis and classification system for mobile malware. Andro-profiler main goals are efficiency, scalability, and accuracy. For that, Andro-profiler classifies malware by exploiting the behavior profiling extracted from the integrated system logs including system calls. Andro-profiler executes a malicious application on an emulator in order to generate the integrated system logs, and creates human-readable behavior profiles by analyzing the integrated system logs. By comparing the behavior profile of malicious application with representative behavior profile for each malware family using a weighted similarity matching technique, Andro-profiler detects and classifies it into malware families. The experiment results demonstrate that Andro-profiler is scalable, performs well in detecting and classifying malware with accuracy greater than 98 %, outperforms the existing state-of-the-art work, and is capable of identifying 0-day mobile malware samples.

  1. Biased sampling, over-identified parameter problems and beyond

    CERN Document Server

    Qin, Jing

    2017-01-01

    This book is devoted to biased sampling problems (also called choice-based sampling in Econometrics parlance) and over-identified parameter estimation problems. Biased sampling problems appear in many areas of research, including Medicine, Epidemiology and Public Health, the Social Sciences and Economics. The book addresses a range of important topics, including case and control studies, causal inference, missing data problems, meta-analysis, renewal process and length biased sampling problems, capture and recapture problems, case cohort studies, exponential tilting genetic mixture models etc. The goal of this book is to make it easier for Ph. D students and new researchers to get started in this research area. It will be of interest to all those who work in the health, biological, social and physical sciences, as well as those who are interested in survey methodology and other areas of statistical science, among others. .

  2. RRHGE: A Novel Approach to Classify the Estrogen Receptor Based Breast Cancer Subtypes

    Directory of Open Access Journals (Sweden)

    Ashish Saini

    2014-01-01

    Full Text Available Background. Breast cancer is the most common type of cancer among females with a high mortality rate. It is essential to classify the estrogen receptor based breast cancer subtypes into correct subclasses, so that the right treatments can be applied to lower the mortality rate. Using gene signatures derived from gene interaction networks to classify breast cancers has proven to be more reproducible and can achieve higher classification performance. However, the interactions in the gene interaction network usually contain many false-positive interactions that do not have any biological meanings. Therefore, it is a challenge to incorporate the reliability assessment of interactions when deriving gene signatures from gene interaction networks. How to effectively extract gene signatures from available resources is critical to the success of cancer classification. Methods. We propose a novel method to measure and extract the reliable (biologically true or valid interactions from gene interaction networks and incorporate the extracted reliable gene interactions into our proposed RRHGE algorithm to identify significant gene signatures from microarray gene expression data for classifying ER+ and ER− breast cancer samples. Results. The evaluation on real breast cancer samples showed that our RRHGE algorithm achieved higher classification accuracy than the existing approaches.

  3. Pixel Classification of SAR ice images using ANFIS-PSO Classifier

    Directory of Open Access Journals (Sweden)

    G. Vasumathi

    2016-12-01

    Full Text Available Synthetic Aperture Radar (SAR is playing a vital role in taking extremely high resolution radar images. It is greatly used to monitor the ice covered ocean regions. Sea monitoring is important for various purposes which includes global climate systems and ship navigation. Classification on the ice infested area gives important features which will be further useful for various monitoring process around the ice regions. Main objective of this paper is to classify the SAR ice image that helps in identifying the regions around the ice infested areas. In this paper three stages are considered in classification of SAR ice images. It starts with preprocessing in which the speckled SAR ice images are denoised using various speckle removal filters; comparison is made on all these filters to find the best filter in speckle removal. Second stage includes segmentation in which different regions are segmented using K-means and watershed segmentation algorithms; comparison is made between these two algorithms to find the best in segmenting SAR ice images. The last stage includes pixel based classification which identifies and classifies the segmented regions using various supervised learning classifiers. The algorithms includes Back propagation neural networks (BPN, Fuzzy Classifier, Adaptive Neuro Fuzzy Inference Classifier (ANFIS classifier and proposed ANFIS with Particle Swarm Optimization (PSO classifier; comparison is made on all these classifiers to propose which classifier is best suitable for classifying the SAR ice image. Various evaluation metrics are performed separately at all these three stages.

  4. The decision tree classifier - Design and potential. [for Landsat-1 data

    Science.gov (United States)

    Hauska, H.; Swain, P. H.

    1975-01-01

    A new classifier has been developed for the computerized analysis of remote sensor data. The decision tree classifier is essentially a maximum likelihood classifier using multistage decision logic. It is characterized by the fact that an unknown sample can be classified into a class using one or several decision functions in a successive manner. The classifier is applied to the analysis of data sensed by Landsat-1 over Kenosha Pass, Colorado. The classifier is illustrated by a tree diagram which for processing purposes is encoded as a string of symbols such that there is a unique one-to-one relationship between string and decision tree.

  5. Classifying Radio Galaxies with the Convolutional Neural Network

    Energy Technology Data Exchange (ETDEWEB)

    Aniyan, A. K.; Thorat, K. [Department of Physics and Electronics, Rhodes University, Grahamstown (South Africa)

    2017-06-01

    We present the application of a deep machine learning technique to classify radio images of extended sources on a morphological basis using convolutional neural networks (CNN). In this study, we have taken the case of the Fanaroff–Riley (FR) class of radio galaxies as well as radio galaxies with bent-tailed morphology. We have used archival data from the Very Large Array (VLA)—Faint Images of the Radio Sky at Twenty Centimeters survey and existing visually classified samples available in the literature to train a neural network for morphological classification of these categories of radio sources. Our training sample size for each of these categories is ∼200 sources, which has been augmented by rotated versions of the same. Our study shows that CNNs can classify images of the FRI and FRII and bent-tailed radio galaxies with high accuracy (maximum precision at 95%) using well-defined samples and a “fusion classifier,” which combines the results of binary classifications, while allowing for a mechanism to find sources with unusual morphologies. The individual precision is highest for bent-tailed radio galaxies at 95% and is 91% and 75% for the FRI and FRII classes, respectively, whereas the recall is highest for FRI and FRIIs at 91% each, while the bent-tailed class has a recall of 79%. These results show that our results are comparable to that of manual classification, while being much faster. Finally, we discuss the computational and data-related challenges associated with the morphological classification of radio galaxies with CNNs.

  6. Classifying Radio Galaxies with the Convolutional Neural Network

    Science.gov (United States)

    Aniyan, A. K.; Thorat, K.

    2017-06-01

    We present the application of a deep machine learning technique to classify radio images of extended sources on a morphological basis using convolutional neural networks (CNN). In this study, we have taken the case of the Fanaroff-Riley (FR) class of radio galaxies as well as radio galaxies with bent-tailed morphology. We have used archival data from the Very Large Array (VLA)—Faint Images of the Radio Sky at Twenty Centimeters survey and existing visually classified samples available in the literature to train a neural network for morphological classification of these categories of radio sources. Our training sample size for each of these categories is ˜200 sources, which has been augmented by rotated versions of the same. Our study shows that CNNs can classify images of the FRI and FRII and bent-tailed radio galaxies with high accuracy (maximum precision at 95%) using well-defined samples and a “fusion classifier,” which combines the results of binary classifications, while allowing for a mechanism to find sources with unusual morphologies. The individual precision is highest for bent-tailed radio galaxies at 95% and is 91% and 75% for the FRI and FRII classes, respectively, whereas the recall is highest for FRI and FRIIs at 91% each, while the bent-tailed class has a recall of 79%. These results show that our results are comparable to that of manual classification, while being much faster. Finally, we discuss the computational and data-related challenges associated with the morphological classification of radio galaxies with CNNs.

  7. Classifying Radio Galaxies with the Convolutional Neural Network

    International Nuclear Information System (INIS)

    Aniyan, A. K.; Thorat, K.

    2017-01-01

    We present the application of a deep machine learning technique to classify radio images of extended sources on a morphological basis using convolutional neural networks (CNN). In this study, we have taken the case of the Fanaroff–Riley (FR) class of radio galaxies as well as radio galaxies with bent-tailed morphology. We have used archival data from the Very Large Array (VLA)—Faint Images of the Radio Sky at Twenty Centimeters survey and existing visually classified samples available in the literature to train a neural network for morphological classification of these categories of radio sources. Our training sample size for each of these categories is ∼200 sources, which has been augmented by rotated versions of the same. Our study shows that CNNs can classify images of the FRI and FRII and bent-tailed radio galaxies with high accuracy (maximum precision at 95%) using well-defined samples and a “fusion classifier,” which combines the results of binary classifications, while allowing for a mechanism to find sources with unusual morphologies. The individual precision is highest for bent-tailed radio galaxies at 95% and is 91% and 75% for the FRI and FRII classes, respectively, whereas the recall is highest for FRI and FRIIs at 91% each, while the bent-tailed class has a recall of 79%. These results show that our results are comparable to that of manual classification, while being much faster. Finally, we discuss the computational and data-related challenges associated with the morphological classification of radio galaxies with CNNs.

  8. Classifying Microorganisms

    DEFF Research Database (Denmark)

    Sommerlund, Julie

    2006-01-01

    This paper describes the coexistence of two systems for classifying organisms and species: a dominant genetic system and an older naturalist system. The former classifies species and traces their evolution on the basis of genetic characteristics, while the latter employs physiological characteris......This paper describes the coexistence of two systems for classifying organisms and species: a dominant genetic system and an older naturalist system. The former classifies species and traces their evolution on the basis of genetic characteristics, while the latter employs physiological...... characteristics. The coexistence of the classification systems does not lead to a conflict between them. Rather, the systems seem to co-exist in different configurations, through which they are complementary, contradictory and inclusive in different situations-sometimes simultaneously. The systems come...

  9. Effective Sequential Classifier Training for SVM-Based Multitemporal Remote Sensing Image Classification

    Science.gov (United States)

    Guo, Yiqing; Jia, Xiuping; Paull, David

    2018-06-01

    The explosive availability of remote sensing images has challenged supervised classification algorithms such as Support Vector Machines (SVM), as training samples tend to be highly limited due to the expensive and laborious task of ground truthing. The temporal correlation and spectral similarity between multitemporal images have opened up an opportunity to alleviate this problem. In this study, a SVM-based Sequential Classifier Training (SCT-SVM) approach is proposed for multitemporal remote sensing image classification. The approach leverages the classifiers of previous images to reduce the required number of training samples for the classifier training of an incoming image. For each incoming image, a rough classifier is firstly predicted based on the temporal trend of a set of previous classifiers. The predicted classifier is then fine-tuned into a more accurate position with current training samples. This approach can be applied progressively to sequential image data, with only a small number of training samples being required from each image. Experiments were conducted with Sentinel-2A multitemporal data over an agricultural area in Australia. Results showed that the proposed SCT-SVM achieved better classification accuracies compared with two state-of-the-art model transfer algorithms. When training data are insufficient, the overall classification accuracy of the incoming image was improved from 76.18% to 94.02% with the proposed SCT-SVM, compared with those obtained without the assistance from previous images. These results demonstrate that the leverage of a priori information from previous images can provide advantageous assistance for later images in multitemporal image classification.

  10. Stack filter classifiers

    Energy Technology Data Exchange (ETDEWEB)

    Porter, Reid B [Los Alamos National Laboratory; Hush, Don [Los Alamos National Laboratory

    2009-01-01

    Just as linear models generalize the sample mean and weighted average, weighted order statistic models generalize the sample median and weighted median. This analogy can be continued informally to generalized additive modeels in the case of the mean, and Stack Filters in the case of the median. Both of these model classes have been extensively studied for signal and image processing but it is surprising to find that for pattern classification, their treatment has been significantly one sided. Generalized additive models are now a major tool in pattern classification and many different learning algorithms have been developed to fit model parameters to finite data. However Stack Filters remain largely confined to signal and image processing and learning algorithms for classification are yet to be seen. This paper is a step towards Stack Filter Classifiers and it shows that the approach is interesting from both a theoretical and a practical perspective.

  11. Nonparametric, Coupled ,Bayesian ,Dictionary ,and Classifier Learning for Hyperspectral Classification.

    Science.gov (United States)

    Akhtar, Naveed; Mian, Ajmal

    2017-10-03

    We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.

  12. Implications of physical symmetries in adaptive image classifiers

    DEFF Research Database (Denmark)

    Sams, Thomas; Hansen, Jonas Lundbek

    2000-01-01

    It is demonstrated that rotational invariance and reflection symmetry of image classifiers lead to a reduction in the number of free parameters in the classifier. When used in adaptive detectors, e.g. neural networks, this may be used to decrease the number of training samples necessary to learn...... a given classification task, or to improve generalization of the neural network. Notably, the symmetrization of the detector does not compromise the ability to distinguish objects that break the symmetry. (C) 2000 Elsevier Science Ltd. All rights reserved....

  13. Identifying PTSD personality subtypes in a workplace trauma sample.

    Science.gov (United States)

    Sellbom, Martin; Bagby, R Michael

    2009-10-01

    The authors sought to identify personality clusters derived from the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) Personality Psychopathology Five Scales in a sample of workplace claimants with posttraumatic stress disorder (PTSD). Three clusters--low pathology, internalizing, and externalizing were recovered similar to those obtained by M. W. Miller and colleagues (2003, 2004, 2007) in samples of combat veterans and sexual assault victims. Internalizers and externalizers scored comparably on measures of PTSD symptom severity, general distress, and negative affect. Internalizers were uniquely characterized by anhedonia and depressed mood; externalizers by antisocial behavior, substance abuse, and anger/aggression.

  14. An ensemble of dissimilarity based classifiers for Mackerel gender determination

    Science.gov (United States)

    Blanco, A.; Rodriguez, R.; Martinez-Maranon, I.

    2014-03-01

    Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity.

  15. An ensemble of dissimilarity based classifiers for Mackerel gender determination

    International Nuclear Information System (INIS)

    Blanco, A; Rodriguez, R; Martinez-Maranon, I

    2014-01-01

    Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity

  16. An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data

    Directory of Open Access Journals (Sweden)

    Datta Susmita

    2010-08-01

    Full Text Available Abstract Background Generally speaking, different classifiers tend to work well for certain types of data and conversely, it is usually not known a priori which algorithm will be optimal in any given classification application. In addition, for most classification problems, selecting the best performing classification algorithm amongst a number of competing algorithms is a difficult task for various reasons. As for example, the order of performance may depend on the performance measure employed for such a comparison. In this work, we present a novel adaptive ensemble classifier constructed by combining bagging and rank aggregation that is capable of adaptively changing its performance depending on the type of data that is being classified. The attractive feature of the proposed classifier is its multi-objective nature where the classification results can be simultaneously optimized with respect to several performance measures, for example, accuracy, sensitivity and specificity. We also show that our somewhat complex strategy has better predictive performance as judged on test samples than a more naive approach that attempts to directly identify the optimal classifier based on the training data performances of the individual classifiers. Results We illustrate the proposed method with two simulated and two real-data examples. In all cases, the ensemble classifier performs at the level of the best individual classifier comprising the ensemble or better. Conclusions For complex high-dimensional datasets resulting from present day high-throughput experiments, it may be wise to consider a number of classification algorithms combined with dimension reduction techniques rather than a fixed standard algorithm set a priori.

  17. An Improvement To The k-Nearest Neighbor Classifier For ECG Database

    Science.gov (United States)

    Jaafar, Haryati; Hidayah Ramli, Nur; Nasir, Aimi Salihah Abdul

    2018-03-01

    The k nearest neighbor (kNN) is a non-parametric classifier and has been widely used for pattern classification. However, in practice, the performance of kNN often tends to fail due to the lack of information on how the samples are distributed among them. Moreover, kNN is no longer optimal when the training samples are limited. Another problem observed in kNN is regarding the weighting issues in assigning the class label before classification. Thus, to solve these limitations, a new classifier called Mahalanobis fuzzy k-nearest centroid neighbor (MFkNCN) is proposed in this study. Here, a Mahalanobis distance is applied to avoid the imbalance of samples distribition. Then, a surrounding rule is employed to obtain the nearest centroid neighbor based on the distributions of training samples and its distance to the query point. Consequently, the fuzzy membership function is employed to assign the query point to the class label which is frequently represented by the nearest centroid neighbor Experimental studies from electrocardiogram (ECG) signal is applied in this study. The classification performances are evaluated in two experimental steps i.e. different values of k and different sizes of feature dimensions. Subsequently, a comparative study of kNN, kNCN, FkNN and MFkCNN classifier is conducted to evaluate the performances of the proposed classifier. The results show that the performance of MFkNCN consistently exceeds the kNN, kNCN and FkNN with the best classification rates of 96.5%.

  18. An Investigation to Improve Classifier Accuracy for Myo Collected Data

    Science.gov (United States)

    2017-02-01

    Bad Samples Effect on Classification Accuracy 7 5.1 Naïve Bayes (NB) Classifier Accuracy 7 5.2 Logistic Model Tree (LMT) 10 5.3 K-Nearest Neighbor...gesture, pitch feature, user 06. All samples exhibit reversed movement...20 Fig. A-2 Come gesture, pitch feature, user 14. All samples exhibit reversed movement

  19. Development and analysis of a low-cost screening tool to identify and classify hearing loss in children: a proposal for developing countries

    Directory of Open Access Journals (Sweden)

    Alessandra Giannella Samelli

    2011-01-01

    Full Text Available OBJECTIVE: A lack of attention has been given to hearing health in primary care in developing countries. A strategy involving low-cost screening tools may fill the current gap in hearing health care provided to children. Therefore, it is necessary to establish and adopt lower-cost procedures that are accessible to underserved areas that lack other physical or human resources that would enable the identification of groups at risk for hearing loss. The aim of this study was to develop and analyze the efficacy of a low-cost screening tool to identify and classify hearing loss in children. METHODS: A total of 214 2-to-10 year-old children participated in this study. The study was conducted by providing a questionnaire to the parents and comparing the answers with the results of a complete audiological assessment. Receiver operating characteristic (ROC curves were constructed, and discriminant analysis techniques were used to classify each child based on the total score. RESULTS: We found conductive hearing loss in 39.3% of children, sensorineural hearing loss in 7.4% and normal hearing in 53.3%. The discriminant analysis technique provided the following classification rule for the total score on the questionnaire: 0 to 4 points - normal hearing; 5 to 7 points - conductive hearing loss; over 7 points - sensorineural hearing loss. CONCLUSION: Our results suggest that the questionnaire could be used as a screening tool to classify children with normal hearing or hearing loss and according to the type of hearing loss based on the total questionnaire score

  20. Verification of classified fissile material using unclassified attributes

    International Nuclear Information System (INIS)

    Nicholas, N.J.; Fearey, B.L.; Puckett, J.M.; Tape, J.W.

    1998-01-01

    This paper reports on the most recent efforts of US technical experts to explore verification by IAEA of unclassified attributes of classified excess fissile material. Two propositions are discussed: (1) that multiple unclassified attributes could be declared by the host nation and then verified (and reverified) by the IAEA in order to provide confidence in that declaration of a classified (or unclassified) inventory while protecting classified or sensitive information; and (2) that attributes could be measured, remeasured, or monitored to provide continuity of knowledge in a nonintrusive and unclassified manner. They believe attributes should relate to characteristics of excess weapons materials and should be verifiable and authenticatable with methods usable by IAEA inspectors. Further, attributes (along with the methods to measure them) must not reveal any classified information. The approach that the authors have taken is as follows: (1) assume certain attributes of classified excess material, (2) identify passive signatures, (3) determine range of applicable measurement physics, (4) develop a set of criteria to assess and select measurement technologies, (5) select existing instrumentation for proof-of-principle measurements and demonstration, and (6) develop and design information barriers to protect classified information. While the attribute verification concepts and measurements discussed in this paper appear promising, neither the attribute verification approach nor the measurement technologies have been fully developed, tested, and evaluated

  1. A systems biology-based classifier for hepatocellular carcinoma diagnosis.

    Directory of Open Access Journals (Sweden)

    Yanqiong Zhang

    Full Text Available AIM: The diagnosis of hepatocellular carcinoma (HCC in the early stage is crucial to the application of curative treatments which are the only hope for increasing the life expectancy of patients. Recently, several large-scale studies have shed light on this problem through analysis of gene expression profiles to identify markers correlated with HCC progression. However, those marker sets shared few genes in common and were poorly validated using independent data. Therefore, we developed a systems biology based classifier by combining the differential gene expression with topological features of human protein interaction networks to enhance the ability of HCC diagnosis. METHODS AND RESULTS: In the Oncomine platform, genes differentially expressed in HCC tissues relative to their corresponding normal tissues were filtered by a corrected Q value cut-off and Concept filters. The identified genes that are common to different microarray datasets were chosen as the candidate markers. Then, their networks were analyzed by GeneGO Meta-Core software and the hub genes were chosen. After that, an HCC diagnostic classifier was constructed by Partial Least Squares modeling based on the microarray gene expression data of the hub genes. Validations of diagnostic performance showed that this classifier had high predictive accuracy (85.88∼92.71% and area under ROC curve (approximating 1.0, and that the network topological features integrated into this classifier contribute greatly to improving the predictive performance. Furthermore, it has been demonstrated that this modeling strategy is not only applicable to HCC, but also to other cancers. CONCLUSION: Our analysis suggests that the systems biology-based classifier that combines the differential gene expression and topological features of human protein interaction network may enhance the diagnostic performance of HCC classifier.

  2. Just-in-time adaptive classifiers-part II: designing the classifier.

    Science.gov (United States)

    Alippi, Cesare; Roveri, Manuel

    2008-12-01

    Aging effects, environmental changes, thermal drifts, and soft and hard faults affect physical systems by changing their nature and behavior over time. To cope with a process evolution adaptive solutions must be envisaged to track its dynamics; in this direction, adaptive classifiers are generally designed by assuming the stationary hypothesis for the process generating the data with very few results addressing nonstationary environments. This paper proposes a methodology based on k-nearest neighbor (NN) classifiers for designing adaptive classification systems able to react to changing conditions just-in-time (JIT), i.e., exactly when it is needed. k-NN classifiers have been selected for their computational-free training phase, the possibility to easily estimate the model complexity k and keep under control the computational complexity of the classifier through suitable data reduction mechanisms. A JIT classifier requires a temporal detection of a (possible) process deviation (aspect tackled in a companion paper) followed by an adaptive management of the knowledge base (KB) of the classifier to cope with the process change. The novelty of the proposed approach resides in the general framework supporting the real-time update of the KB of the classification system in response to novel information coming from the process both in stationary conditions (accuracy improvement) and in nonstationary ones (process tracking) and in providing a suitable estimate of k. It is shown that the classification system grants consistency once the change targets the process generating the data in a new stationary state, as it is the case in many real applications.

  3. The IGSN Experience: Successes and Challenges of Implementing Persistent Identifiers for Samples

    Science.gov (United States)

    Lehnert, Kerstin; Arko, Robert

    2016-04-01

    Physical samples collected and studied in the Earth sciences represent both a research resource and a research product in the Earth Sciences. As such they need to be properly managed, curated, documented, and cited to ensure re-usability and utility for future science, reproducibility of the data generated by their study, and credit for funding agencies and researchers who invested substantial resources and intellectual effort into their collection and curation. Use of persistent and unique identifiers and deposition of metadata in a persistent registry are therefore as important for physical samples as they are for digital data. The International Geo Sample Number (IGSN) is a persistent, globally unique identifier. Its adoption by individual investigators, repository curators, publishers, and data managers is rapidly growing world-wide. This presentation will provide an analysis of the development and implementation path of the IGSN and relevant insights and experiences gained along its way. Development of the IGSN started in 2004 as part of a US NSF-funded project to establish a registry for sample metadata, the System for Earth Sample Registration (SESAR). The initial system provided a centralized solution for users to submit information about their samples and obtain IGSNs and bar codes. Challenges encountered during this initial phase related to defining the scope of the registry, granularity of registered objects, responsibilities of relevant actors, and workflows, and designing the registry's metadata schema, its user interfaces, and the identifier itself, including its syntax. The most challenging task though was to make the IGSN an integral part of personal and institutional sample management, digital management of sample-based data, and data publication on a global scale. Besides convincing individual researchers, curators, editors and publishers, as well as data managers in US and non-US academia, state and federal agencies, the PIs of the SESAR project

  4. The use of hyperspectral data for tree species discrimination: Combining binary classifiers

    CSIR Research Space (South Africa)

    Dastile, X

    2010-11-01

    Full Text Available classifier Classification system 7 class 1 class 2 new sample For 5-nearest neighbour classification: assign new sample to class 1. RU SASA 2010 ? Given learning task {(x1,t1),(x 2,t2),?,(x p,tp)} (xi ? Rn feature vectors, ti ? {?1,?, ?c...). A review on the combination of binary classifiers in multiclass problems. Springer science and Business Media B.V [7] Dietterich T.G and Bakiri G.(1995). Solving Multiclass Learning Problem via Error-Correcting Output Codes. AI Access Foundation...

  5. A radial basis classifier for the automatic detection of aspiration in children with dysphagia

    Directory of Open Access Journals (Sweden)

    Blain Stefanie

    2006-07-01

    Full Text Available Abstract Background Silent aspiration or the inhalation of foodstuffs without overt physiological signs presents a serious health issue for children with dysphagia. To date, there are no reliable means of detecting aspiration in the home or community. An assistive technology that performs in these environments could inform caregivers of adverse events and potentially reduce the morbidity and anxiety of the feeding experience for the child and caregiver, respectively. This paper proposes a classifier for automatic classification of aspiration and swallow vibration signals non-invasively recorded on the neck of children with dysphagia. Methods Vibration signals associated with safe swallows and aspirations, both identified via videofluoroscopy, were collected from over 100 children with neurologically-based dysphagia using a single-axis accelerometer. Five potentially discriminatory mathematical features were extracted from the accelerometry signals. All possible combinations of the five features were investigated in the design of radial basis function classifiers. Performance of different classifiers was compared and the best feature sets were identified. Results Optimal feature combinations for two, three and four features resulted in statistically comparable adjusted accuracies with a radial basis classifier. In particular, the feature pairing of dispersion ratio and normality achieved an adjusted accuracy of 79.8 ± 7.3%, a sensitivity of 79.4 ± 11.7% and specificity of 80.3 ± 12.8% for aspiration detection. Addition of a third feature, namely energy, increased adjusted accuracy to 81.3 ± 8.5% but the change was not statistically significant. A closer look at normality and dispersion ratio features suggest leptokurticity and the frequency and magnitude of atypical values as distinguishing characteristics between swallows and aspirations. The achieved accuracies are 30% higher than those reported for bedside cervical auscultation. Conclusion

  6. A novel statistical method for classifying habitat generalists and specialists

    DEFF Research Database (Denmark)

    Chazdon, Robin L; Chao, Anne; Colwell, Robert K

    2011-01-01

    in second-growth (SG) and old-growth (OG) rain forests in the Caribbean lowlands of northeastern Costa Rica. We evaluate the multinomial model in detail for the tree data set. Our results for birds were highly concordant with a previous nonstatistical classification, but our method classified a higher......: (1) generalist; (2) habitat A specialist; (3) habitat B specialist; and (4) too rare to classify with confidence. We illustrate our multinomial classification method using two contrasting data sets: (1) bird abundance in woodland and heath habitats in southeastern Australia and (2) tree abundance...... fraction (57.7%) of bird species with statistical confidence. Based on a conservative specialization threshold and adjustment for multiple comparisons, 64.4% of tree species in the full sample were too rare to classify with confidence. Among the species classified, OG specialists constituted the largest...

  7. Small Body GN and C Research Report: G-SAMPLE - An In-Flight Dynamical Method for Identifying Sample Mass [External Release Version

    Science.gov (United States)

    Carson, John M., III; Bayard, David S.

    2006-01-01

    G-SAMPLE is an in-flight dynamical method for use by sample collection missions to identify the presence and quantity of collected sample material. The G-SAMPLE method implements a maximum-likelihood estimator to identify the collected sample mass, based on onboard force sensor measurements, thruster firings, and a dynamics model of the spacecraft. With G-SAMPLE, sample mass identification becomes a computation rather than an extra hardware requirement; the added cost of cameras or other sensors for sample mass detection is avoided. Realistic simulation examples are provided for a spacecraft configuration with a sample collection device mounted on the end of an extended boom. In one representative example, a 1000 gram sample mass is estimated to within 110 grams (95% confidence) under realistic assumptions of thruster profile error, spacecraft parameter uncertainty, and sensor noise. For convenience to future mission design, an overall sample-mass estimation error budget is developed to approximate the effect of model uncertainty, sensor noise, data rate, and thrust profile error on the expected estimate of collected sample mass.

  8. Hybrid classifiers methods of data, knowledge, and classifier combination

    CERN Document Server

    Wozniak, Michal

    2014-01-01

    This book delivers a definite and compact knowledge on how hybridization can help improving the quality of computer classification systems. In order to make readers clearly realize the knowledge of hybridization, this book primarily focuses on introducing the different levels of hybridization and illuminating what problems we will face with as dealing with such projects. In the first instance the data and knowledge incorporated in hybridization were the action points, and then a still growing up area of classifier systems known as combined classifiers was considered. This book comprises the aforementioned state-of-the-art topics and the latest research results of the author and his team from Department of Systems and Computer Networks, Wroclaw University of Technology, including as classifier based on feature space splitting, one-class classification, imbalance data, and data stream classification.

  9. A Consistent System for Coding Laboratory Samples

    Science.gov (United States)

    Sih, John C.

    1996-07-01

    A formal laboratory coding system is presented to keep track of laboratory samples. Preliminary useful information regarding the sample (origin and history) is gained without consulting a research notebook. Since this system uses and retains the same research notebook page number for each new experiment (reaction), finding and distinguishing products (samples) of the same or different reactions becomes an easy task. Using this system multiple products generated from a single reaction can be identified and classified in a uniform fashion. Samples can be stored and filed according to stage and degree of purification, e.g. crude reaction mixtures, recrystallized samples, chromatographed or distilled products.

  10. Current Directional Protection of Series Compensated Line Using Intelligent Classifier

    Directory of Open Access Journals (Sweden)

    M. Mollanezhad Heydarabadi

    2016-12-01

    Full Text Available Current inversion condition leads to incorrect operation of current based directional relay in power system with series compensated device. Application of the intelligent system for fault direction classification has been suggested in this paper. A new current directional protection scheme based on intelligent classifier is proposed for the series compensated line. The proposed classifier uses only half cycle of pre-fault and post fault current samples at relay location to feed the classifier. A lot of forward and backward fault simulations under different system conditions upon a transmission line with a fixed series capacitor are carried out using PSCAD/EMTDC software. The applicability of decision tree (DT, probabilistic neural network (PNN and support vector machine (SVM are investigated using simulated data under different system conditions. The performance comparison of the classifiers indicates that the SVM is a best suitable classifier for fault direction discriminating. The backward faults can be accurately distinguished from forward faults even under current inversion without require to detect of the current inversion condition.

  11. Glycosyltransferase Gene Expression Profiles Classify Cancer Types and Propose Prognostic Subtypes

    Science.gov (United States)

    Ashkani, Jahanshah; Naidoo, Kevin J.

    2016-05-01

    Aberrant glycosylation in tumours stem from altered glycosyltransferase (GT) gene expression but can the expression profiles of these signature genes be used to classify cancer types and lead to cancer subtype discovery? The differential structural changes to cellular glycan structures are predominantly regulated by the expression patterns of GT genes and are a hallmark of neoplastic cell metamorphoses. We found that the expression of 210 GT genes taken from 1893 cancer patient samples in The Cancer Genome Atlas (TCGA) microarray data are able to classify six cancers; breast, ovarian, glioblastoma, kidney, colon and lung. The GT gene expression profiles are used to develop cancer classifiers and propose subtypes. The subclassification of breast cancer solid tumour samples illustrates the discovery of subgroups from GT genes that match well against basal-like and HER2-enriched subtypes and correlates to clinical, mutation and survival data. This cancer type glycosyltransferase gene signature finding provides foundational evidence for the centrality of glycosylation in cancer.

  12. A naïve Bayes classifier for planning transfusion requirements in heart surgery.

    Science.gov (United States)

    Cevenini, Gabriele; Barbini, Emanuela; Massai, Maria R; Barbini, Paolo

    2013-02-01

    Transfusion of allogeneic blood products is a key issue in cardiac surgery. Although blood conservation and standard transfusion guidelines have been published by different medical groups, actual transfusion practices after cardiac surgery vary widely among institutions. Models can be a useful support for decision making and may reduce the total cost of care. The objective of this study was to propose and evaluate a procedure to develop a simple locally customized decision-support system. We analysed 3182 consecutive patients undergoing cardiac surgery at the University Hospital of Siena, Italy. Univariate statistical tests were performed to identify a set of preoperative and intraoperative variables as likely independent features for planning transfusion quantities. These features were utilized to design a naïve Bayes classifier. Model performance was evaluated using the leave-one-out cross-validation approach. All computations were done using spss and matlab code. The overall correct classification percentage was not particularly high if several classes of patients were to be identified. Model performance improved appreciably when the patient sample was divided into two classes (transfused and non-transfused patients). In this case the naïve Bayes model correctly classified about three quarters of patients with 71.2% sensitivity and 78.4% specificity, thus providing useful information for recognizing patients with transfusion requirements in the specific scenario considered. Although the classifier is customized to a particular setting and cannot be generalized to other scenarios, the simplicity of its development and the results obtained make it a promising approach for designing a simple model for different heart surgery centres needing a customized decision-support system for planning transfusion requirements in intensive care unit. © 2011 Blackwell Publishing Ltd.

  13. Evaluation of Classifier Performance for Multiclass Phenotype Discrimination in Untargeted Metabolomics.

    Science.gov (United States)

    Trainor, Patrick J; DeFilippis, Andrew P; Rai, Shesh N

    2017-06-21

    Statistical classification is a critical component of utilizing metabolomics data for examining the molecular determinants of phenotypes. Despite this, a comprehensive and rigorous evaluation of the accuracy of classification techniques for phenotype discrimination given metabolomics data has not been conducted. We conducted such an evaluation using both simulated and real metabolomics datasets, comparing Partial Least Squares-Discriminant Analysis (PLS-DA), Sparse PLS-DA, Random Forests, Support Vector Machines (SVM), Artificial Neural Network, k -Nearest Neighbors ( k -NN), and Naïve Bayes classification techniques for discrimination. We evaluated the techniques on simulated data generated to mimic global untargeted metabolomics data by incorporating realistic block-wise correlation and partial correlation structures for mimicking the correlations and metabolite clustering generated by biological processes. Over the simulation studies, covariance structures, means, and effect sizes were stochastically varied to provide consistent estimates of classifier performance over a wide range of possible scenarios. The effects of the presence of non-normal error distributions, the introduction of biological and technical outliers, unbalanced phenotype allocation, missing values due to abundances below a limit of detection, and the effect of prior-significance filtering (dimension reduction) were evaluated via simulation. In each simulation, classifier parameters, such as the number of hidden nodes in a Neural Network, were optimized by cross-validation to minimize the probability of detecting spurious results due to poorly tuned classifiers. Classifier performance was then evaluated using real metabolomics datasets of varying sample medium, sample size, and experimental design. We report that in the most realistic simulation studies that incorporated non-normal error distributions, unbalanced phenotype allocation, outliers, missing values, and dimension reduction

  14. Deep sequencing identifies ethnicity-specific bacterial signatures in the oral microbiome.

    Directory of Open Access Journals (Sweden)

    Matthew R Mason

    Full Text Available Oral infections have a strong ethnic predilection; suggesting that ethnicity is a critical determinant of oral microbial colonization. Dental plaque and saliva samples from 192 subjects belonging to four major ethnicities in the United States were analyzed using terminal restriction fragment length polymorphism (t-RFLP and 16S pyrosequencing. Ethnicity-specific clustering of microbial communities was apparent in saliva and subgingival biofilms, and a machine-learning classifier was capable of identifying an individual's ethnicity from subgingival microbial signatures. The classifier identified African Americans with a 100% sensitivity and 74% specificity and Caucasians with a 50% sensitivity and 91% specificity. The data demonstrates a significant association between ethnic affiliation and the composition of the oral microbiome; to the extent that these microbial signatures appear to be capable of discriminating between ethnicities.

  15. Classifying seismic waveforms from scratch: a case study in the alpine environment

    Science.gov (United States)

    Hammer, C.; Ohrnberger, M.; Fäh, D.

    2013-01-01

    Nowadays, an increasing amount of seismic data is collected by daily observatory routines. The basic step for successfully analyzing those data is the correct detection of various event types. However, the visually scanning process is a time-consuming task. Applying standard techniques for detection like the STA/LTA trigger still requires the manual control for classification. Here, we present a useful alternative. The incoming data stream is scanned automatically for events of interest. A stochastic classifier, called hidden Markov model, is learned for each class of interest enabling the recognition of highly variable waveforms. In contrast to other automatic techniques as neural networks or support vector machines the algorithm allows to start the classification from scratch as soon as interesting events are identified. Neither the tedious process of collecting training samples nor a time-consuming configuration of the classifier is required. An approach originally introduced for the volcanic task force action allows to learn classifier properties from a single waveform example and some hours of background recording. Besides a reduction of required workload this also enables to detect very rare events. Especially the latter feature provides a milestone point for the use of seismic devices in alpine warning systems. Furthermore, the system offers the opportunity to flag new signal classes that have not been defined before. We demonstrate the application of the classification system using a data set from the Swiss Seismological Survey achieving very high recognition rates. In detail we document all refinements of the classifier providing a step-by-step guide for the fast set up of a well-working classification system.

  16. Cascaded lexicalised classifiers for second-person reference resolution

    NARCIS (Netherlands)

    Purver, M.; Fernández, R.; Frampton, M.; Peters, S.; Healey, P.; Pieraccini, R.; Byron, D.; Young, S.; Purver, M.

    2009-01-01

    This paper examines the resolution of the second person English pronoun you in multi-party dialogue. Following previous work, we attempt to classify instances as generic or referential, and in the latter case identify the singular or plural addressee. We show that accuracy and robustness can be

  17. Identifying Corneal Infections in Formalin-Fixed Specimens Using Next Generation Sequencing.

    Science.gov (United States)

    Li, Zhigang; Breitwieser, Florian P; Lu, Jennifer; Jun, Albert S; Asnaghi, Laura; Salzberg, Steven L; Eberhart, Charles G

    2018-01-01

    We test the ability of next-generation sequencing, combined with computational analysis, to identify a range of organisms causing infectious keratitis. This retrospective study evaluated 16 cases of infectious keratitis and four control corneas in formalin-fixed tissues from the pathology laboratory. Infectious cases also were analyzed in the microbiology laboratory using culture, polymerase chain reaction, and direct staining. Classified sequence reads were analyzed with two different metagenomics classification engines, Kraken and Centrifuge, and visualized using the Pavian software tool. Sequencing generated 20 to 46 million reads per sample. On average, 96% of the reads were classified as human, 0.3% corresponded to known vectors or contaminant sequences, 1.7% represented microbial sequences, and 2.4% could not be classified. The two computational strategies successfully identified the fungal, bacterial, and amoebal pathogens in most patients, including all four bacterial and mycobacterial cases, five of six fungal cases, three of three Acanthamoeba cases, and one of three herpetic keratitis cases. In several cases, additional potential pathogens also were identified. In one case with cytomegalovirus identified by Kraken and Centrifuge, the virus was confirmed by direct testing, while two where Staphylococcus aureus or cytomegalovirus were identified by Centrifuge but not Kraken could not be confirmed. Confirmation was not attempted for an additional three potential pathogens identified by Kraken and 11 identified by Centrifuge. Next generation sequencing combined with computational analysis can identify a wide range of pathogens in formalin-fixed corneal specimens, with potential applications in clinical diagnostics and research.

  18. Reducing variability in the output of pattern classifiers using histogram shaping

    International Nuclear Information System (INIS)

    Gupta, Shalini; Kan, Chih-Wen; Markey, Mia K.

    2010-01-01

    Purpose: The authors present a novel technique based on histogram shaping to reduce the variability in the output and (sensitivity, specificity) pairs of pattern classifiers with identical ROC curves, but differently distributed outputs. Methods: The authors identify different sources of variability in the output of linear pattern classifiers with identical ROC curves, which also result in classifiers with differently distributed outputs. They theoretically develop a novel technique based on the matching of the histograms of these differently distributed pattern classifier outputs to reduce the variability in their (sensitivity, specificity) pairs at fixed decision thresholds, and to reduce the variability in their actual output values. They empirically demonstrate the efficacy of the proposed technique by means of analyses on the simulated data and real world mammography data. Results: For the simulated data, with three different known sources of variability, and for the real world mammography data with unknown sources of variability, the proposed classifier output calibration technique significantly reduced the variability in the classifiers' (sensitivity, specificity) pairs at fixed decision thresholds. Furthermore, for classifiers with monotonically or approximately monotonically related output variables, the histogram shaping technique also significantly reduced the variability in their actual output values. Conclusions: Classifier output calibration based on histogram shaping can be successfully employed to reduce the variability in the output values and (sensitivity, specificity) pairs of pattern classifiers with identical ROC curves, but differently distributed outputs.

  19. Genomic Aberrations in Crizotinib Resistant Lung Adenocarcinoma Samples Identified by Transcriptome Sequencing.

    Directory of Open Access Journals (Sweden)

    Ali Saber

    Full Text Available ALK-break positive non-small cell lung cancer (NSCLC patients initially respond to crizotinib, but resistance occurs inevitably. In this study we aimed to identify fusion genes in crizotinib resistant tumor samples. Re-biopsies of three patients were subjected to paired-end RNA sequencing to identify fusion genes using deFuse and EricScript. The IGV browser was used to determine presence of known resistance-associated mutations. Sanger sequencing was used to validate fusion genes and digital droplet PCR to validate mutations. ALK fusion genes were detected in all three patients with EML4 being the fusion partner. One patient had no additional fusion genes. Another patient had one additional fusion gene, but without a predicted open reading frame (ORF. The third patient had three additional fusion genes, of which two were derived from the same chromosomal region as the EML4-ALK. A predicted ORF was identified only in the CLIP4-VSNL1 fusion product. The fusion genes validated in the post-treatment sample were also present in the biopsy before crizotinib. ALK mutations (p.C1156Y and p.G1269A detected in the re-biopsies of two patients, were not detected in pre-treatment biopsies. In conclusion, fusion genes identified in our study are unlikely to be involved in crizotinib resistance based on presence in pre-treatment biopsies. The detection of ALK mutations in post-treatment tumor samples of two patients underlines their role in crizotinib resistance.

  20. Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

    Science.gov (United States)

    Sankari, E Siva; Manimegalai, D

    2017-12-21

    Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Classifying patients' complaints for regulatory purposes : A Pilot Study

    NARCIS (Netherlands)

    Bouwman, R.J.R.; Bomhoff, Manja; Robben, Paul; Friele, R.D.

    2018-01-01

    Objectives: It is assumed that classifying and aggregated reporting of patients' complaints by regulators helps to identify problem areas, to respond better to patients and increase public accountability. This pilot study addresses what a classification of complaints in a regulatory setting

  2. An Active Learning Classifier for Further Reducing Diabetic Retinopathy Screening System Cost

    Directory of Open Access Journals (Sweden)

    Yinan Zhang

    2016-01-01

    Full Text Available Diabetic retinopathy (DR screening system raises a financial problem. For further reducing DR screening cost, an active learning classifier is proposed in this paper. Our approach identifies retinal images based on features extracted by anatomical part recognition and lesion detection algorithms. Kernel extreme learning machine (KELM is a rapid classifier for solving classification problems in high dimensional space. Both active learning and ensemble technique elevate performance of KELM when using small training dataset. The committee only proposes necessary manual work to doctor for saving cost. On the publicly available Messidor database, our classifier is trained with 20%–35% of labeled retinal images and comparative classifiers are trained with 80% of labeled retinal images. Results show that our classifier can achieve better classification accuracy than Classification and Regression Tree, radial basis function SVM, Multilayer Perceptron SVM, Linear SVM, and K Nearest Neighbor. Empirical experiments suggest that our active learning classifier is efficient for further reducing DR screening cost.

  3. STATISTICAL TOOLS FOR CLASSIFYING GALAXY GROUP DYNAMICS

    International Nuclear Information System (INIS)

    Hou, Annie; Parker, Laura C.; Harris, William E.; Wilman, David J.

    2009-01-01

    The dynamical state of galaxy groups at intermediate redshifts can provide information about the growth of structure in the universe. We examine three goodness-of-fit tests, the Anderson-Darling (A-D), Kolmogorov, and χ 2 tests, in order to determine which statistical tool is best able to distinguish between groups that are relaxed and those that are dynamically complex. We perform Monte Carlo simulations of these three tests and show that the χ 2 test is profoundly unreliable for groups with fewer than 30 members. Power studies of the Kolmogorov and A-D tests are conducted to test their robustness for various sample sizes. We then apply these tests to a sample of the second Canadian Network for Observational Cosmology Redshift Survey (CNOC2) galaxy groups and find that the A-D test is far more reliable and powerful at detecting real departures from an underlying Gaussian distribution than the more commonly used χ 2 and Kolmogorov tests. We use this statistic to classify a sample of the CNOC2 groups and find that 34 of 106 groups are inconsistent with an underlying Gaussian velocity distribution, and thus do not appear relaxed. In addition, we compute velocity dispersion profiles (VDPs) for all groups with more than 20 members and compare the overall features of the Gaussian and non-Gaussian groups, finding that the VDPs of the non-Gaussian groups are distinct from those classified as Gaussian.

  4. Classification of THz pulse signals using two-dimensional cross-correlation feature extraction and non-linear classifiers.

    Science.gov (United States)

    Siuly; Yin, Xiaoxia; Hadjiloucas, Sillas; Zhang, Yanchun

    2016-04-01

    This work provides a performance comparison of four different machine learning classifiers: multinomial logistic regression with ridge estimators (MLR) classifier, k-nearest neighbours (KNN), support vector machine (SVM) and naïve Bayes (NB) as applied to terahertz (THz) transient time domain sequences associated with pixelated images of different powder samples. The six substances considered, although have similar optical properties, their complex insertion loss at the THz part of the spectrum is significantly different because of differences in both their frequency dependent THz extinction coefficient as well as differences in their refractive index and scattering properties. As scattering can be unquantifiable in many spectroscopic experiments, classification solely on differences in complex insertion loss can be inconclusive. The problem is addressed using two-dimensional (2-D) cross-correlations between background and sample interferograms, these ensure good noise suppression of the datasets and provide a range of statistical features that are subsequently used as inputs to the above classifiers. A cross-validation procedure is adopted to assess the performance of the classifiers. Firstly the measurements related to samples that had thicknesses of 2mm were classified, then samples at thicknesses of 4mm, and after that 3mm were classified and the success rate and consistency of each classifier was recorded. In addition, mixtures having thicknesses of 2 and 4mm as well as mixtures of 2, 3 and 4mm were presented simultaneously to all classifiers. This approach provided further cross-validation of the classification consistency of each algorithm. The results confirm the superiority in classification accuracy and robustness of the MLR (least accuracy 88.24%) and KNN (least accuracy 90.19%) algorithms which consistently outperformed the SVM (least accuracy 74.51%) and NB (least accuracy 56.86%) classifiers for the same number of feature vectors across all studies

  5. Classifying the Progression of Ductal Carcinoma from Single-Cell Sampled Data via Integer Linear Programming: A Case Study.

    Science.gov (United States)

    Catanzaro, Daniele; Shackney, Stanley E; Schaffer, Alejandro A; Schwartz, Russell

    2016-01-01

    Ductal Carcinoma In Situ (DCIS) is a precursor lesion of Invasive Ductal Carcinoma (IDC) of the breast. Investigating its temporal progression could provide fundamental new insights for the development of better diagnostic tools to predict which cases of DCIS will progress to IDC. We investigate the problem of reconstructing a plausible progression from single-cell sampled data of an individual with synchronous DCIS and IDC. Specifically, by using a number of assumptions derived from the observation of cellular atypia occurring in IDC, we design a possible predictive model using integer linear programming (ILP). Computational experiments carried out on a preexisting data set of 13 patients with simultaneous DCIS and IDC show that the corresponding predicted progression models are classifiable into categories having specific evolutionary characteristics. The approach provides new insights into mechanisms of clonal progression in breast cancers and helps illustrate the power of the ILP approach for similar problems in reconstructing tumor evolution scenarios under complex sets of constraints.

  6. MAMMOGRAMS ANALYSIS USING SVM CLASSIFIER IN COMBINED TRANSFORMS DOMAIN

    Directory of Open Access Journals (Sweden)

    B.N. Prathibha

    2011-02-01

    Full Text Available Breast cancer is a primary cause of mortality and morbidity in women. Reports reveal that earlier the detection of abnormalities, better the improvement in survival. Digital mammograms are one of the most effective means for detecting possible breast anomalies at early stages. Digital mammograms supported with Computer Aided Diagnostic (CAD systems help the radiologists in taking reliable decisions. The proposed CAD system extracts wavelet features and spectral features for the better classification of mammograms. The Support Vector Machines classifier is used to analyze 206 mammogram images from Mias database pertaining to the severity of abnormality, i.e., benign and malign. The proposed system gives 93.14% accuracy for discrimination between normal-malign and 87.25% accuracy for normal-benign samples and 89.22% accuracy for benign-malign samples. The study reveals that features extracted in hybrid transform domain with SVM classifier proves to be a promising tool for analysis of mammograms.

  7. A neural-fuzzy approach to classify the ecological status in surface waters

    International Nuclear Information System (INIS)

    Ocampo-Duque, William; Schuhmacher, Marta; Domingo, Jose L.

    2007-01-01

    A methodology based on a hybrid approach that combines fuzzy inference systems and artificial neural networks has been used to classify ecological status in surface waters. This methodology has been proposed to deal efficiently with the non-linearity and highly subjective nature of variables involved in this serious problem. Ecological status has been assessed with biological, hydro-morphological, and physicochemical indicators. A data set collected from 378 sampling sites in the Ebro river basin has been used to train and validate the hybrid model. Up to 97.6% of sampling sites have been correctly classified with neural-fuzzy models. Such performance resulted very competitive when compared with other classification algorithms. With non-parametric classification-regression trees and probabilistic neural networks, the predictive capacities were 90.7% and 97.0%, respectively. The proposed methodology can support decision-makers in evaluation and classification of ecological status, as required by the EU Water Framework Directive. - Fuzzy inference systems can be used as environmental classifiers

  8. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    Science.gov (United States)

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  9. Cluster analysis of clinical data identifies fibromyalgia subgroups.

    Directory of Open Access Journals (Sweden)

    Elisa Docampo

    Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.

  10. SpectraClassifier 1.0: a user friendly, automated MRS-based classifier-development system

    Directory of Open Access Journals (Sweden)

    Julià-Sapé Margarida

    2010-02-01

    Full Text Available Abstract Background SpectraClassifier (SC is a Java solution for designing and implementing Magnetic Resonance Spectroscopy (MRS-based classifiers. The main goal of SC is to allow users with minimum background knowledge of multivariate statistics to perform a fully automated pattern recognition analysis. SC incorporates feature selection (greedy stepwise approach, either forward or backward, and feature extraction (PCA. Fisher Linear Discriminant Analysis is the method of choice for classification. Classifier evaluation is performed through various methods: display of the confusion matrix of the training and testing datasets; K-fold cross-validation, leave-one-out and bootstrapping as well as Receiver Operating Characteristic (ROC curves. Results SC is composed of the following modules: Classifier design, Data exploration, Data visualisation, Classifier evaluation, Reports, and Classifier history. It is able to read low resolution in-vivo MRS (single-voxel and multi-voxel and high resolution tissue MRS (HRMAS, processed with existing tools (jMRUI, INTERPRET, 3DiCSI or TopSpin. In addition, to facilitate exchanging data between applications, a standard format capable of storing all the information needed for a dataset was developed. Each functionality of SC has been specifically validated with real data with the purpose of bug-testing and methods validation. Data from the INTERPRET project was used. Conclusions SC is a user-friendly software designed to fulfil the needs of potential users in the MRS community. It accepts all kinds of pre-processed MRS data types and classifies them semi-automatically, allowing spectroscopists to concentrate on interpretation of results with the use of its visualisation tools.

  11. Optimization of short amino acid sequences classifier

    Science.gov (United States)

    Barcz, Aleksy; Szymański, Zbigniew

    This article describes processing methods used for short amino acid sequences classification. The data processed are 9-symbols string representations of amino acid sequences, divided into 49 data sets - each one containing samples labeled as reacting or not with given enzyme. The goal of the classification is to determine for a single enzyme, whether an amino acid sequence would react with it or not. Each data set is processed separately. Feature selection is performed to reduce the number of dimensions for each data set. The method used for feature selection consists of two phases. During the first phase, significant positions are selected using Classification and Regression Trees. Afterwards, symbols appearing at the selected positions are substituted with numeric values of amino acid properties taken from the AAindex database. In the second phase the new set of features is reduced using a correlation-based ranking formula and Gram-Schmidt orthogonalization. Finally, the preprocessed data is used for training LS-SVM classifiers. SPDE, an evolutionary algorithm, is used to obtain optimal hyperparameters for the LS-SVM classifier, such as error penalty parameter C and kernel-specific hyperparameters. A simple score penalty is used to adapt the SPDE algorithm to the task of selecting classifiers with best performance measures values.

  12. Classifying MCI Subtypes in Community-Dwelling Elderly Using Cross-Sectional and Longitudinal MRI-Based Biomarkers

    Directory of Open Access Journals (Sweden)

    Hao Guan

    2017-09-01

    Full Text Available Amnestic MCI (aMCI and non-amnestic MCI (naMCI are considered to differ in etiology and outcome. Accurately classifying MCI into meaningful subtypes would enable early intervention with targeted treatment. In this study, we employed structural magnetic resonance imaging (MRI for MCI subtype classification. This was carried out in a sample of 184 community-dwelling individuals (aged 73–85 years. Cortical surface based measurements were computed from longitudinal and cross-sectional scans. By introducing a feature selection algorithm, we identified a set of discriminative features, and further investigated the temporal patterns of these features. A voting classifier was trained and evaluated via 10 iterations of cross-validation. The best classification accuracies achieved were: 77% (naMCI vs. aMCI, 81% (aMCI vs. cognitively normal (CN and 70% (naMCI vs. CN. The best results for differentiating aMCI from naMCI were achieved with baseline features. Hippocampus, amygdala and frontal pole were found to be most discriminative for classifying MCI subtypes. Additionally, we observed the dynamics of classification of several MRI biomarkers. Learning the dynamics of atrophy may aid in the development of better biomarkers, as it may track the progression of cognitive impairment.

  13. Label-Free LC-MS/MS Proteomic Analysis of Cerebrospinal Fluid Identifies Protein/Pathway Alterations and Candidate Biomarkers for Amyotrophic Lateral Sclerosis.

    Science.gov (United States)

    Collins, Mahlon A; An, Jiyan; Hood, Brian L; Conrads, Thomas P; Bowser, Robert P

    2015-11-06

    Analysis of the cerebrospinal fluid (CSF) proteome has proven valuable to the study of neurodegenerative disorders. To identify new protein/pathway alterations and candidate biomarkers for amyotrophic lateral sclerosis (ALS), we performed comparative proteomic profiling of CSF from sporadic ALS (sALS), healthy control (HC), and other neurological disease (OND) subjects using label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS). A total of 1712 CSF proteins were detected and relatively quantified by spectral counting. Levels of several proteins with diverse biological functions were significantly altered in sALS samples. Enrichment analysis was used to link these alterations to biological pathways, which were predominantly related to inflammation, neuronal activity, and extracellular matrix regulation. We then used our CSF proteomic profiles to create a support vector machines classifier capable of discriminating training set ALS from non-ALS (HC and OND) samples. Four classifier proteins, WD repeat-containing protein 63, amyloid-like protein 1, SPARC-like protein 1, and cell adhesion molecule 3, were identified by feature selection and externally validated. The resultant classifier distinguished ALS from non-ALS samples with 83% sensitivity and 100% specificity in an independent test set. Collectively, our results illustrate the utility of CSF proteomic profiling for identifying ALS protein/pathway alterations and candidate disease biomarkers.

  14. Novelty Detection Classifiers in Weed Mapping: Silybum marianum Detection on UAV Multispectral Images.

    Science.gov (United States)

    Alexandridis, Thomas K; Tamouridou, Afroditi Alexandra; Pantazi, Xanthoula Eirini; Lagopodi, Anastasia L; Kashefi, Javid; Ovakoglou, Georgios; Polychronos, Vassilios; Moshou, Dimitrios

    2017-09-01

    In the present study, the detection and mapping of Silybum marianum (L.) Gaertn. weed using novelty detection classifiers is reported. A multispectral camera (green-red-NIR) on board a fixed wing unmanned aerial vehicle (UAV) was employed for obtaining high-resolution images. Four novelty detection classifiers were used to identify S. marianum between other vegetation in a field. The classifiers were One Class Support Vector Machine (OC-SVM), One Class Self-Organizing Maps (OC-SOM), Autoencoders and One Class Principal Component Analysis (OC-PCA). As input features to the novelty detection classifiers, the three spectral bands and texture were used. The S. marianum identification accuracy using OC-SVM reached an overall accuracy of 96%. The results show the feasibility of effective S. marianum mapping by means of novelty detection classifiers acting on multispectral UAV imagery.

  15. Application of a semiautomatic classifier for modic and disk hernia changes in magnetic resonance

    Directory of Open Access Journals (Sweden)

    Eduardo López Arce Vivas

    2015-03-01

    Full Text Available OBJECTIVE: Early detection of degenerative changes in lumbar intervertebral disc by magnetic resonance imaging in a semiautomatic classifier for prevention of degenerative disease. METHOD: MRIs were selected with a diagnosis of degenerative disc disease or back pain from January to May 2014, with a sample of 23 patients and a total of 170 disks evaluated by sagittal T2 MRI image, first evaluated by a specialist physician in training and them were introduced into the software, being the results compared. RESULTS: One hundred and fifteen discs were evaluated by a programmed semiautomatic classifier to identify MODIC changes and hernia, which produced results "normal or MODIC" and "normal or abnormal", respectively. With a total of 230 readings, of which 141 were correct, 84 were reading errors and 10 readings were undiagnosed, the semiautomatic classifier is a useful tool for early diagnosis or established disease and is easy to apply because of the speed and ease of use; however, at this early stage of development, software is inferior to clinical observations and the results were from around 65% to 60% certainty for MODIC rating and 61% to 58% for disc herniation, compared with clinical evaluations. CONCLUSION: The comparative results between the two doctors were 94 consistent results and only 21 errors, which represents 81% certainty.

  16. Classifier utility modeling and analysis of hypersonic inlet start/unstart considering training data costs

    Science.gov (United States)

    Chang, Juntao; Hu, Qinghua; Yu, Daren; Bao, Wen

    2011-11-01

    Start/unstart detection is one of the most important issues of hypersonic inlets and is also the foundation of protection control of scramjet. The inlet start/unstart detection can be attributed to a standard pattern classification problem, and the training sample costs have to be considered for the classifier modeling as the CFD numerical simulations and wind tunnel experiments of hypersonic inlets both cost time and money. To solve this problem, the CFD simulation of inlet is studied at first step, and the simulation results could provide the training data for pattern classification of hypersonic inlet start/unstart. Then the classifier modeling technology and maximum classifier utility theories are introduced to analyze the effect of training data cost on classifier utility. In conclusion, it is useful to introduce support vector machine algorithms to acquire the classifier model of hypersonic inlet start/unstart, and the minimum total cost of hypersonic inlet start/unstart classifier can be obtained by the maximum classifier utility theories.

  17. Quantum ensembles of quantum classifiers.

    Science.gov (United States)

    Schuld, Maria; Petruccione, Francesco

    2018-02-09

    Quantum machine learning witnesses an increasing amount of quantum algorithms for data-driven decision making, a problem with potential applications ranging from automated image recognition to medical diagnosis. Many of those algorithms are implementations of quantum classifiers, or models for the classification of data inputs with a quantum computer. Following the success of collective decision making with ensembles in classical machine learning, this paper introduces the concept of quantum ensembles of quantum classifiers. Creating the ensemble corresponds to a state preparation routine, after which the quantum classifiers are evaluated in parallel and their combined decision is accessed by a single-qubit measurement. This framework naturally allows for exponentially large ensembles in which - similar to Bayesian learning - the individual classifiers do not have to be trained. As an example, we analyse an exponentially large quantum ensemble in which each classifier is weighed according to its performance in classifying the training data, leading to new results for quantum as well as classical machine learning.

  18. Comparison of several chemometric methods of libraries and classifiers for the analysis of expired drugs based on Raman spectra.

    Science.gov (United States)

    Gao, Qun; Liu, Yan; Li, Hao; Chen, Hui; Chai, Yifeng; Lu, Feng

    2014-06-01

    Some expired drugs are difficult to detect by conventional means. If they are repackaged and sold back into market, they will constitute a new public health challenge. For the detection of repackaged expired drugs within specification, paracetamol tablet from a manufacturer was used as a model drug in this study for comparison of Raman spectra-based library verification and classification methods. Raman spectra of different batches of paracetamol tablets were collected and a library including standard spectra of unexpired batches of tablets was established. The Raman spectrum of each sample was identified by cosine and correlation with the standard spectrum. The average HQI of the suspicious samples and the standard spectrum were calculated. The optimum threshold values were 0.997 and 0.998 respectively as a result of ROC and four evaluations, for which the accuracy was up to 97%. Three supervised classifiers, PLS-DA, SVM and k-NN, were chosen to establish two-class classification models and compared subsequently. They were used to establish a classification of expired batches and an unexpired batch, and predict the suspect samples. The average accuracy was 90.12%, 96.80% and 89.37% respectively. Different pre-processing techniques were tried to find that first derivative was optimal for methods of libraries and max-min normalization was optimal for that of classifiers. The results obtained from these studies indicated both libraries and classifier methods could detect the expired drugs effectively, and they should be used complementarily in the fast-screening. Copyright © 2014 Elsevier B.V. All rights reserved.

  19. Fuzziness-based active learning framework to enhance hyperspectral image classification performance for discriminative and generative classifiers.

    Directory of Open Access Journals (Sweden)

    Muhammad Ahmad

    Full Text Available Hyperspectral image classification with a limited number of training samples without loss of accuracy is desirable, as collecting such data is often expensive and time-consuming. However, classifiers trained with limited samples usually end up with a large generalization error. To overcome the said problem, we propose a fuzziness-based active learning framework (FALF, in which we implement the idea of selecting optimal training samples to enhance generalization performance for two different kinds of classifiers, discriminative and generative (e.g. SVM and KNN. The optimal samples are selected by first estimating the boundary of each class and then calculating the fuzziness-based distance between each sample and the estimated class boundaries. Those samples that are at smaller distances from the boundaries and have higher fuzziness are chosen as target candidates for the training set. Through detailed experimentation on three publically available datasets, we showed that when trained with the proposed sample selection framework, both classifiers achieved higher classification accuracy and lower processing time with the small amount of training data as opposed to the case where the training samples were selected randomly. Our experiments demonstrate the effectiveness of our proposed method, which equates favorably with the state-of-the-art methods.

  20. Methods and Techniques of Sampling, Culturing and Identifying of Subsurface Bacteria

    International Nuclear Information System (INIS)

    Lee, Seung Yeop; Baik, Min Hoon

    2010-11-01

    This report described sampling, culturing and identifying of KURT underground bacteria, which existed as iron-, manganese-, and sulfate-reducing bacteria. The methods of culturing and media preparation were different by bacteria species affecting bacteria growth-rates. It will be possible for the cultured bacteria to be used for various applied experiments and researches in the future

  1. Application of chemometric techniques to classify the quality of surface water in the watershed of the river Bermudez in Heredia, Costa Rica

    International Nuclear Information System (INIS)

    Herrera Murillo, Jorge; Rodriguez Roman, Susana; Solis Torres, Ligia Dina; Castro Delgado, Francisco

    2009-01-01

    The application of selected chemometric techniques have been investigated: cluster analysis, principal component analysis and factor analysis, to classify the quality of rivers water and evaluate pollution data. Fourteen physicochemical parameters were monitored at 10 stations located in the watershed of the river Bermudez, from August 2005 to February 2007. The results have identified the existence of two natural clusters of monitoring sites with similar characteristics of contamination and identify the DQO, DBO, NO 3 - , SO 4 -2 and SST, as the main variables that discriminate between sampling sites. (author) [es

  2. Identifying Domain-General and Domain-Specific Predictors of Low Mathematics Performance: A Classification and Regression Tree Analysis

    Directory of Open Access Journals (Sweden)

    David J. Purpura

    2017-12-01

    Full Text Available Many children struggle to successfully acquire early mathematics skills. Theoretical and empirical evidence has pointed to deficits in domain-specific skills (e.g., non-symbolic mathematics skills or domain-general skills (e.g., executive functioning and language as underlying low mathematical performance. In the current study, we assessed a sample of 113 three- to five-year old preschool children on a battery of domain-specific and domain-general factors in the fall and spring of their preschool year to identify Time 1 (fall factors associated with low performance in mathematics knowledge at Time 2 (spring. We used the exploratory approach of classification and regression tree analyses, a strategy that uses step-wise partitioning to create subgroups from a larger sample using multiple predictors, to identify the factors that were the strongest classifiers of low performance for younger and older preschool children. Results indicated that the most consistent classifier of low mathematics performance at Time 2 was children’s Time 1 mathematical language skills. Further, other distinct classifiers of low performance emerged for younger and older children. These findings suggest that risk classification for low mathematics performance may differ depending on children’s age.

  3. Identifying Different Transportation Modes from Trajectory Data Using Tree-Based Ensemble Classifiers

    Directory of Open Access Journals (Sweden)

    Zhibin Xiao

    2017-02-01

    Full Text Available Recognition of transportation modes can be used in different applications including human behavior research, transport management and traffic control. Previous work on transportation mode recognition has often relied on using multiple sensors or matching Geographic Information System (GIS information, which is not possible in many cases. In this paper, an approach based on ensemble learning is proposed to infer hybrid transportation modes using only Global Position System (GPS data. First, in order to distinguish between different transportation modes, we used a statistical method to generate global features and extract several local features from sub-trajectories after trajectory segmentation, before these features were combined in the classification stage. Second, to obtain a better performance, we used tree-based ensemble models (Random Forest, Gradient Boosting Decision Tree, and XGBoost instead of traditional methods (K-Nearest Neighbor, Decision Tree, and Support Vector Machines to classify the different transportation modes. The experiment results on the later have shown the efficacy of our proposed approach. Among them, the XGBoost model produced the best performance with a classification accuracy of 90.77% obtained on the GEOLIFE dataset, and we used a tree-based ensemble method to ensure accurate feature selection to reduce the model complexity.

  4. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.

    Science.gov (United States)

    Zhang, Yi; Ren, Jinchang; Jiang, Jianmin

    2015-01-01

    Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.

  5. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations

    Directory of Open Access Journals (Sweden)

    Yi Zhang

    2015-01-01

    Full Text Available Maximum likelihood classifier (MLC and support vector machines (SVM are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.

  6. Spot sputum samples are at least as good as early morning samples for identifying Mycobacterium tuberculosis.

    Science.gov (United States)

    Murphy, Michael E; Phillips, Patrick P J; Mendel, Carl M; Bongard, Emily; Bateson, Anna L C; Hunt, Robert; Murthy, Saraswathi; Singh, Kasha P; Brown, Michael; Crook, Angela M; Nunn, Andrew J; Meredith, Sarah K; Lipman, Marc; McHugh, Timothy D; Gillespie, Stephen H

    2017-10-27

    The use of early morning sputum samples (EMS) to diagnose tuberculosis (TB) can result in treatment delay given the need for the patient to return to the clinic with the EMS, increasing the chance of patients being lost during their diagnostic workup. However, there is little evidence to support the superiority of EMS over spot sputum samples. In this new analysis of the REMoxTB study, we compare the diagnostic accuracy of EMS with spot samples for identifying Mycobacterium tuberculosis pre- and post-treatment. Patients who were smear positive at screening were enrolled into the study. Paired sputum samples (one EMS and one spot) were collected at each trial visit pre- and post-treatment. Microscopy and culture on solid LJ and liquid MGIT media were performed on all samples; those missing corresponding paired results were excluded from the analyses. Data from 1115 pre- and 2995 post-treatment paired samples from 1931 patients enrolled in the REMoxTB study were analysed. Patients were recruited from South Africa (47%), East Africa (21%), India (20%), Asia (11%), and North America (1%); 70% were male, median age 31 years (IQR 24-41), 139 (7%) co-infected with HIV with a median CD4 cell count of 399 cells/μL (IQR 318-535). Pre-treatment spot samples had a higher yield of positive Ziehl-Neelsen smears (98% vs. 97%, P = 0.02) and LJ cultures (87% vs. 82%, P = 0.006) than EMS, but there was no difference for positivity by MGIT (93% vs. 95%, P = 0.18). Contaminated and false-positive MGIT were found more often with EMS rather than spot samples. Surprisingly, pre-treatment EMS had a higher smear grading and shorter time-to-positivity, by 1 day, than spot samples in MGIT culture (4.5 vs. 5.5 days, P spot samples in those with unfavourable outcomes, there were no differences in smear or culture results, and positive results were not detected earlier in Kaplan-Meier analyses in either EMS or spot samples. Our data do not support the hypothesis that EMS

  7. Derivation of LDA log likelihood ratio one-to-one classifier

    NARCIS (Netherlands)

    Spreeuwers, Lieuwe Jan

    2014-01-01

    The common expression for the Likelihood Ratio classifier using LDA assumes that the reference class mean is available. In biometrics, this is often not the case and only a single sample of the reference class is available. In this paper expressions are derived for biometric comparison between

  8. Identifying 'unhealthy' food advertising on television: a case study applying the UK Nutrient Profile model.

    Science.gov (United States)

    Jenkin, Gabrielle; Wilson, Nick; Hermanson, Nicole

    2009-05-01

    To evaluate the feasibility of the UK Nutrient Profile (NP) model for identifying 'unhealthy' food advertisements using a case study of New Zealand television advertisements. Four weeks of weekday television from 15.30 hours to 18.30 hours was videotaped from a state-owned (free-to-air) television channel popular with children. Food advertisements were identified and their nutritional information collected in accordance with the requirements of the NP model. Nutrient information was obtained from a variety of sources including food labels, company websites and a national nutritional database. From the 60 h sample of weekday afternoon television, there were 1893 advertisements, of which 483 were for food products or retailers. After applying the NP model, 66 % of these were classified as advertising high-fat, high-salt and high-sugar (HFSS) foods; 28 % were classified as advertising non-HFSS foods; and the remaining 2 % were unclassifiable. More than half (53 %) of the HFSS food advertisements were for 'mixed meal' items promoted by major fast-food franchises. The advertising of non-HFSS food was sparse, covering a narrow range of food groups, with no advertisements for fresh fruit or vegetables. Despite the NP model having some design limitations in classifying real-world televised food advertisements, it was easily applied to this sample and could clearly identify HFSS products. Policy makers who do not wish to completely restrict food advertising to children outright should consider using this NP model for regulating food advertising.

  9. Classifying magnetic resonance image modalities with convolutional neural networks

    Science.gov (United States)

    Remedios, Samuel; Pham, Dzung L.; Butman, John A.; Roy, Snehashis

    2018-02-01

    Magnetic Resonance (MR) imaging allows the acquisition of images with different contrast properties depending on the acquisition protocol and the magnetic properties of tissues. Many MR brain image processing techniques, such as tissue segmentation, require multiple MR contrasts as inputs, and each contrast is treated differently. Thus it is advantageous to automate the identification of image contrasts for various purposes, such as facilitating image processing pipelines, and managing and maintaining large databases via content-based image retrieval (CBIR). Most automated CBIR techniques focus on a two-step process: extracting features from data and classifying the image based on these features. We present a novel 3D deep convolutional neural network (CNN)- based method for MR image contrast classification. The proposed CNN automatically identifies the MR contrast of an input brain image volume. Specifically, we explored three classification problems: (1) identify T1-weighted (T1-w), T2-weighted (T2-w), and fluid-attenuated inversion recovery (FLAIR) contrasts, (2) identify pre vs postcontrast T1, (3) identify pre vs post-contrast FLAIR. A total of 3418 image volumes acquired from multiple sites and multiple scanners were used. To evaluate each task, the proposed model was trained on 2137 images and tested on the remaining 1281 images. Results showed that image volumes were correctly classified with 97.57% accuracy.

  10. General and Local: Averaged k-Dependence Bayesian Classifiers

    Directory of Open Access Journals (Sweden)

    Limin Wang

    2015-06-01

    Full Text Available The inference of a general Bayesian network has been shown to be an NP-hard problem, even for approximate solutions. Although k-dependence Bayesian (KDB classifier can construct at arbitrary points (values of k along the attribute dependence spectrum, it cannot identify the changes of interdependencies when attributes take different values. Local KDB, which learns in the framework of KDB, is proposed in this study to describe the local dependencies implicated in each test instance. Based on the analysis of functional dependencies, substitution-elimination resolution, a new type of semi-naive Bayesian operation, is proposed to substitute or eliminate generalization to achieve accurate estimation of conditional probability distribution while reducing computational complexity. The final classifier, averaged k-dependence Bayesian (AKDB classifiers, will average the output of KDB and local KDB. Experimental results on the repository of machine learning databases from the University of California Irvine (UCI showed that AKDB has significant advantages in zero-one loss and bias relative to naive Bayes (NB, tree augmented naive Bayes (TAN, Averaged one-dependence estimators (AODE, and KDB. Moreover, KDB and local KDB show mutually complementary characteristics with respect to variance.

  11. A Single Platinum Microelectrode for Identifying Soft Drink Samples

    Directory of Open Access Journals (Sweden)

    Lígia Bueno

    2012-01-01

    Full Text Available Cyclic voltammograms recorded with a single platinum microelectrode were used along with a non-supervised pattern recognition, namely, Principal Component Analysis, to conduct a qualitative analysis of sixteen different brands of carbonated soft drinks (Kuat, Soda Antarctica, H2OH!, Sprite 2.0, Guarana Antarctica, Guarana Antarctica Zero, Coca-Cola, Coca-Cola Zero, Coca-Cola Plus, Pepsi, Pepsi Light, Pepsi Twist, Pepsi Twist Light, Pepsi Twist 3, Schin Cola, and Classic Dillar’s. In this analysis, soft drink samples were not subjected to pre-treatment. Good differentiation among all the analysed soft drinks was achieved using the voltammetric data. An analysis of the loading plots shows that the potentials of −0.65 V, −0.4 V, 0.4 V, and 0.750 V facilitated the discrimination process. The electrochemical processes related to this potential are the reduction of hydrogen ions and inhibition of the platinum oxidation by the caffeine adsorption on the electrode surface. Additionally, the single platinum microelectrode was useful for the quality control of the soft drink samples, as it helped to identify the time at which the beverage was opened.

  12. Classifying galaxy spectra at 0.5 < z < 1 with self-organizing maps

    Science.gov (United States)

    Rahmani, S.; Teimoorinia, H.; Barmby, P.

    2018-05-01

    The spectrum of a galaxy contains information about its physical properties. Classifying spectra using templates helps elucidate the nature of a galaxy's energy sources. In this paper, we investigate the use of self-organizing maps in classifying galaxy spectra against templates. We trained semi-supervised self-organizing map networks using a set of templates covering the wavelength range from far ultraviolet to near infrared. The trained networks were used to classify the spectra of a sample of 142 galaxies with 0.5 K-means clustering, a supervised neural network, and chi-squared minimization. Spectra corresponding to quiescent galaxies were more likely to be classified similarly by all methods while starburst spectra showed more variability. Compared to classification using chi-squared minimization or the supervised neural network, the galaxies classed together by the self-organizing map had more similar spectra. The class ordering provided by the one-dimensional self-organizing maps corresponds to an ordering in physical properties, a potentially important feature for the exploration of large datasets.

  13. IAEA safeguards and classified materials

    International Nuclear Information System (INIS)

    Pilat, J.F.; Eccleston, G.W.; Fearey, B.L.; Nicholas, N.J.; Tape, J.W.; Kratzer, M.

    1997-01-01

    The international community in the post-Cold War period has suggested that the International Atomic Energy Agency (IAEA) utilize its expertise in support of the arms control and disarmament process in unprecedented ways. The pledges of the US and Russian presidents to place excess defense materials, some of which are classified, under some type of international inspections raises the prospect of using IAEA safeguards approaches for monitoring classified materials. A traditional safeguards approach, based on nuclear material accountancy, would seem unavoidably to reveal classified information. However, further analysis of the IAEA's safeguards approaches is warranted in order to understand fully the scope and nature of any problems. The issues are complex and difficult, and it is expected that common technical understandings will be essential for their resolution. Accordingly, this paper examines and compares traditional safeguards item accounting of fuel at a nuclear power station (especially spent fuel) with the challenges presented by inspections of classified materials. This analysis is intended to delineate more clearly the problems as well as reveal possible approaches, techniques, and technologies that could allow the adaptation of safeguards to the unprecedented task of inspecting classified materials. It is also hoped that a discussion of these issues can advance ongoing political-technical debates on international inspections of excess classified materials

  14. Predicting Classifier Performance with Limited Training Data: Applications to Computer-Aided Diagnosis in Breast and Prostate Cancer

    Science.gov (United States)

    Basavanhally, Ajay; Viswanath, Satish; Madabhushi, Anant

    2015-01-01

    Clinical trials increasingly employ medical imaging data in conjunction with supervised classifiers, where the latter require large amounts of training data to accurately model the system. Yet, a classifier selected at the start of the trial based on smaller and more accessible datasets may yield inaccurate and unstable classification performance. In this paper, we aim to address two common concerns in classifier selection for clinical trials: (1) predicting expected classifier performance for large datasets based on error rates calculated from smaller datasets and (2) the selection of appropriate classifiers based on expected performance for larger datasets. We present a framework for comparative evaluation of classifiers using only limited amounts of training data by using random repeated sampling (RRS) in conjunction with a cross-validation sampling strategy. Extrapolated error rates are subsequently validated via comparison with leave-one-out cross-validation performed on a larger dataset. The ability to predict error rates as dataset size increases is demonstrated on both synthetic data as well as three different computational imaging tasks: detecting cancerous image regions in prostate histopathology, differentiating high and low grade cancer in breast histopathology, and detecting cancerous metavoxels in prostate magnetic resonance spectroscopy. For each task, the relationships between 3 distinct classifiers (k-nearest neighbor, naive Bayes, Support Vector Machine) are explored. Further quantitative evaluation in terms of interquartile range (IQR) suggests that our approach consistently yields error rates with lower variability (mean IQRs of 0.0070, 0.0127, and 0.0140) than a traditional RRS approach (mean IQRs of 0.0297, 0.0779, and 0.305) that does not employ cross-validation sampling for all three datasets. PMID:25993029

  15. Word2Vec inversion and traditional text classifiers for phenotyping lupus.

    Science.gov (United States)

    Turner, Clayton A; Jacobs, Alexander D; Marques, Cassios K; Oates, James C; Kamen, Diane L; Anderson, Paul E; Obeid, Jihad S

    2017-08-22

    Identifying patients with certain clinical criteria based on manual chart review of doctors' notes is a daunting task given the massive amounts of text notes in the electronic health records (EHR). This task can be automated using text classifiers based on Natural Language Processing (NLP) techniques along with pattern recognition machine learning (ML) algorithms. The aim of this research is to evaluate the performance of traditional classifiers for identifying patients with Systemic Lupus Erythematosus (SLE) in comparison with a newer Bayesian word vector method. We obtained clinical notes for patients with SLE diagnosis along with controls from the Rheumatology Clinic (662 total patients). Sparse bag-of-words (BOWs) and Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs) matrices were produced using NLP pipelines. These matrices were subjected to several different NLP classifiers: neural networks, random forests, naïve Bayes, support vector machines, and Word2Vec inversion, a Bayesian inversion method. Performance was measured by calculating accuracy and area under the Receiver Operating Characteristic (ROC) curve (AUC) of a cross-validated (CV) set and a separate testing set. We calculated the accuracy of the ICD-9 billing codes as a baseline to be 90.00% with an AUC of 0.900, the shallow neural network with CUIs to be 92.10% with an AUC of 0.970, the random forest with BOWs to be 95.25% with an AUC of 0.994, the random forest with CUIs to be 95.00% with an AUC of 0.979, and the Word2Vec inversion to be 90.03% with an AUC of 0.905. Our results suggest that a shallow neural network with CUIs and random forests with both CUIs and BOWs are the best classifiers for this lupus phenotyping task. The Word2Vec inversion method failed to significantly beat the ICD-9 code classification, but yielded promising results. This method does not require explicit features and is more adaptable to non-binary classification tasks. The Word2Vec inversion is

  16. A Gene Expression Classifier of Node-Positive Colorectal Cancer

    Directory of Open Access Journals (Sweden)

    Paul F. Meeh

    2009-10-01

    Full Text Available We used digital long serial analysis of gene expression to discover gene expression differences between node-negative and node-positive colorectal tumors and developed a multigene classifier able to discriminate between these two tumor types. We prepared and sequenced long serial analysis of gene expression libraries from one node-negative and one node-positive colorectal tumor, sequenced to a depth of 26,060 unique tags, and identified 262 tags significantly differentially expressed between these two tumors (P < 2 x 10-6. We confirmed the tag-to-gene assignments and differential expression of 31 genes by quantitative real-time polymerase chain reaction, 12 of which were elevated in the node-positive tumor. We analyzed the expression levels of these 12 upregulated genes in a validation panel of 23 additional tumors and developed an optimized seven-gene logistic regression classifier. The classifier discriminated between node-negative and node-positive tumors with 86% sensitivity and 80% specificity. Receiver operating characteristic analysis of the classifier revealed an area under the curve of 0.86. Experimental manipulation of the function of one classification gene, Fibronectin, caused profound effects on invasion and migration of colorectal cancer cells in vitro. These results suggest that the development of node-positive colorectal cancer occurs in part through elevated epithelial FN1 expression and suggest novel strategies for the diagnosis and treatment of advanced disease.

  17. Species classifier choice is a key consideration when analysing low-complexity food microbiome data.

    Science.gov (United States)

    Walsh, Aaron M; Crispie, Fiona; O'Sullivan, Orla; Finnegan, Laura; Claesson, Marcus J; Cotter, Paul D

    2018-03-20

    The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads. Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R 2  = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R 2  = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using

  18. Ensembles of novelty detection classifiers for structural health monitoring using guided waves

    Science.gov (United States)

    Dib, Gerges; Karpenko, Oleksii; Koricho, Ermias; Khomenko, Anton; Haq, Mahmoodul; Udpa, Lalita

    2018-01-01

    Guided wave structural health monitoring uses sparse sensor networks embedded in sophisticated structures for defect detection and characterization. The biggest challenge of those sensor networks is developing robust techniques for reliable damage detection under changing environmental and operating conditions (EOC). To address this challenge, we develop a novelty classifier for damage detection based on one class support vector machines. We identify appropriate features for damage detection and introduce a feature aggregation method which quadratically increases the number of available training observations. We adopt a two-level voting scheme by using an ensemble of classifiers and predictions. Each classifier is trained on a different segment of the guided wave signal, and each classifier makes an ensemble of predictions based on a single observation. Using this approach, the classifier can be trained using a small number of baseline signals. We study the performance using Monte-Carlo simulations of an analytical model and data from impact damage experiments on a glass fiber composite plate. We also demonstrate the classifier performance using two types of baseline signals: fixed and rolling baseline training set. The former requires prior knowledge of baseline signals from all EOC, while the latter does not and leverages the fact that EOC vary slowly over time and can be modeled as a Gaussian process.

  19. Classifying features in CT imagery: accuracy for some single- and multiple-species classifiers

    Science.gov (United States)

    Daniel L. Schmoldt; Jing He; A. Lynn Abbott

    1998-01-01

    Our current approach to automatically label features in CT images of hardwood logs classifies each pixel of an image individually. These feature classifiers use a back-propagation artificial neural network (ANN) and feature vectors that include a small, local neighborhood of pixels and the distance of the target pixel to the center of the log. Initially, this type of...

  20. Comparing hair-morphology and molecular methods to identify fecal samples from Neotropical felids.

    Directory of Open Access Journals (Sweden)

    Carlos C Alberts

    Full Text Available To avoid certain problems encountered with more-traditional and invasive methods in behavioral-ecology studies of mammalian predators, such as felids, molecular approaches have been employed to identify feces found in the field. However, this method requires a complete molecular biology laboratory, and usually also requires very fresh fecal samples to avoid DNA degradation. Both conditions are normally absent in the field. To address these difficulties, identification based on morphological characters (length, color, banding, scales and medullar patterns of hairs found in feces could be employed as an alternative. In this study we constructed a morphological identification key for guard hairs of eight Neotropical felids (jaguar, oncilla, Geoffroy's cat, margay, ocelot, Pampas cat, puma and jaguarundi and compared its efficiency to that of a molecular identification method, using the ATP6 region as a marker. For this molecular approach, we simulated some field conditions by postponing sample-conservation procedures. A blind test of the identification key obtained a nearly 70% overall success rate, which we considered equivalent to or better than the results of some molecular methods (probably due to DNA degradation found in other studies. The jaguar, puma and jaguarundi could be unequivocally discriminated from any other Neotropical felid. On a scale ranging from inadequate to excellent, the key proved poor only for the margay, with only 30% of its hairs successfully identified using this key; and have intermediate success rates for the remaining species, the oncilla, Geoffroy's cat, ocelot and Pampas cat, were intermediate. Complementary information about the known distributions of felid populations may be necessary to substantially improve the results obtained with the key. Our own molecular results were even better, since all blind-tested samples were correctly identified. Part of these identifications were made from samples kept in suboptimal

  1. Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy

    Science.gov (United States)

    2017-01-01

    Background Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor’s activity for the purposes of quality assurance, safety, and continuing professional development. Objective The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors’ professional performance in the United Kingdom. Methods We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians’ colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Results Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to “popular” (recall=.97), “innovator” (recall=.98), and “respected” (recall=.87) codes and was lower for the “interpersonal” (recall=.80) and “professional” (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as “respected,” “professional,” and “interpersonal” related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P.05). Conclusions Machine learning algorithms can classify open-text feedback

  2. Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy.

    Science.gov (United States)

    Gibbons, Chris; Richards, Suzanne; Valderas, Jose Maria; Campbell, John

    2017-03-15

    Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor's activity for the purposes of quality assurance, safety, and continuing professional development. The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors' professional performance in the United Kingdom. We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians' colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to "popular" (recall=.97), "innovator" (recall=.98), and "respected" (recall=.87) codes and was lower for the "interpersonal" (recall=.80) and "professional" (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as "respected," "professional," and "interpersonal" related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P.05). Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high

  3. Biomarker Analysis of Samples Visually Identified as Microbial in the Eocene Green River Formation: An Analogue for Mars.

    Science.gov (United States)

    Olcott Marshall, Alison; Cestari, Nicholas A

    2015-09-01

    One of the major exploration targets for current and future Mars missions are lithofacies suggestive of biotic activity. Although such lithofacies are not confirmation of biotic activity, they provide a way to identify samples for further analyses. To test the efficacy of this approach, we identified carbonate samples from the Eocene Green River Formation as "microbial" or "non-microbial" based on the macroscale morphology of their laminations. These samples were then crushed and analyzed by gas chromatography/mass spectroscopy (GC/MS) to determine their lipid biomarker composition. GC/MS analysis revealed that carbonates visually identified as "microbial" contained a higher concentration of more diverse biomarkers than those identified as "non-microbial," suggesting that this could be a viable detection strategy for selecting samples for further analysis or caching on Mars.

  4. A predictive toxicogenomics signature to classify genotoxic versus non-genotoxic chemicals in human TK6 cells

    Directory of Open Access Journals (Sweden)

    Andrew Williams

    2015-12-01

    Full Text Available Genotoxicity testing is a critical component of chemical assessment. The use of integrated approaches in genetic toxicology, including the incorporation of gene expression data to determine the DNA damage response pathways involved in response, is becoming more common. In companion papers previously published in Environmental and Molecular Mutagenesis, Li et al. (2015 [6] developed a dose optimization protocol that was based on evaluating expression changes in several well-characterized stress-response genes using quantitative real-time PCR in human lymphoblastoid TK6 cells in culture. This optimization approach was applied to the analysis of TK6 cells exposed to one of 14 genotoxic or 14 non-genotoxic agents, with sampling 4 h post-exposure. Microarray-based transcriptomic analyses were then used to develop a classifier for genotoxicity using the nearest shrunken centroids method. A panel of 65 genes was identified that could accurately classify toxicants as genotoxic or non-genotoxic. In Buick et al. (2015 [1], the utility of the biomarker for chemicals that require metabolic activation was evaluated. In this study, TK6 cells were exposed to increasing doses of four chemicals (two genotoxic that require metabolic activation and two non-genotoxic chemicals in the presence of rat liver S9 to demonstrate that S9 does not impair the ability to classify genotoxicity using this genomic biomarker in TK6cells.

  5. 15 CFR 4.8 - Classified Information.

    Science.gov (United States)

    2010-01-01

    ... 15 Commerce and Foreign Trade 1 2010-01-01 2010-01-01 false Classified Information. 4.8 Section 4... INFORMATION Freedom of Information Act § 4.8 Classified Information. In processing a request for information..., the information shall be reviewed to determine whether it should remain classified. Ordinarily the...

  6. Hybrid Radar Emitter Recognition Based on Rough k-Means Classifier and Relevance Vector Machine

    Science.gov (United States)

    Yang, Zhutian; Wu, Zhilu; Yin, Zhendong; Quan, Taifan; Sun, Hongjian

    2013-01-01

    Due to the increasing complexity of electromagnetic signals, there exists a significant challenge for recognizing radar emitter signals. In this paper, a hybrid recognition approach is presented that classifies radar emitter signals by exploiting the different separability of samples. The proposed approach comprises two steps, namely the primary signal recognition and the advanced signal recognition. In the former step, a novel rough k-means classifier, which comprises three regions, i.e., certain area, rough area and uncertain area, is proposed to cluster the samples of radar emitter signals. In the latter step, the samples within the rough boundary are used to train the relevance vector machine (RVM). Then RVM is used to recognize the samples in the uncertain area; therefore, the classification accuracy is improved. Simulation results show that, for recognizing radar emitter signals, the proposed hybrid recognition approach is more accurate, and presents lower computational complexity than traditional approaches. PMID:23344380

  7. Treatment response in psychotic patients classified according to social and clinical needs, drug side effects, and previous treatment; a method to identify functional remission

    DEFF Research Database (Denmark)

    Alenius, Malin; Hammarlund-Udenaes, Margareta; Honoré, Per Gustaf Hartvig

    2009-01-01

    , fewer psychotic symptoms, and higher rate of workers than those with the worst treatment outcome. CONCLUSION: In the evaluation, CANSEPT showed validity in discriminating the patients of interest and was well tolerated by the patients. CANSEPT could secure inclusion of correct patients in the clinic......BACKGROUND: Various approaches have been made over the years to classify psychotic patients according to inadequate treatment response, using terms such as treatment resistant or treatment refractory. Existing classifications have been criticized for overestimating positive symptoms......; underestimating residual symptoms, negative symptoms, and side effects; or being to open for individual interpretation. The aim of this study was to present and evaluate a new method of classification according to treatment response and, thus, to identify patients in functional remission. METHOD: A naturalistic...

  8. Planning schistosomiasis control: investigation of alternative sampling strategies for Schistosoma mansoni to target mass drug administration of praziquantel in East Africa.

    Science.gov (United States)

    Sturrock, Hugh J W; Gething, Pete W; Ashton, Ruth A; Kolaczinski, Jan H; Kabatereine, Narcis B; Brooker, Simon

    2011-09-01

    In schistosomiasis control, there is a need to geographically target treatment to populations at high risk of morbidity. This paper evaluates alternative sampling strategies for surveys of Schistosoma mansoni to target mass drug administration in Kenya and Ethiopia. Two main designs are considered: lot quality assurance sampling (LQAS) of children from all schools; and a geostatistical design that samples a subset of schools and uses semi-variogram analysis and spatial interpolation to predict prevalence in the remaining unsurveyed schools. Computerized simulations are used to investigate the performance of sampling strategies in correctly classifying schools according to treatment needs and their cost-effectiveness in identifying high prevalence schools. LQAS performs better than geostatistical sampling in correctly classifying schools, but at a cost with a higher cost per high prevalence school correctly classified. It is suggested that the optimal surveying strategy for S. mansoni needs to take into account the goals of the control programme and the financial and drug resources available.

  9. Fingerprint prediction using classifier ensembles

    CSIR Research Space (South Africa)

    Molale, P

    2011-11-01

    Full Text Available ); logistic discrimination (LgD), k-nearest neighbour (k-NN), artificial neural network (ANN), association rules (AR) decision tree (DT), naive Bayes classifier (NBC) and the support vector machine (SVM). The performance of several multiple classifier systems...

  10. Defining and Classifying Interest Groups

    DEFF Research Database (Denmark)

    Baroni, Laura; Carroll, Brendan; Chalmers, Adam

    2014-01-01

    The interest group concept is defined in many different ways in the existing literature and a range of different classification schemes are employed. This complicates comparisons between different studies and their findings. One of the important tasks faced by interest group scholars engaged...... in large-N studies is therefore to define the concept of an interest group and to determine which classification scheme to use for different group types. After reviewing the existing literature, this article sets out to compare different approaches to defining and classifying interest groups with a sample...... in the organizational attributes of specific interest group types. As expected, our comparison of coding schemes reveals a closer link between group attributes and group type in narrower classification schemes based on group organizational characteristics than those based on a behavioral definition of lobbying....

  11. Classifier Directed Data Hybridization for Geographic Sample Supervised Segment Generation

    Directory of Open Access Journals (Sweden)

    Christoff Fourie

    2014-11-01

    Full Text Available Quality segment generation is a well-known challenge and research objective within Geographic Object-based Image Analysis (GEOBIA. Although methodological avenues within GEOBIA are diverse, segmentation commonly plays a central role in most approaches, influencing and being influenced by surrounding processes. A general approach using supervised quality measures, specifically user provided reference segments, suggest casting the parameters of a given segmentation algorithm as a multidimensional search problem. In such a sample supervised segment generation approach, spatial metrics observing the user provided reference segments may drive the search process. The search is commonly performed by metaheuristics. A novel sample supervised segment generation approach is presented in this work, where the spectral content of provided reference segments is queried. A one-class classification process using spectral information from inside the provided reference segments is used to generate a probability image, which in turn is employed to direct a hybridization of the original input imagery. Segmentation is performed on such a hybrid image. These processes are adjustable, interdependent and form a part of the search problem. Results are presented detailing the performances of four method variants compared to the generic sample supervised segment generation approach, under various conditions in terms of resultant segment quality, required computing time and search process characteristics. Multiple metrics, metaheuristics and segmentation algorithms are tested with this approach. Using the spectral data contained within user provided reference segments to tailor the output generally improves the results in the investigated problem contexts, but at the expense of additional required computing time.

  12. An integrated multi-label classifier with chemical-chemical interactions for prediction of chemical toxicity effects.

    Science.gov (United States)

    Liu, Tao; Chen, Lei; Pan, Xiaoyong

    2018-05-31

    Chemical toxicity effect is one of the major reasons for declining candidate drugs. Detecting the toxicity effects of all chemicals can accelerate the procedures of drug discovery. However, it is time-consuming and expensive to identify the toxicity effects of a given chemical through traditional experiments. Designing quick, reliable and non-animal-involved computational methods is an alternative way. In this study, a novel integrated multi-label classifier was proposed. First, based on five types of chemical-chemical interactions retrieved from STITCH, each of which is derived from one aspect of chemicals, five individual classifiers were built. Then, several integrated classifiers were built by integrating some or all individual classifiers. By testing the integrated classifiers on a dataset with chemicals and their toxicity effects in Accelrys Toxicity database and non-toxic chemicals with their performance evaluated by jackknife test, an optimal integrated classifier was selected as the proposed classifier, which provided quite high prediction accuracies and wide applications. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  13. Application of classification-tree methods to identify nitrate sources in ground water

    Science.gov (United States)

    Spruill, T.B.; Showers, W.J.; Howe, S.S.

    2002-01-01

    A study was conducted to determine if nitrate sources in ground water (fertilizer on crops, fertilizer on golf courses, irrigation spray from hog (Sus scrofa) wastes, and leachate from poultry litter and septic systems) could be classified with 80% or greater success. Two statistical classification-tree models were devised from 48 water samples containing nitrate from five source categories. Model I was constructed by evaluating 32 variables and selecting four primary predictor variables (??15N, nitrate to ammonia ratio, sodium to potassium ratio, and zinc) to identify nitrate sources. A ??15N value of nitrate plus potassium 18.2 indicated inorganic or soil organic N. A nitrate to ammonia ratio 575 indicated nitrate from golf courses. A sodium to potassium ratio 3.2 indicated spray or poultry wastes. A value for zinc 2.8 indicated poultry wastes. Model 2 was devised by using all variables except ??15N. This model also included four variables (sodium plus potassium, nitrate to ammonia ratio, calcium to magnesium ratio, and sodium to potassium ratio) to distinguish categories. Both models were able to distinguish all five source categories with better than 80% overall success and with 71 to 100% success in individual categories using the learning samples. Seventeen water samples that were not used in model development were tested using Model 2 for three categories, and all were correctly classified. Classification-tree models show great potential in identifying sources of contamination and variables important in the source-identification process.

  14. Use of Lot Quality Assurance Sampling to Ascertain Levels of Drug Resistant Tuberculosis in Western Kenya.

    Directory of Open Access Journals (Sweden)

    Julia Jezmir

    Full Text Available To classify the prevalence of multi-drug resistant tuberculosis (MDR-TB in two different geographic settings in western Kenya using the Lot Quality Assurance Sampling (LQAS methodology.The prevalence of drug resistance was classified among treatment-naïve smear positive TB patients in two settings, one rural and one urban. These regions were classified as having high or low prevalence of MDR-TB according to a static, two-way LQAS sampling plan selected to classify high resistance regions at greater than 5% resistance and low resistance regions at less than 1% resistance.This study classified both the urban and rural settings as having low levels of TB drug resistance. Out of the 105 patients screened in each setting, two patients were diagnosed with MDR-TB in the urban setting and one patient was diagnosed with MDR-TB in the rural setting. An additional 27 patients were diagnosed with a variety of mono- and poly- resistant strains.Further drug resistance surveillance using LQAS may help identify the levels and geographical distribution of drug resistance in Kenya and may have applications in other countries in the African Region facing similar resource constraints.

  15. Use of Lot Quality Assurance Sampling to Ascertain Levels of Drug Resistant Tuberculosis in Western Kenya.

    Science.gov (United States)

    Jezmir, Julia; Cohen, Ted; Zignol, Matteo; Nyakan, Edwin; Hedt-Gauthier, Bethany L; Gardner, Adrian; Kamle, Lydia; Injera, Wilfred; Carter, E Jane

    2016-01-01

    To classify the prevalence of multi-drug resistant tuberculosis (MDR-TB) in two different geographic settings in western Kenya using the Lot Quality Assurance Sampling (LQAS) methodology. The prevalence of drug resistance was classified among treatment-naïve smear positive TB patients in two settings, one rural and one urban. These regions were classified as having high or low prevalence of MDR-TB according to a static, two-way LQAS sampling plan selected to classify high resistance regions at greater than 5% resistance and low resistance regions at less than 1% resistance. This study classified both the urban and rural settings as having low levels of TB drug resistance. Out of the 105 patients screened in each setting, two patients were diagnosed with MDR-TB in the urban setting and one patient was diagnosed with MDR-TB in the rural setting. An additional 27 patients were diagnosed with a variety of mono- and poly- resistant strains. Further drug resistance surveillance using LQAS may help identify the levels and geographical distribution of drug resistance in Kenya and may have applications in other countries in the African Region facing similar resource constraints.

  16. Identifying Effectiveness Criteria for Internet Payment Systems.

    Science.gov (United States)

    Shon, Tae-Hwan; Swatman, Paula M. C.

    1998-01-01

    Examines Internet payment systems (IPS): third-party, card, secure Web server, electronic token, financial electronic data interchange (EDI), and micropayment based. Reports the results of a Delphi survey of experts identifying and classifying IPS effectiveness criteria and classifying types of IPS providers. Includes the survey invitation letter…

  17. Interface Prostheses With Classifier-Feedback-Based User Training.

    Science.gov (United States)

    Fang, Yinfeng; Zhou, Dalin; Li, Kairu; Liu, Honghai

    2017-11-01

    It is evident that user training significantly affects performance of pattern-recognition-based myoelectric prosthetic device control. Despite plausible classification accuracy on offline datasets, online accuracy usually suffers from the changes in physiological conditions and electrode displacement. The user ability in generating consistent electromyographic (EMG) patterns can be enhanced via proper user training strategies in order to improve online performance. This study proposes a clustering-feedback strategy that provides real-time feedback to users by means of a visualized online EMG signal input as well as the centroids of the training samples, whose dimensionality is reduced to minimal number by dimension reduction. Clustering feedback provides a criterion that guides users to adjust motion gestures and muscle contraction forces intentionally. The experiment results have demonstrated that hand motion recognition accuracy increases steadily along the progress of the clustering-feedback-based user training, while conventional classifier-feedback methods, i.e., label feedback, hardly achieve any improvement. The result concludes that the use of proper classifier feedback can accelerate the process of user training, and implies prosperous future for the amputees with limited or no experience in pattern-recognition-based prosthetic device manipulation.It is evident that user training significantly affects performance of pattern-recognition-based myoelectric prosthetic device control. Despite plausible classification accuracy on offline datasets, online accuracy usually suffers from the changes in physiological conditions and electrode displacement. The user ability in generating consistent electromyographic (EMG) patterns can be enhanced via proper user training strategies in order to improve online performance. This study proposes a clustering-feedback strategy that provides real-time feedback to users by means of a visualized online EMG signal input as well

  18. Classifying Sluice Occurrences in Dialogue

    DEFF Research Database (Denmark)

    Baird, Austin; Hamza, Anissa; Hardt, Daniel

    2018-01-01

    perform manual annotation with acceptable inter-coder agreement. We build classifier models with Decision Trees and Naive Bayes, with accuracy of 67%. We deploy a classifier to automatically classify sluice occurrences in OpenSubtitles, resulting in a corpus with 1.7 million occurrences. This will support....... Despite this, the corpus can be of great use in research on sluicing and development of systems, and we are making the corpus freely available on request. Furthermore, we are in the process of improving the accuracy of sluice identification and annotation for the purpose of created a subsequent version...

  19. 48 CFR 952.223-76 - Conditional payment of fee or profit-safeguarding restricted data and other classified...

    Science.gov (United States)

    2010-10-01

    ... disclosure of Top Secret Restricted Data or other information classified as Top Secret, any classification... result in the loss, compromise, or unauthorized disclosure of Top Secret Restricted Data, or other information classified as Top Secret, any classification level of information in a SAP, information identified...

  20. Classifying smoking urges via machine learning.

    Science.gov (United States)

    Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

    2016-12-01

    Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights

  1. Finding needles in a haystack: a methodology for identifying and sampling community-based youth smoking cessation programs.

    Science.gov (United States)

    Emery, Sherry; Lee, Jungwha; Curry, Susan J; Johnson, Tim; Sporer, Amy K; Mermelstein, Robin; Flay, Brian; Warnecke, Richard

    2010-02-01

    Surveys of community-based programs are difficult to conduct when there is virtually no information about the number or locations of the programs of interest. This article describes the methodology used by the Helping Young Smokers Quit (HYSQ) initiative to identify and profile community-based youth smoking cessation programs in the absence of a defined sample frame. We developed a two-stage sampling design, with counties as the first-stage probability sampling units. The second stage used snowball sampling to saturation, to identify individuals who administered youth smoking cessation programs across three economic sectors in each county. Multivariate analyses modeled the relationship between program screening, eligibility, and response rates and economic sector and stratification criteria. Cumulative logit models analyzed the relationship between the number of contacts in a county and the number of programs screened, eligible, or profiled in a county. The snowball process yielded 9,983 unique and traceable contacts. Urban and high-income counties yielded significantly more screened program administrators; urban counties produced significantly more eligible programs, but there was no significant association between the county characteristics and program response rate. There is a positive relationship between the number of informants initially located and the number of programs screened, eligible, and profiled in a county. Our strategy to identify youth tobacco cessation programs could be used to create a sample frame for other nonprofit organizations that are difficult to identify due to a lack of existing directories, lists, or other traditional sample frames.

  2. Joint Feature Extraction and Classifier Design for ECG-Based Biometric Recognition.

    Science.gov (United States)

    Gutta, Sandeep; Cheng, Qi

    2016-03-01

    Traditional biometric recognition systems often utilize physiological traits such as fingerprint, face, iris, etc. Recent years have seen a growing interest in electrocardiogram (ECG)-based biometric recognition techniques, especially in the field of clinical medicine. In existing ECG-based biometric recognition methods, feature extraction and classifier design are usually performed separately. In this paper, a multitask learning approach is proposed, in which feature extraction and classifier design are carried out simultaneously. Weights are assigned to the features within the kernel of each task. We decompose the matrix consisting of all the feature weights into sparse and low-rank components. The sparse component determines the features that are relevant to identify each individual, and the low-rank component determines the common feature subspace that is relevant to identify all the subjects. A fast optimization algorithm is developed, which requires only the first-order information. The performance of the proposed approach is demonstrated through experiments using the MIT-BIH Normal Sinus Rhythm database.

  3. Classification of bacterial samples as negative or positive for a UTI and antibiogram using surface enhanced Raman spectroscopy

    Science.gov (United States)

    Kastanos, Evdokia; Hadjigeorgiou, Katerina; Kyriakides, Alexandros; Pitris, Costas

    2011-03-01

    Urinary tract infection (UTI) diagnosis requires an overnight culture to identify a sample as positive or negative for a UTI. Additional cultures are required to identify the pathogen responsible for the infection and to test its sensitivity to antibiotics. A rise in ineffective treatments, chronic infections, rising health care costs and antibiotic resistance are some of the consequences of this prolonged waiting period of UTI diagnosis. In this work, Surface Enhanced Raman Spectroscopy (SERS) is used for classifying bacterial samples as positive or negative for UTI. SERS spectra of serial dilutions of E.coli bacteria, isolated from a urine culture, were classified as positive (105-108 cells/ml) or negative (103-104 cells/ml) for UTI after mixing samples with gold nanoparticles. A leave-one-out cross validation was performed using the first two principal components resulting in the correct classification of 82% of all samples. Sensitivity of classification was 88% and specificity was 67%. Antibiotic sensitivity testing was also done using SERS spectra of various species of gram negative bacteria collected 4 hours after exposure to antibiotics. Spectral analysis revealed clear separation between the spectra of samples exposed to ciprofloxacin (sensitive) and amoxicillin (resistant). This study can become the basis for identifying urine samples as positive or negative for a UTI and determining their antibiogram without requiring an overnight culture.

  4. Classifiers based on optimal decision rules

    KAUST Repository

    Amin, Talha

    2013-11-25

    Based on dynamic programming approach we design algorithms for sequential optimization of exact and approximate decision rules relative to the length and coverage [3, 4]. In this paper, we use optimal rules to construct classifiers, and study two questions: (i) which rules are better from the point of view of classification-exact or approximate; and (ii) which order of optimization gives better results of classifier work: length, length+coverage, coverage, or coverage+length. Experimental results show that, on average, classifiers based on exact rules are better than classifiers based on approximate rules, and sequential optimization (length+coverage or coverage+length) is better than the ordinary optimization (length or coverage).

  5. Classifiers based on optimal decision rules

    KAUST Repository

    Amin, Talha M.; Chikalov, Igor; Moshkov, Mikhail; Zielosko, Beata

    2013-01-01

    Based on dynamic programming approach we design algorithms for sequential optimization of exact and approximate decision rules relative to the length and coverage [3, 4]. In this paper, we use optimal rules to construct classifiers, and study two questions: (i) which rules are better from the point of view of classification-exact or approximate; and (ii) which order of optimization gives better results of classifier work: length, length+coverage, coverage, or coverage+length. Experimental results show that, on average, classifiers based on exact rules are better than classifiers based on approximate rules, and sequential optimization (length+coverage or coverage+length) is better than the ordinary optimization (length or coverage).

  6. A support vector machine (SVM) based voltage stability classifier

    Energy Technology Data Exchange (ETDEWEB)

    Dosano, R.D.; Song, H. [Kunsan National Univ., Kunsan, Jeonbuk (Korea, Republic of); Lee, B. [Korea Univ., Seoul (Korea, Republic of)

    2007-07-01

    Power system stability has become even more complex and critical with the advent of deregulated energy markets and the growing desire to completely employ existing transmission and infrastructure. The economic pressure on electricity markets forces the operation of power systems and components to their limit of capacity and performance. System conditions can be more exposed to instability due to greater uncertainty in day to day system operations and increase in the number of potential components for system disturbances potentially resulting in voltage stability. This paper proposed a support vector machine (SVM) based power system voltage stability classifier using local measurements of voltage and active power of load. It described the procedure for fast classification of long-term voltage stability using the SVM algorithm. The application of the SVM based voltage stability classifier was presented with reference to the choice of input parameters; input data preconditioning; moving window for feature vector; determination of learning samples; and other considerations in SVM applications. The paper presented a case study with numerical examples of an 11-bus test system. The test results for the feasibility study demonstrated that the classifier could offer an excellent performance in classification with time-series measurements in terms of long-term voltage stability. 9 refs., 14 figs.

  7. A cascade of classifiers for extracting medication information from discharge summaries

    Directory of Open Access Journals (Sweden)

    Halgrim Scott

    2011-07-01

    Full Text Available Abstract Background Extracting medication information from clinical records has many potential applications, and recently published research, systems, and competitions reflect an interest therein. Much of the early extraction work involved rules and lexicons, but more recently machine learning has been applied to the task. Methods We present a hybrid system consisting of two parts. The first part, field detection, uses a cascade of statistical classifiers to identify medication-related named entities. The second part uses simple heuristics to link those entities into medication events. Results The system achieved performance that is comparable to other approaches to the same task. This performance is further improved by adding features that reference external medication name lists. Conclusions This study demonstrates that our hybrid approach outperforms purely statistical or rule-based systems. The study also shows that a cascade of classifiers works better than a single classifier in extracting medication information. The system is available as is upon request from the first author.

  8. Classifying indicators of quality: a collaboration between Dutch and English regulators.

    Science.gov (United States)

    Mears, Alex; Vesseur, Jan; Hamblin, Richard; Long, Paul; Den Ouden, Lya

    2011-12-01

    Many approaches to measuring quality in healthcare exist, generally employing indicators or metrics. While there are important differences, most of these approaches share three key areas of measurement: safety, effectiveness and patient experience. The European Partnership for Supervisory Organisations in Health Services and Social Care (EPSO) exists as a working group and discussion forum for European regulators. This group undertook to identify a common framework within which European approaches to indicators could be compared. A framework was developed to classify indicators, using four sets of criteria: conceptualization of quality, Donabedian definition (structure, process, outcome), data type (derivable, collectable from routine sources, special collections, samples) and data use (judgement (singular or part of framework) benchmarking, risk assessment). Indicators from English and Dutch hospital measurement programmes were put into the framework, showing areas of agreement and levels of comparability. In the first instance, results are only illustrative. The EPSO has been a powerful driver for undertaking cross-European research, and this project is the first of many to take advantage of the access to international expertize. It has shown that through development of a framework that deconstructs national indicators, commonalities can be identified. Future work will attempt to incorporate other nations' indicators, and attempt cross-national comparison.

  9. Discovery and validation of a colorectal cancer classifier in a new blood test with improved performance for high-risk subjects

    DEFF Research Database (Denmark)

    Croner, Lisa J; Dillon, Roslyn; Kao, Athit

    2017-01-01

    BACKGROUND: The aim was to improve upon an existing blood-based colorectal cancer (CRC) test directed to high-risk symptomatic patients, by developing a new CRC classifier to be used with a new test embodiment. The new test uses a robust assay format-electrochemiluminescence immunoassays......, the indeterminate rate of the new panel was 23.2%, sensitivity/specificity was 0.80/0.83, PPV was 36.5%, and NPV was 97.1%. CONCLUSIONS: The validated classifier serves as the basis of a new blood-based CRC test for symptomatic patients. The improved performance, resulting from robust concentration measures across......-to quantify protein concentrations. The aim was achieved by building and validating a CRC classifier using concentration measures from a large sample set representing a true intent-to-test (ITT) symptomatic population. METHODS: 4435 patient samples were drawn from the Endoscopy II sample set. Samples were...

  10. 76 FR 34761 - Classified National Security Information

    Science.gov (United States)

    2011-06-14

    ... MARINE MAMMAL COMMISSION Classified National Security Information [Directive 11-01] AGENCY: Marine... Commission's (MMC) policy on classified information, as directed by Information Security Oversight Office... of Executive Order 13526, ``Classified National Security Information,'' and 32 CFR part 2001...

  11. Identification of flooded area from satellite images using Hybrid Kohonen Fuzzy C-Means sigma classifier

    Directory of Open Access Journals (Sweden)

    Krishna Kant Singh

    2017-06-01

    Full Text Available A novel neuro fuzzy classifier Hybrid Kohonen Fuzzy C-Means-σ (HKFCM-σ is proposed in this paper. The proposed classifier is a hybridization of Kohonen Clustering Network (KCN with FCM-σ clustering algorithm. The network architecture of HKFCM-σ is similar to simple KCN network having only two layers, i.e., input and output layer. However, the selection of winner neuron is done based on FCM-σ algorithm. Thus, embedding the features of both, a neural network and a fuzzy clustering algorithm in the classifier. This hybridization results in a more efficient, less complex and faster classifier for classifying satellite images. HKFCM-σ is used to identify the flooding that occurred in Kashmir area in September 2014. The HKFCM-σ classifier is applied on pre and post flooding Landsat 8 OLI images of Kashmir to detect the areas that were flooded due to the heavy rainfalls of September, 2014. The classifier is trained using the mean values of the various spectral indices like NDVI, NDWI, NDBI and first component of Principal Component Analysis. The error matrix was computed to test the performance of the method. The method yields high producer’s accuracy, consumer’s accuracy and kappa coefficient value indicating that the proposed classifier is highly effective and efficient.

  12. Error minimizing algorithms for nearest eighbor classifiers

    Energy Technology Data Exchange (ETDEWEB)

    Porter, Reid B [Los Alamos National Laboratory; Hush, Don [Los Alamos National Laboratory; Zimmer, G. Beate [TEXAS A& M

    2011-01-03

    Stack Filters define a large class of discrete nonlinear filter first introd uced in image and signal processing for noise removal. In recent years we have suggested their application to classification problems, and investigated their relationship to other types of discrete classifiers such as Decision Trees. In this paper we focus on a continuous domain version of Stack Filter Classifiers which we call Ordered Hypothesis Machines (OHM), and investigate their relationship to Nearest Neighbor classifiers. We show that OHM classifiers provide a novel framework in which to train Nearest Neighbor type classifiers by minimizing empirical error based loss functions. We use the framework to investigate a new cost sensitive loss function that allows us to train a Nearest Neighbor type classifier for low false alarm rate applications. We report results on both synthetic data and real-world image data.

  13. Aggregation Operator Based Fuzzy Pattern Classifier Design

    DEFF Research Database (Denmark)

    Mönks, Uwe; Larsen, Henrik Legind; Lohweg, Volker

    2009-01-01

    This paper presents a novel modular fuzzy pattern classifier design framework for intelligent automation systems, developed on the base of the established Modified Fuzzy Pattern Classifier (MFPC) and allows designing novel classifier models which are hardware-efficiently implementable....... The performances of novel classifiers using substitutes of MFPC's geometric mean aggregator are benchmarked in the scope of an image processing application against the MFPC to reveal classification improvement potentials for obtaining higher classification rates....

  14. Patients on weaning trials classified with support vector machines

    International Nuclear Information System (INIS)

    Garde, Ainara; Caminal, Pere; Giraldo, Beatriz F; Schroeder, Rico; Voss, Andreas; Benito, Salvador

    2010-01-01

    The process of discontinuing mechanical ventilation is called weaning and is one of the most challenging problems in intensive care. An unnecessary delay in the discontinuation process and an early weaning trial are undesirable. This study aims to characterize the respiratory pattern through features that permit the identification of patients' conditions in weaning trials. Three groups of patients have been considered: 94 patients with successful weaning trials, who could maintain spontaneous breathing after 48 h (GSucc); 39 patients who failed the weaning trial (GFail) and 21 patients who had successful weaning trials, but required reintubation in less than 48 h (GRein). Patients are characterized by their cardiorespiratory interactions, which are described by joint symbolic dynamics (JSD) applied to the cardiac interbeat and breath durations. The most discriminating features in the classification of the different groups of patients (GSucc, GFail and GRein) are identified by support vector machines (SVMs). The SVM-based feature selection algorithm has an accuracy of 81% in classifying GSucc versus the rest of the patients, 83% in classifying GRein versus GSucc patients and 81% in classifying GRein versus the rest of the patients. Moreover, a good balance between sensitivity and specificity is achieved in all classifications

  15. Clean-up progress at the SNL/NM Classified Waste Landfill

    International Nuclear Information System (INIS)

    Slavin, P.J.; Galloway, R.B.

    1999-01-01

    The Sandia National Laboratories/New Mexico (SNL/NM)Environmental Restoration Project is currently excavating the Classified Waste Landfill in Technical Area II, a disposal area for weapon components for approximately 40 years until it closed in 1987. Many different types of classified parts were disposed in unlined trenches and pits throughout the course of the landfill's history. A percentage of the parts contain explosives and/or radioactive components or contamination. The excavation has progressed backward chronologically from the last trenches filled through to the earlier pits. Excavation commenced in March 1998, and approximately 75 percent of the site (as defined by geophysical anomalies) has been completed as of November 1999. The material excavated consists primarily of classified weapon assemblies and related components, so disposition must include demilitarization and sanitization. This has resulted in substantial waste minimization and cost avoidance for the project as upwards of 90 percent of the classified materials are being demilitarized and recycled. The project is using field screening and lab analysis in conjunction with preliminary and in-process risk assessments to characterize soil and make waste determinations in a timely a fashion as possible. Challenges in waste management have prompted the adoption of innovative solutions. The hand-picked crew (both management and field staff) and the ability to quickly adapt to changing conditions has ensured the success of the project. The current schedule is to complete excavation in July 2000, with follow-on verification sampling, demilitarization, and waste management activities following

  16. Using multivariate analyses and GIS to identify pollutants and their spatial patterns in urban soils in Galway, Ireland

    International Nuclear Information System (INIS)

    Zhang Chaosheng

    2006-01-01

    Galway is a small but rapidly growing tourism city in western Ireland. To evaluate its environmental quality, a total of 166 surface soil samples (0-10 cm depth) were collected from parks and grasslands at the density of 1 sample per 0.25 km 2 at the end of 2004. All samples were analysed using ICP-AES for the near-total concentrations of 26 chemical elements. Multivariate statistics and GIS techniques were applied to classify the elements and to identify elements influenced by human activities. Cluster analysis (Canada) and principal component analysis (PCA) classified the elements into two groups: the first group predominantly derived from natural sources, the second being influenced by human activities. GIS mapping is a powerful tool in identifying the possible sources of pollutants. Relatively high concentrations of Cu, Pb and Zn were found in the city centre, old residential areas, and along major traffic routes, showing significant effects of traffic pollution. The element As is enriched in soils of the old built-up areas, which can be attributed to coal and peat combustion for home heating. Such significant spatial patterns of pollutants displayed by urban soils may imply potential health threat to residents of the contaminated areas of the city. - Multivariate statistics and GIS are useful tools to identify pollutants in urban soils

  17. Composite Classifiers for Automatic Target Recognition

    National Research Council Canada - National Science Library

    Wang, Lin-Cheng

    1998-01-01

    ...) using forward-looking infrared (FLIR) imagery. Two existing classifiers, one based on learning vector quantization and the other on modular neural networks, are used as the building blocks for our composite classifiers...

  18. 48 CFR 952.204-76 - Conditional payment of fee or profit-safeguarding restricted data and other classified information.

    Science.gov (United States)

    2010-10-01

    ... disclosure of Top Secret Restricted Data or other information classified as Top Secret, any classification... result in the loss, compromise, or unauthorized disclosure of Top Secret Restricted Data, or other information classified as Top Secret, any classification level of information in a SAP, information identified...

  19. Localization and Recognition of Dynamic Hand Gestures Based on Hierarchy of Manifold Classifiers

    Science.gov (United States)

    Favorskaya, M.; Nosov, A.; Popov, A.

    2015-05-01

    Generally, the dynamic hand gestures are captured in continuous video sequences, and a gesture recognition system ought to extract the robust features automatically. This task involves the highly challenging spatio-temporal variations of dynamic hand gestures. The proposed method is based on two-level manifold classifiers including the trajectory classifiers in any time instants and the posture classifiers of sub-gestures in selected time instants. The trajectory classifiers contain skin detector, normalized skeleton representation of one or two hands, and motion history representing by motion vectors normalized through predetermined directions (8 and 16 in our case). Each dynamic gesture is separated into a set of sub-gestures in order to predict a trajectory and remove those samples of gestures, which do not satisfy to current trajectory. The posture classifiers involve the normalized skeleton representation of palm and fingers and relative finger positions using fingertips. The min-max criterion is used for trajectory recognition, and the decision tree technique was applied for posture recognition of sub-gestures. For experiments, a dataset "Multi-modal Gesture Recognition Challenge 2013: Dataset and Results" including 393 dynamic hand-gestures was chosen. The proposed method yielded 84-91% recognition accuracy, in average, for restricted set of dynamic gestures.

  20. LOCALIZATION AND RECOGNITION OF DYNAMIC HAND GESTURES BASED ON HIERARCHY OF MANIFOLD CLASSIFIERS

    Directory of Open Access Journals (Sweden)

    M. Favorskaya

    2015-05-01

    Full Text Available Generally, the dynamic hand gestures are captured in continuous video sequences, and a gesture recognition system ought to extract the robust features automatically. This task involves the highly challenging spatio-temporal variations of dynamic hand gestures. The proposed method is based on two-level manifold classifiers including the trajectory classifiers in any time instants and the posture classifiers of sub-gestures in selected time instants. The trajectory classifiers contain skin detector, normalized skeleton representation of one or two hands, and motion history representing by motion vectors normalized through predetermined directions (8 and 16 in our case. Each dynamic gesture is separated into a set of sub-gestures in order to predict a trajectory and remove those samples of gestures, which do not satisfy to current trajectory. The posture classifiers involve the normalized skeleton representation of palm and fingers and relative finger positions using fingertips. The min-max criterion is used for trajectory recognition, and the decision tree technique was applied for posture recognition of sub-gestures. For experiments, a dataset “Multi-modal Gesture Recognition Challenge 2013: Dataset and Results” including 393 dynamic hand-gestures was chosen. The proposed method yielded 84–91% recognition accuracy, in average, for restricted set of dynamic gestures.

  1. Classifying Internet Pathological Users: Their Usage, Internet Sensation Seeking, and Perceptions.

    Science.gov (United States)

    Lin, Sunny S. J.

    A study was conducted to identify pathological Internet users and to reveal their psychological features and problematic usage patterns. One thousand and fifty Taiwanese undergraduates were selected. An Internet Addiction Scale was adopted to classify 648 students into 4 clusters. The 146 users in the 4th cluster, who reported significantly higher…

  2. Super resolution reconstruction of infrared images based on classified dictionary learning

    Science.gov (United States)

    Liu, Fei; Han, Pingli; Wang, Yi; Li, Xuan; Bai, Lu; Shao, Xiaopeng

    2018-05-01

    Infrared images always suffer from low-resolution problems resulting from limitations of imaging devices. An economical approach to combat this problem involves reconstructing high-resolution images by reasonable methods without updating devices. Inspired by compressed sensing theory, this study presents and demonstrates a Classified Dictionary Learning method to reconstruct high-resolution infrared images. It classifies features of the samples into several reasonable clusters and trained a dictionary pair for each cluster. The optimal pair of dictionaries is chosen for each image reconstruction and therefore, more satisfactory results is achieved without the increase in computational complexity and time cost. Experiments and results demonstrated that it is a viable method for infrared images reconstruction since it improves image resolution and recovers detailed information of targets.

  3. Distance and Density Similarity Based Enhanced k-NN Classifier for Improving Fault Diagnosis Performance of Bearings

    Directory of Open Access Journals (Sweden)

    Sharif Uddin

    2016-01-01

    Full Text Available An enhanced k-nearest neighbor (k-NN classification algorithm is presented, which uses a density based similarity measure in addition to a distance based similarity measure to improve the diagnostic performance in bearing fault diagnosis. Due to its use of distance based similarity measure alone, the classification accuracy of traditional k-NN deteriorates in case of overlapping samples and outliers and is highly susceptible to the neighborhood size, k. This study addresses these limitations by proposing the use of both distance and density based measures of similarity between training and test samples. The proposed k-NN classifier is used to enhance the diagnostic performance of a bearing fault diagnosis scheme, which classifies different fault conditions based upon hybrid feature vectors extracted from acoustic emission (AE signals. Experimental results demonstrate that the proposed scheme, which uses the enhanced k-NN classifier, yields better diagnostic performance and is more robust to variations in the neighborhood size, k.

  4. Use of Unlabeled Samples for Mitigating the Hughes Phenomenon

    Science.gov (United States)

    Landgrebe, David A.; Shahshahani, Behzad M.

    1993-01-01

    The use of unlabeled samples in improving the performance of classifiers is studied. When the number of training samples is fixed and small, additional feature measurements may reduce the performance of a statistical classifier. It is shown that by using unlabeled samples, estimates of the parameters can be improved and therefore this phenomenon may be mitigated. Various methods for using unlabeled samples are reviewed and experimental results are provided.

  5. The Integrative Method Based on the Module-Network for Identifying Driver Genes in Cancer Subtypes

    Directory of Open Access Journals (Sweden)

    Xinguo Lu

    2018-01-01

    Full Text Available With advances in next-generation sequencing(NGS technologies, a large number of multiple types of high-throughput genomics data are available. A great challenge in exploring cancer progression is to identify the driver genes from the variant genes by analyzing and integrating multi-types genomics data. Breast cancer is known as a heterogeneous disease. The identification of subtype-specific driver genes is critical to guide the diagnosis, assessment of prognosis and treatment of breast cancer. We developed an integrated frame based on gene expression profiles and copy number variation (CNV data to identify breast cancer subtype-specific driver genes. In this frame, we employed statistical machine-learning method to select gene subsets and utilized an module-network analysis method to identify potential candidate driver genes. The final subtype-specific driver genes were acquired by paired-wise comparison in subtypes. To validate specificity of the driver genes, the gene expression data of these genes were applied to classify the patient samples with 10-fold cross validation and the enrichment analysis were also conducted on the identified driver genes. The experimental results show that the proposed integrative method can identify the potential driver genes and the classifier with these genes acquired better performance than with genes identified by other methods.

  6. Hybrid Neuro-Fuzzy Classifier Based On Nefclass Model

    Directory of Open Access Journals (Sweden)

    Bogdan Gliwa

    2011-01-01

    Full Text Available The paper presents hybrid neuro-fuzzy classifier, based on NEFCLASS model, which wasmodified. The presented classifier was compared to popular classifiers – neural networks andk-nearest neighbours. Efficiency of modifications in classifier was compared with methodsused in original model NEFCLASS (learning methods. Accuracy of classifier was testedusing 3 datasets from UCI Machine Learning Repository: iris, wine and breast cancer wisconsin.Moreover, influence of ensemble classification methods on classification accuracy waspresented.

  7. Verification of Representative Sampling in RI waste

    International Nuclear Information System (INIS)

    Ahn, Hong Joo; Song, Byung Cheul; Sohn, Se Cheul; Song, Kyu Seok; Jee, Kwang Yong; Choi, Kwang Seop

    2009-01-01

    For evaluating the radionuclide inventories for RI wastes, representative sampling is one of the most important parts in the process of radiochemical assay. Sampling to characterized RI waste conditions typically has been based on judgment or convenience sampling of individual or groups. However, it is difficult to get a sample representatively among the numerous drums. In addition, RI waste drums might be classified into heterogeneous wastes because they have a content of cotton, glass, vinyl, gloves, etc. In order to get the representative samples, the sample to be analyzed must be collected from selected every drum. Considering the expense and time of analysis, however, the number of sample has to be minimized. In this study, RI waste drums were classified by the various conditions of the half-life, surface dose, acceptance date, waste form, generator, etc. A sample for radiochemical assay was obtained through mixing samples of each drum. The sample has to be prepared for radiochemical assay and although the sample should be reasonably uniform, it is rare that a completely homogeneous material is received. Every sample is shredded by a 1 ∼ 2 cm 2 diameter and a representative aliquot taken for the required analysis. For verification of representative sampling, classified every group is tested for evaluation of 'selection of representative drum in a group' and 'representative sampling in a drum'

  8. SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data.

    Science.gov (United States)

    Pirooznia, Mehdi; Deng, Youping

    2006-12-12

    Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1-BRCA2 samples with RBF kernel of SVM. We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at http://mfgn.usm.edu/ebl/svm/.

  9. FT-Raman and chemometric tools for rapid determination of quality parameters in milk powder: Classification of samples for the presence of lactose and fraud detection by addition of maltodextrin.

    Science.gov (United States)

    Rodrigues Júnior, Paulo Henrique; de Sá Oliveira, Kamila; de Almeida, Carlos Eduardo Rocha; De Oliveira, Luiz Fernando Cappa; Stephani, Rodrigo; Pinto, Michele da Silva; de Carvalho, Antônio Fernandes; Perrone, Ítalo Tuler

    2016-04-01

    FT-Raman spectroscopy has been explored as a quick screening method to evaluate the presence of lactose and identify milk powder samples adulterated with maltodextrin (2.5-50% w/w). Raman measurements can easily differentiate samples of milk powder, without the need for sample preparation, while traditional quality control methods, including high performance liquid chromatography, are cumbersome and slow. FT-Raman spectra were obtained from samples of whole lactose and low-lactose milk powder, both without and with addition of maltodextrin. Differences were observed between the spectra involved in identifying samples with low lactose content, as well as adulterated samples. Exploratory data analysis using Raman spectroscopy and multivariate analysis was also developed to classify samples with PCA and PLS-DA. The PLS-DA models obtained allowed to correctly classify all samples. These results demonstrate the utility of FT-Raman spectroscopy in combination with chemometrics to infer about the quality of milk powder. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. An improved early detection method of type-2 diabetes mellitus using multiple classifier system

    KAUST Repository

    Zhu, Jia

    2015-01-01

    The specific causes of complex diseases such as Type-2 Diabetes Mellitus (T2DM) have not yet been identified. Nevertheless, many medical science researchers believe that complex diseases are caused by a combination of genetic, environmental, and lifestyle factors. Detection of such diseases becomes an issue because it is not free from false presumptions and is accompanied by unpredictable effects. Given the greatly increased amount of data gathered in medical databases, data mining has been used widely in recent years to detect and improve the diagnosis of complex diseases. However, past research showed that no single classifier can be considered optimal for all problems. Therefore, in this paper, we focus on employing multiple classifier systems to improve the accuracy of detection for complex diseases, such as T2DM. We proposed a dynamic weighted voting scheme called multiple factors weighted combination for classifiers\\' decision combination. This method considers not only the local and global accuracy but also the diversity among classifiers and localized generalization error of each classifier. We evaluated our method on two real T2DM data sets and other medical data sets. The favorable results indicated that our proposed method significantly outperforms individual classifiers and other fusion methods.

  11. 36 CFR 1256.46 - National security-classified information.

    Science.gov (United States)

    2010-07-01

    ... 36 Parks, Forests, and Public Property 3 2010-07-01 2010-07-01 false National security-classified... Restrictions § 1256.46 National security-classified information. In accordance with 5 U.S.C. 552(b)(1), NARA... properly classified under the provisions of the pertinent Executive Order on Classified National Security...

  12. Classifying the mechanisms of electrochemical shock in ion-intercalation materials

    OpenAIRE

    Woodford, William; Carter, W. Craig; Chiang, Yet-Ming

    2014-01-01

    “Electrochemical shock” – the electrochemical cycling-induced fracture of materials – contributes to impedance growth and performance degradation in ion-intercalation batteries, such as lithium-ion. Using a combination of micromechanical models and acoustic emission experiments, the mechanisms of electrochemical shock are identified, classified, and modeled in targeted model systems with different composition and microstructure. A particular emphasis is placed on mechanical degradation occurr...

  13. Class-specific Error Bounds for Ensemble Classifiers

    Energy Technology Data Exchange (ETDEWEB)

    Prenger, R; Lemmond, T; Varshney, K; Chen, B; Hanley, W

    2009-10-06

    The generalization error, or probability of misclassification, of ensemble classifiers has been shown to be bounded above by a function of the mean correlation between the constituent (i.e., base) classifiers and their average strength. This bound suggests that increasing the strength and/or decreasing the correlation of an ensemble's base classifiers may yield improved performance under the assumption of equal error costs. However, this and other existing bounds do not directly address application spaces in which error costs are inherently unequal. For applications involving binary classification, Receiver Operating Characteristic (ROC) curves, performance curves that explicitly trade off false alarms and missed detections, are often utilized to support decision making. To address performance optimization in this context, we have developed a lower bound for the entire ROC curve that can be expressed in terms of the class-specific strength and correlation of the base classifiers. We present empirical analyses demonstrating the efficacy of these bounds in predicting relative classifier performance. In addition, we specify performance regions of the ROC curve that are naturally delineated by the class-specific strengths of the base classifiers and show that each of these regions can be associated with a unique set of guidelines for performance optimization of binary classifiers within unequal error cost regimes.

  14. Building gene expression profile classifiers with a simple and efficient rejection option in R.

    Science.gov (United States)

    Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco; Savino, Alessandro; Hafeezurrehman, Hafeez

    2011-01-01

    The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional

  15. Deconvolution When Classifying Noisy Data Involving Transformations

    KAUST Repository

    Carroll, Raymond

    2012-09-01

    In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online.

  16. Deconvolution When Classifying Noisy Data Involving Transformations.

    Science.gov (United States)

    Carroll, Raymond; Delaigle, Aurore; Hall, Peter

    2012-09-01

    In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online.

  17. Just-in-time classifiers for recurrent concepts.

    Science.gov (United States)

    Alippi, Cesare; Boracchi, Giacomo; Roveri, Manuel

    2013-04-01

    Just-in-time (JIT) classifiers operate in evolving environments by classifying instances and reacting to concept drift. In stationary conditions, a JIT classifier improves its accuracy over time by exploiting additional supervised information coming from the field. In nonstationary conditions, however, the classifier reacts as soon as concept drift is detected; the current classification setup is discarded and a suitable one activated to keep the accuracy high. We present a novel generation of JIT classifiers able to deal with recurrent concept drift by means of a practical formalization of the concept representation and the definition of a set of operators working on such representations. The concept-drift detection activity, which is crucial in promptly reacting to changes exactly when needed, is advanced by considering change-detection tests monitoring both inputs and classes distributions.

  18. Deconvolution When Classifying Noisy Data Involving Transformations

    KAUST Repository

    Carroll, Raymond; Delaigle, Aurore; Hall, Peter

    2012-01-01

    In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online.

  19. Spectroscopic properties for identifying sapphire samples from Ban Bo Kaew, Phrae Province, Thailand

    Science.gov (United States)

    Mogmued, J.; Monarumit, N.; Won-in, K.; Satitkune, S.

    2017-09-01

    Gemstone commercial is a high revenue for Thailand especially ruby and sapphire. Moreover, Phrae is a potential gem field located in the northern part of Thailand. The studies of spectroscopic properties are mainly to identify gemstone using advanced techniques (e.g. UV-Vis-NIR spectrophotometry, FTIR spectrometry and Raman spectroscopy). Typically, UV-Vis-NIR spectrophotometry is a technique to study the cause of color in gemstones. FTIR spectrometry is a technique to study the functional groups in gem-materials. Raman pattern can be applied to identify the mineral inclusions in gemstones. In this study, the natural sapphires from Ban Bo Kaew were divided into two groups based on colors including blue and green. The samples were analyzed by UV-Vis-NIR spectrophotometer, FTIR spectrometer and Raman spectroscope for studying spectroscopic properties. According to UV-Vis-NIR spectra, the blue sapphires show higher Fe3+/Ti4+ and Fe2+/Fe3+ absorption peaks than those of green sapphires. Otherwise, green sapphires display higher Fe3+/Fe3+ absorption peaks than blue sapphires. The FTIR spectra of both blue and green sapphire samples show the absorption peaks of -OH,-CH and CO2. The mineral inclusions such as ferrocolumbite and rutile in sapphires from this area were observed by Raman spectroscope. The spectroscopic properties of sapphire samples from Ban Bo Kaew, Phrae Province, Thailand are applied to be the specific evidence for gemstone identification.

  20. WEB-BASED ADAPTIVE TESTING SYSTEM (WATS FOR CLASSIFYING STUDENTS ACADEMIC ABILITY

    Directory of Open Access Journals (Sweden)

    Jaemu LEE,

    2012-08-01

    Full Text Available Computer Adaptive Testing (CAT has been highlighted as a promising assessment method to fulfill two testing purposes: estimating student academic ability and classifying student academic level. In this paper, we introduced the Web-based Adaptive Testing System (WATS developed to support a cost effective assessment for classifying students’ ability into different academic levels. Instead of using a traditional paper and pencil test, the WATS is expected to serve as an alternate method to promptly diagnosis and identify underachieving students through Web-based testing. The WATS can also help provide students with appropriate learning contents and necessary academic support in time. In this paper, theoretical background and structure of WATS, item construction process based upon item response theory, and user interfaces of WATS were discussed.

  1. EVALUATING A COMPUTER BASED SKILLS ACQUISITION TRAINER TO CLASSIFY BADMINTON PLAYERS

    Directory of Open Access Journals (Sweden)

    Minh Vu Huynh

    2011-09-01

    Full Text Available The aim of the present study was to compare the statistical ability of both neural networks and discriminant function analysis on the newly developed SATB program. Using these statistical tools, we identified the accuracy of the SATB in classifying badminton players into different skill level groups. Forty-one participants, classified as advanced, intermediate, or beginner skilled level, participated in this study. Results indicated neural networks are more effective in predicting group membership, and displayed higher predictive validity when compared to discriminant analysis. Using these outcomes, in conjunction with the physiological and biomechanical variables of the participants, we assessed the authenticity and accuracy of the SATB and commented on the overall effectiveness of the visual based training approach to training badminton athletes

  2. Robust species taxonomy assignment algorithm for 16S rRNA NGS reads: application to oral carcinoma samples

    Directory of Open Access Journals (Sweden)

    Nezar Noor Al-Hebshi

    2015-09-01

    Full Text Available Background: Usefulness of next-generation sequencing (NGS in assessing bacteria associated with oral squamous cell carcinoma (OSCC has been undermined by inability to classify reads to the species level. Objective: The purpose of this study was to develop a robust algorithm for species-level classification of NGS reads from oral samples and to pilot test it for profiling bacteria within OSCC tissues. Methods: Bacterial 16S V1-V3 libraries were prepared from three OSCC DNA samples and sequenced using 454's FLX chemistry. High-quality, well-aligned, and non-chimeric reads ≥350 bp were classified using a novel, multi-stage algorithm that involves matching reads to reference sequences in revised versions of the Human Oral Microbiome Database (HOMD, HOMD extended (HOMDEXT, and Greengene Gold (GGG at alignment coverage and percentage identity ≥98%, followed by assignment to species level based on top hit reference sequences. Priority was given to hits in HOMD, then HOMDEXT and finally GGG. Unmatched reads were subject to operational taxonomic unit analysis. Results: Nearly, 92.8% of the reads were matched to updated-HOMD 13.2, 1.83% to trusted-HOMDEXT, and 1.36% to modified-GGG. Of all matched reads, 99.6% were classified to species level. A total of 228 species-level taxa were identified, representing 11 phyla; the most abundant were Proteobacteria, Bacteroidetes, Firmicutes, Fusobacteria, and Actinobacteria. Thirty-five species-level taxa were detected in all samples. On average, Prevotella oris, Neisseria flava, Neisseria flavescens/subflava, Fusobacterium nucleatum ss polymorphum, Aggregatibacter segnis, Streptococcus mitis, and Fusobacterium periodontium were the most abundant. Bacteroides fragilis, a species rarely isolated from the oral cavity, was detected in two samples. Conclusion: This multi-stage algorithm maximizes the fraction of reads classified to the species level while ensuring reliable classification by giving priority to the

  3. Comparing classifiers for pronunciation error detection

    NARCIS (Netherlands)

    Strik, H.; Truong, K.; Wet, F. de; Cucchiarini, C.

    2007-01-01

    Providing feedback on pronunciation errors in computer assisted language learning systems requires that pronunciation errors be detected automatically. In the present study we compare four types of classifiers that can be used for this purpose: two acoustic-phonetic classifiers (one of which employs

  4. Finger vein identification using fuzzy-based k-nearest centroid neighbor classifier

    Science.gov (United States)

    Rosdi, Bakhtiar Affendi; Jaafar, Haryati; Ramli, Dzati Athiar

    2015-02-01

    In this paper, a new approach for personal identification using finger vein image is presented. Finger vein is an emerging type of biometrics that attracts attention of researchers in biometrics area. As compared to other biometric traits such as face, fingerprint and iris, finger vein is more secured and hard to counterfeit since the features are inside the human body. So far, most of the researchers focus on how to extract robust features from the captured vein images. Not much research was conducted on the classification of the extracted features. In this paper, a new classifier called fuzzy-based k-nearest centroid neighbor (FkNCN) is applied to classify the finger vein image. The proposed FkNCN employs a surrounding rule to obtain the k-nearest centroid neighbors based on the spatial distributions of the training images and their distance to the test image. Then, the fuzzy membership function is utilized to assign the test image to the class which is frequently represented by the k-nearest centroid neighbors. Experimental evaluation using our own database which was collected from 492 fingers shows that the proposed FkNCN has better performance than the k-nearest neighbor, k-nearest-centroid neighbor and fuzzy-based-k-nearest neighbor classifiers. This shows that the proposed classifier is able to identify the finger vein image effectively.

  5. Classifier Fusion With Contextual Reliability Evaluation.

    Science.gov (United States)

    Liu, Zhunga; Pan, Quan; Dezert, Jean; Han, Jun-Wei; He, You

    2018-05-01

    Classifier fusion is an efficient strategy to improve the classification performance for the complex pattern recognition problem. In practice, the multiple classifiers to combine can have different reliabilities and the proper reliability evaluation plays an important role in the fusion process for getting the best classification performance. We propose a new method for classifier fusion with contextual reliability evaluation (CF-CRE) based on inner reliability and relative reliability concepts. The inner reliability, represented by a matrix, characterizes the probability of the object belonging to one class when it is classified to another class. The elements of this matrix are estimated from the -nearest neighbors of the object. A cautious discounting rule is developed under belief functions framework to revise the classification result according to the inner reliability. The relative reliability is evaluated based on a new incompatibility measure which allows to reduce the level of conflict between the classifiers by applying the classical evidence discounting rule to each classifier before their combination. The inner reliability and relative reliability capture different aspects of the classification reliability. The discounted classification results are combined with Dempster-Shafer's rule for the final class decision making support. The performance of CF-CRE have been evaluated and compared with those of main classical fusion methods using real data sets. The experimental results show that CF-CRE can produce substantially higher accuracy than other fusion methods in general. Moreover, CF-CRE is robust to the changes of the number of nearest neighbors chosen for estimating the reliability matrix, which is appealing for the applications.

  6. Hierarchical mixtures of naive Bayes classifiers

    NARCIS (Netherlands)

    Wiering, M.A.

    2002-01-01

    Naive Bayes classifiers tend to perform very well on a large number of problem domains, although their representation power is quite limited compared to more sophisticated machine learning algorithms. In this pa- per we study combining multiple naive Bayes classifiers by using the hierar- chical

  7. Estimates of phytomass and net primary productivity in terrestrial ecosystems of the former Soviet Union identified by classified Global Vegetation Index

    Energy Technology Data Exchange (ETDEWEB)

    Gaston, G.G.; Kolchugina, T.P. [Oregon State Univ., Corvallis, OR (United States)

    1995-12-01

    Forty-two regions with similar vegetation and landcover were identified in the former Soviet Union (FSU) by classifying Global Vegetation Index (GVI) images. Image classes were described in terms of vegetation and landcover. Image classes appear to provide more accurate and precise descriptions for most ecosystems when compared to general thematic maps. The area of forest lands were estimated at 1,330 Mha and the actual area of forest ecosystems at 875 Mha. Arable lands were estimated to be 211 Mha. The area of the tundra biome was estimated at 261 Mha. The areas of the forest-tundra/dwarf forest, taiga, mixed-deciduous forest and forest-steppe biomes were estimated t 153, 882, 196, and 144 Mha, respectively. The areas of desert-semidesert biome and arable land with irrigated land and meadows, were estimated at 126 and 237 Mha, respectively. Vegetation and landcover types were associated with the Bazilevich database of phytomass and NPP for vegetation in the FSU. The phytomass in the FSU was estimated at 97.1 Gt C, with 86.8 in forest vegetation, 9.7 in natural non-forest and 0.6 Gt C in arable lands. The NPP was estimated at 8.6 Gt C/yr, with 3.2, 4.8, and 0.6 Gt C/yr of forest, natural non-forest, and arable ecosystems, respectively. The phytomass estimates for forests were greater than previous assessments which considered the age-class distribution of forest stands in the FSU. The NPP of natural ecosystems estimated in this study was 23% greater than previous estimates which used thematic maps to identify ecosystems. 47 refs., 4 figs., 2 tabs.

  8. A Supervised Multiclass Classifier for an Autocoding System

    Directory of Open Access Journals (Sweden)

    Yukako Toko

    2017-11-01

    Full Text Available Classification is often required in various contexts, including in the field of official statistics. In the previous study, we have developed a multiclass classifier that can classify short text descriptions with high accuracy. The algorithm borrows the concept of the naïve Bayes classifier and is so simple that its structure is easily understandable. The proposed classifier has the following two advantages. First, the processing times for both learning and classifying are extremely practical. Second, the proposed classifier yields high-accuracy results for a large portion of a dataset. We have previously developed an autocoding system for the Family Income and Expenditure Survey in Japan that has a better performing classifier. While the original system was developed in Perl in order to improve the efficiency of the coding process of short Japanese texts, the proposed system is implemented in the R programming language in order to explore versatility and is modified to make the system easily applicable to English text descriptions, in consideration of the increasing number of R users in the field of official statistics. We are planning to publish the proposed classifier as an R-package. The proposed classifier would be generally applicable to other classification tasks including coding activities in the field of official statistics, and it would contribute greatly to improving their efficiency.

  9. Vibrio parahaemolyticus Strains of Pandemic Serotypes Identified from Clinical and Environmental Samples from Jiangsu, China

    Directory of Open Access Journals (Sweden)

    Jingjiao eLi

    2016-05-01

    Full Text Available Vibrio parahaemolyticus has emerged as a major foodborne pathogen in China, Japan, Thailand and other Asian countries. In this study, 72 strains of V. parahaemolyticus were isolated from clinical and environmental samples between 2006 and 2014 in Jiangsu, China. The serotypes and six virulence genes including thermostable direct hemolysin (TDR and TDR-related hemolysin (TRH genes were assessed among the isolates. Twenty five serotypes were identified and O3:K6 was one of the dominant serotypes. The genetic diversity was assessed by multilocus sequence typing (MLST analysis, and 48 sequence types (STs were found, suggesting this V. parahaemolyticus group is widely dispersed and undergoing rapid evolution. A total of 25 strains of pandemic serotypes such as O3:K6, O5:K17 and O1:KUT were identified. It is worth noting that the pandemic serotypes were not exclusively identified from clinical samples, rather, nine strains were also isolated from environmental samples; and some of these strains harbored several virulence genes, which may render those strains pathogenicity potential. Therefore, the emergence of these environmental pandemic V. parahaemolyticus strains may poses a new threat to the public health in China. Furthermore, six novel serotypes and 34 novel STs were identified among the 72 isolates, indicating that V. parahaemolyticus were widely distributed and fast evolving in the environment in Jiangsu, China. The findings of this study provide new insight into the phylogenic relationship between V. parahaemolyticus strains of pandemic serotypes from clinical and environmental sources and enhance the MLST database; and our proposed possible O- and K- antigen evolving paths of V. parahaemolyticus may help understand how the serotypes of this dispersed bacterial population evolve.

  10. Classification Identification of Acoustic Emission Signals from Underground Metal Mine Rock by ICIMF Classifier

    Directory of Open Access Journals (Sweden)

    Hongyan Zuo

    2014-01-01

    Full Text Available To overcome the drawback that fuzzy classifier was sensitive to noises and outliers, Mamdani fuzzy classifier based on improved chaos immune algorithm was developed, in which bilateral Gaussian membership function parameters were set as constraint conditions and the indexes of fuzzy classification effectiveness and number of correct samples of fuzzy classification as the subgoal of fitness function. Moreover, Iris database was used for simulation experiment, classification, and recognition of acoustic emission signals and interference signals from stope wall rock of underground metal mines. The results showed that Mamdani fuzzy classifier based on improved chaos immune algorithm could effectively improve the prediction accuracy of classification of data sets with noises and outliers and the classification accuracy of acoustic emission signal and interference signal from stope wall rock of underground metal mines was 90.00%. It was obvious that the improved chaos immune Mamdani fuzzy (ICIMF classifier was useful for accurate diagnosis of acoustic emission signal and interference signal from stope wall rock of underground metal mines.

  11. Logarithmic learning for generalized classifier neural network.

    Science.gov (United States)

    Ozyildirim, Buse Melis; Avci, Mutlu

    2014-12-01

    Generalized classifier neural network is introduced as an efficient classifier among the others. Unless the initial smoothing parameter value is close to the optimal one, generalized classifier neural network suffers from convergence problem and requires quite a long time to converge. In this work, to overcome this problem, a logarithmic learning approach is proposed. The proposed method uses logarithmic cost function instead of squared error. Minimization of this cost function reduces the number of iterations used for reaching the minima. The proposed method is tested on 15 different data sets and performance of logarithmic learning generalized classifier neural network is compared with that of standard one. Thanks to operation range of radial basis function included by generalized classifier neural network, proposed logarithmic approach and its derivative has continuous values. This makes it possible to adopt the advantage of logarithmic fast convergence by the proposed learning method. Due to fast convergence ability of logarithmic cost function, training time is maximally decreased to 99.2%. In addition to decrease in training time, classification performance may also be improved till 60%. According to the test results, while the proposed method provides a solution for time requirement problem of generalized classifier neural network, it may also improve the classification accuracy. The proposed method can be considered as an efficient way for reducing the time requirement problem of generalized classifier neural network. Copyright © 2014 Elsevier Ltd. All rights reserved.

  12. Training set optimization and classifier performance in a top-down diabetic retinopathy screening system

    Science.gov (United States)

    Wigdahl, J.; Agurto, C.; Murray, V.; Barriga, S.; Soliz, P.

    2013-03-01

    Diabetic retinopathy (DR) affects more than 4.4 million Americans age 40 and over. Automatic screening for DR has shown to be an efficient and cost-effective way to lower the burden on the healthcare system, by triaging diabetic patients and ensuring timely care for those presenting with DR. Several supervised algorithms have been developed to detect pathologies related to DR, but little work has been done in determining the size of the training set that optimizes an algorithm's performance. In this paper we analyze the effect of the training sample size on the performance of a top-down DR screening algorithm for different types of statistical classifiers. Results are based on partial least squares (PLS), support vector machines (SVM), k-nearest neighbor (kNN), and Naïve Bayes classifiers. Our dataset consisted of digital retinal images collected from a total of 745 cases (595 controls, 150 with DR). We varied the number of normal controls in the training set, while keeping the number of DR samples constant, and repeated the procedure 10 times using randomized training sets to avoid bias. Results show increasing performance in terms of area under the ROC curve (AUC) when the number of DR subjects in the training set increased, with similar trends for each of the classifiers. Of these, PLS and k-NN had the highest average AUC. Lower standard deviation and a flattening of the AUC curve gives evidence that there is a limit to the learning ability of the classifiers and an optimal number of cases to train on.

  13. Fusion of classifiers for REIS-based detection of suspicious breast lesions

    Science.gov (United States)

    Lederman, Dror; Wang, Xingwei; Zheng, Bin; Sumkin, Jules H.; Tublin, Mitchell; Gur, David

    2011-03-01

    After developing a multi-probe resonance-frequency electrical impedance spectroscopy (REIS) system aimed at detecting women with breast abnormalities that may indicate a developing breast cancer, we have been conducting a prospective clinical study to explore the feasibility of applying this REIS system to classify younger women (breast cancer. The system comprises one central probe placed in contact with the nipple, and six additional probes uniformly distributed along an outside circle to be placed in contact with six points on the outer breast skin surface. In this preliminary study, we selected an initial set of 174 examinations on participants that have completed REIS examinations and have clinical status verification. Among these, 66 examinations were recommended for biopsy due to findings of a highly suspicious breast lesion ("positives"), and 108 were determined as negative during imaging based procedures ("negatives"). A set of REIS-based features, extracted using a mirror-matched approach, was computed and fed into five machine learning classifiers. A genetic algorithm was used to select an optimal subset of features for each of the five classifiers. Three fusion rules, namely sum rule, weighted sum rule and weighted median rule, were used to combine the results of the classifiers. Performance evaluation was performed using a leave-one-case-out cross-validation method. The results indicated that REIS may provide a new technology to identify younger women with higher than average risk of having or developing breast cancer. Furthermore, it was shown that fusion rule, such as a weighted median fusion rule and a weighted sum fusion rule may improve performance as compared with the highest performing single classifier.

  14. SVM Classifier – a comprehensive java interface for support vector machine classification of microarray data

    Science.gov (United States)

    Pirooznia, Mehdi; Deng, Youping

    2006-01-01

    Motivation Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. Results The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1–BRCA2 samples with RBF kernel of SVM. Conclusion We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at . PMID:17217518

  15. DECISION TREE CLASSIFIERS FOR STAR/GALAXY SEPARATION

    International Nuclear Information System (INIS)

    Vasconcellos, E. C.; Ruiz, R. S. R.; De Carvalho, R. R.; Capelato, H. V.; Gal, R. R.; LaBarbera, F. L.; Frago Campos Velho, H.; Trevisan, M.

    2011-01-01

    We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: 14 ≤ r ≤ 21 (85.2%) and r ≥ 19 (82.1%). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT, and Ball et al. We find that our FT classifier is comparable to or better in completeness over the full magnitude range 15 ≤ r ≤ 21, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination (∼2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 ≤ r ≤ 21.

  16. Improved Classification by Non Iterative and Ensemble Classifiers in Motor Fault Diagnosis

    Directory of Open Access Journals (Sweden)

    PANIGRAHY, P. S.

    2018-02-01

    Full Text Available Data driven approach for multi-class fault diagnosis of induction motor using MCSA at steady state condition is a complex pattern classification problem. This investigation has exploited the built-in ensemble process of non-iterative classifiers to resolve the most challenging issues in this area, including bearing and stator fault detection. Non-iterative techniques exhibit with an average 15% of increased fault classification accuracy against their iterative counterparts. Particularly RF has shown outstanding performance even at less number of training samples and noisy feature space because of its distributive feature model. The robustness of the results, backed by the experimental verification shows that the non-iterative individual classifiers like RF is the optimum choice in the area of automatic fault diagnosis of induction motor.

  17. Classifying threats with a 14-MeV neutron interrogation system.

    Science.gov (United States)

    Strellis, Dan; Gozani, Tsahi

    2005-01-01

    SeaPODDS (Sea Portable Drug Detection System) is a non-intrusive tool for detecting concealed threats in hidden compartments of maritime vessels. This system consists of an electronic neutron generator, a gamma-ray detector, a data acquisition computer, and a laptop computer user-interface. Although initially developed to detect narcotics, recent algorithm developments have shown that the system is capable of correctly classifying a threat into one of four distinct categories: narcotic, explosive, chemical weapon, or radiological dispersion device (RDD). Detection of narcotics, explosives, and chemical weapons is based on gamma-ray signatures unique to the chemical elements. Elements are identified by their characteristic prompt gamma-rays induced by fast and thermal neutrons. Detection of RDD is accomplished by detecting gamma-rays emitted by common radioisotopes and nuclear reactor fission products. The algorithm phenomenology for classifying threats into the proper categories is presented here.

  18. A Machine Learning Ensemble Classifier for Early Prediction of Diabetic Retinopathy.

    Science.gov (United States)

    S K, Somasundaram; P, Alli

    2017-11-09

    The main complication of diabetes is Diabetic retinopathy (DR), retinal vascular disease and it leads to the blindness. Regular screening for early DR disease detection is considered as an intensive labor and resource oriented task. Therefore, automatic detection of DR diseases is performed only by using the computational technique is the great solution. An automatic method is more reliable to determine the presence of an abnormality in Fundus images (FI) but, the classification process is poorly performed. Recently, few research works have been designed for analyzing texture discrimination capacity in FI to distinguish the healthy images. However, the feature extraction (FE) process was not performed well, due to the high dimensionality. Therefore, to identify retinal features for DR disease diagnosis and early detection using Machine Learning and Ensemble Classification method, called, Machine Learning Bagging Ensemble Classifier (ML-BEC) is designed. The ML-BEC method comprises of two stages. The first stage in ML-BEC method comprises extraction of the candidate objects from Retinal Images (RI). The candidate objects or the features for DR disease diagnosis include blood vessels, optic nerve, neural tissue, neuroretinal rim, optic disc size, thickness and variance. These features are initially extracted by applying Machine Learning technique called, t-distributed Stochastic Neighbor Embedding (t-SNE). Besides, t-SNE generates a probability distribution across high-dimensional images where the images are separated into similar and dissimilar pairs. Then, t-SNE describes a similar probability distribution across the points in the low-dimensional map. This lessens the Kullback-Leibler divergence among two distributions regarding the locations of the points on the map. The second stage comprises of application of ensemble classifiers to the extracted features for providing accurate analysis of digital FI using machine learning. In this stage, an automatic detection

  19. 32 CFR 2400.28 - Dissemination of classified information.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Dissemination of classified information. 2400.28... SECURITY PROGRAM Safeguarding § 2400.28 Dissemination of classified information. Heads of OSTP offices... originating official may prescribe specific restrictions on dissemination of classified information when...

  20. Classification of Multiple Chinese Liquors by Means of a QCM-based E-Nose and MDS-SVM Classifier.

    Science.gov (United States)

    Li, Qiang; Gu, Yu; Jia, Jing

    2017-01-30

    Chinese liquors are internationally well-known fermentative alcoholic beverages. They have unique flavors attributable to the use of various bacteria and fungi, raw materials, and production processes. Developing a novel, rapid, and reliable method to identify multiple Chinese liquors is of positive significance. This paper presents a pattern recognition system for classifying ten brands of Chinese liquors based on multidimensional scaling (MDS) and support vector machine (SVM) algorithms in a quartz crystal microbalance (QCM)-based electronic nose (e-nose) we designed. We evaluated the comprehensive performance of the MDS-SVM classifier that predicted all ten brands of Chinese liquors individually. The prediction accuracy (98.3%) showed superior performance of the MDS-SVM classifier over the back-propagation artificial neural network (BP-ANN) classifier (93.3%) and moving average-linear discriminant analysis (MA-LDA) classifier (87.6%). The MDS-SVM classifier has reasonable reliability, good fitting and prediction (generalization) performance in classification of the Chinese liquors. Taking both application of the e-nose and validation of the MDS-SVM classifier into account, we have thus created a useful method for the classification of multiple Chinese liquors.

  1. Classification of Multiple Chinese Liquors by Means of a QCM-based E-Nose and MDS-SVM Classifier

    Directory of Open Access Journals (Sweden)

    Qiang Li

    2017-01-01

    Full Text Available Chinese liquors are internationally well-known fermentative alcoholic beverages. They have unique flavors attributable to the use of various bacteria and fungi, raw materials, and production processes. Developing a novel, rapid, and reliable method to identify multiple Chinese liquors is of positive significance. This paper presents a pattern recognition system for classifying ten brands of Chinese liquors based on multidimensional scaling (MDS and support vector machine (SVM algorithms in a quartz crystal microbalance (QCM-based electronic nose (e-nose we designed. We evaluated the comprehensive performance of the MDS-SVM classifier that predicted all ten brands of Chinese liquors individually. The prediction accuracy (98.3% showed superior performance of the MDS-SVM classifier over the back-propagation artificial neural network (BP-ANN classifier (93.3% and moving average-linear discriminant analysis (MA-LDA classifier (87.6%. The MDS-SVM classifier has reasonable reliability, good fitting and prediction (generalization performance in classification of the Chinese liquors. Taking both application of the e-nose and validation of the MDS-SVM classifier into account, we have thus created a useful method for the classification of multiple Chinese liquors.

  2. Utility of functional status for classifying community versus institutional discharges after inpatient rehabilitation for stroke.

    Science.gov (United States)

    Reistetter, Timothy A; Graham, James E; Deutsch, Anne; Granger, Carl V; Markello, Samuel; Ottenbacher, Kenneth J

    2010-03-01

    To evaluate the ability of patient functional status to differentiate between community and institutional discharges after rehabilitation for stroke. Retrospective cross-sectional design. Inpatient rehabilitation facilities contributing to the Uniform Data System for Medical Rehabilitation. Patients (N=157,066) receiving inpatient rehabilitation for stroke from 2006 and 2007. Not applicable. Discharge FIM rating and discharge setting (community vs institutional). Approximately 71% of the sample was discharged to the community. Receiver operating characteristic curve analyses revealed that FIM total performed as well as or better than FIM motor and FIM cognition subscales in differentiating discharge settings. Area under the curve for FIM total was .85, indicating very good ability to identify persons discharged to the community. A FIM total rating of 78 was identified as the optimal cut point for distinguishing between positive (community) and negative (institution) tests. This cut point yielded balanced sensitivity and specificity (both=.77). Discharge planning is complex, involving many factors. Identifying a functional threshold for classifying discharge settings can provide important information to assist in this process. Additional research is needed to determine if the risks and benefits of classification errors justify shifting the cut point to weight either sensitivity or specificity of FIM ratings. Copyright 2010 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  3. Validation of mercury tip-switch and accelerometer activity sensors for identifying resting and active behavior in bears

    Science.gov (United States)

    Jasmine Ware,; Rode, Karyn D.; Pagano, Anthony M.; Bromaghin, Jeffrey F.; Robbins, Charles T.; Joy Erlenbach,; Shannon Jensen,; Amy Cutting,; Nicole Nicassio-Hiskey,; Amy Hash,; Owen, Megan A.; Heiko Jansen,

    2015-01-01

    Activity sensors are often included in wildlife transmitters and can provide information on the behavior and activity patterns of animals remotely. However, interpreting activity-sensor data relative to animal behavior can be difficult if animals cannot be continuously observed. In this study, we examined the performance of a mercury tip-switch and a tri-axial accelerometer housed in collars to determine whether sensor data can be accurately classified as resting and active behaviors and whether data are comparable for the 2 sensor types. Five captive bears (3 polar [Ursus maritimus] and 2 brown [U. arctos horribilis]) were fitted with a collar specially designed to internally house the sensors. The bears’ behaviors were recorded, classified, and then compared with sensor readings. A separate tri-axial accelerometer that sampled continuously at a higher frequency and provided raw acceleration values from 3 axes was also mounted on the collar to compare with the lower resolution sensors. Both accelerometers more accurately identified resting and active behaviors at time intervals ranging from 1 minute to 1 hour (≥91.1% accuracy) compared with the mercury tip-switch (range = 75.5–86.3%). However, mercury tip-switch accuracy improved when sampled at longer intervals (e.g., 30–60 min). Data from the lower resolution accelerometer, but not the mercury tip-switch, accurately predicted the percentage of time spent resting during an hour. Although the number of bears available for this study was small, our results suggest that these activity sensors can remotely identify resting versus active behaviors across most time intervals. We recommend that investigators consider both study objectives and the variation in accuracy of classifying resting and active behaviors reported here when determining sampling interval.

  4. NMD Classifier: A reliable and systematic classification tool for nonsense-mediated decay events.

    Directory of Open Access Journals (Sweden)

    Min-Kung Hsu

    Full Text Available Nonsense-mediated decay (NMD degrades mRNAs that include premature termination codons to avoid the translation and accumulation of truncated proteins. This mechanism has been found to participate in gene regulation and a wide spectrum of biological processes. However, the evolutionary and regulatory origins of NMD-targeted transcripts (NMDTs have been less studied, partly because of the complexity in analyzing NMD events. Here we report NMD Classifier, a tool for systematic classification of NMD events for either annotated or de novo assembled transcripts. This tool is based on the assumption of minimal evolution/regulation-an event that leads to the least change is the most likely to occur. Our simulation results indicate that NMD Classifier can correctly identify an average of 99.3% of the NMD-causing transcript structural changes, particularly exon inclusions/exclusions and exon boundary alterations. Researchers can apply NMD Classifier to evolutionary and regulatory studies by comparing NMD events of different biological conditions or in different organisms.

  5. Detecting Dutch political tweets : A classifier based on voting system using supervised learning

    NARCIS (Netherlands)

    de Mello Araújo, Eric Fernandes; Ebbelaar, Dave

    The task of classifying political tweets has been shown to be very difficult, with controversial results in many works and with non-replicable methods. Most of the works with this goal use rule-based methods to identify political tweets. We propose here two methods, being one rule-based approach,

  6. Classified study and clinical value of the phase imaging features

    International Nuclear Information System (INIS)

    Dang Yaping; Ma Aiqun; Zheng Xiaopu; Yang Aimin; Xiao Jiang; Gao Xinyao

    2000-01-01

    445 patients with various heart diseases were examined by the gated cardiac blood pool imaging, and the phase was classified. The relationship between the seven types with left ventricular function index, clinical heart function, different heart diseases as well as electrocardiograph was studied. The results showed that the phase image classification could match with the clinical heart function. It can visually, directly and accurately indicate clinical heart function and can be used to identify diagnosis of heart disease

  7. Neural Network Classifiers for Local Wind Prediction.

    Science.gov (United States)

    Kretzschmar, Ralf; Eckert, Pierre; Cattani, Daniel; Eggimann, Fritz

    2004-05-01

    This paper evaluates the quality of neural network classifiers for wind speed and wind gust prediction with prediction lead times between +1 and +24 h. The predictions were realized based on local time series and model data. The selection of appropriate input features was initiated by time series analysis and completed by empirical comparison of neural network classifiers trained on several choices of input features. The selected input features involved day time, yearday, features from a single wind observation device at the site of interest, and features derived from model data. The quality of the resulting classifiers was benchmarked against persistence for two different sites in Switzerland. The neural network classifiers exhibited superior quality when compared with persistence judged on a specific performance measure, hit and false-alarm rates.

  8. 3D Bayesian contextual classifiers

    DEFF Research Database (Denmark)

    Larsen, Rasmus

    2000-01-01

    We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours.......We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours....

  9. Minimal methylation classifier (MIMIC): A novel method for derivation and rapid diagnostic detection of disease-associated DNA methylation signatures.

    Science.gov (United States)

    Schwalbe, E C; Hicks, D; Rafiee, G; Bashton, M; Gohlke, H; Enshaei, A; Potluri, S; Matthiesen, J; Mather, M; Taleongpong, P; Chaston, R; Silmon, A; Curtis, A; Lindsey, J C; Crosier, S; Smith, A J; Goschzik, T; Doz, F; Rutkowski, S; Lannering, B; Pietsch, T; Bailey, S; Williamson, D; Clifford, S C

    2017-10-18

    Rapid and reliable detection of disease-associated DNA methylation patterns has major potential to advance molecular diagnostics and underpin research investigations. We describe the development and validation of minimal methylation classifier (MIMIC), combining CpG signature design from genome-wide datasets, multiplex-PCR and detection by single-base extension and MALDI-TOF mass spectrometry, in a novel method to assess multi-locus DNA methylation profiles within routine clinically-applicable assays. We illustrate the application of MIMIC to successfully identify the methylation-dependent diagnostic molecular subgroups of medulloblastoma (the most common malignant childhood brain tumour), using scant/low-quality samples remaining from the most recently completed pan-European medulloblastoma clinical trial, refractory to analysis by conventional genome-wide DNA methylation analysis. Using this approach, we identify critical DNA methylation patterns from previously inaccessible cohorts, and reveal novel survival differences between the medulloblastoma disease subgroups with significant potential for clinical exploitation.

  10. A bench-top hyperspectral imaging system to classify beef from Nellore cattle based on tenderness

    Science.gov (United States)

    Nubiato, Keni Eduardo Zanoni; Mazon, Madeline Rezende; Antonelo, Daniel Silva; Calkins, Chris R.; Naganathan, Govindarajan Konda; Subbiah, Jeyamkondan; da Luz e Silva, Saulo

    2018-03-01

    The aim of this study was to evaluate the accuracy of classification of Nellore beef aged for 0, 7, 14, or 21 days and classification based on tenderness and aging period using a bench-top hyperspectral imaging system. A hyperspectral imaging system (λ = 928-2524 nm) was used to collect hyperspectral images of the Longissimus thoracis et lumborum (aging n = 376 and tenderness n = 345) of Nellore cattle. The image processing steps included selection of region of interest, extraction of spectra, and indentification and evalution of selected wavelengths for classification. Six linear discriminant models were developed to classify samples based on tenderness and aging period. The model using the first derivative of partial absorbance spectra (give wavelength range spectra) was able to classify steaks based on the tenderness with an overall accuracy of 89.8%. The model using the first derivative of full absorbance spectra was able to classify steaks based on aging period with an overall accuracy of 84.8%. The results demonstrate that the HIS may be a viable technology for classifying beef based on tenderness and aging period.

  11. A CLASSIFIER SYSTEM USING SMOOTH GRAPH COLORING

    Directory of Open Access Journals (Sweden)

    JORGE FLORES CRUZ

    2017-01-01

    Full Text Available Unsupervised classifiers allow clustering methods with less or no human intervention. Therefore it is desirable to group the set of items with less data processing. This paper proposes an unsupervised classifier system using the model of soft graph coloring. This method was tested with some classic instances in the literature and the results obtained were compared with classifications made with human intervention, yielding as good or better results than supervised classifiers, sometimes providing alternative classifications that considers additional information that humans did not considered.

  12. Improved Collaborative Representation Classifier Based on l2-Regularized for Human Action Recognition

    Directory of Open Access Journals (Sweden)

    Shirui Huo

    2017-01-01

    Full Text Available Human action recognition is an important recent challenging task. Projecting depth images onto three depth motion maps (DMMs and extracting deep convolutional neural network (DCNN features are discriminant descriptor features to characterize the spatiotemporal information of a specific action from a sequence of depth images. In this paper, a unified improved collaborative representation framework is proposed in which the probability that a test sample belongs to the collaborative subspace of all classes can be well defined and calculated. The improved collaborative representation classifier (ICRC based on l2-regularized for human action recognition is presented to maximize the likelihood that a test sample belongs to each class, then theoretical investigation into ICRC shows that it obtains a final classification by computing the likelihood for each class. Coupled with the DMMs and DCNN features, experiments on depth image-based action recognition, including MSRAction3D and MSRGesture3D datasets, demonstrate that the proposed approach successfully using a distance-based representation classifier achieves superior performance over the state-of-the-art methods, including SRC, CRC, and SVM.

  13. Rapid Land Cover Map Updates Using Change Detection and Robust Random Forest Classifiers

    Directory of Open Access Journals (Sweden)

    Konrad J. Wessels

    2016-10-01

    Full Text Available The paper evaluated the Landsat Automated Land Cover Update Mapping (LALCUM system designed to rapidly update a land cover map to a desired nominal year using a pre-existing reference land cover map. The system uses the Iteratively Reweighted Multivariate Alteration Detection (IRMAD to identify areas of change and no change. The system then automatically generates large amounts of training samples (n > 1 million in the no-change areas as input to an optimized Random Forest classifier. Experiments were conducted in the KwaZulu-Natal Province of South Africa using a reference land cover map from 2008, a change mask between 2008 and 2011 and Landsat ETM+ data for 2011. The entire system took 9.5 h to process. We expected that the use of the change mask would improve classification accuracy by reducing the number of mislabeled training data caused by land cover change between 2008 and 2011. However, this was not the case due to exceptional robustness of Random Forest classifier to mislabeled training samples. The system achieved an overall accuracy of 65%–67% using 22 detailed classes and 72%–74% using 12 aggregated national classes. “Water”, “Plantations”, “Plantations—clearfelled”, “Orchards—trees”, “Sugarcane”, “Built-up/dense settlement”, “Cultivation—Irrigated” and “Forest (indigenous” had user’s accuracies above 70%. Other detailed classes (e.g., “Low density settlements”, “Mines and Quarries”, and “Cultivation, subsistence, drylands” which are required for operational, provincial-scale land use planning and are usually mapped using manual image interpretation, could not be mapped using Landsat spectral data alone. However, the system was able to map the 12 national classes, at a sufficiently high level of accuracy for national scale land cover monitoring. This update approach and the highly automated, scalable LALCUM system can improve the efficiency and update rate of regional land

  14. Exploring Land Use and Land Cover of Geotagged Social-Sensing Images Using Naive Bayes Classifier

    Directory of Open Access Journals (Sweden)

    Asamaporn Sitthi

    2016-09-01

    Full Text Available Online social media crowdsourced photos contain a vast amount of visual information about the physical properties and characteristics of the earth’s surface. Flickr is an important online social media platform for users seeking this information. Each day, users generate crowdsourced geotagged digital imagery containing an immense amount of information. In this paper, geotagged Flickr images are used for automatic extraction of low-level land use/land cover (LULC features. The proposed method uses a naive Bayes classifier with color, shape, and color index descriptors. The classified images are mapped using a majority filtering approach. The classifier performance in overall accuracy, kappa coefficient, precision, recall, and f-measure was 87.94%, 82.89%, 88.20%, 87.90%, and 88%, respectively. Labeled-crowdsourced images were filtered into a spatial tile of a 30 m × 30 m resolution using the majority voting method to reduce geolocation uncertainty from the crowdsourced data. These tile datasets were used as training and validation samples to classify Landsat TM5 images. The supervised maximum likelihood method was used for the LULC classification. The results show that the geotagged Flickr images can classify LULC types with reasonable accuracy and that the proposed approach improves LULC classification efficiency if a sufficient spatial distribution of crowdsourced data exists.

  15. Comparison of Different Features and Classifiers for Driver Fatigue Detection Based on a Single EEG Channel

    Directory of Open Access Journals (Sweden)

    Jianfeng Hu

    2017-01-01

    Full Text Available Driver fatigue has become an important factor to traffic accidents worldwide, and effective detection of driver fatigue has major significance for public health. The purpose method employs entropy measures for feature extraction from a single electroencephalogram (EEG channel. Four types of entropies measures, sample entropy (SE, fuzzy entropy (FE, approximate entropy (AE, and spectral entropy (PE, were deployed for the analysis of original EEG signal and compared by ten state-of-the-art classifiers. Results indicate that optimal performance of single channel is achieved using a combination of channel CP4, feature FE, and classifier Random Forest (RF. The highest accuracy can be up to 96.6%, which has been able to meet the needs of real applications. The best combination of channel + features + classifier is subject-specific. In this work, the accuracy of FE as the feature is far greater than the Acc of other features. The accuracy using classifier RF is the best, while that of classifier SVM with linear kernel is the worst. The impact of channel selection on the Acc is larger. The performance of various channels is very different.

  16. The edge-preservation multi-classifier relearning framework for the classification of high-resolution remotely sensed imagery

    Science.gov (United States)

    Han, Xiaopeng; Huang, Xin; Li, Jiayi; Li, Yansheng; Yang, Michael Ying; Gong, Jianya

    2018-04-01

    In recent years, the availability of high-resolution imagery has enabled more detailed observation of the Earth. However, it is imperative to simultaneously achieve accurate interpretation and preserve the spatial details for the classification of such high-resolution data. To this aim, we propose the edge-preservation multi-classifier relearning framework (EMRF). This multi-classifier framework is made up of support vector machine (SVM), random forest (RF), and sparse multinomial logistic regression via variable splitting and augmented Lagrangian (LORSAL) classifiers, considering their complementary characteristics. To better characterize complex scenes of remote sensing images, relearning based on landscape metrics is proposed, which iteratively quantizes both the landscape composition and spatial configuration by the use of the initial classification results. In addition, a novel tri-training strategy is proposed to solve the over-smoothing effect of relearning by means of automatic selection of training samples with low classification certainties, which always distribute in or near the edge areas. Finally, EMRF flexibly combines the strengths of relearning and tri-training via the classification certainties calculated by the probabilistic output of the respective classifiers. It should be noted that, in order to achieve an unbiased evaluation, we assessed the classification accuracy of the proposed framework using both edge and non-edge test samples. The experimental results obtained with four multispectral high-resolution images confirm the efficacy of the proposed framework, in terms of both edge and non-edge accuracy.

  17. Carbon classified?

    DEFF Research Database (Denmark)

    Lippert, Ingmar

    2012-01-01

    . Using an actor- network theory (ANT) framework, the aim is to investigate the actors who bring together the elements needed to classify their carbon emission sources and unpack the heterogeneous relations drawn on. Based on an ethnographic study of corporate agents of ecological modernisation over...... a period of 13 months, this paper provides an exploration of three cases of enacting classification. Drawing on ANT, we problematise the silencing of a range of possible modalities of consumption facts and point to the ontological ethics involved in such performances. In a context of global warming...

  18. Intestinal parasites and genotyping of Giardia duodenalis in children: first report of genotype B in isolates from human clinical samples in Mexico

    Directory of Open Access Journals (Sweden)

    Julio César Torres-Romero

    2014-06-01

    Full Text Available Giardia duodenalis is one of the most prevalent enteroparasites in children. This parasite produces several clinical manifestations. The aim of this study was to determine the prevalence of genotypes of G. duodenalis causing infection in a region of southeastern Mexico. G. duodenalis cysts were isolated (33/429 from stool samples of children and molecular genotyping was performed by polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP analysis, targeting the triosephosphate isomerase ( tpi and glutamate dehydrogenase ( gdh genes. The tpi gene was amplified in all of the cyst samples, either for assemblage A (27 samples or assemblage B (6 samples. RFLP analysis classified the 27 tpi -A amplicons in assemblage A, subgenotype I. Samples classified as assemblage B were further analysed using PCR-RFLP of the gdh gene and identified as assemblage B, subgenotype III. To our knowledge, this is the first report of assemblage B of G. duodenalis in human clinical samples from Mexico.

  19. Feature extraction using convolutional neural network for classifying breast density in mammographic images

    Science.gov (United States)

    Thomaz, Ricardo L.; Carneiro, Pedro C.; Patrocinio, Ana C.

    2017-03-01

    Breast cancer is the leading cause of death for women in most countries. The high levels of mortality relate mostly to late diagnosis and to the direct proportionally relationship between breast density and breast cancer development. Therefore, the correct assessment of breast density is important to provide better screening for higher risk patients. However, in modern digital mammography the discrimination among breast densities is highly complex due to increased contrast and visual information for all densities. Thus, a computational system for classifying breast density might be a useful tool for aiding medical staff. Several machine-learning algorithms are already capable of classifying small number of classes with good accuracy. However, machinelearning algorithms main constraint relates to the set of features extracted and used for classification. Although well-known feature extraction techniques might provide a good set of features, it is a complex task to select an initial set during design of a classifier. Thus, we propose feature extraction using a Convolutional Neural Network (CNN) for classifying breast density by a usual machine-learning classifier. We used 307 mammographic images downsampled to 260x200 pixels to train a CNN and extract features from a deep layer. After training, the activation of 8 neurons from a deep fully connected layer are extracted and used as features. Then, these features are feedforward to a single hidden layer neural network that is cross-validated using 10-folds to classify among four classes of breast density. The global accuracy of this method is 98.4%, presenting only 1.6% of misclassification. However, the small set of samples and memory constraints required the reuse of data in both CNN and MLP-NN, therefore overfitting might have influenced the results even though we cross-validated the network. Thus, although we presented a promising method for extracting features and classifying breast density, a greater database is

  20. A multi-sample based method for identifying common CNVs in normal human genomic structure using high-resolution aCGH data.

    Directory of Open Access Journals (Sweden)

    Chihyun Park

    Full Text Available BACKGROUND: It is difficult to identify copy number variations (CNV in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. METHODOLOGY AND PRINCIPAL FINDINGS: We developed a multi-sample-based genomic variations detector (MGVD that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs; CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR. CONCLUSIONS AND SIGNIFICANCE: We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php.

  1. Final Validation of the ProMisE Molecular Classifier for Endometrial Carcinoma in a Large Population-based Case Series.

    Science.gov (United States)

    Kommoss, S; McConechy, M K; Kommoss, F; Leung, S; Bunz, A; Magrill, J; Britton, H; Kommoss, F; Grevenkamp, F; Karnezis, A; Yang, W; Lum, A; Krämer, B; Taran, F; Staebler, A; Lax, S; Brucker, S Y; Huntsman, D G; Gilks, C B; McAlpine, J N; Talhouk, A

    2018-02-07

    Based on The Cancer Genome Atlas, we previously developed and confirmed a pragmatic molecular classifier for endometrial cancers; ProMisE (Proactive Molecular Risk Classifier for Endometrial Cancer). ProMisE identifies four prognostically distinct molecular subtypes, and can be applied to diagnostic specimens (biopsy/curettings), enabling earlier informed decision-making. We have strictly adhered to the Institute of Medicine (IOM) guidelines for the development of genomic biomarkers, and herein present the final validation step of a locked-down classifier prior to clinical application. We assessed a retrospective cohort of women from the Tübingen University Women's Hospital treated for endometrial carcinoma between 2003-13. Primary outcomes of overall, disease-specific and progression-free survival were evaluated for clinical, pathological, and molecular features. Complete clinical and molecular data were evaluable from 452 women. Patient age ranged from 29 - 93 (median 65) years, and 87.8% cases were endometrioid histotype. Grade distribution included 282 (62.4%) G1, 75 (16.6%) G2, and 95 (21.0%) G3 tumors. 276 (61.1%) patients had stage IA disease, with the remaining stage IB (89 (19.7%)), stage II (26 (5.8%)), and stage III/IV (61 (13.5%)). ProMisE molecular classification yielded 127 (28.1%) MMR-D, 42 (9.3%) POLE, 55 (12.2%) p53abn, and 228 (50.4%) p53wt. ProMisE was a prognostic marker for progression-free (P=0.001) and disease-specific (P=0.03) survival even after adjusting for known risk factors. Concordance between diagnostic and surgical specimens was highly favorable; accuracy 0.91, kappa 0.88. We have developed, confirmed and now validated a pragmatic molecular classification tool (ProMisE) that provides consistent categorization of tumors and identifies four distinct prognostic molecular subtypes. ProMisE can be applied to diagnostic samples and thus could be used to inform surgical procedure(s) and/or need for adjuvant therapy. Based on the IOM

  2. Identifying hidden voice and video streams

    Science.gov (United States)

    Fan, Jieyan; Wu, Dapeng; Nucci, Antonio; Keralapura, Ram; Gao, Lixin

    2009-04-01

    Given the rising popularity of voice and video services over the Internet, accurately identifying voice and video traffic that traverse their networks has become a critical task for Internet service providers (ISPs). As the number of proprietary applications that deliver voice and video services to end users increases over time, the search for the one methodology that can accurately detect such services while being application independent still remains open. This problem becomes even more complicated when voice and video service providers like Skype, Microsoft, and Google bundle their voice and video services with other services like file transfer and chat. For example, a bundled Skype session can contain both voice stream and file transfer stream in the same layer-3/layer-4 flow. In this context, traditional techniques to identify voice and video streams do not work. In this paper, we propose a novel self-learning classifier, called VVS-I , that detects the presence of voice and video streams in flows with minimum manual intervention. Our classifier works in two phases: training phase and detection phase. In the training phase, VVS-I first extracts the relevant features, and subsequently constructs a fingerprint of a flow using the power spectral density (PSD) analysis. In the detection phase, it compares the fingerprint of a flow to the existing fingerprints learned during the training phase, and subsequently classifies the flow. Our classifier is not only capable of detecting voice and video streams that are hidden in different flows, but is also capable of detecting different applications (like Skype, MSN, etc.) that generate these voice/video streams. We show that our classifier can achieve close to 100% detection rate while keeping the false positive rate to less that 1%.

  3. Selectivity in Genetic Association with Sub-classified Migraine in Women

    Science.gov (United States)

    Chasman, Daniel I.; Anttila, Verneri; Buring, Julie E.; Ridker, Paul M.; Schürks, Markus; Kurth, Tobias

    2014-01-01

    Migraine can be sub-classified not only according to presence of migraine aura (MA) or absence of migraine aura (MO), but also by additional features accompanying migraine attacks, e.g. photophobia, phonophobia, nausea, etc. all of which are formally recognized by the International Classification of Headache Disorders. It remains unclear how aura status and the other migraine features may be related to underlying migraine pathophysiology. Recent genome-wide association studies (GWAS) have identified 12 independent loci at which single nucleotide polymorphisms (SNPs) are associated with migraine. Using a likelihood framework, we explored the selective association of these SNPs with migraine, sub-classified according to aura status and the other features in a large population-based cohort of women including 3,003 active migraineurs and 18,108 free of migraine. Five loci met stringent significance for association with migraine, among which four were selective for sub-classified migraine, including rs11172113 (LRP1) for MO. The number of loci associated with migraine increased to 11 at suggestive significance thresholds, including five additional selective associations for MO but none for MA. No two SNPs showed similar patterns of selective association with migraine characteristics. At one extreme, SNPs rs6790925 (near TGFBR2) and rs2274316 (MEF2D) were not associated with migraine overall, MA, or MO but were selective for migraine sub-classified by the presence of one or more of the additional migraine features. In contrast, SNP rs7577262 (TRPM8) was associated with migraine overall and showed little or no selectivity for any of the migraine characteristics. The results emphasize the multivalent nature of migraine pathophysiology and suggest that a complete understanding of the genetic influence on migraine may benefit from analyses that stratify migraine according to both aura status and the additional diagnostic features used for clinical characterization of

  4. Local-global classifier fusion for screening chest radiographs

    Science.gov (United States)

    Ding, Meng; Antani, Sameer; Jaeger, Stefan; Xue, Zhiyun; Candemir, Sema; Kohli, Marc; Thoma, George

    2017-03-01

    Tuberculosis (TB) is a severe comorbidity of HIV and chest x-ray (CXR) analysis is a necessary step in screening for the infective disease. Automatic analysis of digital CXR images for detecting pulmonary abnormalities is critical for population screening, especially in medical resource constrained developing regions. In this article, we describe steps that improve previously reported performance of NLM's CXR screening algorithms and help advance the state of the art in the field. We propose a local-global classifier fusion method where two complementary classification systems are combined. The local classifier focuses on subtle and partial presentation of the disease leveraging information in radiology reports that roughly indicates locations of the abnormalities. In addition, the global classifier models the dominant spatial structure in the gestalt image using GIST descriptor for the semantic differentiation. Finally, the two complementary classifiers are combined using linear fusion, where the weight of each decision is calculated by the confidence probabilities from the two classifiers. We evaluated our method on three datasets in terms of the area under the Receiver Operating Characteristic (ROC) curve, sensitivity, specificity and accuracy. The evaluation demonstrates the superiority of our proposed local-global fusion method over any single classifier.

  5. Balanced sensitivity functions for tuning multi-dimensional Bayesian network classifiers

    NARCIS (Netherlands)

    Bolt, J.H.; van der Gaag, L.C.

    Multi-dimensional Bayesian network classifiers are Bayesian networks of restricted topological structure, which are tailored to classifying data instances into multiple dimensions. Like more traditional classifiers, multi-dimensional classifiers are typically learned from data and may include

  6. Assessment of fractures classified as non-mineralised in the Sicada database

    Energy Technology Data Exchange (ETDEWEB)

    Claesson Liljedahl, Lillemor; Munier, Raymond (Swedish Nuclear Fuel and Waste Management Co., Stockholm (Sweden)); Sandstroem, Bjoern (WSP Sverige AB, Goeteborg (Sweden)); Drake, Henrik (Isochron GeoConsulting, Varberg (Sweden)); Tullborg, Eva-Lena (Terralogica AB, Graabo (Sweden))

    2011-03-15

    The general objective of this report was to describe the results of the investigation of fractures classified as non-mineralised in Sicada. Such fractures exist at Forsmark and at Laxemar. The main aims of the investigation of these fractures were to: - Quantify the number of non-mineralised fractures (i.e. fractures lacking mineral coating) in Sicada (table: p{_}fract{_}core{_}eshi). - Closely examine a selection of fractures recorded as non-mineralised in Sicada. - Outline possible reasons for the existence of non-mineralised fractures. The work has involved extraction of fracture data from Sicada and subsequent statistical analysis. Since several thousand fractures are classified as non-mineralised in Sicada, it was not a practical possibility to include all these in this study, we examined one fracture sub-set from each site. We investigated a sample of 204 of these fractures in detail (see Sections 1.1 and 2.4). Rock mechanical differences between Forsmark and Laxemar and kinematic analysis of fracture surfaces is not discussed in this report

  7. Recognition of pornographic web pages by classifying texts and images.

    Science.gov (United States)

    Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve

    2007-06-01

    With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.

  8. Zooniverse: Combining Human and Machine Classifiers for the Big Survey Era

    Science.gov (United States)

    Fortson, Lucy; Wright, Darryl; Beck, Melanie; Lintott, Chris; Scarlata, Claudia; Dickinson, Hugh; Trouille, Laura; Willi, Marco; Laraia, Michael; Boyer, Amy; Veldhuis, Marten; Zooniverse

    2018-01-01

    Many analyses of astronomical data sets, ranging from morphological classification of galaxies to identification of supernova candidates, have relied on humans to classify data into distinct categories. Crowdsourced galaxy classifications via the Galaxy Zoo project provided a solution that scaled visual classification for extant surveys by harnessing the combined power of thousands of volunteers. However, the much larger data sets anticipated from upcoming surveys will require a different approach. Automated classifiers using supervised machine learning have improved considerably over the past decade but their increasing sophistication comes at the expense of needing ever more training data. Crowdsourced classification by human volunteers is a critical technique for obtaining these training data. But several improvements can be made on this zeroth order solution. Efficiency gains can be achieved by implementing a “cascade filtering” approach whereby the task structure is reduced to a set of binary questions that are more suited to simpler machines while demanding lower cognitive loads for humans.Intelligent subject retirement based on quantitative metrics of volunteer skill and subject label reliability also leads to dramatic improvements in efficiency. We note that human and machine classifiers may retire subjects differently leading to trade-offs in performance space. Drawing on work with several Zooniverse projects including Galaxy Zoo and Supernova Hunter, we will present recent findings from experiments that combine cohorts of human and machine classifiers. We show that the most efficient system results when appropriate subsets of the data are intelligently assigned to each group according to their particular capabilities.With sufficient online training, simple machines can quickly classify “easy” subjects, leaving more difficult (and discovery-oriented) tasks for volunteers. We also find humans achieve higher classification purity while samples

  9. Using Neural Networks to Classify Digitized Images of Galaxies

    Science.gov (United States)

    Goderya, S. N.; McGuire, P. C.

    2000-12-01

    Automated classification of Galaxies into Hubble types is of paramount importance to study the large scale structure of the Universe, particularly as survey projects like the Sloan Digital Sky Survey complete their data acquisition of one million galaxies. At present it is not possible to find robust and efficient artificial intelligence based galaxy classifiers. In this study we will summarize progress made in the development of automated galaxy classifiers using neural networks as machine learning tools. We explore the Bayesian linear algorithm, the higher order probabilistic network, the multilayer perceptron neural network and Support Vector Machine Classifier. The performance of any machine classifier is dependant on the quality of the parameters that characterize the different groups of galaxies. Our effort is to develop geometric and invariant moment based parameters as input to the machine classifiers instead of the raw pixel data. Such an approach reduces the dimensionality of the classifier considerably, and removes the effects of scaling and rotation, and makes it easier to solve for the unknown parameters in the galaxy classifier. To judge the quality of training and classification we develop the concept of Mathews coefficients for the galaxy classification community. Mathews coefficients are single numbers that quantify classifier performance even with unequal prior probabilities of the classes.

  10. Classifier fusion for VoIP attacks classification

    Science.gov (United States)

    Safarik, Jakub; Rezac, Filip

    2017-05-01

    SIP is one of the most successful protocols in the field of IP telephony communication. It establishes and manages VoIP calls. As the number of SIP implementation rises, we can expect a higher number of attacks on the communication system in the near future. This work aims at malicious SIP traffic classification. A number of various machine learning algorithms have been developed for attack classification. The paper presents a comparison of current research and the use of classifier fusion method leading to a potential decrease in classification error rate. Use of classifier combination makes a more robust solution without difficulties that may affect single algorithms. Different voting schemes, combination rules, and classifiers are discussed to improve the overall performance. All classifiers have been trained on real malicious traffic. The concept of traffic monitoring depends on the network of honeypot nodes. These honeypots run in several networks spread in different locations. Separation of honeypots allows us to gain an independent and trustworthy attack information.

  11. Fast Most Similar Neighbor (MSN) classifiers for Mixed Data

    OpenAIRE

    Hernández Rodríguez, Selene

    2010-01-01

    The k nearest neighbor (k-NN) classifier has been extensively used in Pattern Recognition because of its simplicity and its good performance. However, in large datasets applications, the exhaustive k-NN classifier becomes impractical. Therefore, many fast k-NN classifiers have been developed; most of them rely on metric properties (usually the triangle inequality) to reduce the number of prototype comparisons. Hence, the existing fast k-NN classifiers are applicable only when the comparison f...

  12. Deep convolutional neural network for classifying Fusarium wilt of radish from unmanned aerial vehicles

    Science.gov (United States)

    Ha, Jin Gwan; Moon, Hyeonjoon; Kwak, Jin Tae; Hassan, Syed Ibrahim; Dang, Minh; Lee, O. New; Park, Han Yong

    2017-10-01

    Recently, unmanned aerial vehicles (UAVs) have gained much attention. In particular, there is a growing interest in utilizing UAVs for agricultural applications such as crop monitoring and management. We propose a computerized system that is capable of detecting Fusarium wilt of radish with high accuracy. The system adopts computer vision and machine learning techniques, including deep learning, to process the images captured by UAVs at low altitudes and to identify the infected radish. The whole radish field is first segmented into three distinctive regions (radish, bare ground, and mulching film) via a softmax classifier and K-means clustering. Then, the identified radish regions are further classified into healthy radish and Fusarium wilt of radish using a deep convolutional neural network (CNN). In identifying radish, bare ground, and mulching film from a radish field, we achieved an accuracy of ≥97.4%. In detecting Fusarium wilt of radish, the CNN obtained an accuracy of 93.3%. It also outperformed the standard machine learning algorithm, obtaining 82.9% accuracy. Therefore, UAVs equipped with computational techniques are promising tools for improving the quality and efficiency of agriculture today.

  13. Feature extraction for dynamic integration of classifiers

    NARCIS (Netherlands)

    Pechenizkiy, M.; Tsymbal, A.; Puuronen, S.; Patterson, D.W.

    2007-01-01

    Recent research has shown the integration of multiple classifiers to be one of the most important directions in machine learning and data mining. In this paper, we present an algorithm for the dynamic integration of classifiers in the space of extracted features (FEDIC). It is based on the technique

  14. Classifying Returns as Extreme

    DEFF Research Database (Denmark)

    Christiansen, Charlotte

    2014-01-01

    I consider extreme returns for the stock and bond markets of 14 EU countries using two classification schemes: One, the univariate classification scheme from the previous literature that classifies extreme returns for each market separately, and two, a novel multivariate classification scheme tha...

  15. The Protection of Classified Information: The Legal Framework

    National Research Council Canada - National Science Library

    Elsea, Jennifer K

    2006-01-01

    Recent incidents involving leaks of classified information have heightened interest in the legal framework that governs security classification, access to classified information, and penalties for improper disclosure...

  16. Classifying risk status of non-clinical adolescents using psychometric indicators for psychosis spectrum disorders.

    Science.gov (United States)

    Fonseca-Pedrero, Eduardo; Gooding, Diane C; Ortuño-Sierra, Javier; Pflum, Madeline; Paino, Mercedes; Muñiz, José

    2016-09-30

    This study is an attempt to evaluate extant psychometric indicators using latent profile analysis for classifying community-derived individuals based on a set of clinical, behavioural, and personality traits considered risk markers for psychosis spectrum disorders. The present investigation included four hundred and forty-nine high-school students between the ages of 12 and 19. We used the following to assess risk: the Prodromal Questionnaire-Brief (PQ-B), Oviedo Schizotypy Assessment Questionnaire (ESQUIZO-Q), Anticipatory and Consummatory Interpersonal Pleasure Scale-Adolescent version (ACIPS-A), and General Health Questionnaire 12 (GHQ-12). Using Latent profile analysis six latent classes (LC) were identified: participants in class 1 (LC1) displayed little or no symptoms and accounted for 38.53% of the sample; class 2 (LC2), who accounted for 28.06%, also produced low mean scores across most measures though they expressed somewhat higher levels of subjective distress; LC3, a positive schizotypy group (10.24%); LC4 (13.36%), a psychosis high-risk group; LC5, a high positive and negative schizotypy group (4.45%); and LC6, a very high distress, severe clinical high-risk group, comprised 5.34% of the sample. The current research indicates that different latent classes of early individuals at risk can be empirically defined in adolescent community samples using psychometric indicators for psychosis spectrum disorders. These findings may have implications for early detection and prevention strategies in psychosis spectrum disorders. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  17. Energy-Efficient Neuromorphic Classifiers.

    Science.gov (United States)

    Martí, Daniel; Rigotti, Mattia; Seok, Mingoo; Fusi, Stefano

    2016-10-01

    Neuromorphic engineering combines the architectural and computational principles of systems neuroscience with semiconductor electronics, with the aim of building efficient and compact devices that mimic the synaptic and neural machinery of the brain. The energy consumptions promised by neuromorphic engineering are extremely low, comparable to those of the nervous system. Until now, however, the neuromorphic approach has been restricted to relatively simple circuits and specialized functions, thereby obfuscating a direct comparison of their energy consumption to that used by conventional von Neumann digital machines solving real-world tasks. Here we show that a recent technology developed by IBM can be leveraged to realize neuromorphic circuits that operate as classifiers of complex real-world stimuli. Specifically, we provide a set of general prescriptions to enable the practical implementation of neural architectures that compete with state-of-the-art classifiers. We also show that the energy consumption of these architectures, realized on the IBM chip, is typically two or more orders of magnitude lower than that of conventional digital machines implementing classifiers with comparable performance. Moreover, the spike-based dynamics display a trade-off between integration time and accuracy, which naturally translates into algorithms that can be flexibly deployed for either fast and approximate classifications, or more accurate classifications at the mere expense of longer running times and higher energy costs. This work finally proves that the neuromorphic approach can be efficiently used in real-world applications and has significant advantages over conventional digital devices when energy consumption is considered.

  18. Automated Detection of Driver Fatigue Based on AdaBoost Classifier with EEG Signals

    Directory of Open Access Journals (Sweden)

    Jianfeng Hu

    2017-08-01

    Full Text Available Purpose: Driving fatigue has become one of the important causes of road accidents, there are many researches to analyze driver fatigue. EEG is becoming increasingly useful in the measuring fatigue state. Manual interpretation of EEG signals is impossible, so an effective method for automatic detection of EEG signals is crucial needed.Method: In order to evaluate the complex, unstable, and non-linear characteristics of EEG signals, four feature sets were computed from EEG signals, in which fuzzy entropy (FE, sample entropy (SE, approximate Entropy (AE, spectral entropy (PE, and combined entropies (FE + SE + AE + PE were included. All these feature sets were used as the input vectors of AdaBoost classifier, a boosting method which is fast and highly accurate. To assess our method, several experiments including parameter setting and classifier comparison were conducted on 28 subjects. For comparison, Decision Trees (DT, Support Vector Machine (SVM and Naive Bayes (NB classifiers are used.Results: The proposed method (combination of FE and AdaBoost yields superior performance than other schemes. Using FE feature extractor, AdaBoost achieves improved area (AUC under the receiver operating curve of 0.994, error rate (ERR of 0.024, Precision of 0.969, Recall of 0.984, F1 score of 0.976, and Matthews correlation coefficient (MCC of 0.952, compared to SVM (ERR at 0.035, Precision of 0.957, Recall of 0.974, F1 score of 0.966, and MCC of 0.930 with AUC of 0.990, DT (ERR at 0.142, Precision of 0.857, Recall of 0.859, F1 score of 0.966, and MCC of 0.716 with AUC of 0.916 and NB (ERR at 0.405, Precision of 0.646, Recall of 0.434, F1 score of 0.519, and MCC of 0.203 with AUC of 0.606. It shows that the FE feature set and combined feature set outperform other feature sets. AdaBoost seems to have better robustness against changes of ratio of test samples for all samples and number of subjects, which might therefore aid in the real-time detection of driver

  19. Effect of sample size on multi-parametric prediction of tissue outcome in acute ischemic stroke using a random forest classifier

    Science.gov (United States)

    Forkert, Nils Daniel; Fiehler, Jens

    2015-03-01

    The tissue outcome prediction in acute ischemic stroke patients is highly relevant for clinical and research purposes. It has been shown that the combined analysis of diffusion and perfusion MRI datasets using high-level machine learning techniques leads to an improved prediction of final infarction compared to single perfusion parameter thresholding. However, most high-level classifiers require a previous training and, until now, it is ambiguous how many subjects are required for this, which is the focus of this work. 23 MRI datasets of acute stroke patients with known tissue outcome were used in this work. Relative values of diffusion and perfusion parameters as well as the binary tissue outcome were extracted on a voxel-by- voxel level for all patients and used for training of a random forest classifier. The number of patients used for training set definition was iteratively and randomly reduced from using all 22 other patients to only one other patient. Thus, 22 tissue outcome predictions were generated for each patient using the trained random forest classifiers and compared to the known tissue outcome using the Dice coefficient. Overall, a logarithmic relation between the number of patients used for training set definition and tissue outcome prediction accuracy was found. Quantitatively, a mean Dice coefficient of 0.45 was found for the prediction using the training set consisting of the voxel information from only one other patient, which increases to 0.53 if using all other patients (n=22). Based on extrapolation, 50-100 patients appear to be a reasonable tradeoff between tissue outcome prediction accuracy and effort required for data acquisition and preparation.

  20. Performance of the BioPlex 2200 HIV Ag-Ab assay for identifying acute HIV infection.

    Science.gov (United States)

    Eshleman, Susan H; Piwowar-Manning, Estelle; Sivay, Mariya V; Debevec, Barbara; Veater, Stephanie; McKinstry, Laura; Bekker, Linda-Gail; Mannheimer, Sharon; Grant, Robert M; Chesney, Margaret A; Coates, Thomas J; Koblin, Beryl A; Fogel, Jessica M

    Assays that detect HIV antigen (Ag) and antibody (Ab) can be used to screen for HIV infection. To compare the performance of the BioPlex 2200 HIV Ag-Ab assay and two other Ag/Ab combination assays for detection of acute HIV infection. Samples were obtained from 24 individuals (18 from the US, 6 from South Africa); these individuals were classified as having acute infection based on the following criteria: positive qualitative RNA assay; two negative rapid tests; negative discriminatory test. The samples were tested with the BioPlex assay, the ARCHITECT HIV Ag/Ab Combo test, the Bio-Rad GS HIV Combo Ag-Ab EIA test, and a viral load assay. Twelve (50.0%) of 24 samples had RNA detected only ( > 40 to 13,476 copies/mL). Ten (43.5%) samples had reactive results with all three Ag/Ab assays, one sample was reactive with the ARCHITECT and Bio-Rad assays, and one sample was reactive with the Bio-Rad and BioPlex assays. The 11 samples that were reactive with the BioPlex assay had viral loads from 83,010 to >750,000 copies/mL; 9/11 samples were classified as Ag positive/Ab negative by the BioPlex assay. Detection of acute HIV infection was similar for the BioPlex assay and two other Ag/Ab assays. All three tests were less sensitive than a qualitative RNA assay and only detected HIV Ag when the viral load was high. The BioPlex assay detected acute infection in about half of the cases, and identified most of those infections as Ag positive/Ab negative. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. MSEBAG: a dynamic classifier ensemble generation based on `minimum-sufficient ensemble' and bagging

    Science.gov (United States)

    Chen, Lei; Kamel, Mohamed S.

    2016-01-01

    In this paper, we propose a dynamic classifier system, MSEBAG, which is characterised by searching for the 'minimum-sufficient ensemble' and bagging at the ensemble level. It adopts an 'over-generation and selection' strategy and aims to achieve a good bias-variance trade-off. In the training phase, MSEBAG first searches for the 'minimum-sufficient ensemble', which maximises the in-sample fitness with the minimal number of base classifiers. Then, starting from the 'minimum-sufficient ensemble', a backward stepwise algorithm is employed to generate a collection of ensembles. The objective is to create a collection of ensembles with a descending fitness on the data, as well as a descending complexity in the structure. MSEBAG dynamically selects the ensembles from the collection for the decision aggregation. The extended adaptive aggregation (EAA) approach, a bagging-style algorithm performed at the ensemble level, is employed for this task. EAA searches for the competent ensembles using a score function, which takes into consideration both the in-sample fitness and the confidence of the statistical inference, and averages the decisions of the selected ensembles to label the test pattern. The experimental results show that the proposed MSEBAG outperforms the benchmarks on average.

  2. Application of Monte Carlo cross-validation to identify pathway cross-talk in neonatal sepsis.

    Science.gov (United States)

    Zhang, Yuxia; Liu, Cui; Wang, Jingna; Li, Xingxia

    2018-03-01

    To explore genetic pathway cross-talk in neonates with sepsis, an integrated approach was used in this paper. To explore the potential relationships between differently expressed genes between normal uninfected neonates and neonates with sepsis and pathways, genetic profiling and biologic signaling pathway were first integrated. For different pathways, the score was obtained based upon the genetic expression by quantitatively analyzing the pathway cross-talk. The paired pathways with high cross-talk were identified by random forest classification. The purpose of the work was to find the best pairs of pathways able to discriminate sepsis samples versus normal samples. The results found 10 pairs of pathways, which were probably able to discriminate neonates with sepsis versus normal uninfected neonates. Among them, the best two paired pathways were identified according to analysis of extensive literature. Impact statement To find the best pairs of pathways able to discriminate sepsis samples versus normal samples, an RF classifier, the DS obtained by DEGs of paired pathways significantly associated, and Monte Carlo cross-validation were applied in this paper. Ten pairs of pathways were probably able to discriminate neonates with sepsis versus normal uninfected neonates. Among them, the best two paired pathways ((7) IL-6 Signaling and Phospholipase C Signaling (PLC); (8) Glucocorticoid Receptor (GR) Signaling and Dendritic Cell Maturation) were identified according to analysis of extensive literature.

  3. Effective Heart Disease Detection Based on Quantitative Computerized Traditional Chinese Medicine Using Representation Based Classifiers

    Directory of Open Access Journals (Sweden)

    Ting Shu

    2017-01-01

    Full Text Available At present, heart disease is the number one cause of death worldwide. Traditionally, heart disease is commonly detected using blood tests, electrocardiogram, cardiac computerized tomography scan, cardiac magnetic resonance imaging, and so on. However, these traditional diagnostic methods are time consuming and/or invasive. In this paper, we propose an effective noninvasive computerized method based on facial images to quantitatively detect heart disease. Specifically, facial key block color features are extracted from facial images and analyzed using the Probabilistic Collaborative Representation Based Classifier. The idea of facial key block color analysis is founded in Traditional Chinese Medicine. A new dataset consisting of 581 heart disease and 581 healthy samples was experimented by the proposed method. In order to optimize the Probabilistic Collaborative Representation Based Classifier, an analysis of its parameters was performed. According to the experimental results, the proposed method obtains the highest accuracy compared with other classifiers and is proven to be effective at heart disease detection.

  4. Reinforcement Learning Based Artificial Immune Classifier

    Directory of Open Access Journals (Sweden)

    Mehmet Karakose

    2013-01-01

    Full Text Available One of the widely used methods for classification that is a decision-making process is artificial immune systems. Artificial immune systems based on natural immunity system can be successfully applied for classification, optimization, recognition, and learning in real-world problems. In this study, a reinforcement learning based artificial immune classifier is proposed as a new approach. This approach uses reinforcement learning to find better antibody with immune operators. The proposed new approach has many contributions according to other methods in the literature such as effectiveness, less memory cell, high accuracy, speed, and data adaptability. The performance of the proposed approach is demonstrated by simulation and experimental results using real data in Matlab and FPGA. Some benchmark data and remote image data are used for experimental results. The comparative results with supervised/unsupervised based artificial immune system, negative selection classifier, and resource limited artificial immune classifier are given to demonstrate the effectiveness of the proposed new method.

  5. Intelligent Garbage Classifier

    Directory of Open Access Journals (Sweden)

    Ignacio Rodríguez Novelle

    2008-12-01

    Full Text Available IGC (Intelligent Garbage Classifier is a system for visual classification and separation of solid waste products. Currently, an important part of the separation effort is based on manual work, from household separation to industrial waste management. Taking advantage of the technologies currently available, a system has been built that can analyze images from a camera and control a robot arm and conveyor belt to automatically separate different kinds of waste.

  6. Correlation Dimension-Based Classifier

    Czech Academy of Sciences Publication Activity Database

    Jiřina, Marcel; Jiřina jr., M.

    2014-01-01

    Roč. 44, č. 12 (2014), s. 2253-2263 ISSN 2168-2267 R&D Projects: GA MŠk(CZ) LG12020 Institutional support: RVO:67985807 Keywords : classifier * multidimensional data * correlation dimension * scaling exponent * polynomial expansion Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 3.469, year: 2014

  7. An ensemble classifier to predict track geometry degradation

    International Nuclear Information System (INIS)

    Cárdenas-Gallo, Iván; Sarmiento, Carlos A.; Morales, Gilberto A.; Bolivar, Manuel A.; Akhavan-Tabatabaei, Raha

    2017-01-01

    Railway operations are inherently complex and source of several problems. In particular, track geometry defects are one of the leading causes of train accidents in the United States. This paper presents a solution approach which entails the construction of an ensemble classifier to forecast the degradation of track geometry. Our classifier is constructed by solving the problem from three different perspectives: deterioration, regression and classification. We considered a different model from each perspective and our results show that using an ensemble method improves the predictive performance. - Highlights: • We present an ensemble classifier to forecast the degradation of track geometry. • Our classifier considers three perspectives: deterioration, regression and classification. • We construct and test three models and our results show that using an ensemble method improves the predictive performance.

  8. Methods of generalizing and classifying layer structures of a special form

    Energy Technology Data Exchange (ETDEWEB)

    Viktorova, N P

    1981-09-01

    An examination is made of the problem of classifying structures represented by weighted multilayer graphs of special form with connections between the vertices of each layer. The classification of structures of such a form is based on the construction of resolving sets of graphs as a result of generalization of the elements of the training sample of each class and the testing of whether an input object is isomorphic (with allowance for the weights) to the structures of the resolving set or not. 4 references.

  9. Data characteristics that determine classifier performance

    CSIR Research Space (South Africa)

    Van der Walt, Christiaan M

    2006-11-01

    Full Text Available available at [11]. The kNN uses a LinearNN nearest neighbour search algorithm with an Euclidean distance metric [8]. The optimal k value is determined by performing 10-fold cross-validation. An optimal k value between 1 and 10 is used for Experiments 1... classifiers. 10-fold cross-validation is used to evaluate and compare the performance of the classifiers on the different data sets. 3.1. Artificial data generation Multivariate Gaussian distributions are used to generate artificial data sets. We use d...

  10. Disassembly and Sanitization of Classified Matter

    International Nuclear Information System (INIS)

    Stockham, Dwight J.; Saad, Max P.

    2008-01-01

    The Disassembly Sanitization Operation (DSO) process was implemented to support weapon disassembly and disposition by using recycling and waste minimization measures. This process was initiated by treaty agreements and reconfigurations within both the DOD and DOE Complexes. The DOE is faced with disassembling and disposing of a huge inventory of retired weapons, components, training equipment, spare parts, weapon maintenance equipment, and associated material. In addition, regulations have caused a dramatic increase in the need for information required to support the handling and disposition of these parts and materials. In the past, huge inventories of classified weapon components were required to have long-term storage at Sandia and at many other locations throughout the DoE Complex. These materials are placed in onsite storage unit due to classification issues and they may also contain radiological and/or hazardous components. Since no disposal options exist for this material, the only choice was long-term storage. Long-term storage is costly and somewhat problematic, requiring a secured storage area, monitoring, auditing, and presenting the potential for loss or theft of the material. Overall recycling rates for materials sent through the DSO process have enabled 70 to 80% of these components to be recycled. These components are made of high quality materials and once this material has been sanitized, the demand for the component metals for recycling efforts is very high. The DSO process for NGPF, classified components established the credibility of this technique for addressing the long-term storage requirements of the classified weapons component inventory. The success of this application has generated interest from other Sandia organizations and other locations throughout the complex. Other organizations are requesting the help of the DSO team and the DSO is responding to these requests by expanding its scope to include Work-for- Other projects. For example

  11. Evaluation of immunization coverage by lot quality assurance sampling compared with 30-cluster sampling in a primary health centre in India.

    Science.gov (United States)

    Singh, J; Jain, D C; Sharma, R S; Verghese, T

    1996-01-01

    The immunization coverage of infants, children and women residing in a primary health centre (PHC) area in Rajasthan was evaluated both by lot quality assurance sampling (LQAS) and by the 30-cluster sampling method recommended by WHO's Expanded Programme on Immunization (EPI). The LQAS survey was used to classify 27 mutually exclusive subunits of the population, defined as residents in health subcentre areas, on the basis of acceptable or unacceptable levels of immunization coverage among infants and their mothers. The LQAS results from the 27 subcentres were also combined to obtain an overall estimate of coverage for the entire population of the primary health centre, and these results were compared with the EPI cluster survey results. The LQAS survey did not identify any subcentre with a level of immunization among infants high enough to be classified as acceptable; only three subcentres were classified as having acceptable levels of tetanus toxoid (TT) coverage among women. The estimated overall coverage in the PHC population from the combined LQAS results showed that a quarter of the infants were immunized appropriately for their ages and that 46% of their mothers had been adequately immunized with TT. Although the age groups and the periods of time during which the children were immunized differed for the LQAS and EPI survey populations, the characteristics of the mothers were largely similar. About 57% (95% CI, 46-67) of them were found to be fully immunized with TT by 30-cluster sampling, compared with 46% (95% CI, 41-51) by stratified random sampling. The difference was not statistically significant. The field work to collect LQAS data took about three times longer, and cost 60% more than the EPI survey. The apparently homogeneous and low level of immunization coverage in the 27 subcentres makes this an impractical situation in which to apply LQAS, and the results obtained were therefore not particularly useful. However, if LQAS had been applied by local

  12. Segment convolutional neural networks (Seg-CNNs) for classifying relations in clinical notes.

    Science.gov (United States)

    Luo, Yuan; Cheng, Yu; Uzuner, Özlem; Szolovits, Peter; Starren, Justin

    2018-01-01

    We propose Segment Convolutional Neural Networks (Seg-CNNs) for classifying relations from clinical notes. Seg-CNNs use only word-embedding features without manual feature engineering. Unlike typical CNN models, relations between 2 concepts are identified by simultaneously learning separate representations for text segments in a sentence: preceding, concept1, middle, concept2, and succeeding. We evaluate Seg-CNN on the i2b2/VA relation classification challenge dataset. We show that Seg-CNN achieves a state-of-the-art micro-average F-measure of 0.742 for overall evaluation, 0.686 for classifying medical problem-treatment relations, 0.820 for medical problem-test relations, and 0.702 for medical problem-medical problem relations. We demonstrate the benefits of learning segment-level representations. We show that medical domain word embeddings help improve relation classification. Seg-CNNs can be trained quickly for the i2b2/VA dataset on a graphics processing unit (GPU) platform. These results support the use of CNNs computed over segments of text for classifying medical relations, as they show state-of-the-art performance while requiring no manual feature engineering. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  13. Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study.

    Science.gov (United States)

    Broman, Karl W; Keller, Mark P; Broman, Aimee Teo; Kendziorski, Christina; Yandell, Brian S; Sen, Śaunak; Attie, Alan D

    2015-08-19

    In a mouse intercross with more than 500 animals and genome-wide gene expression data on six tissues, we identified a high proportion (18%) of sample mix-ups in the genotype data. Local expression quantitative trait loci (eQTL; genetic loci influencing gene expression) with extremely large effect were used to form a classifier to predict an individual's eQTL genotype based on expression data alone. By considering multiple eQTL and their related transcripts, we identified numerous individuals whose predicted eQTL genotypes (based on their expression data) did not match their observed genotypes, and then went on to identify other individuals whose genotypes did match the predicted eQTL genotypes. The concordance of predictions across six tissues indicated that the problem was due to mix-ups in the genotypes (although we further identified a small number of sample mix-ups in each of the six panels of gene expression microarrays). Consideration of the plate positions of the DNA samples indicated a number of off-by-one and off-by-two errors, likely the result of pipetting errors. Such sample mix-ups can be a problem in any genetic study, but eQTL data allow us to identify, and even correct, such problems. Our methods have been implemented in an R package, R/lineup. Copyright © 2015 Broman et al.

  14. How well Can We Classify SWOT-derived Water Surface Profiles?

    Science.gov (United States)

    Frasson, R. P. M.; Wei, R.; Picamilh, C.; Durand, M. T.

    2015-12-01

    The upcoming Surface Water Ocean Topography (SWOT) mission will detect water bodies and measure water surface elevation throughout the globe. Within its continental high resolution mask, SWOT is expected to deliver measurements of river width, water elevation and slope of rivers wider than ~50 m. The definition of river reaches is an integral step of the computation of discharge based on SWOT's observables. As poorly defined reaches can negatively affect the accuracy of discharge estimations, we seek strategies to break up rivers into physically meaningful sections. In the present work, we investigate how accurately we can classify water surface profiles based on simulated SWOT observations. We assume that most river sections can be classified as either M1 (mild slope, with depth larger than the normal depth), or A1 (adverse slope with depth larger than the critical depth). This assumption allows the classification to be based solely on the second derivative of water surface profiles, with convex profiles being classified as A1 and concave profiles as M1. We consider a HEC-RAS model of the Sacramento River as a representation of the true state of the river. We employ the SWOT instrument simulator to generate a synthetic pass of the river, which includes our best estimates of height measurement noise and geolocation errors. We process the resulting point cloud of water surface heights with the RiverObs package, which delineates the river center line and draws the water surface profile. Next, we identify inflection points in the water surface profile and classify the sections between the inflection points. Finally, we compare our limited classification of simulated SWOT-derived water surface profile to the "exact" classification of the modeled Sacramento River. With this exercise, we expect to determine if SWOT observations can be used to find inflection points in water surface profiles, which would bring knowledge of flow regimes into the definition of river reaches.

  15. Fault Diagnosis for Distribution Networks Using Enhanced Support Vector Machine Classifier with Classical Multidimensional Scaling

    Directory of Open Access Journals (Sweden)

    Ming-Yuan Cho

    2017-09-01

    Full Text Available In this paper, a new fault diagnosis techniques based on time domain reflectometry (TDR method with pseudo-random binary sequence (PRBS stimulus and support vector machine (SVM classifier has been investigated to recognize the different types of fault in the radial distribution feeders. This novel technique has considered the amplitude of reflected signals and the peaks of cross-correlation (CCR between the reflected and incident wave for generating fault current dataset for SVM. Furthermore, this multi-layer enhanced SVM classifier is combined with classical multidimensional scaling (CMDS feature extraction algorithm and kernel parameter optimization to increase training speed and improve overall classification accuracy. The proposed technique has been tested on a radial distribution feeder to identify ten different types of fault considering 12 input features generated by using Simulink software and MATLAB Toolbox. The success rate of SVM classifier is over 95% which demonstrates the effectiveness and the high accuracy of proposed method.

  16. Assessment of fractures classified as non-mineralised in the Sicada database

    International Nuclear Information System (INIS)

    Claesson Liljedahl, Lillemor; Munier, Raymond; Sandstroem, Bjoern; Drake, Henrik; Tullborg, Eva-Lena

    2011-03-01

    The general objective of this report was to describe the results of the investigation of fractures classified as non-mineralised in Sicada. Such fractures exist at Forsmark and at Laxemar. The main aims of the investigation of these fractures were to: - Quantify the number of non-mineralised fractures (i.e. fractures lacking mineral coating) in Sicada (table: p f ract c ore e shi). - Closely examine a selection of fractures recorded as non-mineralised in Sicada. - Outline possible reasons for the existence of non-mineralised fractures. The work has involved extraction of fracture data from Sicada and subsequent statistical analysis. Since several thousand fractures are classified as non-mineralised in Sicada, it was not a practical possibility to include all these in this study, we examined one fracture sub-set from each site. We investigated a sample of 204 of these fractures in detail (see Sections 1.1 and 2.4). Rock mechanical differences between Forsmark and Laxemar and kinematic analysis of fracture surfaces is not discussed in this report

  17. Comparing identified and statistically significant lipids and polar metabolites in 15-year old serum and dried blood spot samples for longitudinal studies: Comparing lipids and metabolites in serum and DBS samples

    Energy Technology Data Exchange (ETDEWEB)

    Kyle, Jennifer E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Casey, Cameron P. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Stratton, Kelly G. [National Security Directorate, Pacific Northwest National Laboratory, Richland WA USA; Zink, Erika M. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Kim, Young-Mo [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Zheng, Xueyun [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Monroe, Matthew E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Weitz, Karl K. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Bloodsworth, Kent J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Orton, Daniel J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Ibrahim, Yehia M. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Moore, Ronald J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Lee, Christine G. [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Research Service, Portland Veterans Affairs Medical Center, Portland OR USA; Pedersen, Catherine [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Orwoll, Eric [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Smith, Richard D. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Burnum-Johnson, Kristin E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Baker, Erin S. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA

    2017-02-05

    The use of dried blood spots (DBS) has many advantages over traditional plasma and serum samples such as smaller blood volume required, storage at room temperature, and ability for sampling in remote locations. However, understanding the robustness of different analytes in DBS samples is essential, especially in older samples collected for longitudinal studies. Here we analyzed DBS samples collected in 2000-2001 and stored at room temperature and compared them to matched serum samples stored at -80°C to determine if they could be effectively used as specific time points in a longitudinal study following metabolic disease. Four hundred small molecules were identified in both the serum and DBS samples using gas chromatograph-mass spectrometry (GC-MS), liquid chromatography-MS (LC-MS) and LC-ion mobility spectrometry-MS (LC-IMS-MS). The identified polar metabolites overlapped well between the sample types, though only one statistically significant polar metabolite in a case-control study was conserved, indicating degradation occurs in the DBS samples affecting quantitation. Differences in the lipid identifications indicated that some oxidation occurs in the DBS samples. However, thirty-six statistically significant lipids correlated in both sample types indicating that lipid quantitation was more stable across the sample types.

  18. Exome sequencing identifies SUCO mutations in mesial temporal lobe epilepsy.

    Science.gov (United States)

    Sha, Zhiqiang; Sha, Longze; Li, Wenting; Dou, Wanchen; Shen, Yan; Wu, Liwen; Xu, Qi

    2015-03-30

    Mesial temporal lobe epilepsy (mTLE) is the main type and most common medically intractable form of epilepsy. Severity of disease-based stratified samples may help identify new disease-associated mutant genes. We analyzed mRNA expression profiles from patient hippocampal tissue. Three of the seven patients had severe mTLE with generalized-onset convulsions and consciousness loss that occurred over many years. We found that compared with other groups, patients with severe mTLE were classified into a distinct group. Whole-exome sequencing and Sanger sequencing validation in all seven patients identified three novel SUN domain-containing ossification factor (SUCO) mutations in severely affected patients. Furthermore, SUCO knock down significantly reduced dendritic length in vitro. Our results indicate that mTLE defects may affect neuronal development, and suggest that neurons have abnormal development due to lack of SUCO, which may be a generalized-onset epilepsy-related gene. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  19. Classifying the Optical Morphology of Shocked POststarburst Galaxies

    Science.gov (United States)

    Stewart, Tess; SPOGs Team

    2018-01-01

    The Shocked POststarburst Galaxy Survey (SPOGS) is a sample of galaxies in transition from blue, star forming spirals to red, inactive ellipticals. These galaxies are earlier in the transition than classical poststarburst samples. We have classified the physical characteristics of the full sample of 1067 SPOGs in 7 categories, covering (1) their shape; (2) the relative prominence of their nuclei; (3) the uniformity of their optical color; (4) whether the outskirts of the galaxy were indicative of on-going star formation; (5) whether they are engaged in interactions with other galaxies, and if so, (6) the kinds of galaxies with which they are interacting; and (7) the presence of asymmetrical features, possibly indicative of recent interactions. We determined that a plurality of SPOGs are in elliptical galaxies, indicating morphological transformations may tend to conclude before other indicators of transitions have faded. Further, early-type SPOGs also tend to have the brightest optical nuclei. Most galaxies do not show signs of current or recent interactions. We used these classifications to search for correlations between qualitative and quantitative characteristics of SPOGs using Sloan Digital Sky Survey and Wide-field Infrared Survey Explorer magnitudes. We find that relative optical nuclear brightness is not a good indicator of the presence of an active galactic nuclei and that galaxies with visible indications of active star formation also cluster in optical color and diagnostic line ratios.

  20. Classifying low-grade and high-grade bladder cancer using label-free serum surface-enhanced Raman spectroscopy and support vector machine

    Science.gov (United States)

    Zhang, Yanjiao; Lai, Xiaoping; Zeng, Qiuyao; Li, Linfang; Lin, Lin; Li, Shaoxin; Liu, Zhiming; Su, Chengkang; Qi, Minni; Guo, Zhouyi

    2018-03-01

    This study aims to classify low-grade and high-grade bladder cancer (BC) patients using serum surface-enhanced Raman scattering (SERS) spectra and support vector machine (SVM) algorithms. Serum SERS spectra are acquired from 88 serum samples with silver nanoparticles as the SERS-active substrate. Diagnostic accuracies of 96.4% and 95.4% are obtained when differentiating the serum SERS spectra of all BC patients versus normal subjects and low-grade versus high-grade BC patients, respectively, with optimal SVM classifier models. This study demonstrates that the serum SERS technique combined with SVM has great potential to noninvasively detect and classify high-grade and low-grade BC patients.

  1. Identifying Shifts in Leaf-Litter Ant Assemblages (Hymenoptera: Formicidae across Ecosystem Boundaries Using Multiple Sampling Methods.

    Directory of Open Access Journals (Sweden)

    Michal Wiezik

    Full Text Available Global or regional environmental changes in climate or land use have been increasingly implied in shifts in boundaries (ecotones between adjacent ecosystems such as beech or oak-dominated forests and forest-steppe ecotones that frequently co-occur near the southern range limits of deciduous forest biome in Europe. Yet, our ability to detect changes in biological communities across these ecosystems, or to understand their environmental drivers, can be hampered when different sampling methods are required to characterize biological communities of the adjacent but ecologically different ecosystems. Ants (Hymenoptera: Formicidae have been shown to be particularly sensitive to changes in temperature and vegetation and they require different sampling methods in closed vs. open habitats. We compared ant assemblages of closed-forests (beech- or oak-dominated and open forest-steppe habitats in southwestern Carpathians using methods for closed-forest (litter sifting and open habitats (pitfall trapping, and developed an integrated sampling approach to characterize changes in ant assemblages across these adjacent ecosystems. Using both methods, we collected 5,328 individual ant workers from 28 species. Neither method represented ant communities completely, but pitfall trapping accounted for more species (24 than litter sifting (16. Although pitfall trapping characterized differences in species richness and composition among the ecosystems better, with beech forest being most species poor and ecotone most species rich, litter sifting was more successful in identifying characteristic litter-dwelling species in oak-dominated forest. The integrated sampling approach using both methods yielded more accurate characterization of species richness and composition, and particularly so in species-rich forest-steppe habitat where the combined sample identified significantly higher number of species compared to either of the two methods on their own. Thus, an integrated

  2. Identifying Shifts in Leaf-Litter Ant Assemblages (Hymenoptera: Formicidae) across Ecosystem Boundaries Using Multiple Sampling Methods.

    Science.gov (United States)

    Wiezik, Michal; Svitok, Marek; Wieziková, Adela; Dovčiak, Martin

    2015-01-01

    Global or regional environmental changes in climate or land use have been increasingly implied in shifts in boundaries (ecotones) between adjacent ecosystems such as beech or oak-dominated forests and forest-steppe ecotones that frequently co-occur near the southern range limits of deciduous forest biome in Europe. Yet, our ability to detect changes in biological communities across these ecosystems, or to understand their environmental drivers, can be hampered when different sampling methods are required to characterize biological communities of the adjacent but ecologically different ecosystems. Ants (Hymenoptera: Formicidae) have been shown to be particularly sensitive to changes in temperature and vegetation and they require different sampling methods in closed vs. open habitats. We compared ant assemblages of closed-forests (beech- or oak-dominated) and open forest-steppe habitats in southwestern Carpathians using methods for closed-forest (litter sifting) and open habitats (pitfall trapping), and developed an integrated sampling approach to characterize changes in ant assemblages across these adjacent ecosystems. Using both methods, we collected 5,328 individual ant workers from 28 species. Neither method represented ant communities completely, but pitfall trapping accounted for more species (24) than litter sifting (16). Although pitfall trapping characterized differences in species richness and composition among the ecosystems better, with beech forest being most species poor and ecotone most species rich, litter sifting was more successful in identifying characteristic litter-dwelling species in oak-dominated forest. The integrated sampling approach using both methods yielded more accurate characterization of species richness and composition, and particularly so in species-rich forest-steppe habitat where the combined sample identified significantly higher number of species compared to either of the two methods on their own. Thus, an integrated sampling

  3. A joint latent class model for classifying severely hemorrhaging trauma patients.

    Science.gov (United States)

    Rahbar, Mohammad H; Ning, Jing; Choi, Sangbum; Piao, Jin; Hong, Chuan; Huang, Hanwen; Del Junco, Deborah J; Fox, Erin E; Rahbar, Elaheh; Holcomb, John B

    2015-10-24

    In trauma research, "massive transfusion" (MT), historically defined as receiving ≥10 units of red blood cells (RBCs) within 24 h of admission, has been routinely used as a "gold standard" for quantifying bleeding severity. Due to early in-hospital mortality, however, MT is subject to survivor bias and thus a poorly defined criterion to classify bleeding trauma patients. Using the data from a retrospective trauma transfusion study, we applied a latent-class (LC) mixture model to identify severely hemorrhaging (SH) patients. Based on the joint distribution of cumulative units of RBCs and binary survival outcome at 24 h of admission, we applied an expectation-maximization (EM) algorithm to obtain model parameters. Estimated posterior probabilities were used for patients' classification and compared with the MT rule. To evaluate predictive performance of the LC-based classification, we examined the role of six clinical variables as predictors using two separate logistic regression models. Out of 471 trauma patients, 211 (45 %) were MT, while our latent SH classifier identified only 127 (27 %) of patients as SH. The agreement between the two classification methods was 73 %. A non-ignorable portion of patients (17 out of 68, 25 %) who died within 24 h were not classified as MT but the SH group included 62 patients (91 %) who died during the same period. Our comparison of the predictive models based on MT and SH revealed significant differences between the coefficients of potential predictors of patients who may be in need of activation of the massive transfusion protocol. The traditional MT classification does not adequately reflect transfusion practices and outcomes during the trauma reception and initial resuscitation phase. Although we have demonstrated that joint latent class modeling could be used to correct for potential bias caused by misclassification of severely bleeding patients, improvement in this approach could be made in the presence of time to event

  4. 32 CFR 2400.30 - Reproduction of classified information.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Reproduction of classified information. 2400.30... SECURITY PROGRAM Safeguarding § 2400.30 Reproduction of classified information. Documents or portions of... the originator or higher authority. Any stated prohibition against reproduction shall be strictly...

  5. Robust Combining of Disparate Classifiers Through Order Statistics

    Science.gov (United States)

    Tumer, Kagan; Ghosh, Joydeep

    2001-01-01

    Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.

  6. Knowledge Uncertainty and Composed Classifier

    Czech Academy of Sciences Publication Activity Database

    Klimešová, Dana; Ocelíková, E.

    2007-01-01

    Roč. 1, č. 2 (2007), s. 101-105 ISSN 1998-0140 Institutional research plan: CEZ:AV0Z10750506 Keywords : Boosting architecture * contextual modelling * composed classifier * knowledge management, * knowledge * uncertainty Subject RIV: IN - Informatics, Computer Science

  7. Differential profiling of volatile organic compound biomarker signatures utilizing a logical statistical filter-set and novel hybrid evolutionary classifiers

    Science.gov (United States)

    Grigsby, Claude C.; Zmuda, Michael A.; Boone, Derek W.; Highlander, Tyler C.; Kramer, Ryan M.; Rizki, Mateen M.

    2012-06-01

    A growing body of discoveries in molecular signatures has revealed that volatile organic compounds (VOCs), the small molecules associated with an individual's odor and breath, can be monitored to reveal the identity and presence of a unique individual, as well their overall physiological status. Given the analysis requirements for differential VOC profiling via gas chromatography/mass spectrometry, our group has developed a novel informatics platform, Metabolite Differentiation and Discovery Lab (MeDDL). In its current version, MeDDL is a comprehensive tool for time-series spectral registration and alignment, visualization, comparative analysis, and machine learning to facilitate the efficient analysis of multiple, large-scale biomarker discovery studies. The MeDDL toolset can therefore identify a large differential subset of registered peaks, where their corresponding intensities can be used as features for classification. This initial screening of peaks yields results sets that are typically too large for incorporation into a portable, electronic nose based system in addition to including VOCs that are not amenable to classification; consequently, it is also important to identify an optimal subset of these peaks to increase classification accuracy and to decrease the cost of the final system. MeDDL's learning tools include a classifier similar to a K-nearest neighbor classifier used in conjunction with a genetic algorithm (GA) that simultaneously optimizes the classifier and subset of features. The GA uses ROC curves to produce classifiers having maximal area under their ROC curve. Experimental results on over a dozen recognition problems show many examples of classifiers and feature sets that produce perfect ROC curves.

  8. Identifying thematic roles from neural representations measured by functional magnetic resonance imaging.

    Science.gov (United States)

    Wang, Jing; Cherkassky, Vladimir L; Yang, Ying; Chang, Kai-Min Kevin; Vargas, Robert; Diana, Nicholas; Just, Marcel Adam

    2016-01-01

    The generativity and complexity of human thought stem in large part from the ability to represent relations among concepts and form propositions. The current study reveals how a given object such as rabbit is neurally encoded differently and identifiably depending on whether it is an agent ("the rabbit punches the monkey") or a patient ("the monkey punches the rabbit"). Machine-learning classifiers were trained on functional magnetic resonance imaging (fMRI) data evoked by a set of short videos that conveyed agent-verb-patient propositions. When tested on a held-out video, the classifiers were able to reliably identify the thematic role of an object from its associated fMRI activation pattern. Moreover, when trained on one subset of the study participants, classifiers reliably identified the thematic roles in the data of a left-out participant (mean accuracy = .66), indicating that the neural representations of thematic roles were common across individuals.

  9. Representative Vector Machines: A Unified Framework for Classical Classifiers.

    Science.gov (United States)

    Gui, Jie; Liu, Tongliang; Tao, Dacheng; Sun, Zhenan; Tan, Tieniu

    2016-08-01

    Classifier design is a fundamental problem in pattern recognition. A variety of pattern classification methods such as the nearest neighbor (NN) classifier, support vector machine (SVM), and sparse representation-based classification (SRC) have been proposed in the literature. These typical and widely used classifiers were originally developed from different theory or application motivations and they are conventionally treated as independent and specific solutions for pattern classification. This paper proposes a novel pattern classification framework, namely, representative vector machines (or RVMs for short). The basic idea of RVMs is to assign the class label of a test example according to its nearest representative vector. The contributions of RVMs are twofold. On one hand, the proposed RVMs establish a unified framework of classical classifiers because NN, SVM, and SRC can be interpreted as the special cases of RVMs with different definitions of representative vectors. Thus, the underlying relationship among a number of classical classifiers is revealed for better understanding of pattern classification. On the other hand, novel and advanced classifiers are inspired in the framework of RVMs. For example, a robust pattern classification method called discriminant vector machine (DVM) is motivated from RVMs. Given a test example, DVM first finds its k -NNs and then performs classification based on the robust M-estimator and manifold regularization. Extensive experimental evaluations on a variety of visual recognition tasks such as face recognition (Yale and face recognition grand challenge databases), object categorization (Caltech-101 dataset), and action recognition (Action Similarity LAbeliNg) demonstrate the advantages of DVM over other classifiers.

  10. 41 CFR 105-62.102 - Authority to originally classify.

    Science.gov (United States)

    2010-07-01

    ... originally classify. (a) Top secret, secret, and confidential. The authority to originally classify information as Top Secret, Secret, or Confidential may be exercised only by the Administrator and is delegable...

  11. MicroRNA classifier and nomogram for metastasis prediction in colon cancer.

    Science.gov (United States)

    Goossens-Beumer, Inès J; Derr, Remco S; Buermans, Henk P J; Goeman, Jelle J; Böhringer, Stefan; Morreau, Hans; Nitsche, Ulrich; Janssen, Klaus-Peter; van de Velde, Cornelis J H; Kuppen, Peter J K

    2015-01-01

    Colon cancer prognosis and treatment are currently based on a classification system still showing large heterogeneity in clinical outcome, especially in TNM stages II and III. Prognostic biomarkers for metastasis risk are warranted as development of distant recurrent disease mainly accounts for the high lethality rates of colon cancer. miRNAs have been proposed as potential biomarkers for cancer. Furthermore, a verified standard for normalization of the amount of input material in PCR-based relative quantification of miRNA expression is lacking. A selection of frozen tumor specimens from two independent patient cohorts with TNM stage II-III microsatellite stable primary adenocarcinomas was used for laser capture microdissection. Next-generation sequencing was performed on small RNAs isolated from colorectal tumors from the Dutch cohort (N = 50). Differential expression analysis, comparing in metastasized and nonmetastasized tumors, identified prognostic miRNAs. Validation was performed on colon tumors from the German cohort (N = 43) using quantitative PCR (qPCR). miR25-3p and miR339-5p were identified and validated as independent prognostic markers and used to construct a multivariate nomogram for metastasis risk prediction. The nomogram showed good probability prediction in validation. In addition, we recommend combination of miR16-5p and miR26a-5p as standard for normalization in qPCR of colon cancer tissue-derived miRNA expression. In this international study, we identified and validated a miRNA classifier in primary cancers, and propose a nomogram capable of predicting metastasis risk in microsatellite stable TNM stage II-III colon cancer. In conjunction with TNM staging, by means of a nomogram, this miRNA classifier may allow for personalized treatment decisions based on individual tumor characteristics. ©2014 American Association for Cancer Research.

  12. Diversity of reductive dehalogenase genes from environmental samples and enrichment cultures identified with degenerate primer PCR screens.

    Directory of Open Access Journals (Sweden)

    Laura Audrey Hug

    2013-11-01

    Full Text Available Reductive dehalogenases are the critical enzymes for anaerobic organohalide respiration, a microbial metabolic process that has been harnessed for bioremediation efforts to resolve chlorinated solvent contamination in groundwater and is implicated in the global halogen cycle. Reductive dehalogenase sequence diversity is informative for the dechlorination potential of the site or enrichment culture. A suite of degenerate PCR primers targeting a comprehensive curated set of reductive dehalogenase genes was designed and applied to twelve DNA samples extracted from contaminated and pristine sites, as well as six enrichment cultures capable of reducing chlorinated compounds to non-toxic end-products. The amplified gene products from four environmental sites and two enrichment cultures were sequenced using Illumina HiSeq, and the reductive dehalogenase complement of each sample determined. The results indicate that the diversity of the reductive dehalogenase gene family is much deeper than is currently accounted for: one-third of the translated proteins have less than 70% pairwise amino acid identity to database sequences. Approximately 60% of the sequenced reductive dehalogenase genes were broadly distributed, being identified in four or more samples, and often in previously sequenced genomes as well. In contrast, 17% of the sequenced reductive dehalogenases were unique, present in only a single sample and bearing less than 90% pairwise amino acid identity to any previously identified proteins. Many of the broadly distributed reductive dehalogenases are uncharacterized in terms of their substrate specificity, making these intriguing targets for further biochemical experimentation. Finally, comparison of samples from a contaminated site and an enrichment culture derived from the same site eight years prior allowed examination of the effect of the enrichment process.

  13. Automatic discrimination between safe and unsafe swallowing using a reputation-based classifier

    Directory of Open Access Journals (Sweden)

    Nikjoo Mohammad S

    2011-11-01

    Full Text Available Abstract Background Swallowing accelerometry has been suggested as a potential non-invasive tool for bedside dysphagia screening. Various vibratory signal features and complementary measurement modalities have been put forth in the literature for the potential discrimination between safe and unsafe swallowing. To date, automatic classification of swallowing accelerometry has exclusively involved a single-axis of vibration although a second axis is known to contain additional information about the nature of the swallow. Furthermore, the only published attempt at automatic classification in adult patients has been based on a small sample of swallowing vibrations. Methods In this paper, a large corpus of dual-axis accelerometric signals were collected from 30 older adults (aged 65.47 ± 13.4 years, 15 male referred to videofluoroscopic examination on the suspicion of dysphagia. We invoked a reputation-based classifier combination to automatically categorize the dual-axis accelerometric signals into safe and unsafe swallows, as labeled via videofluoroscopic review. From these participants, a total of 224 swallowing samples were obtained, 164 of which were labeled as unsafe swallows (swallows where the bolus entered the airway and 60 as safe swallows. Three separate support vector machine (SVM classifiers and eight different features were selected for classification. Results With selected time, frequency and information theoretic features, the reputation-based algorithm distinguished between safe and unsafe swallowing with promising accuracy (80.48 ± 5.0%, high sensitivity (97.1 ± 2% and modest specificity (64 ± 8.8%. Interpretation of the most discriminatory features revealed that in general, unsafe swallows had lower mean vibration amplitude and faster autocorrelation decay, suggestive of decreased hyoid excursion and compromised coordination, respectively. Further, owing to its performance-based weighting of component classifiers, the static

  14. Hierarchy-associated semantic-rule inference framework for classifying indoor scenes

    Science.gov (United States)

    Yu, Dan; Liu, Peng; Ye, Zhipeng; Tang, Xianglong; Zhao, Wei

    2016-03-01

    Typically, the initial task of classifying indoor scenes is challenging, because the spatial layout and decoration of a scene can vary considerably. Recent efforts at classifying object relationships commonly depend on the results of scene annotation and predefined rules, making classification inflexible. Furthermore, annotation results are easily affected by external factors. Inspired by human cognition, a scene-classification framework was proposed using the empirically based annotation (EBA) and a match-over rule-based (MRB) inference system. The semantic hierarchy of images is exploited by EBA to construct rules empirically for MRB classification. The problem of scene classification is divided into low-level annotation and high-level inference from a macro perspective. Low-level annotation involves detecting the semantic hierarchy and annotating the scene with a deformable-parts model and a bag-of-visual-words model. In high-level inference, hierarchical rules are extracted to train the decision tree for classification. The categories of testing samples are generated from the parts to the whole. Compared with traditional classification strategies, the proposed semantic hierarchy and corresponding rules reduce the effect of a variable background and improve the classification performance. The proposed framework was evaluated on a popular indoor scene dataset, and the experimental results demonstrate its effectiveness.

  15. Machine-learning identifies substance-specific behavioral markers for opiate and stimulant dependence.

    Science.gov (United States)

    Ahn, Woo-Young; Vassileva, Jasmin

    2016-04-01

    Recent animal and human studies reveal distinct cognitive and neurobiological differences between opiate and stimulant addictions; however, our understanding of the common and specific effects of these two classes of drugs remains limited due to the high rates of polysubstance-dependence among drug users. The goal of the current study was to identify multivariate substance-specific markers classifying heroin dependence (HD) and amphetamine dependence (AD), by using machine-learning approaches. Participants included 39 amphetamine mono-dependent, 44 heroin mono-dependent, 58 polysubstance dependent, and 81 non-substance dependent individuals. The majority of substance dependent participants were in protracted abstinence. We used demographic, personality (trait impulsivity, trait psychopathy, aggression, sensation seeking), psychiatric (attention deficit hyperactivity disorder, conduct disorder, antisocial personality disorder, psychopathy, anxiety, depression), and neurocognitive impulsivity measures (Delay Discounting, Go/No-Go, Stop Signal, Immediate Memory, Balloon Analogue Risk, Cambridge Gambling, and Iowa Gambling tasks) as predictors in a machine-learning algorithm. The machine-learning approach revealed substance-specific multivariate profiles that classified HD and AD in new samples with high degree of accuracy. Out of 54 predictors, psychopathy was the only classifier common to both types of addiction. Important dissociations emerged between factors classifying HD and AD, which often showed opposite patterns among individuals with HD and AD. These results suggest that different mechanisms may underlie HD and AD, challenging the unitary account of drug addiction. This line of work may shed light on the development of standardized and cost-efficient clinical diagnostic tests and facilitate the development of individualized prevention and intervention programs for HD and AD. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  16. A cardiorespiratory classifier of voluntary and involuntary electrodermal activity

    Directory of Open Access Journals (Sweden)

    Sejdic Ervin

    2010-02-01

    Full Text Available Abstract Background Electrodermal reactions (EDRs can be attributed to many origins, including spontaneous fluctuations of electrodermal activity (EDA and stimuli such as deep inspirations, voluntary mental activity and startling events. In fields that use EDA as a measure of psychophysiological state, the fact that EDRs may be elicited from many different stimuli is often ignored. This study attempts to classify observed EDRs as voluntary (i.e., generated from intentional respiratory or mental activity or involuntary (i.e., generated from startling events or spontaneous electrodermal fluctuations. Methods Eight able-bodied participants were subjected to conditions that would cause a change in EDA: music imagery, startling noises, and deep inspirations. A user-centered cardiorespiratory classifier consisting of 1 an EDR detector, 2 a respiratory filter and 3 a cardiorespiratory filter was developed to automatically detect a participant's EDRs and to classify the origin of their stimulation as voluntary or involuntary. Results Detected EDRs were classified with a positive predictive value of 78%, a negative predictive value of 81% and an overall accuracy of 78%. Without the classifier, EDRs could only be correctly attributed as voluntary or involuntary with an accuracy of 50%. Conclusions The proposed classifier may enable investigators to form more accurate interpretations of electrodermal activity as a measure of an individual's psychophysiological state.

  17. Treatment response in psychotic patients classified according to social and clinical needs, drug side effects, and previous treatment; a method to identify functional remission.

    Science.gov (United States)

    Alenius, Malin; Hammarlund-Udenaes, Margareta; Hartvig, Per; Sundquist, Staffan; Lindström, Leif

    2009-01-01

    Various approaches have been made over the years to classify psychotic patients according to inadequate treatment response, using terms such as treatment resistant or treatment refractory. Existing classifications have been criticized for overestimating positive symptoms; underestimating residual symptoms, negative symptoms, and side effects; or being to open for individual interpretation. The aim of this study was to present and evaluate a new method of classification according to treatment response and, thus, to identify patients in functional remission. A naturalistic, cross-sectional study was performed using patient interviews and information from patient files. The new classification method CANSEPT, which combines the Camberwell Assessment of Need rating scale, the Udvalg for Kliniske Undersøgelser side effect rating scale (SE), and the patient's previous treatment history (PT), was used to group the patients according to treatment response. CANSEPT was evaluated by comparison of expected and observed results. In the patient population (n = 123), the patients in functional remission, as defined by CANSEPT, had higher quality of life, fewer hospitalizations, fewer psychotic symptoms, and higher rate of workers than those with the worst treatment outcome. In the evaluation, CANSEPT showed validity in discriminating the patients of interest and was well tolerated by the patients. CANSEPT could secure inclusion of correct patients in the clinic or in research.

  18. Learning to classify wakes from local sensory information

    Science.gov (United States)

    Alsalman, Mohamad; Colvert, Brendan; Kanso, Eva; Kanso Team

    2017-11-01

    Aquatic organisms exhibit remarkable abilities to sense local flow signals contained in their fluid environment and to surmise the origins of these flows. For example, fish can discern the information contained in various flow structures and utilize this information for obstacle avoidance and prey tracking. Flow structures created by flapping and swimming bodies are well characterized in the fluid dynamics literature; however, such characterization relies on classical methods that use an external observer to reconstruct global flow fields. The reconstructed flows, or wakes, are then classified according to the unsteady vortex patterns. Here, we propose a new approach for wake identification: we classify the wakes resulting from a flapping airfoil by applying machine learning algorithms to local flow information. In particular, we simulate the wakes of an oscillating airfoil in an incoming flow, extract the downstream vorticity information, and train a classifier to learn the different flow structures and classify new ones. This data-driven approach provides a promising framework for underwater navigation and detection in application to autonomous bio-inspired vehicles.

  19. Bayesian Classifier for Medical Data from Doppler Unit

    Directory of Open Access Journals (Sweden)

    J. Málek

    2006-01-01

    Full Text Available Nowadays, hand-held ultrasonic Doppler units (probes are often used for noninvasive screening of atherosclerosis in the arteries of the lower limbs. The mean velocity of blood flow in time and blood pressures are measured on several positions on each lower limb. By listening to the acoustic signal generated by the device or by reading the signal displayed on screen, a specialist can detect peripheral arterial disease (PAD.This project aims to design software that will be able to analyze data from such a device and classify it into several diagnostic classes. At the Department of Functional Diagnostics at the Regional Hospital in Liberec a database of several hundreds signals was collected. In cooperation with the specialist, the signals were manually classified into four classes. For each class, selected signal features were extracted and then used for training a Bayesian classifier. Another set of signals was used for evaluating and optimizing the parameters of the classifier. Slightly above 84 % of successfully recognized diagnostic states, was recently achieved on the test data. 

  20. An ensemble self-training protein interaction article classifier.

    Science.gov (United States)

    Chen, Yifei; Hou, Ping; Manderick, Bernard

    2014-01-01

    Protein-protein interaction (PPI) is essential to understand the fundamental processes governing cell biology. The mining and curation of PPI knowledge are critical for analyzing proteomics data. Hence it is desired to classify articles PPI-related or not automatically. In order to build interaction article classification systems, an annotated corpus is needed. However, it is usually the case that only a small number of labeled articles can be obtained manually. Meanwhile, a large number of unlabeled articles are available. By combining ensemble learning and semi-supervised self-training, an ensemble self-training interaction classifier called EST_IACer is designed to classify PPI-related articles based on a small number of labeled articles and a large number of unlabeled articles. A biological background based feature weighting strategy is extended using the category information from both labeled and unlabeled data. Moreover, a heuristic constraint is put forward to select optimal instances from unlabeled data to improve the performance further. Experiment results show that the EST_IACer can classify the PPI related articles effectively and efficiently.

  1. Classifying Linear Canonical Relations

    OpenAIRE

    Lorand, Jonathan

    2015-01-01

    In this Master's thesis, we consider the problem of classifying, up to conjugation by linear symplectomorphisms, linear canonical relations (lagrangian correspondences) from a finite-dimensional symplectic vector space to itself. We give an elementary introduction to the theory of linear canonical relations and present partial results toward the classification problem. This exposition should be accessible to undergraduate students with a basic familiarity with linear algebra.

  2. Classified facilities for environmental protection

    International Nuclear Information System (INIS)

    Anon.

    1993-02-01

    The legislation of the classified facilities governs most of the dangerous or polluting industries or fixed activities. It rests on the law of 9 July 1976 concerning facilities classified for environmental protection and its application decree of 21 September 1977. This legislation, the general texts of which appear in this volume 1, aims to prevent all the risks and the harmful effects coming from an installation (air, water or soil pollutions, wastes, even aesthetic breaches). The polluting or dangerous activities are defined in a list called nomenclature which subjects the facilities to a declaration or an authorization procedure. The authorization is delivered by the prefect at the end of an open and contradictory procedure after a public survey. In addition, the facilities can be subjected to technical regulations fixed by the Environment Minister (volume 2) or by the prefect for facilities subjected to declaration (volume 3). (A.B.)

  3. Defending Malicious Script Attacks Using Machine Learning Classifiers

    Directory of Open Access Journals (Sweden)

    Nayeem Khan

    2017-01-01

    Full Text Available The web application has become a primary target for cyber criminals by injecting malware especially JavaScript to perform malicious activities for impersonation. Thus, it becomes an imperative to detect such malicious code in real time before any malicious activity is performed. This study proposes an efficient method of detecting previously unknown malicious java scripts using an interceptor at the client side by classifying the key features of the malicious code. Feature subset was obtained by using wrapper method for dimensionality reduction. Supervised machine learning classifiers were used on the dataset for achieving high accuracy. Experimental results show that our method can efficiently classify malicious code from benign code with promising results.

  4. Is it important to classify ischaemic stroke?

    LENUS (Irish Health Repository)

    Iqbal, M

    2012-02-01

    Thirty-five percent of all ischemic events remain classified as cryptogenic. This study was conducted to ascertain the accuracy of diagnosis of ischaemic stroke based on information given in the medical notes. It was tested by applying the clinical information to the (TOAST) criteria. Hundred and five patients presented with acute stroke between Jan-Jun 2007. Data was collected on 90 patients. Male to female ratio was 39:51 with age range of 47-93 years. Sixty (67%) patients had total\\/partial anterior circulation stroke; 5 (5.6%) had a lacunar stroke and in 25 (28%) the mechanism of stroke could not be identified. Four (4.4%) patients with small vessel disease were anticoagulated; 5 (5.6%) with atrial fibrillation received antiplatelet therapy and 2 (2.2%) patients with atrial fibrillation underwent CEA. This study revealed deficiencies in the clinical assessment of patients and treatment was not tailored to the mechanism of stroke in some patients.

  5. Identifying potential surface water sampling sites for emerging chemical pollutants in Gauteng Province, South Africa

    OpenAIRE

    Petersen, F; Dabrowski, JM; Forbes, PBC

    2017-01-01

    Emerging chemical pollutants (ECPs) are defined as new chemicals which do not have a regulatory status, but which may have an adverse effect on human health and the environment. The occurrence and concentrations of ECPs in South African water bodies are largely unknown, so monitoring is required in order to determine the potential threat that these ECPs may pose. Relevant surface water sampling sites in the Gauteng Province of South Africa were identified utilising a geographic information sy...

  6. Fisher classifier and its probability of error estimation

    Science.gov (United States)

    Chittineni, C. B.

    1979-01-01

    Computationally efficient expressions are derived for estimating the probability of error using the leave-one-out method. The optimal threshold for the classification of patterns projected onto Fisher's direction is derived. A simple generalization of the Fisher classifier to multiple classes is presented. Computational expressions are developed for estimating the probability of error of the multiclass Fisher classifier.

  7. Spatio-Spectral Method for Estimating Classified Regions with High Confidence using MODIS Data

    International Nuclear Information System (INIS)

    Katiyal, Anuj; Rajan, Dr K S

    2014-01-01

    In studies like change analysis, the availability of very high resolution (VHR)/high resolution (HR) imagery for a particular period and region is a challenge due to the sensor revisit times and high cost of acquisition. Therefore, most studies prefer lower resolution (LR) sensor imagery with frequent revisit times, in addition to their cost and computational advantages. Further, the classification techniques provide us a global estimate of the class accuracy, which limits its utility if the accuracy is low. In this work, we focus on the sub-classification problem of LR images and estimate regions of higher confidence than the global classification accuracy within its classified region. The spectrally classified data was mined into spatially clustered regions and further refined and processed using statistical measures to arrive at local high confidence regions (LHCRs), for every class. Rabi season MODIS data of January 2006 and 2007 was used for this study and the evaluation of LHCR was done using the APLULC 2005 classified data. For Jan-2007, the global class accuracies for water bodies (WB), forested regions (FR) and Kharif crops and barren lands (KB) were 89%, 71.7% and 71.23% respectively, while the respective LHCRs had accuracies of 96.67%, 89.4% and 80.9% covering an area of 46%, 29% and 14.5% of the initially classified areas. Though areas are reduced, LHCRs with higher accuracies help in extracting more representative class regions. Identification of such regions can facilitate in improving the classification time and processing for HR images when combined with the more frequently acquired LR imagery, isolate pure vs. mixed/impure pixels and as training samples locations for HR imagery

  8. Use of information barriers to protect classified information

    International Nuclear Information System (INIS)

    MacArthur, D.; Johnson, M.W.; Nicholas, N.J.; Whiteson, R.

    1998-01-01

    This paper discusses the detailed requirements for an information barrier (IB) for use with verification systems that employ intrusive measurement technologies. The IB would protect classified information in a bilateral or multilateral inspection of classified fissile material. Such a barrier must strike a balance between providing the inspecting party the confidence necessary to accept the measurement while protecting the inspected party's classified information. The authors discuss the structure required of an IB as well as the implications of the IB on detector system maintenance. A defense-in-depth approach is proposed which would provide assurance to the inspected party that all sensitive information is protected and to the inspecting party that the measurements are being performed as expected. The barrier could include elements of physical protection (such as locks, surveillance systems, and tamper indicators), hardening of key hardware components, assurance of capabilities and limitations of hardware and software systems, administrative controls, validation and verification of the systems, and error detection and resolution. Finally, an unclassified interface could be used to display and, possibly, record measurement results. The introduction of an IB into an analysis system may result in many otherwise innocuous components (detectors, analyzers, etc.) becoming classified and unavailable for routine maintenance by uncleared personnel. System maintenance and updating will be significantly simplified if the classification status of as many components as possible can be made reversible (i.e. the component can become unclassified following the removal of classified objects)

  9. Consensus of sample-balanced classifiers for identifying ligand-binding residue by co-evolutionary physicochemical characteristics of amino acids

    KAUST Repository

    Chen, Peng

    2013-01-01

    Protein-ligand binding is an important mechanism for some proteins to perform their functions, and those binding sites are the residues of proteins that physically bind to ligands. So far, the state-of-the-art methods search for similar, known

  10. Classifying sows' activity types from acceleration patterns

    DEFF Research Database (Denmark)

    Cornou, Cecile; Lundbye-Christensen, Søren

    2008-01-01

    An automated method of classifying sow activity using acceleration measurements would allow the individual sow's behavior to be monitored throughout the reproductive cycle; applications for detecting behaviors characteristic of estrus and farrowing or to monitor illness and welfare can be foreseen....... This article suggests a method of classifying five types of activity exhibited by group-housed sows. The method involves the measurement of acceleration in three dimensions. The five activities are: feeding, walking, rooting, lying laterally and lying sternally. Four time series of acceleration (the three...

  11. Naive Bayesian classifiers for multinomial features: a theoretical analysis

    CSIR Research Space (South Africa)

    Van Dyk, E

    2007-11-01

    Full Text Available The authors investigate the use of naive Bayesian classifiers for multinomial feature spaces and derive error estimates for these classifiers. The error analysis is done by developing a mathematical model to estimate the probability density...

  12. Detection of herbaceous-plant pararetrovirus in lichen herbarium samples.

    Science.gov (United States)

    Petrzik, K; Koloniuk, I; Sarkisová, T; Číhal, L

    2016-06-01

    Cauliflower mosaic virus (CaMV) - a plant pararetrovirus that naturally causes diseases in Brassicaceae and Solanaceae plant hosts worldwide - has been detected by PCR for the first time in herbarium samples of Usnea sp. lichens. The virus's presence in these lichens did not result in any micro- or macromorphological changes, and the herbarium records were classified as representative for the distinct species. Sequence analyses classified all the detected viruses into one lineage of CaMV isolates. We have shown here that herbarium samples could be a good source for virus study, especially where a longer time span is involved.

  13. Parents' Experiences and Perceptions when Classifying their Children with Cerebral Palsy: Recommendations for Service Providers.

    Science.gov (United States)

    Scime, Natalie V; Bartlett, Doreen J; Brunton, Laura K; Palisano, Robert J

    2017-08-01

    This study investigated the experiences and perceptions of parents of children with cerebral palsy (CP) when classifying their children using the Gross Motor Function Classification System (GMFCS), the Manual Ability Classification System (MACS), and the Communication Function Classification System (CFCS). The second aim was to collate parents' recommendations for service providers on how to interact and communicate with families. A purposive sample of seven parents participating in the On Track study was recruited. Semi-structured interviews were conducted orally and were audiotaped, transcribed, and coded openly. A descriptive interpretive approach within a pragmatic perspective was used during analysis. Seven themes encompassing parents' experiences and perspectives reflect a process of increased understanding when classifying their children, with perceptions of utility evident throughout this process. Six recommendations for service providers emerged, including making the child a priority and being a dependable resource. Knowledge of parents' experiences when using the GMFCS, MACS, and CFCS can provide useful insight for service providers collaborating with parents to classify function in children with CP. Using the recommendations from these parents can facilitate family-provider collaboration for goal setting and intervention planning.

  14. Robust online tracking via adaptive samples selection with saliency detection

    Science.gov (United States)

    Yan, Jia; Chen, Xi; Zhu, QiuPing

    2013-12-01

    Online tracking has shown to be successful in tracking of previously unknown objects. However, there are two important factors which lead to drift problem of online tracking, the one is how to select the exact labeled samples even when the target locations are inaccurate, and the other is how to handle the confusors which have similar features with the target. In this article, we propose a robust online tracking algorithm with adaptive samples selection based on saliency detection to overcome the drift problem. To deal with the problem of degrading the classifiers using mis-aligned samples, we introduce the saliency detection method to our tracking problem. Saliency maps and the strong classifiers are combined to extract the most correct positive samples. Our approach employs a simple yet saliency detection algorithm based on image spectral residual analysis. Furthermore, instead of using the random patches as the negative samples, we propose a reasonable selection criterion, in which both the saliency confidence and similarity are considered with the benefits that confusors in the surrounding background are incorporated into the classifiers update process before the drift occurs. The tracking task is formulated as a binary classification via online boosting framework. Experiment results in several challenging video sequences demonstrate the accuracy and stability of our tracker.

  15. Textual and shape-based feature extraction and neuro-fuzzy classifier for nuclear track recognition

    Science.gov (United States)

    Khayat, Omid; Afarideh, Hossein

    2013-04-01

    Track counting algorithms as one of the fundamental principles of nuclear science have been emphasized in the recent years. Accurate measurement of nuclear tracks on solid-state nuclear track detectors is the aim of track counting systems. Commonly track counting systems comprise a hardware system for the task of imaging and software for analysing the track images. In this paper, a track recognition algorithm based on 12 defined textual and shape-based features and a neuro-fuzzy classifier is proposed. Features are defined so as to discern the tracks from the background and small objects. Then, according to the defined features, tracks are detected using a trained neuro-fuzzy system. Features and the classifier are finally validated via 100 Alpha track images and 40 training samples. It is shown that principle textual and shape-based features concomitantly yield a high rate of track detection compared with the single-feature based methods.

  16. Neural network classifier of attacks in IP telephony

    Science.gov (United States)

    Safarik, Jakub; Voznak, Miroslav; Mehic, Miralem; Partila, Pavol; Mikulec, Martin

    2014-05-01

    Various types of monitoring mechanism allow us to detect and monitor behavior of attackers in VoIP networks. Analysis of detected malicious traffic is crucial for further investigation and hardening the network. This analysis is typically based on statistical methods and the article brings a solution based on neural network. The proposed algorithm is used as a classifier of attacks in a distributed monitoring network of independent honeypot probes. Information about attacks on these honeypots is collected on a centralized server and then classified. This classification is based on different mechanisms. One of them is based on the multilayer perceptron neural network. The article describes inner structure of used neural network and also information about implementation of this network. The learning set for this neural network is based on real attack data collected from IP telephony honeypot called Dionaea. We prepare the learning set from real attack data after collecting, cleaning and aggregation of this information. After proper learning is the neural network capable to classify 6 types of most commonly used VoIP attacks. Using neural network classifier brings more accurate attack classification in a distributed system of honeypots. With this approach is possible to detect malicious behavior in a different part of networks, which are logically or geographically divided and use the information from one network to harden security in other networks. Centralized server for distributed set of nodes serves not only as a collector and classifier of attack data, but also as a mechanism for generating a precaution steps against attacks.

  17. Efficiency of different techniques to identify changes in land use

    Science.gov (United States)

    Zornoza, Raúl; Mateix-Solera, Jorge; Gerrero, César

    2013-04-01

    The need for the development of sensitive and efficient methodologies for soil quality evaluation is increasing. The ability to assess soil quality and identify key soil properties that serve as indicators of soil function is complicated by the multiplicity of physical, chemical and biological factors that control soil processes. In the mountain region of the Mediterranean Basin of Spain, almond trees have been cultivated in terraced orchards for centuries. These crops are immersed in the Mediterranean forest scenery, configuring a mosaic landscape where orchards are integrated in the forest masses. In the last decades, almond orchards are being abandoned, leading to an increase in vegetation cover, since abandoned fields are naturally colonized by the surrounded natural vegetation. Soil processes and properties are expected to be associated with vegetation successional dynamics. Thus, the establishment of suitable parameters to monitor soil quality related to land use changes is particularly important to guarantee the regeneration of the mature community. In this study, we selected three land uses, constituted by forest, almond trees orchards, and orchards abandoned between 10 and 15 years previously to sampling. Sampling was carried out in four different locations in SE Spain. The main purpose was to evaluate if changes in management have significantly influenced different sets of soil characteristics. For this purpose, we used a discriminant analysis (DA). The different sets of soil characteristics tested in this study were 1: physical, chemical and biochemical properties; 2: soil near infrared (NIR) spectra; and 3: phospholipid fatty acids (PLFAs). After the DA performed with the sets 1 and 2, the three land uses were clearly separated by the two first discriminant functions, and more than 85 % of the samples were correctly classified (grouped). Using the sets 3 and 4 for DA resulted in a slightly better separation of land uses, being more than 85% of the

  18. A Machine Learned Classifier That Uses Gene Expression Data to Accurately Predict Estrogen Receptor Status

    Science.gov (United States)

    Bastani, Meysam; Vos, Larissa; Asgarian, Nasimeh; Deschenes, Jean; Graham, Kathryn; Mackey, John; Greiner, Russell

    2013-01-01

    Background Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER) status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. Methods To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. Results This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. Conclusions Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions. PMID:24312637

  19. SVM classifier on chip for melanoma detection.

    Science.gov (United States)

    Afifi, Shereen; GholamHosseini, Hamid; Sinha, Roopak

    2017-07-01

    Support Vector Machine (SVM) is a common classifier used for efficient classification with high accuracy. SVM shows high accuracy for classifying melanoma (skin cancer) clinical images within computer-aided diagnosis systems used by skin cancer specialists to detect melanoma early and save lives. We aim to develop a medical low-cost handheld device that runs a real-time embedded SVM-based diagnosis system for use in primary care for early detection of melanoma. In this paper, an optimized SVM classifier is implemented onto a recent FPGA platform using the latest design methodology to be embedded into the proposed device for realizing online efficient melanoma detection on a single system on chip/device. The hardware implementation results demonstrate a high classification accuracy of 97.9% and a significant acceleration factor of 26 from equivalent software implementation on an embedded processor, with 34% of resources utilization and 2 watts for power consumption. Consequently, the implemented system meets crucial embedded systems constraints of high performance and low cost, resources utilization and power consumption, while achieving high classification accuracy.

  20. Classification of Focal and Non Focal Epileptic Seizures Using Multi-Features and SVM Classifier.

    Science.gov (United States)

    Sriraam, N; Raghu, S

    2017-09-02

    Identifying epileptogenic zones prior to surgery is an essential and crucial step in treating patients having pharmacoresistant focal epilepsy. Electroencephalogram (EEG) is a significant measurement benchmark to assess patients suffering from epilepsy. This paper investigates the application of multi-features derived from different domains to recognize the focal and non focal epileptic seizures obtained from pharmacoresistant focal epilepsy patients from Bern Barcelona database. From the dataset, five different classification tasks were formed. Total 26 features were extracted from focal and non focal EEG. Significant features were selected using Wilcoxon rank sum test by setting p-value (p z > 1.96) at 95% significance interval. Hypothesis was made that the effect of removing outliers improves the classification accuracy. Turkey's range test was adopted for pruning outliers from feature set. Finally, 21 features were classified using optimized support vector machine (SVM) classifier with 10-fold cross validation. Bayesian optimization technique was adopted to minimize the cross-validation loss. From the simulation results, it was inferred that the highest sensitivity, specificity, and classification accuracy of 94.56%, 89.74%, and 92.15% achieved respectively and found to be better than the state-of-the-art approaches. Further, it was observed that the classification accuracy improved from 80.2% with outliers to 92.15% without outliers. The classifier performance metrics ensures the suitability of the proposed multi-features with optimized SVM classifier. It can be concluded that the proposed approach can be applied for recognition of focal EEG signals to localize epileptogenic zones.

  1. Ensemble of gene signatures identifies novel biomarkers in colorectal cancer activated through PPARγ and TNFα signaling.

    Directory of Open Access Journals (Sweden)

    Stefano Maria Pagnotta

    Full Text Available We describe a novel bioinformatic and translational pathology approach, gene Signature Finder Algorithm (gSFA to identify biomarkers associated with Colorectal Cancer (CRC survival. Here a robust set of CRC markers is selected by an ensemble method. By using a dataset of 232 gene expression profiles, gSFA discovers 16 highly significant small gene signatures. Analysis of dichotomies generated by the signatures results in a set of 133 samples stably classified in good prognosis group and 56 samples in poor prognosis group, whereas 43 remain unreliably classified. AKAP12, DCBLD2, NT5E and SPON1 are particularly represented in the signatures and selected for validation in vivo on two independent patients cohorts comprising 140 tumor tissues and 60 matched normal tissues. Their expression and regulatory programs are investigated in vitro. We show that the coupled expression of NT5E and DCBLD2 robustly stratifies our patients in two groups (one of which with 100% survival at five years. We show that NT5E is a target of the TNF-α signaling in vitro; the tumor suppressor PPARγ acts as a novel NT5E antagonist that positively and concomitantly regulates DCBLD2 in a cancer cell context-dependent manner.

  2. Performance of classification confidence measures in dynamic classifier systems

    Czech Academy of Sciences Publication Activity Database

    Štefka, D.; Holeňa, Martin

    2013-01-01

    Roč. 23, č. 4 (2013), s. 299-319 ISSN 1210-0552 R&D Projects: GA ČR GA13-17187S Institutional support: RVO:67985807 Keywords : classifier combining * dynamic classifier systems * classification confidence Subject RIV: IN - Informatics, Computer Science Impact factor: 0.412, year: 2013

  3. Design of a high-sensitivity classifier based on a genetic algorithm: application to computer-aided diagnosis

    International Nuclear Information System (INIS)

    Sahiner, Berkman; Chan, Heang-Ping; Petrick, Nicholas; Helvie, Mark A.; Goodsitt, Mitchell M.

    1998-01-01

    identified 61% and 34% of the benign masses respectively without missing any malignant masses. Our results show that the choice of the feature selection technique is important in computer-aided diagnosis, and that the GA may be a useful tool for designing classifiers for lesion characterization. (author)

  4. CAsubtype: An R Package to Identify Gene Sets Predictive of Cancer Subtypes and Clinical Outcomes.

    Science.gov (United States)

    Kong, Hualei; Tong, Pan; Zhao, Xiaodong; Sun, Jielin; Li, Hua

    2018-03-01

    In the past decade, molecular classification of cancer has gained high popularity owing to its high predictive power on clinical outcomes as compared with traditional methods commonly used in clinical practice. In particular, using gene expression profiles, recent studies have successfully identified a number of gene sets for the delineation of cancer subtypes that are associated with distinct prognosis. However, identification of such gene sets remains a laborious task due to the lack of tools with flexibility, integration and ease of use. To reduce the burden, we have developed an R package, CAsubtype, to efficiently identify gene sets predictive of cancer subtypes and clinical outcomes. By integrating more than 13,000 annotated gene sets, CAsubtype provides a comprehensive repertoire of candidates for new cancer subtype identification. For easy data access, CAsubtype further includes the gene expression and clinical data of more than 2000 cancer patients from TCGA. CAsubtype first employs principal component analysis to identify gene sets (from user-provided or package-integrated ones) with robust principal components representing significantly large variation between cancer samples. Based on these principal components, CAsubtype visualizes the sample distribution in low-dimensional space for better understanding of the distinction between samples and classifies samples into subgroups with prevalent clustering algorithms. Finally, CAsubtype performs survival analysis to compare the clinical outcomes between the identified subgroups, assessing their clinical value as potentially novel cancer subtypes. In conclusion, CAsubtype is a flexible and well-integrated tool in the R environment to identify gene sets for cancer subtype identification and clinical outcome prediction. Its simple R commands and comprehensive data sets enable efficient examination of the clinical value of any given gene set, thus facilitating hypothesis generating and testing in biological and

  5. SAR Target Recognition Based on Multi-feature Multiple Representation Classifier Fusion

    Directory of Open Access Journals (Sweden)

    Zhang Xinzheng

    2017-10-01

    Full Text Available In this paper, we present a Synthetic Aperture Radar (SAR image target recognition algorithm based on multi-feature multiple representation learning classifier fusion. First, it extracts three features from the SAR images, namely principal component analysis, wavelet transform, and Two-Dimensional Slice Zernike Moments (2DSZM features. Second, we harness the sparse representation classifier and the cooperative representation classifier with the above-mentioned features to get six predictive labels. Finally, we adopt classifier fusion to obtain the final recognition decision. We researched three different classifier fusion algorithms in our experiments, and the results demonstrate thatusing Bayesian decision fusion gives thebest recognition performance. The method based on multi-feature multiple representation learning classifier fusion integrates the discrimination of multi-features and combines the sparse and cooperative representation classification performance to gain complementary advantages and to improve recognition accuracy. The experiments are based on the Moving and Stationary Target Acquisition and Recognition (MSTAR database,and they demonstrate the effectiveness of the proposed approach.

  6. Addressing the Challenge of Defining Valid Proteomic Biomarkers and Classifiers

    LENUS (Irish Health Repository)

    Dakna, Mohammed

    2010-12-10

    Abstract Background The purpose of this manuscript is to provide, based on an extensive analysis of a proteomic data set, suggestions for proper statistical analysis for the discovery of sets of clinically relevant biomarkers. As tractable example we define the measurable proteomic differences between apparently healthy adult males and females. We choose urine as body-fluid of interest and CE-MS, a thoroughly validated platform technology, allowing for routine analysis of a large number of samples. The second urine of the morning was collected from apparently healthy male and female volunteers (aged 21-40) in the course of the routine medical check-up before recruitment at the Hannover Medical School. Results We found that the Wilcoxon-test is best suited for the definition of potential biomarkers. Adjustment for multiple testing is necessary. Sample size estimation can be performed based on a small number of observations via resampling from pilot data. Machine learning algorithms appear ideally suited to generate classifiers. Assessment of any results in an independent test-set is essential. Conclusions Valid proteomic biomarkers for diagnosis and prognosis only can be defined by applying proper statistical data mining procedures. In particular, a justification of the sample size should be part of the study design.

  7. Detection of Driver Drowsiness Using Wavelet Analysis of Heart Rate Variability and a Support Vector Machine Classifier

    Directory of Open Access Journals (Sweden)

    Gang Li

    2013-12-01

    Full Text Available Driving while fatigued is just as dangerous as drunk driving and may result in car accidents. Heart rate variability (HRV analysis has been studied recently for the detection of driver drowsiness. However, the detection reliability has been lower than anticipated, because the HRV signals of drivers were always regarded as stationary signals. The wavelet transform method is a method for analyzing non-stationary signals. The aim of this study is to classify alert and drowsy driving events using the wavelet transform of HRV signals over short time periods and to compare the classification performance of this method with the conventional method that uses fast Fourier transform (FFT-based features. Based on the standard shortest duration for FFT-based short-term HRV evaluation, the wavelet decomposition is performed on 2-min HRV samples, as well as 1-min and 3-min samples for reference purposes. A receiver operation curve (ROC analysis and a support vector machine (SVM classifier are used for feature selection and classification, respectively. The ROC analysis results show that the wavelet-based method performs better than the FFT-based method regardless of the duration of the HRV sample that is used. Finally, based on the real-time requirements for driver drowsiness detection, the SVM classifier is trained using eighty FFT and wavelet-based features that are extracted from 1-min HRV signals from four subjects. The averaged leave-one-out (LOO classification performance using wavelet-based feature is 95% accuracy, 95% sensitivity, and 95% specificity. This is better than the FFT-based results that have 68.8% accuracy, 62.5% sensitivity, and 75% specificity. In addition, the proposed hardware platform is inexpensive and easy-to-use.

  8. Case based reasoning applied to medical diagnosis using multi-class classifier: A preliminary study

    Directory of Open Access Journals (Sweden)

    D. Viveros-Melo

    2017-02-01

    Full Text Available Case-based reasoning (CBR is a process used for computer processing that tries to mimic the behavior of a human expert in making decisions regarding a subject and learn from the experience of past cases. CBR has demonstrated to be appropriate for working with unstructured domains data or difficult knowledge acquisition situations, such as medical diagnosis, where it is possible to identify diseases such as: cancer diagnosis, epilepsy prediction and appendicitis diagnosis. Some of the trends that may be developed for CBR in the health science are oriented to reduce the number of features in highly dimensional data. An important contribution may be the estimation of probabilities of belonging to each class for new cases. In this paper, in order to adequately represent the database and to avoid the inconveniences caused by the high dimensionality, noise and redundancy, a number of algorithms are used in the preprocessing stage for performing both variable selection and dimension reduction procedures. Also, a comparison of the performance of some representative multi-class classifiers is carried out to identify the most effective one to include within a CBR scheme. Particularly, four classification techniques and two reduction techniques are employed to make a comparative study of multiclass classifiers on CBR

  9. Classifying spaces with virtually cyclic stabilizers for linear groups

    DEFF Research Database (Denmark)

    Degrijse, Dieter Dries; Köhl, Ralf; Petrosyan, Nansen

    2015-01-01

    We show that every discrete subgroup of GL(n, ℝ) admits a finite-dimensional classifying space with virtually cyclic stabilizers. Applying our methods to SL(3, ℤ), we obtain a four-dimensional classifying space with virtually cyclic stabilizers and a decomposition of the algebraic K-theory of its...

  10. Intuitive Action Set Formation in Learning Classifier Systems with Memory Registers

    NARCIS (Netherlands)

    Simões, L.F.; Schut, M.C.; Haasdijk, E.W.

    2008-01-01

    An important design goal in Learning Classifier Systems (LCS) is to equally reinforce those classifiers which cause the level of reward supplied by the environment. In this paper, we propose a new method for action set formation in LCS. When applied to a Zeroth Level Classifier System with Memory

  11. Identify and Classify Critical Success Factor of Agile Software Development Methodology Using Mind Map

    OpenAIRE

    Tasneem Abd El Hameed; Mahmoud Abd EL Latif; Sherif Kholief

    2016-01-01

    Selecting the right method, right personnel and right practices, and applying them adequately, determine the success of software development. In this paper, a qualitative study is carried out among the critical factors of success from previous studies. The factors of success match with their relative principles to illustrate the most valuable factor for agile approach success, this paper also prove that the twelve principles poorly identified for few factors resulting from qualitative and qua...

  12. Data Stream Classification Based on the Gamma Classifier

    Directory of Open Access Journals (Sweden)

    Abril Valeria Uriarte-Arcia

    2015-01-01

    Full Text Available The ever increasing data generation confronts us with the problem of handling online massive amounts of information. One of the biggest challenges is how to extract valuable information from these massive continuous data streams during single scanning. In a data stream context, data arrive continuously at high speed; therefore the algorithms developed to address this context must be efficient regarding memory and time management and capable of detecting changes over time in the underlying distribution that generated the data. This work describes a novel method for the task of pattern classification over a continuous data stream based on an associative model. The proposed method is based on the Gamma classifier, which is inspired by the Alpha-Beta associative memories, which are both supervised pattern recognition models. The proposed method is capable of handling the space and time constrain inherent to data stream scenarios. The Data Streaming Gamma classifier (DS-Gamma classifier implements a sliding window approach to provide concept drift detection and a forgetting mechanism. In order to test the classifier, several experiments were performed using different data stream scenarios with real and synthetic data streams. The experimental results show that the method exhibits competitive performance when compared to other state-of-the-art algorithms.

  13. Design of Robust Neural Network Classifiers

    DEFF Research Database (Denmark)

    Larsen, Jan; Andersen, Lars Nonboe; Hintz-Madsen, Mads

    1998-01-01

    This paper addresses a new framework for designing robust neural network classifiers. The network is optimized using the maximum a posteriori technique, i.e., the cost function is the sum of the log-likelihood and a regularization term (prior). In order to perform robust classification, we present...... a modified likelihood function which incorporates the potential risk of outliers in the data. This leads to the introduction of a new parameter, the outlier probability. Designing the neural classifier involves optimization of network weights as well as outlier probability and regularization parameters. We...... suggest to adapt the outlier probability and regularisation parameters by minimizing the error on a validation set, and a simple gradient descent scheme is derived. In addition, the framework allows for constructing a simple outlier detector. Experiments with artificial data demonstrate the potential...

  14. Using Extreme Phenotype Sampling to Identify the Rare Causal Variants of Quantitative Traits in Association Studies

    OpenAIRE

    Li, Dalin; Lewinger, Juan Pablo; Gauderman, William J.; Murcray, Cassandra Elizabeth; Conti, David

    2011-01-01

    Variants identified in recent genome-wide association studies based on the common-disease common-variant hypothesis are far from fully explaining the hereditability of complex traits. Rare variants may, in part, explain some of the missing hereditability. Here, we explored the advantage of the extreme phenotype sampling in rare-variant analysis and refined this design framework for future large-scale association studies on quantitative traits. We first proposed a power calculation approach fo...

  15. Combining multiple classifiers for age classification

    CSIR Research Space (South Africa)

    Van Heerden, C

    2009-11-01

    Full Text Available The authors compare several different classifier combination methods on a single task, namely speaker age classification. This task is well suited to combination strategies, since significantly different feature classes are employed. Support vector...

  16. Maximum margin classifier working in a set of strings.

    Science.gov (United States)

    Koyano, Hitoshi; Hayashida, Morihiro; Akutsu, Tatsuya

    2016-03-01

    Numbers and numerical vectors account for a large portion of data. However, recently, the amount of string data generated has increased dramatically. Consequently, classifying string data is a common problem in many fields. The most widely used approach to this problem is to convert strings into numerical vectors using string kernels and subsequently apply a support vector machine that works in a numerical vector space. However, this non-one-to-one conversion involves a loss of information and makes it impossible to evaluate, using probability theory, the generalization error of a learning machine, considering that the given data to train and test the machine are strings generated according to probability laws. In this study, we approach this classification problem by constructing a classifier that works in a set of strings. To evaluate the generalization error of such a classifier theoretically, probability theory for strings is required. Therefore, we first extend a limit theorem for a consensus sequence of strings demonstrated by one of the authors and co-workers in a previous study. Using the obtained result, we then demonstrate that our learning machine classifies strings in an asymptotically optimal manner. Furthermore, we demonstrate the usefulness of our machine in practical data analysis by applying it to predicting protein-protein interactions using amino acid sequences and classifying RNAs by the secondary structure using nucleotide sequences.

  17. A proteomic study to identify soya allergens--the human response to transgenic versus non-transgenic soya samples.

    Science.gov (United States)

    Batista, Rita; Martins, Isabel; Jeno, Paul; Ricardo, Cândido Pinto; Oliveira, Maria Margarida

    2007-01-01

    In spite of being among the main foods responsible for allergic reactions worldwide, soybean (Glycine max)-derived products continue to be increasingly widespread in a variety of food products due to their well-documented health benefits. Soybean also continues to be one of the elected target crops for genetic modification. The aim of this study was to characterize the soya proteome and, specifically, IgE-reactive proteins as well as to compare the IgE response in soya-allergic individuals to genetically modified Roundup Ready soya versus its non-transgenic control. We performed two-dimensional gel electrophoresis of protein extracts from a 5% genetically modified Roundup Ready flour sample and its non-transgenic control followed by Western blotting with plasma from 5 soya-sensitive individuals. We used peptide tandem mass spectrometry to identify soya proteins (55 protein matches), specifically IgE-binding ones, and to evaluate differences between transgenic and non-transgenic samples. We identified 2 new potential soybean allergens--one is maturation associated and seems to be part of the late embryogenesis abundant proteins group and the other is a cysteine proteinase inhibitor. None of the individuals tested reacted differentially to the transgenic versus non-transgenic samples under study. Soybean endogenous allergen expression does not seem to be altered after genetic modification. Proteomics should be considered a powerful tool for functional characterization of plants and for food safety assessment. Copyright (c) 2007 S. Karger AG, Basel.

  18. A Survey of Blue-Noise Sampling and Its Applications

    KAUST Repository

    Yan, Dongming; Guo, Jian-Wei; Wang, Bin; Zhang, Xiao-Peng; Wonka, Peter

    2015-01-01

    In this paper, we survey recent approaches to blue-noise sampling and discuss their beneficial applications. We discuss the sampling algorithms that use points as sampling primitives and classify the sampling algorithms based on various aspects, e.g., the sampling domain and the type of algorithm. We demonstrate several well-known applications that can be improved by recent blue-noise sampling techniques, as well as some new applications such as dynamic sampling and blue-noise remeshing.

  19. A Survey of Blue-Noise Sampling and Its Applications

    KAUST Repository

    Yan, Dongming

    2015-05-05

    In this paper, we survey recent approaches to blue-noise sampling and discuss their beneficial applications. We discuss the sampling algorithms that use points as sampling primitives and classify the sampling algorithms based on various aspects, e.g., the sampling domain and the type of algorithm. We demonstrate several well-known applications that can be improved by recent blue-noise sampling techniques, as well as some new applications such as dynamic sampling and blue-noise remeshing.

  20. Molecular analysis of faecal samples from birds to identify potential crop pests and useful biocontrol agents in natural areas.

    Science.gov (United States)

    King, R A; Symondson, W O C; Thomas, R J

    2015-06-01

    Wild habitats adjoining farmland are potentially valuable sources of natural enemies, but also of pests. Here we tested the utility of birds as 'sampling devices', to identify the diversity of prey available to predators and particularly to screen for pests and natural enemies using natural ecosystems as refugia. Here we used PCR to amplify prey DNA from three sympatric songbirds foraging on small invertebrates in Phragmites reedbed ecosystems, namely the Reed Warbler (Acrocephalus scirpaceus), Sedge Warbler (Acrocephalus schoenobaenus) and Cetti's Warbler (Cettia cetti). A recently described general invertebrate primer pair was used for the first time to analyse diets. Amplicons were cloned and sequenced, then identified by reference to the Barcoding of Life Database and to our own sequences obtained from fresh invertebrates. Forty-five distinct prey DNA sequences were obtained from 11 faecal samples, of which 39 could be identified to species or genus. Targeting three warbler species ensured that species-specific differences in prey choice broadened the range of prey taken. Amongst the prey found in reedbeds were major pests (including the tomato moth Lacanobia oleracea) as well as many potentially valuable natural enemies including aphidophagous hoverflies and braconid wasps. Given the mobility of birds, this approach provides a practical way of sampling a whole habitat at once, providing growers with information on possible invasion by locally resident pests and the colonization potential of natural enemies from local natural habitats.

  1. Bias correction for selecting the minimal-error classifier from many machine learning models.

    Science.gov (United States)

    Ding, Ying; Tang, Shaowu; Liao, Serena G; Jia, Jia; Oesterreich, Steffi; Lin, Yan; Tseng, George C

    2014-11-15

    Supervised machine learning is commonly applied in genomic research to construct a classifier from the training data that is generalizable to predict independent testing data. When test datasets are not available, cross-validation is commonly used to estimate the error rate. Many machine learning methods are available, and it is well known that no universally best method exists in general. It has been a common practice to apply many machine learning methods and report the method that produces the smallest cross-validation error rate. Theoretically, such a procedure produces a selection bias. Consequently, many clinical studies with moderate sample sizes (e.g. n = 30-60) risk reporting a falsely small cross-validation error rate that could not be validated later in independent cohorts. In this article, we illustrated the probabilistic framework of the problem and explored the statistical and asymptotic properties. We proposed a new bias correction method based on learning curve fitting by inverse power law (IPL) and compared it with three existing methods: nested cross-validation, weighted mean correction and Tibshirani-Tibshirani procedure. All methods were compared in simulation datasets, five moderate size real datasets and two large breast cancer datasets. The result showed that IPL outperforms the other methods in bias correction with smaller variance, and it has an additional advantage to extrapolate error estimates for larger sample sizes, a practical feature to recommend whether more samples should be recruited to improve the classifier and accuracy. An R package 'MLbias' and all source files are publicly available. tsenglab.biostat.pitt.edu/software.htm. ctseng@pitt.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. A machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.

    Directory of Open Access Journals (Sweden)

    Meysam Bastani

    Full Text Available BACKGROUND: Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. METHODS: To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. RESULTS: This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. CONCLUSIONS: Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions.

  3. Genome-wide association study meta-analysis of European and Asian-ancestry samples identifies three novel loci associated with bipolar disorder.

    Science.gov (United States)

    Chen, D T; Jiang, X; Akula, N; Shugart, Y Y; Wendland, J R; Steele, C J M; Kassem, L; Park, J-H; Chatterjee, N; Jamain, S; Cheng, A; Leboyer, M; Muglia, P; Schulze, T G; Cichon, S; Nöthen, M M; Rietschel, M; McMahon, F J; Farmer, A; McGuffin, P; Craig, I; Lewis, C; Hosang, G; Cohen-Woods, S; Vincent, J B; Kennedy, J L; Strauss, J

    2013-02-01

    Meta-analyses of bipolar disorder (BD) genome-wide association studies (GWAS) have identified several genome-wide significant signals in European-ancestry samples, but so far account for little of the inherited risk. We performed a meta-analysis of ∼750,000 high-quality genetic markers on a combined sample of ∼14,000 subjects of European and Asian-ancestry (phase I). The most significant findings were further tested in an extended sample of ∼17,700 cases and controls (phase II). The results suggest novel association findings near the genes TRANK1 (LBA1), LMAN2L and PTGFR. In phase I, the most significant single nucleotide polymorphism (SNP), rs9834970 near TRANK1, was significant at the P=2.4 × 10(-11) level, with no heterogeneity. Supportive evidence for prior association findings near ANK3 and a locus on chromosome 3p21.1 was also observed. The phase II results were similar, although the heterogeneity test became significant for several SNPs. On the basis of these results and other established risk loci, we used the method developed by Park et al. to estimate the number, and the effect size distribution, of BD risk loci that could still be found by GWAS methods. We estimate that >63,000 case-control samples would be needed to identify the ∼105 BD risk loci discoverable by GWAS, and that these will together explain <6% of the inherited risk. These results support previous GWAS findings and identify three new candidate genes for BD. Further studies are needed to replicate these findings and may potentially lead to identification of functional variants. Sample size will remain a limiting factor in the discovery of common alleles associated with BD.

  4. A Customizable Text Classifier for Text Mining

    Directory of Open Access Journals (Sweden)

    Yun-liang Zhang

    2007-12-01

    Full Text Available Text mining deals with complex and unstructured texts. Usually a particular collection of texts that is specified to one or more domains is necessary. We have developed a customizable text classifier for users to mine the collection automatically. It derives from the sentence category of the HNC theory and corresponding techniques. It can start with a few texts, and it can adjust automatically or be adjusted by user. The user can also control the number of domains chosen and decide the standard with which to choose the texts based on demand and abundance of materials. The performance of the classifier varies with the user's choice.

  5. A survey of decision tree classifier methodology

    Science.gov (United States)

    Safavian, S. R.; Landgrebe, David

    1991-01-01

    Decision tree classifiers (DTCs) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps the most important feature of DTCs is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  6. Recognition of Arabic Sign Language Alphabet Using Polynomial Classifiers

    Directory of Open Access Journals (Sweden)

    M. Al-Rousan

    2005-08-01

    Full Text Available Building an accurate automatic sign language recognition system is of great importance in facilitating efficient communication with deaf people. In this paper, we propose the use of polynomial classifiers as a classification engine for the recognition of Arabic sign language (ArSL alphabet. Polynomial classifiers have several advantages over other classifiers in that they do not require iterative training, and that they are highly computationally scalable with the number of classes. Based on polynomial classifiers, we have built an ArSL system and measured its performance using real ArSL data collected from deaf people. We show that the proposed system provides superior recognition results when compared with previously published results using ANFIS-based classification on the same dataset and feature extraction methodology. The comparison is shown in terms of the number of misclassified test patterns. The reduction in the rate of misclassified patterns was very significant. In particular, we have achieved a 36% reduction of misclassifications on the training data and 57% on the test data.

  7. Comparison of Classifier Architectures for Online Neural Spike Sorting.

    Science.gov (United States)

    Saeed, Maryam; Khan, Amir Ali; Kamboh, Awais Mehmood

    2017-04-01

    High-density, intracranial recordings from micro-electrode arrays need to undergo Spike Sorting in order to associate the recorded neuronal spikes to particular neurons. This involves spike detection, feature extraction, and classification. To reduce the data transmission and power requirements, on-chip real-time processing is becoming very popular. However, high computational resources are required for classifiers in on-chip spike-sorters, making scalability a great challenge. In this review paper, we analyze several popular classifiers to propose five new hardware architectures using the off-chip training with on-chip classification approach. These include support vector classification, fuzzy C-means classification, self-organizing maps classification, moving-centroid K-means classification, and Cosine distance classification. The performance of these architectures is analyzed in terms of accuracy and resource requirement. We establish that the neural networks based Self-Organizing Maps classifier offers the most viable solution. A spike sorter based on the Self-Organizing Maps classifier, requires only 7.83% of computational resources of the best-reported spike sorter, hierarchical adaptive means, while offering a 3% better accuracy at 7 dB SNR.

  8. 32 CFR 2004.21 - Protection of Classified Information [201(e)].

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Protection of Classified Information [201(e... PROGRAM DIRECTIVE NO. 1 Operations § 2004.21 Protection of Classified Information [201(e)]. Procedures for... coordination process. ...

  9. Tolerance to missing data using a likelihood ratio based classifier for computer-aided classification of breast cancer

    International Nuclear Information System (INIS)

    Bilska-Wolak, Anna O; Floyd, Carey E Jr

    2004-01-01

    While mammography is a highly sensitive method for detecting breast tumours, its ability to differentiate between malignant and benign lesions is low, which may result in as many as 70% of unnecessary biopsies. The purpose of this study was to develop a highly specific computer-aided diagnosis algorithm to improve classification of mammographic masses. A classifier based on the likelihood ratio was developed to accommodate cases with missing data. Data for development included 671 biopsy cases (245 malignant), with biopsy-proved outcome. Sixteen features based on the BI-RADS TM lexicon and patient history had been recorded for the cases, with 1.3 ± 1.1 missing feature values per case. Classifier evaluation methods included receiver operating characteristic and leave-one-out bootstrap sampling. The classifier achieved 32% specificity at 100% sensitivity on the 671 cases with 16 features that had missing values. Utilizing just the seven features present for all cases resulted in decreased performance at 100% sensitivity with average 19% specificity. No cases and no feature data were omitted during classifier development, showing that it is more beneficial to utilize cases with missing values than to discard incomplete cases that cannot be handled by many algorithms. Classification of mammographic masses was commendable at high sensitivity levels, indicating that benign cases could be potentially spared from biopsy

  10. Single-Pol Synthetic Aperture Radar Terrain Classification using Multiclass Confidence for One-Class Classifiers

    Energy Technology Data Exchange (ETDEWEB)

    Koch, Mark William [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Steinbach, Ryan Matthew [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Moya, Mary M [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2015-10-01

    Except in the most extreme conditions, Synthetic aperture radar (SAR) is a remote sensing technology that can operate day or night. A SAR can provide surveillance over a long time period by making multiple passes over a wide area. For object-based intelligence it is convenient to segment and classify the SAR images into objects that identify various terrains and man-made structures that we call “static features.” In this paper we introduce a novel SAR image product that captures how different regions decorrelate at different rates. Using superpixels and their first two moments we develop a series of one-class classification algorithms using a goodness-of-fit metric. P-value fusion is used to combine the results from different classes. We also show how to combine multiple one-class classifiers to get a confidence about a classification. This can be used by downstream algorithms such as a conditional random field to enforce spatial constraints.

  11. A Study of Assimilation Bias in Name-Based Sampling of Migrants

    Directory of Open Access Journals (Sweden)

    Schnell Rainer

    2014-06-01

    Full Text Available The use of personal names for screening is an increasingly popular sampling technique for migrant populations. Although this is often an effective sampling procedure, very little is known about the properties of this method. Based on a large German survey, this article compares characteristics of respondents whose names have been correctly classified as belonging to a migrant population with respondentswho aremigrants and whose names have not been classified as belonging to a migrant population. Although significant differences were found for some variables even with some large effect sizes, the overall bias introduced by name-based sampling (NBS is small as long as procedures with small false-negative rates are employed.

  12. Classifying and Visualising Roman Pottery using Computer-scanned Typologies

    Directory of Open Access Journals (Sweden)

    Jacqueline Christmas

    2018-05-01

    Full Text Available For many archaeological assemblages and type-series, accurate drawings of standardised pottery vessels have been recorded in consistent styles. This provides the opportunity to extract individual pot drawings and derive from them data that can be used for analysis and visualisation. Starting from PDF scans of the original pages of pot drawings, we have automated much of the process for locating, defining the boundaries, extracting and orientating each individual pot drawing. From these processed images, basic features such as width and height, the volume of the interior, the edges, and the shape of the cross-section outline are extracted and are then used to construct more complex features such as a measure of a pot's 'circularity'. Capturing these traits opens up new possibilities for (a classifying vessel form in a way that is sensitive to the physical characteristics of pots relative to other vessels in an assemblage, and (b visualising the results of quantifying assemblages using standard typologies. A frequently encountered problem when trying to compare pottery from different archaeological sites is that the pottery is classified into forms and labels using different standards. With a set of data from early Roman urban centres and related sites that has been labelled both with forms (e.g. 'platter' and 'bowl' and shape identifiers (based on the Camulodunum type-series, we use the extracted features from images to look both at how the pottery forms cluster for a given set of features, and at how the features may be used to compare finds from different sites.

  13. Evaluating hepatocellular carcinoma cell lines for tumour samples using within-sample relative expression orderings of genes.

    Science.gov (United States)

    Ao, Lu; Guo, You; Song, Xuekun; Guan, Qingzhou; Zheng, Weicheng; Zhang, Jiahui; Huang, Haiyan; Zou, Yi; Guo, Zheng; Wang, Xianlong

    2017-11-01

    Concerns are raised about the representativeness of cell lines for tumours due to the culture environment and misidentification. Liver is a major metastatic destination of many cancers, which might further confuse the origin of hepatocellular carcinoma cell lines. Therefore, it is of crucial importance to understand how well they can represent hepatocellular carcinoma. The HCC-specific gene pairs with highly stable relative expression orderings in more than 99% of hepatocellular carcinoma but with reversed relative expression orderings in at least 99% of one of the six types of cancer, colorectal carcinoma, breast carcinoma, non-small-cell lung cancer, gastric carcinoma, pancreatic carcinoma and ovarian carcinoma, were identified. With the simple majority rule, the HCC-specific relative expression orderings from comparisons with colorectal carcinoma and breast carcinoma could exactly discriminate primary hepatocellular carcinoma samples from both primary colorectal carcinoma and breast carcinoma samples. Especially, they correctly classified more than 90% of liver metastatic samples from colorectal carcinoma and breast carcinoma to their original tumours. Finally, using these HCC-specific relative expression orderings from comparisons with six cancer types, we identified eight of 24 hepatocellular carcinoma cell lines in the Cancer Cell Line Encyclopedia (Huh-7, Huh-1, HepG2, Hep3B, JHH-5, JHH-7, C3A and Alexander cells) that are highly representative of hepatocellular carcinoma. Evaluated with a REOs-based prognostic signature for hepatocellular carcinoma, all these eight cell lines showed the same metastatic properties of the high-risk metastatic hepatocellular carcinoma tissues. Caution should be taken for using hepatocellular carcinoma cell lines. Our results should be helpful to select proper hepatocellular carcinoma cell lines for biological experiments. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  14. A sample of galaxy pairs identified from the LAMOST spectral survey and the Sloan Digital Sky Survey

    International Nuclear Information System (INIS)

    Shen, Shi-Yin; Argudo-Fernández, Maria; Chen, Li; Feng, Shuai; Hou, Jin-Liang; Shao, Zheng-Yi; Chen, Xiao-Yan; Luo, A-Li; Wu, Hong; Yang, Hai-Feng; Yang, Ming; Hou, Yong-Hui; Wang, Yue-Fei; Jiang, Peng; Wang, Ting-Gui; Jing, Yi-Peng; Kong, Xu; Wang, Wen-Ting; Luo, Zhi-Jian; Wu, Xue-Bing

    2016-01-01

    A small fraction (< 10%) of the SDSS main galaxy (MG) sample has not been targeted with spectroscopy due to the effect of fiber collisions. These galaxies have been compiled into the input catalog of the LAMOST ExtraGAlactic Surveys and named the complementary galaxy sample. In this paper, we introduce this project and status of the spectroscopies associated with the complementary galaxies in the first two years of the LAMOST spectral survey (till Sep. of 2014). Moreover, we present a sample of 1102 galaxy pairs identified from the LAMOST complementary galaxies and SDSS MGs, which are defined as two members that have a projected distance smaller than 100 h −1 70 kpc and a recessional velocity difference smaller than 500 km s −1 . Compared with galaxy pairs that are only selected from SDSS, the LAMOST-SDSS pairs have the advantages of not being biased toward large separations and therefore act as a useful supplement in statistical studies of galaxy interaction and galaxy merging. (paper)

  15. Ensemble of classifiers based network intrusion detection system performance bound

    CSIR Research Space (South Africa)

    Mkuzangwe, Nenekazi NP

    2017-11-01

    Full Text Available This paper provides a performance bound of a network intrusion detection system (NIDS) that uses an ensemble of classifiers. Currently researchers rely on implementing the ensemble of classifiers based NIDS before they can determine the performance...

  16. Classifying Sources Influencing Indoor Air Quality (IAQ Using Artificial Neural Network (ANN

    Directory of Open Access Journals (Sweden)

    Shaharil Mad Saad

    2015-05-01

    Full Text Available Monitoring indoor air quality (IAQ is deemed important nowadays. A sophisticated IAQ monitoring system which could classify the source influencing the IAQ is definitely going to be very helpful to the users. Therefore, in this paper, an IAQ monitoring system has been proposed with a newly added feature which enables the system to identify the sources influencing the level of IAQ. In order to achieve this, the data collected has been trained with artificial neural network or ANN—a proven method for pattern recognition. Basically, the proposed system consists of sensor module cloud (SMC, base station and service-oriented client. The SMC contain collections of sensor modules that measure the air quality data and transmit the captured data to base station through wireless network. The IAQ monitoring system is also equipped with IAQ Index and thermal comfort index which could tell the users about the room’s conditions. The results showed that the system is able to measure the level of air quality and successfully classify the sources influencing IAQ in various environments like ambient air, chemical presence, fragrance presence, foods and beverages and human activity.

  17. 3 CFR - Implementation of the Executive Order, “Classified National Security Information”

    Science.gov (United States)

    2010-01-01

    ... 29, 2009 Implementation of the Executive Order, “Classified National Security Information” Memorandum..., “Classified National Security Information” (the “order”), which substantially advances my goals for reforming... or handles classified information shall provide the Director of the Information Security Oversight...

  18. Identifying the potential of changes to blood sample logistics using simulation

    DEFF Research Database (Denmark)

    Jørgensen, Pelle Morten Thomas; Jacobsen, Peter; Poulsen, Jørgen Hjelm

    2013-01-01

    of the simulation was to evaluate changes made to the transportation of blood samples between wards and the laboratory. The average- (AWT) and maximum waiting time (MWT) from a blood sample was drawn at the ward until it was received at the laboratory, and the distribution of arrivals of blood samples......, each of the scenarios was tested in terms of what amount of resources would give the optimal result. The simulations showed a big improvement potential in implementing a new technology/mean for transporting the blood samples. The pneumatic tube system showed the biggest potential lowering the AWT...

  19. Sex Bias in Classifying Borderline and Narcissistic Personality Disorder.

    Science.gov (United States)

    Braamhorst, Wouter; Lobbestael, Jill; Emons, Wilco H M; Arntz, Arnoud; Witteman, Cilia L M; Bekker, Marrie H J

    2015-10-01

    This study investigated sex bias in the classification of borderline and narcissistic personality disorders. A sample of psychologists in training for a post-master degree (N = 180) read brief case histories (male or female version) and made DSM classification. To differentiate sex bias due to sex stereotyping or to base rate variation, we used different case histories, respectively: (1) non-ambiguous case histories with enough criteria of either borderline or narcissistic personality disorder to meet the threshold for classification, and (2) an ambiguous case with subthreshold features of both borderline and narcissistic personality disorder. Results showed significant differences due to sex of the patient in the ambiguous condition. Thus, when the diagnosis is not straightforward, as in the case of mixed subthreshold features, sex bias is present and is influenced by base-rate variation. These findings emphasize the need for caution in classifying personality disorders, especially borderline or narcissistic traits.

  20. 36 CFR 1256.70 - What controls access to national security-classified information?

    Science.gov (United States)

    2010-07-01

    ... national security-classified information? 1256.70 Section 1256.70 Parks, Forests, and Public Property... HISTORICAL MATERIALS Access to Materials Containing National Security-Classified Information § 1256.70 What controls access to national security-classified information? (a) The declassification of and public access...

  1. Recovery of thermophilic Campylobacter by three sampling methods from classified river sites in Northeast Georgia, USA

    Science.gov (United States)

    It is not clear how best to sample streams for the detection of Campylobacter which may be introduced from agricultural or community land use. Fifteen sites in the watershed of the South Fork of the Broad River (SFBR) in Northeastern Georgia, USA, were sampled in three seasons. Seven sites were cl...

  2. Machinery Bearing Fault Diagnosis Using Variational Mode Decomposition and Support Vector Machine as a Classifier

    Science.gov (United States)

    Rama Krishna, K.; Ramachandran, K. I.

    2018-02-01

    Crack propagation is a major cause of failure in rotating machines. It adversely affects the productivity, safety, and the machining quality. Hence, detecting the crack’s severity accurately is imperative for the predictive maintenance of such machines. Fault diagnosis is an established concept in identifying the faults, for observing the non-linear behaviour of the vibration signals at various operating conditions. In this work, we find the classification efficiencies for both original and the reconstructed vibrational signals. The reconstructed signals are obtained using Variational Mode Decomposition (VMD), by splitting the original signal into three intrinsic mode functional components and framing them accordingly. Feature extraction, feature selection and feature classification are the three phases in obtaining the classification efficiencies. All the statistical features from the original signals and reconstructed signals are found out in feature extraction process individually. A few statistical parameters are selected in feature selection process and are classified using the SVM classifier. The obtained results show the best parameters and appropriate kernel in SVM classifier for detecting the faults in bearings. Hence, we conclude that better results were obtained by VMD and SVM process over normal process using SVM. This is owing to denoising and filtering the raw vibrational signals.

  3. Aplikasi E-Tour Guide dengan Fitur Pengenalan Image Menggunakan Metode Haar Classifier

    Directory of Open Access Journals (Sweden)

    Derwin Suhartono

    2013-12-01

    Full Text Available Smartphone has became an important instrument in modern society as it is used for entertainment and information searching except for communication. Concerning to this condition, it is needed to develop an application in order to improve smart phone functionality. The objective of this research is to create an application named E-Tour Guide as a tool for helping to plan and manage tourism activity equipped with image recognition feature. Image recognition method used is the Haar Classifier method. The feature is used to recognize historical objects. From the testing result done to 20 images sample, 85% accuracy is achieved for the image recognition feature.

  4. Needles in the haystack: identifying individuals present in pooled genomic data.

    Directory of Open Access Journals (Sweden)

    Rosemary Braun

    2009-10-01

    Full Text Available Recent publications have described and applied a novel metric that quantifies the genetic distance of an individual with respect to two population samples, and have suggested that the metric makes it possible to infer the presence of an individual of known genotype in a sample for which only the marginal allele frequencies are known. However, the assumptions, limitations, and utility of this metric remained incompletely characterized. Here we present empirical tests of the method using publicly accessible genotypes, as well as analytical investigations of the method's strengths and limitations. The results reveal that the null distribution is sensitive to the underlying assumptions, making it difficult to accurately calibrate thresholds for classifying an individual as a member of the population samples. As a result, the false-positive rates obtained in practice are considerably higher than previously believed. However, despite the metric's inadequacies for identifying the presence of an individual in a sample, our results suggest potential avenues for future research on tuning this method to problems of ancestry inference or disease prediction. By revealing both the strengths and limitations of the proposed method, we hope to elucidate situations in which this distance metric may be used in an appropriate manner. We also discuss the implications of our findings in forensics applications and in the protection of GWAS participant privacy.

  5. A support vector machine classifier reduces interscanner variation in the HRCT classification of regional disease pattern in diffuse lung disease: Comparison to a Bayesian classifier

    Energy Technology Data Exchange (ETDEWEB)

    Chang, Yongjun; Lim, Jonghyuck; Kim, Namkug; Seo, Joon Beom [Department of Radiology, University of Ulsan College of Medicine, 388-1 Pungnap2-dong, Songpa-gu, Seoul 138-736 (Korea, Republic of); Lynch, David A. [Department of Radiology, National Jewish Medical and Research Center, Denver, Colorado 80206 (United States)

    2013-05-15

    Purpose: To investigate the effect of using different computed tomography (CT) scanners on the accuracy of high-resolution CT (HRCT) images in classifying regional disease patterns in patients with diffuse lung disease, support vector machine (SVM) and Bayesian classifiers were applied to multicenter data. Methods: Two experienced radiologists marked sets of 600 rectangular 20 Multiplication-Sign 20 pixel regions of interest (ROIs) on HRCT images obtained from two scanners (GE and Siemens), including 100 ROIs for each of local patterns of lungs-normal lung and five of regional pulmonary disease patterns (ground-glass opacity, reticular opacity, honeycombing, emphysema, and consolidation). Each ROI was assessed using 22 quantitative features belonging to one of the following descriptors: histogram, gradient, run-length, gray level co-occurrence matrix, low-attenuation area cluster, and top-hat transform. For automatic classification, a Bayesian classifier and a SVM classifier were compared under three different conditions. First, classification accuracies were estimated using data from each scanner. Next, data from the GE and Siemens scanners were used for training and testing, respectively, and vice versa. Finally, all ROI data were integrated regardless of the scanner type and were then trained and tested together. All experiments were performed based on forward feature selection and fivefold cross-validation with 20 repetitions. Results: For each scanner, better classification accuracies were achieved with the SVM classifier than the Bayesian classifier (92% and 82%, respectively, for the GE scanner; and 92% and 86%, respectively, for the Siemens scanner). The classification accuracies were 82%/72% for training with GE data and testing with Siemens data, and 79%/72% for the reverse. The use of training and test data obtained from the HRCT images of different scanners lowered the classification accuracy compared to the use of HRCT images from the same scanner. For

  6. A support vector machine classifier reduces interscanner variation in the HRCT classification of regional disease pattern in diffuse lung disease: Comparison to a Bayesian classifier

    International Nuclear Information System (INIS)

    Chang, Yongjun; Lim, Jonghyuck; Kim, Namkug; Seo, Joon Beom; Lynch, David A.

    2013-01-01

    Purpose: To investigate the effect of using different computed tomography (CT) scanners on the accuracy of high-resolution CT (HRCT) images in classifying regional disease patterns in patients with diffuse lung disease, support vector machine (SVM) and Bayesian classifiers were applied to multicenter data. Methods: Two experienced radiologists marked sets of 600 rectangular 20 × 20 pixel regions of interest (ROIs) on HRCT images obtained from two scanners (GE and Siemens), including 100 ROIs for each of local patterns of lungs—normal lung and five of regional pulmonary disease patterns (ground-glass opacity, reticular opacity, honeycombing, emphysema, and consolidation). Each ROI was assessed using 22 quantitative features belonging to one of the following descriptors: histogram, gradient, run-length, gray level co-occurrence matrix, low-attenuation area cluster, and top-hat transform. For automatic classification, a Bayesian classifier and a SVM classifier were compared under three different conditions. First, classification accuracies were estimated using data from each scanner. Next, data from the GE and Siemens scanners were used for training and testing, respectively, and vice versa. Finally, all ROI data were integrated regardless of the scanner type and were then trained and tested together. All experiments were performed based on forward feature selection and fivefold cross-validation with 20 repetitions. Results: For each scanner, better classification accuracies were achieved with the SVM classifier than the Bayesian classifier (92% and 82%, respectively, for the GE scanner; and 92% and 86%, respectively, for the Siemens scanner). The classification accuracies were 82%/72% for training with GE data and testing with Siemens data, and 79%/72% for the reverse. The use of training and test data obtained from the HRCT images of different scanners lowered the classification accuracy compared to the use of HRCT images from the same scanner. For integrated ROI

  7. Stepwise classification of cancer samples using clinical and molecular data

    Directory of Open Access Journals (Sweden)

    Obulkasim Askar

    2011-10-01

    Full Text Available Abstract Background Combining clinical and molecular data types may potentially improve prediction accuracy of a classifier. However, currently there is a shortage of effective and efficient statistical and bioinformatic tools for true integrative data analysis. Existing integrative classifiers have two main disadvantages: First, coarse combination may lead to subtle contributions of one data type to be overshadowed by more obvious contributions of the other. Second, the need to measure both data types for all patients may be both unpractical and (cost inefficient. Results We introduce a novel classification method, a stepwise classifier, which takes advantage of the distinct classification power of clinical data and high-dimensional molecular data. We apply classification algorithms to two data types independently, starting with the traditional clinical risk factors. We only turn to relatively expensive molecular data when the uncertainty of prediction result from clinical data exceeds a predefined limit. Experimental results show that our approach is adaptive: the proportion of samples that needs to be re-classified using molecular data depends on how much we expect the predictive accuracy to increase when re-classifying those samples. Conclusions Our method renders a more cost-efficient classifier that is at least as good, and sometimes better, than one based on clinical or molecular data alone. Hence our approach is not just a classifier that minimizes a particular loss function. Instead, it aims to be cost-efficient by avoiding molecular tests for a potentially large subgroup of individuals; moreover, for these individuals a test result would be quickly available, which may lead to reduced waiting times (for diagnosis and hence lower the patients distress. Stepwise classification is implemented in R-package stepwiseCM and available at the Bioconductor website.

  8. Identifying content-based and relational techniques to change behaviour in motivational interviewing.

    Science.gov (United States)

    Hardcastle, Sarah J; Fortier, Michelle; Blake, Nicola; Hagger, Martin S

    2017-03-01

    Motivational interviewing (MI) is a complex intervention comprising multiple techniques aimed at changing health-related motivation and behaviour. However, MI techniques have not been systematically isolated and classified. This study aimed to identify the techniques unique to MI, classify them as content-related or relational, and evaluate the extent to which they overlap with techniques from the behaviour change technique taxonomy version 1 [BCTTv1; Michie, S., Richardson, M., Johnston, M., Abraham, C., Francis, J., Hardeman, W., … Wood, C. E. (2013). The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: Building an international consensus for the reporting of behavior change interventions. Annals of Behavioral Medicine, 46, 81-95]. Behaviour change experts (n = 3) content-analysed MI techniques based on Miller and Rollnick's [(2013). Motivational interviewing: Preparing people for change (3rd ed.). New York: Guildford Press] conceptualisation. Each technique was then coded for independence and uniqueness by independent experts (n = 10). The experts also compared each MI technique to those from the BCTTv1. Experts identified 38 distinct MI techniques with high agreement on clarity, uniqueness, preciseness, and distinctiveness ratings. Of the identified techniques, 16 were classified as relational techniques. The remaining 22 techniques were classified as content based. Sixteen of the MI techniques were identified as having substantial overlap with techniques from the BCTTv1. The isolation and classification of MI techniques will provide researchers with the necessary tools to clearly specify MI interventions and test the main and interactive effects of the techniques on health behaviour. The distinction between relational and content-based techniques within MI is also an important advance, recognising that changes in motivation and behaviour in MI is a function of both intervention content and the interpersonal style

  9. Silicon nanowire arrays as learning chemical vapour classifiers

    International Nuclear Information System (INIS)

    Niskanen, A O; Colli, A; White, R; Li, H W; Spigone, E; Kivioja, J M

    2011-01-01

    Nanowire field-effect transistors are a promising class of devices for various sensing applications. Apart from detecting individual chemical or biological analytes, it is especially interesting to use multiple selective sensors to look at their collective response in order to perform classification into predetermined categories. We show that non-functionalised silicon nanowire arrays can be used to robustly classify different chemical vapours using simple statistical machine learning methods. We were able to distinguish between acetone, ethanol and water with 100% accuracy while methanol, ethanol and 2-propanol were classified with 96% accuracy in ambient conditions.

  10. 48 CFR 8.608 - Protection of classified and sensitive information.

    Science.gov (United States)

    2010-10-01

    ... Prison Industries, Inc. 8.608 Protection of classified and sensitive information. Agencies shall not enter into any contract with FPI that allows an inmate worker access to any— (a) Classified data; (b) Geographic data regarding the location of— (1) Surface and subsurface infrastructure providing communications...

  11. Classified Component Disposal at the Nevada National Security Site (NNSS) - 13454

    Energy Technology Data Exchange (ETDEWEB)

    Poling, Jeanne; Arnold, Pat [National Security Technologies, LLC (NSTec), P.O. Box 98521, Las Vegas, NV 89193-8521 (United States); Saad, Max [Sandia National Laboratories, P.O. Box 5800, Albuquerque, NM 87185 (United States); DiSanza, Frank [E. Frank DiSanza Consulting, 2250 Alanhurst Drive, Henderson, NV 89052 (United States); Cabble, Kevin [U.S. Department of Energy, National Nuclear Security Administration Nevada Site Office, P.O. Box 98518, Las Vegas, NV 89193-8518 (United States)

    2013-07-01

    The Nevada National Security Site (NNSS) has added the capability needed for the safe, secure disposal of non-nuclear classified components that have been declared excess to national security requirements. The NNSS has worked with U.S. Department of Energy, National Nuclear Security Administration senior leadership to gain formal approval for permanent burial of classified matter at the NNSS in the Area 5 Radioactive Waste Management Complex owned by the U.S. Department of Energy. Additionally, by working with state regulators, the NNSS added the capability to dispose non-radioactive hazardous and non-hazardous classified components. The NNSS successfully piloted the new disposal pathway with the receipt of classified materials from the Kansas City Plant in March 2012. (authors)

  12. Classified Component Disposal at the Nevada National Security Site (NNSS) - 13454

    International Nuclear Information System (INIS)

    Poling, Jeanne; Arnold, Pat; Saad, Max; DiSanza, Frank; Cabble, Kevin

    2013-01-01

    The Nevada National Security Site (NNSS) has added the capability needed for the safe, secure disposal of non-nuclear classified components that have been declared excess to national security requirements. The NNSS has worked with U.S. Department of Energy, National Nuclear Security Administration senior leadership to gain formal approval for permanent burial of classified matter at the NNSS in the Area 5 Radioactive Waste Management Complex owned by the U.S. Department of Energy. Additionally, by working with state regulators, the NNSS added the capability to dispose non-radioactive hazardous and non-hazardous classified components. The NNSS successfully piloted the new disposal pathway with the receipt of classified materials from the Kansas City Plant in March 2012. (authors)

  13. Using passive cavitation images to classify high-intensity focused ultrasound lesions.

    Science.gov (United States)

    Haworth, Kevin J; Salgaonkar, Vasant A; Corregan, Nicholas M; Holland, Christy K; Mast, T Douglas

    2015-09-01

    Passive cavitation imaging provides spatially resolved monitoring of cavitation emissions. However, the diffraction limit of a linear imaging array results in relatively poor range resolution. Poor range resolution has limited prior analyses of the spatial specificity and sensitivity of passive cavitation imaging in predicting thermal lesion formation. In this study, this limitation is overcome by orienting a linear array orthogonal to the high-intensity focused ultrasound propagation direction and performing passive imaging. Fourteen lesions were formed in ex vivo bovine liver samples as a result of 1.1-MHz continuous-wave ultrasound exposure. The lesions were classified as focal, "tadpole" or pre-focal based on their shape and location. Passive cavitation images were beamformed from emissions at the fundamental, harmonic, ultraharmonic and inharmonic frequencies with an established algorithm. Using the area under a receiver operating characteristic curve (AUROC), fundamental, harmonic and ultraharmonic emissions were found to be significant predictors of lesion formation for all lesion types. For both harmonic and ultraharmonic emissions, pre-focal lesions were classified most successfully (AUROC values of 0.87 and 0.88, respectively), followed by tadpole lesions (AUROC values of 0.77 and 0.64, respectively) and focal lesions (AUROC values of 0.65 and 0.60, respectively). Copyright © 2015 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.

  14. Generalization in the XCSF classifier system: analysis, improvement, and extension.

    Science.gov (United States)

    Lanzi, Pier Luca; Loiacono, Daniele; Wilson, Stewart W; Goldberg, David E

    2007-01-01

    We analyze generalization in XCSF and introduce three improvements. We begin by showing that the types of generalizations evolved by XCSF can be influenced by the input range. To explain these results we present a theoretical analysis of the convergence of classifier weights in XCSF which highlights a broader issue. In XCSF, because of the mathematical properties of the Widrow-Hoff update, the convergence of classifier weights in a given subspace can be slow when the spread of the eigenvalues of the autocorrelation matrix associated with each classifier is large. As a major consequence, the system's accuracy pressure may act before classifier weights are adequately updated, so that XCSF may evolve piecewise constant approximations, instead of the intended, and more efficient, piecewise linear ones. We propose three different ways to update classifier weights in XCSF so as to increase the generalization capabilities of XCSF: one based on a condition-based normalization of the inputs, one based on linear least squares, and one based on the recursive version of linear least squares. Through a series of experiments we show that while all three approaches significantly improve XCSF, least squares approaches appear to be best performing and most robust. Finally we show how XCSF can be extended to include polynomial approximations.

  15. Statistical and Machine-Learning Classifier Framework to Improve Pulse Shape Discrimination System Design

    Energy Technology Data Exchange (ETDEWEB)

    Wurtz, R. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Kaplan, A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2015-10-28

    Pulse shape discrimination (PSD) is a variety of statistical classifier. Fully-­realized statistical classifiers rely on a comprehensive set of tools for designing, building, and implementing. PSD advances rely on improvements to the implemented algorithm. PSD advances can be improved by using conventional statistical classifier or machine learning methods. This paper provides the reader with a glossary of classifier-­building elements and their functions in a fully-­designed and operational classifier framework that can be used to discover opportunities for improving PSD classifier projects. This paper recommends reporting the PSD classifier’s receiver operating characteristic (ROC) curve and its behavior at a gamma rejection rate (GRR) relevant for realistic applications.

  16. Classifier-ensemble incremental-learning procedure for nuclear transient identification at different operational conditions

    Energy Technology Data Exchange (ETDEWEB)

    Baraldi, Piero, E-mail: piero.baraldi@polimi.i [Dipartimento di Energia - Sezione Ingegneria Nucleare, Politecnico di Milano, via Ponzio 34/3, 20133 Milano (Italy); Razavi-Far, Roozbeh [Dipartimento di Energia - Sezione Ingegneria Nucleare, Politecnico di Milano, via Ponzio 34/3, 20133 Milano (Italy); Zio, Enrico [Dipartimento di Energia - Sezione Ingegneria Nucleare, Politecnico di Milano, via Ponzio 34/3, 20133 Milano (Italy); Ecole Centrale Paris-Supelec, Paris (France)

    2011-04-15

    An important requirement for the practical implementation of empirical diagnostic systems is the capability of classifying transients in all plant operational conditions. The present paper proposes an approach based on an ensemble of classifiers for incrementally learning transients under different operational conditions. New classifiers are added to the ensemble where transients occurring in new operational conditions are not satisfactorily classified. The construction of the ensemble is made by bagging; the base classifier is a supervised Fuzzy C Means (FCM) classifier whose outcomes are combined by majority voting. The incremental learning procedure is applied to the identification of simulated transients in the feedwater system of a Boiling Water Reactor (BWR) under different reactor power levels.

  17. Classifier-ensemble incremental-learning procedure for nuclear transient identification at different operational conditions

    International Nuclear Information System (INIS)

    Baraldi, Piero; Razavi-Far, Roozbeh; Zio, Enrico

    2011-01-01

    An important requirement for the practical implementation of empirical diagnostic systems is the capability of classifying transients in all plant operational conditions. The present paper proposes an approach based on an ensemble of classifiers for incrementally learning transients under different operational conditions. New classifiers are added to the ensemble where transients occurring in new operational conditions are not satisfactorily classified. The construction of the ensemble is made by bagging; the base classifier is a supervised Fuzzy C Means (FCM) classifier whose outcomes are combined by majority voting. The incremental learning procedure is applied to the identification of simulated transients in the feedwater system of a Boiling Water Reactor (BWR) under different reactor power levels.

  18. LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone

    KAUST Repository

    Chen, Peng

    2014-12-03

    Background Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. Results In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. Conclusions Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.

  19. High-resolution gel electrophoresis and sodium dodecyl sulphate-agarose gel electrophoresis on urine samples for qualitative analysis of proteinuria in dogs.

    Science.gov (United States)

    Giori, Luca; Tricomi, Flavia Marcella; Zatelli, Andrea; Roura, Xavier; Paltrinieri, Saverio

    2011-07-01

    The aims of the current study were to assess whether sodium dodecyl sulphate-agarose gel electrophoresis (SDS-AGE) and high-resolution electrophoresis (HRE) can identify dogs with a urinary protein-to-creatinine ratio (UPC ratio) >0.2 and whether HRE can provide preliminary information about the type of proteinuria, using SDS-AGE as a reference method. HRE and SDS-AGE were conducted on 87 urine samples classified according to the International Renal Interest Society as non-proteinuric (NP; UPC ratio: 0.51; 40/87). SDS-AGE and HRE were positive in 14 out of 32 and 3 out of 32 NP samples and in 52 out of 55 and 40 out of 55 samples with a UPC ratio >0.20, respectively. The concordance between HRE or SDS and UPC ratio was comparable (κ = 0.59; κ = 0.55). However, specificity (90%) and positive likelihood ratio (7.76) were higher for HRE than for SDS-AGE (56% and 2.16) while sensitivity was lower (73% vs. 94%). The analysis of HRE results revealed that a percentage of albumin >41.4% and an albumin/α(1)-globulin ratio (alb/α(1) ratio) >1.46 can identify samples classified by SDS-AGE as affected by glomerular proteinuria while a percentage of α(1)-globulin >40.8% and an alb/α(1) ratio HRE could misclassify samples with a UPC ratio higher or lower than 0.20. Therefore, UPC ratio must always be determined before conducting these tests. The percentage of albumin and α(1)-globulin or the alb/α(1) ratio determined by HRE can provide preliminary information about the origin of proteinuria.

  20. Ultrasonic Sensor Signals and Optimum Path Forest Classifier for the Microstructural Characterization of Thermally-Aged Inconel 625 Alloy

    Directory of Open Access Journals (Sweden)

    Victor Hugo C. de Albuquerque

    2015-05-01

    Full Text Available Secondary phases, such as laves and carbides, are formed during the final solidification stages of nickel-based superalloy coatings deposited during the gas tungsten arc welding cold wire process. However, when aged at high temperatures, other phases can precipitate in the microstructure, like the γ'' and δ phases. This work presents an evaluation of the powerful optimum path forest (OPF classifier configured with six distance functions to classify background echo and backscattered ultrasonic signals from samples of the inconel 625 superalloy thermally aged at 650 and 950 \\(^\\circ\\C for 10, 100 and 200 h. The background echo and backscattered ultrasonic signals were acquired using transducers with frequencies of 4 and 5 MHz. The potentiality of ultrasonic sensor signals combined with the OPF to characterize the microstructures of an inconel 625 thermally aged and in the as-welded condition were confirmed by the results. The experimental results revealed that the OPF classifier is sufficiently fast (classification total time of 0.316 ms and accurate (accuracy of 88.75% and harmonic mean of 89.52 for the application proposed.

  1. Ultrasonic sensor signals and optimum path forest classifier for the microstructural characterization of thermally-aged inconel 625 alloy.

    Science.gov (United States)

    de Albuquerque, Victor Hugo C; Barbosa, Cleisson V; Silva, Cleiton C; Moura, Elineudo P; Filho, Pedro P Rebouças; Papa, João P; Tavares, João Manuel R S

    2015-05-27

    Secondary phases, such as laves and carbides, are formed during the final solidification stages of nickel-based superalloy coatings deposited during the gas tungsten arc welding cold wire process. However, when aged at high temperatures, other phases can precipitate in the microstructure, like the γ'' and δ phases. This work presents an evaluation of the powerful optimum path forest (OPF) classifier configured with six distance functions to classify background echo and backscattered ultrasonic signals from samples of the inconel 625 superalloy thermally aged at 650 and 950 °C for 10, 100 and 200 h. The background echo and backscattered ultrasonic signals were acquired using transducers with frequencies of 4 and 5 MHz. The potentiality of ultrasonic sensor signals combined with the OPF to characterize the microstructures of an inconel 625 thermally aged and in the as-welded condition were confirmed by the results. The experimental results revealed that the OPF classifier is sufficiently fast (classification total time of 0.316 ms) and accurate (accuracy of 88.75%" and harmonic mean of 89.52) for the application proposed.

  2. Iceberg Semantics For Count Nouns And Mass Nouns: Classifiers, measures and portions

    Directory of Open Access Journals (Sweden)

    Fred Landman

    2016-12-01

    It is the analysis of complex NPs and their mass-count properties that is the focus of the second part of this paper. There I develop an analysis of English and Dutch pseudo- partitives, in particular, measure phrases like three liters of wine and classifier phrases like three glasses of wine. We will study measure interpretations and classifier interpretations of measures and classifiers, and different types of classifier interpretations: container interpretations, contents interpretations, and - indeed - portion interpretations. Rothstein 2011 argues that classifier interpretations (including portion interpretations of pseudo partitives pattern with count nouns, but that measure interpretations pattern with mass nouns. I will show that this distinction follows from the very basic architecture of Iceberg semantics.

  3. 18 CFR 3a.12 - Authority to classify official information.

    Science.gov (United States)

    2010-04-01

    ... efficient administration. (b) The authority to classify information or material originally as Top Secret is... classify information or material originally as Secret is exercised only by: (1) Officials who have Top... information or material originally as Confidential is exercised by officials who have Top Secret or Secret...

  4. Multivariate analysis of quantitative traits can effectively classify rapeseed germplasm

    Directory of Open Access Journals (Sweden)

    Jankulovska Mirjana

    2014-01-01

    Full Text Available In this study, the use of different multivariate approaches to classify rapeseed genotypes based on quantitative traits has been presented. Tree regression analysis, PCA analysis and two-way cluster analysis were applied in order todescribe and understand the extent of genetic variability in spring rapeseed genotype by trait data. The traits which highly influenced seed and oil yield in rapeseed were successfully identified by the tree regression analysis. Principal predictor for both response variables was number of pods per plant (NP. NP and 1000 seed weight could help in the selection of high yielding genotypes. High values for both traits and oil content could lead to high oil yielding genotypes. These traits may serve as indirect selection criteria and can lead to improvement of seed and oil yield in rapeseed. Quantitative traits that explained most of the variability in the studied germplasm were classified using principal component analysis. In this data set, five PCs were identified, out of which the first three PCs explained 63% of the total variance. It helped in facilitating the choice of variables based on which the genotypes’ clustering could be performed. The two-way cluster analysissimultaneously clustered genotypes and quantitative traits. The final number of clusters was determined using bootstrapping technique. This approach provided clear overview on the variability of the analyzed genotypes. The genotypes that have similar performance regarding the traits included in this study can be easily detected on the heatmap. Genotypes grouped in the clusters 1 and 8 had high values for seed and oil yield, and relatively short vegetative growth duration period and those in cluster 9, combined moderate to low values for vegetative growth duration and moderate to high seed and oil yield. These genotypes should be further exploited and implemented in the rapeseed breeding program. The combined application of these multivariate methods

  5. Human Activity Recognition by Combining a Small Number of Classifiers.

    Science.gov (United States)

    Nazabal, Alfredo; Garcia-Moreno, Pablo; Artes-Rodriguez, Antonio; Ghahramani, Zoubin

    2016-09-01

    We consider the problem of daily human activity recognition (HAR) using multiple wireless inertial sensors, and specifically, HAR systems with a very low number of sensors, each one providing an estimation of the performed activities. We propose new Bayesian models to combine the output of the sensors. The models are based on a soft outputs combination of individual classifiers to deal with the small number of sensors. We also incorporate the dynamic nature of human activities as a first-order homogeneous Markov chain. We develop both inductive and transductive inference methods for each model to be employed in supervised and semisupervised situations, respectively. Using different real HAR databases, we compare our classifiers combination models against a single classifier that employs all the signals from the sensors. Our models exhibit consistently a reduction of the error rate and an increase of robustness against sensor failures. Our models also outperform other classifiers combination models that do not consider soft outputs and an Markovian structure of the human activities.

  6. Identifying Cassini's Magnetospheric Location Using Magnetospheric Imaging Instrument (MIMI) Data and Machine Learning

    Science.gov (United States)

    Vandegriff, J. D.; Smith, G. L.; Edenbaum, H.; Peachey, J. M.; Mitchell, D. G.

    2017-12-01

    We analyzed data from Cassini's Magnetospheric Imaging Instrument (MIMI) and Magnetometer (MAG) and attempted to identify the region of Saturn's magnetosphere that Cassini was in at a given time using machine learning. MIMI data are from the Charge-Energy-Mass Spectrometer (CHEMS) instrument and the Low-Energy Magnetospheric Measurement System (LEMMS). We trained on data where the region is known based on a previous analysis of Cassini Plasma Spectrometer (CAPS) plasma data. Three magnetospheric regions are considered: Magnetosphere, Magnetosheath, and Solar Wind. MIMI particle intensities, magnetic field values, and spacecraft position are used as input attributes, and the output is the CAPS-based region, which is available from 2004 to 2012. We then use the trained classifier to identify Cassini's magnetospheric regions for times after 2012, when CAPS data is no longer available. Training accuracy is evaluated by testing the classifier performance on a time range of known regions that the classifier has never seen. Preliminary results indicate a 68% accuracy on such test data. Other techniques are being tested that may increase this performance. We present the data and algorithms used, and will describe the latest results, including the magnetospheric regions post-2012 identified by the algorithm.

  7. Classifying objects in LWIR imagery via CNNs

    Science.gov (United States)

    Rodger, Iain; Connor, Barry; Robertson, Neil M.

    2016-10-01

    The aim of the presented work is to demonstrate enhanced target recognition and improved false alarm rates for a mid to long range detection system, utilising a Long Wave Infrared (LWIR) sensor. By exploiting high quality thermal image data and recent techniques in machine learning, the system can provide automatic target recognition capabilities. A Convolutional Neural Network (CNN) is trained and the classifier achieves an overall accuracy of > 95% for 6 object classes related to land defence. While the highly accurate CNN struggles to recognise long range target classes, due to low signal quality, robust target discrimination is achieved for challenging candidates. The overall performance of the methodology presented is assessed using human ground truth information, generating classifier evaluation metrics for thermal image sequences.

  8. New neural network classifier of fall-risk based on the Mahalanobis distance and kinematic parameters assessed by a wearable device

    International Nuclear Information System (INIS)

    Giansanti, Daniele; Macellari, Velio; Maccioni, Giovanni

    2008-01-01

    Fall prevention lacks easy, quantitative and wearable methods for the classification of fall-risk (FR). Efforts must be thus devoted to the choice of an ad hoc classifier both to reduce the size of the sample used to train the classifier and to improve performances. A new methodology that uses a neural network (NN) and a wearable device are hereby proposed for this purpose. The NN uses kinematic parameters assessed by a wearable device with accelerometers and rate gyroscopes during a posturography protocol. The training of the NN was based on the Mahalanobis distance and was carried out on two groups of 30 elderly subjects with varying fall-risk Tinetti scores. The validation was done on two groups of 100 subjects with different fall-risk Tinetti scores and showed that, both in terms of specificity and sensitivity, the NN performed better than other classifiers (naive Bayes, Bayes net, multilayer perceptron, support vector machines, statistical classifiers). In particular, (i) the proposed NN methodology improved the specificity and sensitivity by a mean of 3% when compared to the statistical classifier based on the Mahalanobis distance (SCMD) described in Giansanti (2006 Physiol. Meas. 27 1081–90); (ii) the assessed specificity was 97%, the assessed sensitivity was 98% and the area under receiver operator characteristics was 0.965. (note)

  9. Deep-Learning Convolutional Neural Networks Accurately Classify Genetic Mutations in Gliomas.

    Science.gov (United States)

    Chang, P; Grinband, J; Weinberg, B D; Bardis, M; Khy, M; Cadena, G; Su, M-Y; Cha, S; Filippi, C G; Bota, D; Baldi, P; Poisson, L M; Jain, R; Chow, D

    2018-05-10

    The World Health Organization has recently placed new emphasis on the integration of genetic information for gliomas. While tissue sampling remains the criterion standard, noninvasive imaging techniques may provide complimentary insight into clinically relevant genetic mutations. Our aim was to train a convolutional neural network to independently predict underlying molecular genetic mutation status in gliomas with high accuracy and identify the most predictive imaging features for each mutation. MR imaging data and molecular information were retrospectively obtained from The Cancer Imaging Archives for 259 patients with either low- or high-grade gliomas. A convolutional neural network was trained to classify isocitrate dehydrogenase 1 ( IDH1 ) mutation status, 1p/19q codeletion, and O6-methylguanine-DNA methyltransferase ( MGMT ) promotor methylation status. Principal component analysis of the final convolutional neural network layer was used to extract the key imaging features critical for successful classification. Classification had high accuracy: IDH1 mutation status, 94%; 1p/19q codeletion, 92%; and MGMT promotor methylation status, 83%. Each genetic category was also associated with distinctive imaging features such as definition of tumor margins, T1 and FLAIR suppression, extent of edema, extent of necrosis, and textural features. Our results indicate that for The Cancer Imaging Archives dataset, machine-learning approaches allow classification of individual genetic mutations of both low- and high-grade gliomas. We show that relevant MR imaging features acquired from an added dimensionality-reduction technique demonstrate that neural networks are capable of learning key imaging components without prior feature selection or human-directed training. © 2018 by American Journal of Neuroradiology.

  10. Sample size determination for disease prevalence studies with partially validated data.

    Science.gov (United States)

    Qiu, Shi-Fang; Poon, Wai-Yin; Tang, Man-Lai

    2016-02-01

    Disease prevalence is an important topic in medical research, and its study is based on data that are obtained by classifying subjects according to whether a disease has been contracted. Classification can be conducted with high-cost gold standard tests or low-cost screening tests, but the latter are subject to the misclassification of subjects. As a compromise between the two, many research studies use partially validated datasets in which all data points are classified by fallible tests, and some of the data points are validated in the sense that they are also classified by the completely accurate gold-standard test. In this article, we investigate the determination of sample sizes for disease prevalence studies with partially validated data. We use two approaches. The first is to find sample sizes that can achieve a pre-specified power of a statistical test at a chosen significance level, and the second is to find sample sizes that can control the width of a confidence interval with a pre-specified confidence level. Empirical studies have been conducted to demonstrate the performance of various testing procedures with the proposed sample sizes. The applicability of the proposed methods are illustrated by a real-data example. © The Author(s) 2012.

  11. Classifying web pages with visual features

    NARCIS (Netherlands)

    de Boer, V.; van Someren, M.; Lupascu, T.; Filipe, J.; Cordeiro, J.

    2010-01-01

    To automatically classify and process web pages, current systems use the textual content of those pages, including both the displayed content and the underlying (HTML) code. However, a very important feature of a web page is its visual appearance. In this paper, we show that using generic visual

  12. Dynamic integration of classifiers in the space of principal components

    NARCIS (Netherlands)

    Tsymbal, A.; Pechenizkiy, M.; Puuronen, S.; Patterson, D.W.; Kalinichenko, L.A.; Manthey, R.; Thalheim, B.; Wloka, U.

    2003-01-01

    Recent research has shown the integration of multiple classifiers to be one of the most important directions in machine learning and data mining. It was shown that, for an ensemble to be successful, it should consist of accurate and diverse base classifiers. However, it is also important that the

  13. Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection.

    Science.gov (United States)

    Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad

    2014-11-01

    Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices.

  14. An expert computer program for classifying stars on the MK spectral classification system

    International Nuclear Information System (INIS)

    Gray, R. O.; Corbally, C. J.

    2014-01-01

    This paper describes an expert computer program (MKCLASS) designed to classify stellar spectra on the MK Spectral Classification system in a way similar to humans—by direct comparison with the MK classification standards. Like an expert human classifier, the program first comes up with a rough spectral type, and then refines that spectral type by direct comparison with MK standards drawn from a standards library. A number of spectral peculiarities, including barium stars, Ap and Am stars, λ Bootis stars, carbon-rich giants, etc., can be detected and classified by the program. The program also evaluates the quality of the delivered spectral type. The program currently is capable of classifying spectra in the violet-green region in either the rectified or flux-calibrated format, although the accuracy of the flux calibration is not important. We report on tests of MKCLASS on spectra classified by human classifiers; those tests suggest that over the entire HR diagram, MKCLASS will classify in the temperature dimension with a precision of 0.6 spectral subclass, and in the luminosity dimension with a precision of about one half of a luminosity class. These results compare well with human classifiers.

  15. An expert computer program for classifying stars on the MK spectral classification system

    Energy Technology Data Exchange (ETDEWEB)

    Gray, R. O. [Department of Physics and Astronomy, Appalachian State University, Boone, NC 26808 (United States); Corbally, C. J. [Vatican Observatory Research Group, Tucson, AZ 85721-0065 (United States)

    2014-04-01

    This paper describes an expert computer program (MKCLASS) designed to classify stellar spectra on the MK Spectral Classification system in a way similar to humans—by direct comparison with the MK classification standards. Like an expert human classifier, the program first comes up with a rough spectral type, and then refines that spectral type by direct comparison with MK standards drawn from a standards library. A number of spectral peculiarities, including barium stars, Ap and Am stars, λ Bootis stars, carbon-rich giants, etc., can be detected and classified by the program. The program also evaluates the quality of the delivered spectral type. The program currently is capable of classifying spectra in the violet-green region in either the rectified or flux-calibrated format, although the accuracy of the flux calibration is not important. We report on tests of MKCLASS on spectra classified by human classifiers; those tests suggest that over the entire HR diagram, MKCLASS will classify in the temperature dimension with a precision of 0.6 spectral subclass, and in the luminosity dimension with a precision of about one half of a luminosity class. These results compare well with human classifiers.

  16. Ship localization in Santa Barbara Channel using machine learning classifiers.

    Science.gov (United States)

    Niu, Haiqiang; Ozanich, Emma; Gerstoft, Peter

    2017-11-01

    Machine learning classifiers are shown to outperform conventional matched field processing for a deep water (600 m depth) ocean acoustic-based ship range estimation problem in the Santa Barbara Channel Experiment when limited environmental information is known. Recordings of three different ships of opportunity on a vertical array were used as training and test data for the feed-forward neural network and support vector machine classifiers, demonstrating the feasibility of machine learning methods to locate unseen sources. The classifiers perform well up to 10 km range whereas the conventional matched field processing fails at about 4 km range without accurate environmental information.

  17. Influence of Sampling Practices on the Appearance of DNA Image Histograms of Prostate Cells in FNAB Samples

    Directory of Open Access Journals (Sweden)

    Abdelbaset Buhmeida

    1999-01-01

    Full Text Available Twenty‐one fine needle aspiration biopsies (FNAB of the prostate, diagnostically classified as definitely malignant, were studied. The Papanicolaou or H&E stained samples were destained and then stained for DNA with the Feulgen reaction. DNA cytometry was applied after different sampling rules. The histograms varied according to the sampling rule applied. Because free cells between cell groups were easier to measure than cells in the cell groups, two sampling rules were tested in all samples: (i cells in the cell groups were measured, and (ii free cells between cell groups were measured. Abnormal histograms were more common after the sampling rule based on free cells, suggesting that abnormal patterns are best revealed through the free cells in these samples. The conclusions were independent of the applied histogram interpretation method.

  18. Deconstructing Cross-Entropy for Probabilistic Binary Classifiers

    Directory of Open Access Journals (Sweden)

    Daniel Ramos

    2018-03-01

    Full Text Available In this work, we analyze the cross-entropy function, widely used in classifiers both as a performance measure and as an optimization objective. We contextualize cross-entropy in the light of Bayesian decision theory, the formal probabilistic framework for making decisions, and we thoroughly analyze its motivation, meaning and interpretation from an information-theoretical point of view. In this sense, this article presents several contributions: First, we explicitly analyze the contribution to cross-entropy of (i prior knowledge; and (ii the value of the features in the form of a likelihood ratio. Second, we introduce a decomposition of cross-entropy into two components: discrimination and calibration. This decomposition enables the measurement of different performance aspects of a classifier in a more precise way; and justifies previously reported strategies to obtain reliable probabilities by means of the calibration of the output of a discriminating classifier. Third, we give different information-theoretical interpretations of cross-entropy, which can be useful in different application scenarios, and which are related to the concept of reference probabilities. Fourth, we present an analysis tool, the Empirical Cross-Entropy (ECE plot, a compact representation of cross-entropy and its aforementioned decomposition. We show the power of ECE plots, as compared to other classical performance representations, in two diverse experimental examples: a speaker verification system, and a forensic case where some glass findings are present.

  19. Beyond "implementation strategies": classifying the full range of strategies used in implementation science and practice.

    Science.gov (United States)

    Leeman, Jennifer; Birken, Sarah A; Powell, Byron J; Rohweder, Catherine; Shea, Christopher M

    2017-11-03

    Strategies are central to the National Institutes of Health's definition of implementation research as "the study of strategies to integrate evidence-based interventions into specific settings." Multiple scholars have proposed lists of the strategies used in implementation research and practice, which they increasingly are classifying under the single term "implementation strategies." We contend that classifying all strategies under a single term leads to confusion, impedes synthesis across studies, and limits advancement of the full range of strategies of importance to implementation. To address this concern, we offer a system for classifying implementation strategies that builds on Proctor and colleagues' (2013) reporting guidelines, which recommend that authors not only name and define their implementation strategies but also specify who enacted the strategy (i.e., the actor) and the level and determinants that were targeted (i.e., the action targets). We build on Wandersman and colleagues' Interactive Systems Framework to distinguish strategies based on whether they are enacted by actors functioning as part of a Delivery, Support, or Synthesis and Translation System. We build on Damschroder and colleague's Consolidated Framework for Implementation Research to distinguish the levels that strategies target (intervention, inner setting, outer setting, individual, and process). We then draw on numerous resources to identify determinants, which are conceptualized as modifiable factors that prevent or enable the adoption and implementation of evidence-based interventions. Identifying actors and targets resulted in five conceptually distinct classes of implementation strategies: dissemination, implementation process, integration, capacity-building, and scale-up. In our descriptions of each class, we identify the level of the Interactive System Framework at which the strategy is enacted (actors), level and determinants targeted (action targets), and outcomes used to

  20. Invention and validation of an automated camera system that uses optical character recognition to identify patient name mislabeled samples.

    Science.gov (United States)

    Hawker, Charles D; McCarthy, William; Cleveland, David; Messinger, Bonnie L

    2014-03-01

    Mislabeled samples are a serious problem in most clinical laboratories. Published error rates range from 0.39/1000 to as high as 1.12%. Standardization of bar codes and label formats has not yet achieved the needed improvement. The mislabel rate in our laboratory, although low compared with published rates, prompted us to seek a solution to achieve zero errors. To reduce or eliminate our mislabeled samples, we invented an automated device using 4 cameras to photograph the outside of a sample tube. The system uses optical character recognition (OCR) to look for discrepancies between the patient name in our laboratory information system (LIS) vs the patient name on the customer label. All discrepancies detected by the system's software then require human inspection. The system was installed on our automated track and validated with production samples. We obtained 1 009 830 images during the validation period, and every image was reviewed. OCR passed approximately 75% of the samples, and no mislabeled samples were passed. The 25% failed by the system included 121 samples actually mislabeled by patient name and 148 samples with spelling discrepancies between the patient name on the customer label and the patient name in our LIS. Only 71 of the 121 mislabeled samples detected by OCR were found through our normal quality assurance process. We have invented an automated camera system that uses OCR technology to identify potential mislabeled samples. We have validated this system using samples transported on our automated track. Full implementation of this technology offers the possibility of zero mislabeled samples in the preanalytic stage.

  1. Towards Identify Selective Antibacterial Peptides Based on Abstracts Meaning

    Directory of Open Access Journals (Sweden)

    Liliana I. Barbosa-Santillán

    2016-01-01

    Full Text Available We present an Identify Selective Antibacterial Peptides (ISAP approach based on abstracts meaning. Laboratories and researchers have significantly increased the report of their discoveries related to antibacterial peptides in primary publications. It is important to find antibacterial peptides that have been reported in primary publications because they can produce antibiotics of different generations that attack and destroy the bacteria. Unfortunately, researchers used heterogeneous forms of natural language to describe their discoveries (sometimes without the sequence of the peptides. Thus, we propose that learning the words meaning instead of the antibacterial peptides sequence is possible to identify and predict antibacterial peptides reported in the PubMed engine. The ISAP approach consists of two stages: training and discovering. ISAP founds that the 35% of the abstracts sample had antibacterial peptides and we tested in the updated Antimicrobial Peptide Database 2 (APD2. ISAP predicted that 45% of the abstracts had antibacterial peptides. That is, ISAP found that 810 antibacterial peptides were not classified like that, so they are not reported in APD2. As a result, this new search tool would complement the APD2 with a set of peptides that are candidates to be antibacterial. Finally, 20% of the abstracts were not semantic related to APD2.

  2. Identifying intimate partner violence (IPV) during the postpartum period in a Greek sample.

    Science.gov (United States)

    Vivilaki, Victoria G; Dafermos, Vassilis; Daglas, Maria; Antoniou, Evagelia; Tsopelas, Nicholas D; Theodorakis, Pavlos N; Brown, Judith B; Lionis, Christos

    2010-12-01

    Research has highlighted the wide impact of intimate partner violence (IPV) and the public health role of community health professionals in detection of victimized women. The purpose of this study was to identify postpartum emotional and physical abuse and to validate the Greek version of the Women Abuse Screening Tool (WAST) along with its sensitivity and specificity. Five hundred seventy-nine mothers within 12 weeks postpartum were recruited from the perinatal care registers of the Maternity Departments of two public hospitals in Athens, Greece. Participants were randomly selected by clinic or shift. The WAST and the Partner Violence Screen (PVS) surveys were administered in random order to the mothers from September 2007 to January 2008. The WAST was compared with the PVS as a criterion standard. Agreement between the screening instruments was examined. The psychometric measurements that were performed included: two independent sample t tests, reliability coefficients, explanatory factor analysis using a Varimax rotation, and Principal Components Method. Confirmatory analysis-also called structural equation modeling-of principal components was conducted by Linear Structural Relations. A receiver operating characteristic (ROC) analysis was carried out to evaluate the global functioning of the scale. Two hundred four (35.6%) of the mothers screened were identified as experiencing IPV. Scores on the WAST correlated well with those on the PVS; the internal consistency of the WAST Greek version-tested using Cronbach's alpha coefficient-was found to be 0.926 and that of Guttman's split-half coefficient was 0.924. Our findings confirm the multidimensionality of the WAST, demonstrating a two-factor structure. The area under ROC curve (AUC) was found to be 0.824, and the logistic estimate for the threshold score of 0/1 fitted the model sensitivity at 99.7% and model specificity at 64.4%. Our data confirm the validity of the Greek version of the WAST in identifying IPV

  3. LOCALIZATION AND RECOGNITION OF DYNAMIC HAND GESTURES BASED ON HIERARCHY OF MANIFOLD CLASSIFIERS

    OpenAIRE

    M. Favorskaya; A. Nosov; A. Popov

    2015-01-01

    Generally, the dynamic hand gestures are captured in continuous video sequences, and a gesture recognition system ought to extract the robust features automatically. This task involves the highly challenging spatio-temporal variations of dynamic hand gestures. The proposed method is based on two-level manifold classifiers including the trajectory classifiers in any time instants and the posture classifiers of sub-gestures in selected time instants. The trajectory classifiers contain skin dete...

  4. A comparison of rule-based and machine learning approaches for classifying patient portal messages.

    Science.gov (United States)

    Cronin, Robert M; Fabbri, Daniel; Denny, Joshua C; Rosenbloom, S Trent; Jackson, Gretchen Purcell

    2017-09-01

    Secure messaging through patient portals is an increasingly popular way that consumers interact with healthcare providers. The increasing burden of secure messaging can affect clinic staffing and workflows. Manual management of portal messages is costly and time consuming. Automated classification of portal messages could potentially expedite message triage and delivery of care. We developed automated patient portal message classifiers with rule-based and machine learning techniques using bag of words and natural language processing (NLP) approaches. To evaluate classifier performance, we used a gold standard of 3253 portal messages manually categorized using a taxonomy of communication types (i.e., main categories of informational, medical, logistical, social, and other communications, and subcategories including prescriptions, appointments, problems, tests, follow-up, contact information, and acknowledgement). We evaluated our classifiers' accuracies in identifying individual communication types within portal messages with area under the receiver-operator curve (AUC). Portal messages often contain more than one type of communication. To predict all communication types within single messages, we used the Jaccard Index. We extracted the variables of importance for the random forest classifiers. The best performing approaches to classification for the major communication types were: logistic regression for medical communications (AUC: 0.899); basic (rule-based) for informational communications (AUC: 0.842); and random forests for social communications and logistical communications (AUCs: 0.875 and 0.925, respectively). The best performing classification approach of classifiers for individual communication subtypes was random forests for Logistical-Contact Information (AUC: 0.963). The Jaccard Indices by approach were: basic classifier, Jaccard Index: 0.674; Naïve Bayes, Jaccard Index: 0.799; random forests, Jaccard Index: 0.859; and logistic regression, Jaccard

  5. COMPARISON OF SVM AND FUZZY CLASSIFIER FOR AN INDIAN SCRIPT

    Directory of Open Access Journals (Sweden)

    M. J. Baheti

    2012-01-01

    Full Text Available With the advent of technological era, conversion of scanned document (handwritten or printed into machine editable format has attracted many researchers. This paper deals with the problem of recognition of Gujarati handwritten numerals. Gujarati numeral recognition requires performing some specific steps as a part of preprocessing. For preprocessing digitization, segmentation, normalization and thinning are done with considering that the image have almost no noise. Further affine invariant moments based model is used for feature extraction and finally Support Vector Machine (SVM and Fuzzy classifiers are used for numeral classification. . The comparison of SVM and Fuzzy classifier is made and it can be seen that SVM procured better results as compared to Fuzzy Classifier.

  6. Evaluating a k-nearest neighbours-based classifier for locating faulty areas in power systems

    Directory of Open Access Journals (Sweden)

    Juan José Mora Flórez

    2008-09-01

    Full Text Available This paper reports a strategy for identifying and locating faults in a power distribution system. The strategy was based on the K-nearest neighbours technique. This technique simply helps to estimate a distance from the features used for describing a particu-lar fault being classified to the faults presented during the training stage. If new data is presented to the proposed fault locator, it is classified according to the nearest example recovered. A characterisation of the voltage and current measurements obtained at one single line end is also presented in this document for assigning the area in the case of a fault in a power system. The pro-posed strategy was tested in a real power distribution system, average 93% confidence indexes being obtained which gives a good indicator of the proposal’s high performance. The results showed how a fault could be located by using features obtained from voltage and current, improving utility response and thereby improving system continuity indexes in power distribution sys-tems.

  7. Learning Bayesian network classifiers for credit scoring using Markov Chain Monte Carlo search

    NARCIS (Netherlands)

    Baesens, B.; Egmont-Petersen, M.; Castelo, R.; Vanthienen, J.

    2001-01-01

    In this paper, we will evaluate the power and usefulness of Bayesian network classifiers for credit scoring. Various types of Bayesian network classifiers will be evaluated and contrasted including unrestricted Bayesian network classifiers learnt using Markov Chain Monte Carlo (MCMC) search.

  8. Snoring classified: The Munich-Passau Snore Sound Corpus.

    Science.gov (United States)

    Janott, Christoph; Schmitt, Maximilian; Zhang, Yue; Qian, Kun; Pandit, Vedhas; Zhang, Zixing; Heiser, Clemens; Hohenhorst, Winfried; Herzog, Michael; Hemmert, Werner; Schuller, Björn

    2018-03-01

    Snoring can be excited in different locations within the upper airways during sleep. It was hypothesised that the excitation locations are correlated with distinct acoustic characteristics of the snoring noise. To verify this hypothesis, a database of snore sounds is developed, labelled with the location of sound excitation. Video and audio recordings taken during drug induced sleep endoscopy (DISE) examinations from three medical centres have been semi-automatically screened for snore events, which subsequently have been classified by ENT experts into four classes based on the VOTE classification. The resulting dataset containing 828 snore events from 219 subjects has been split into Train, Development, and Test sets. An SVM classifier has been trained using low level descriptors (LLDs) related to energy, spectral features, mel frequency cepstral coefficients (MFCC), formants, voicing, harmonic-to-noise ratio (HNR), spectral harmonicity, pitch, and microprosodic features. An unweighted average recall (UAR) of 55.8% could be achieved using the full set of LLDs including formants. Best performing subset is the MFCC-related set of LLDs. A strong difference in performance could be observed between the permutations of train, development, and test partition, which may be caused by the relatively low number of subjects included in the smaller classes of the strongly unbalanced data set. A database of snoring sounds is presented which are classified according to their sound excitation location based on objective criteria and verifiable video material. With the database, it could be demonstrated that machine classifiers can distinguish different excitation location of snoring sounds in the upper airway based on acoustic parameters. Copyright © 2018 Elsevier Ltd. All rights reserved.

  9. Integrating support vector machines and random forests to classify crops in time series of Worldview-2 images

    Science.gov (United States)

    Zafari, A.; Zurita-Milla, R.; Izquierdo-Verdiguier, E.

    2017-10-01

    Crop maps are essential inputs for the agricultural planning done at various governmental and agribusinesses agencies. Remote sensing offers timely and costs efficient technologies to identify and map crop types over large areas. Among the plethora of classification methods, Support Vector Machine (SVM) and Random Forest (RF) are widely used because of their proven performance. In this work, we study the synergic use of both methods by introducing a random forest kernel (RFK) in an SVM classifier. A time series of multispectral WorldView-2 images acquired over Mali (West Africa) in 2014 was used to develop our case study. Ground truth containing five common crop classes (cotton, maize, millet, peanut, and sorghum) were collected at 45 farms and used to train and test the classifiers. An SVM with the standard Radial Basis Function (RBF) kernel, a RF, and an SVM-RFK were trained and tested over 10 random training and test subsets generated from the ground data. Results show that the newly proposed SVM-RFK classifier can compete with both RF and SVM-RBF. The overall accuracies based on the spectral bands only are of 83, 82 and 83% respectively. Adding vegetation indices to the analysis result in the classification accuracy of 82, 81 and 84% for SVM-RFK, RF, and SVM-RBF respectively. Overall, it can be observed that the newly tested RFK can compete with SVM-RBF and RF classifiers in terms of classification accuracy.

  10. Entropy based classifier for cross-domain opinion mining

    Directory of Open Access Journals (Sweden)

    Jyoti S. Deshmukh

    2018-01-01

    Full Text Available In recent years, the growth of social network has increased the interest of people in analyzing reviews and opinions for products before they buy them. Consequently, this has given rise to the domain adaptation as a prominent area of research in sentiment analysis. A classifier trained from one domain often gives poor results on data from another domain. Expression of sentiment is different in every domain. The labeling cost of each domain separately is very high as well as time consuming. Therefore, this study has proposed an approach that extracts and classifies opinion words from one domain called source domain and predicts opinion words of another domain called target domain using a semi-supervised approach, which combines modified maximum entropy and bipartite graph clustering. A comparison of opinion classification on reviews on four different product domains is presented. The results demonstrate that the proposed method performs relatively well in comparison to the other methods. Comparison of SentiWordNet of domain-specific and domain-independent words reveals that on an average 72.6% and 88.4% words, respectively, are correctly classified.

  11. Acute Toxicity of a Recently Identified Phenol-based Synthetic ...

    African Journals Online (AJOL)

    This paper reports on the acute toxicity of a new phenol based synthetic tsetse fly repellent recently identified at the International Centre for Insect Physiology and Ecology (patent No. ... The repellent can be classified as being highly toxic with central nervous system (CNS) involvement and a mild skin and eye irritant.

  12. Application of Machine Learning Approaches for Classifying Sitting Posture Based on Force and Acceleration Sensors.

    Science.gov (United States)

    Zemp, Roland; Tanadini, Matteo; Plüss, Stefan; Schnüriger, Karin; Singh, Navrag B; Taylor, William R; Lorenzetti, Silvio

    2016-01-01

    Occupational musculoskeletal disorders, particularly chronic low back pain (LBP), are ubiquitous due to prolonged static sitting or nonergonomic sitting positions. Therefore, the aim of this study was to develop an instrumented chair with force and acceleration sensors to determine the accuracy of automatically identifying the user's sitting position by applying five different machine learning methods (Support Vector Machines, Multinomial Regression, Boosting, Neural Networks, and Random Forest). Forty-one subjects were requested to sit four times in seven different prescribed sitting positions (total 1148 samples). Sixteen force sensor values and the backrest angle were used as the explanatory variables (features) for the classification. The different classification methods were compared by means of a Leave-One-Out cross-validation approach. The best performance was achieved using the Random Forest classification algorithm, producing a mean classification accuracy of 90.9% for subjects with which the algorithm was not familiar. The classification accuracy varied between 81% and 98% for the seven different sitting positions. The present study showed the possibility of accurately classifying different sitting positions by means of the introduced instrumented office chair combined with machine learning analyses. The use of such novel approaches for the accurate assessment of chair usage could offer insights into the relationships between sitting position, sitting behaviour, and the occurrence of musculoskeletal disorders.

  13. Building an automated SOAP classifier for emergency department reports.

    Science.gov (United States)

    Mowery, Danielle; Wiebe, Janyce; Visweswaran, Shyam; Harkema, Henk; Chapman, Wendy W

    2012-02-01

    Information extraction applications that extract structured event and entity information from unstructured text can leverage knowledge of clinical report structure to improve performance. The Subjective, Objective, Assessment, Plan (SOAP) framework, used to structure progress notes to facilitate problem-specific, clinical decision making by physicians, is one example of a well-known, canonical structure in the medical domain. Although its applicability to structuring data is understood, its contribution to information extraction tasks has not yet been determined. The first step to evaluating the SOAP framework's usefulness for clinical information extraction is to apply the model to clinical narratives and develop an automated SOAP classifier that classifies sentences from clinical reports. In this quantitative study, we applied the SOAP framework to sentences from emergency department reports, and trained and evaluated SOAP classifiers built with various linguistic features. We found the SOAP framework can be applied manually to emergency department reports with high agreement (Cohen's kappa coefficients over 0.70). Using a variety of features, we found classifiers for each SOAP class can be created with moderate to outstanding performance with F(1) scores of 93.9 (subjective), 94.5 (objective), 75.7 (assessment), and 77.0 (plan). We look forward to expanding the framework and applying the SOAP classification to clinical information extraction tasks. Copyright © 2011. Published by Elsevier Inc.

  14. Localizing genes to cerebellar layers by classifying ISH images.

    Directory of Open Access Journals (Sweden)

    Lior Kirsch

    Full Text Available Gene expression controls how the brain develops and functions. Understanding control processes in the brain is particularly hard since they involve numerous types of neurons and glia, and very little is known about which genes are expressed in which cells and brain layers. Here we describe an approach to detect genes whose expression is primarily localized to a specific brain layer and apply it to the mouse cerebellum. We learn typical spatial patterns of expression from a few markers that are known to be localized to specific layers, and use these patterns to predict localization for new genes. We analyze images of in-situ hybridization (ISH experiments, which we represent using histograms of local binary patterns (LBP and train image classifiers and gene classifiers for four layers of the cerebellum: the Purkinje, granular, molecular and white matter layer. On held-out data, the layer classifiers achieve accuracy above 94% (AUC by representing each image at multiple scales and by combining multiple image scores into a single gene-level decision. When applied to the full mouse genome, the classifiers predict specific layer localization for hundreds of new genes in the Purkinje and granular layers. Many genes localized to the Purkinje layer are likely to be expressed in astrocytes, and many others are involved in lipid metabolism, possibly due to the unusual size of Purkinje cells.

  15. Arabic Handwriting Recognition Using Neural Network Classifier

    African Journals Online (AJOL)

    pc

    2018-03-05

    Mar 5, 2018 ... an OCR using Neural Network classifier preceded by a set of preprocessing .... Artificial Neural Networks (ANNs), which we adopt in this research, consist of ... advantage and disadvantages of each technique. In [9],. Khemiri ...

  16. Lung Nodule Image Classification Based on Local Difference Pattern and Combined Classifier.

    Science.gov (United States)

    Mao, Keming; Deng, Zhuofu

    2016-01-01

    This paper proposes a novel lung nodule classification method for low-dose CT images. The method includes two stages. First, Local Difference Pattern (LDP) is proposed to encode the feature representation, which is extracted by comparing intensity difference along circular regions centered at the lung nodule. Then, the single-center classifier is trained based on LDP. Due to the diversity of feature distribution for different class, the training images are further clustered into multiple cores and the multicenter classifier is constructed. The two classifiers are combined to make the final decision. Experimental results on public dataset show the superior performance of LDP and the combined classifier.

  17. Lung Nodule Image Classification Based on Local Difference Pattern and Combined Classifier

    Directory of Open Access Journals (Sweden)

    Keming Mao

    2016-01-01

    Full Text Available This paper proposes a novel lung nodule classification method for low-dose CT images. The method includes two stages. First, Local Difference Pattern (LDP is proposed to encode the feature representation, which is extracted by comparing intensity difference along circular regions centered at the lung nodule. Then, the single-center classifier is trained based on LDP. Due to the diversity of feature distribution for different class, the training images are further clustered into multiple cores and the multicenter classifier is constructed. The two classifiers are combined to make the final decision. Experimental results on public dataset show the superior performance of LDP and the combined classifier.

  18. Detection of microaneurysms in retinal images using an ensemble classifier

    Directory of Open Access Journals (Sweden)

    M.M. Habib

    2017-01-01

    Full Text Available This paper introduces, and reports on the performance of, a novel combination of algorithms for automated microaneurysm (MA detection in retinal images. The presence of MAs in retinal images is a pathognomonic sign of Diabetic Retinopathy (DR which is one of the leading causes of blindness amongst the working age population. An extensive survey of the literature is presented and current techniques in the field are summarised. The proposed technique first detects an initial set of candidates using a Gaussian Matched Filter and then classifies this set to reduce the number of false positives. A Tree Ensemble classifier is used with a set of 70 features (the most commons features in the literature. A new set of 32 MA groundtruth images (with a total of 256 labelled MAs based on images from the MESSIDOR dataset is introduced as a public dataset for benchmarking MA detection algorithms. We evaluate our algorithm on this dataset as well as another public dataset (DIARETDB1 v2.1 and compare it against the best available alternative. Results show that the proposed classifier is superior in terms of eliminating false positive MA detection from the initial set of candidates. The proposed method achieves an ROC score of 0.415 compared to 0.2636 achieved by the best available technique. Furthermore, results show that the classifier model maintains consistent performance across datasets, illustrating the generalisability of the classifier and that overfitting does not occur.

  19. Dynamic cluster generation for a fuzzy classifier with ellipsoidal regions.

    Science.gov (United States)

    Abe, S

    1998-01-01

    In this paper, we discuss a fuzzy classifier with ellipsoidal regions that dynamically generates clusters. First, for the data belonging to a class we define a fuzzy rule with an ellipsoidal region. Namely, using the training data for each class, we calculate the center and the covariance matrix of the ellipsoidal region for the class. Then we tune the fuzzy rules, i.e., the slopes of the membership functions, successively until there is no improvement in the recognition rate of the training data. Then if the number of the data belonging to a class that are misclassified into another class exceeds a prescribed number, we define a new cluster to which those data belong and the associated fuzzy rule. Then we tune the newly defined fuzzy rules in the similar way as stated above, fixing the already obtained fuzzy rules. We iterate generation of clusters and tuning of the newly generated fuzzy rules until the number of the data belonging to a class that are misclassified into another class does not exceed the prescribed number. We evaluate our method using thyroid data, Japanese Hiragana data of vehicle license plates, and blood cell data. By dynamic cluster generation, the generalization ability of the classifier is improved and the recognition rate of the fuzzy classifier for the test data is the best among the neural network classifiers and other fuzzy classifiers if there are no discrete input variables.

  20. A Bayesian Classifier for X-Ray Pulsars Recognition

    Directory of Open Access Journals (Sweden)

    Hao Liang

    2016-01-01

    Full Text Available Recognition for X-ray pulsars is important for the problem of spacecraft’s attitude determination by X-ray Pulsar Navigation (XPNAV. By using the nonhomogeneous Poisson model of the received photons and the minimum recognition error criterion, a classifier based on the Bayesian theorem is proposed. For X-ray pulsars recognition with unknown Doppler frequency and initial phase, the features of every X-ray pulsar are extracted and the unknown parameters are estimated using the Maximum Likelihood (ML method. Besides that, a method to recognize unknown X-ray pulsars or X-ray disturbances is proposed. Simulation results certificate the validity of the proposed Bayesian classifier.

  1. Adaptation in P300 braincomputer interfaces: A two-classifier cotraining approach

    DEFF Research Database (Denmark)

    Panicker, Rajesh C.; Sun, Ying; Puthusserypady, Sadasivan

    2010-01-01

    A cotraining-based approach is introduced for constructing high-performance classifiers for P300-based braincomputer interfaces (BCIs), which were trained from very little data. It uses two classifiers: Fishers linear discriminant analysis and Bayesian linear discriminant analysis progressively...

  2. New complete sample of identified radio sources. Part 2. Statistical study

    International Nuclear Information System (INIS)

    Soltan, A.

    1978-01-01

    Complete sample of radio sources with known redshifts selected in Paper I is studied. Source counts in the sample and the luminosity - volume test show that both quasars and galaxies are subject to the evolution. Luminosity functions for different ranges of redshifts are obtained. Due to many uncertainties only simplified models of the evolution are tested. Exponential decline of the liminosity with time of all the bright sources is in a good agreement both with the luminosity- volume test and N(S) realtion in the entire range of observed flux densities. It is shown that sources in the sample are randomly distributed in scales greater than about 17 Mpc. (author)

  3. Binary naive Bayesian classifiers for correlated Gaussian features: a theoretical analysis

    CSIR Research Space (South Africa)

    Van Dyk, E

    2008-11-01

    Full Text Available classifier with Gaussian features while using any quadratic decision boundary. Therefore, the analysis is not restricted to Naive Bayesian classifiers alone and can, for instance, be used to calculate the Bayes error performance. We compare the analytical...

  4. Automating the construction of scene classifiers for content-based video retrieval

    NARCIS (Netherlands)

    Khan, L.; Israël, Menno; Petrushin, V.A.; van den Broek, Egon; van der Putten, Peter

    2004-01-01

    This paper introduces a real time automatic scene classifier within content-based video retrieval. In our envisioned approach end users like documentalists, not image processing experts, build classifiers interactively, by simply indicating positive examples of a scene. Classification consists of a

  5. A unified classifier for robust face recognition based on combining multiple subspace algorithms

    Science.gov (United States)

    Ijaz Bajwa, Usama; Ahmad Taj, Imtiaz; Waqas Anwar, Muhammad

    2012-10-01

    Face recognition being the fastest growing biometric technology has expanded manifold in the last few years. Various new algorithms and commercial systems have been proposed and developed. However, none of the proposed or developed algorithm is a complete solution because it may work very well on one set of images with say illumination changes but may not work properly on another set of image variations like expression variations. This study is motivated by the fact that any single classifier cannot claim to show generally better performance against all facial image variations. To overcome this shortcoming and achieve generality, combining several classifiers using various strategies has been studied extensively also incorporating the question of suitability of any classifier for this task. The study is based on the outcome of a comprehensive comparative analysis conducted on a combination of six subspace extraction algorithms and four distance metrics on three facial databases. The analysis leads to the selection of the most suitable classifiers which performs better on one task or the other. These classifiers are then combined together onto an ensemble classifier by two different strategies of weighted sum and re-ranking. The results of the ensemble classifier show that these strategies can be effectively used to construct a single classifier that can successfully handle varying facial image conditions of illumination, aging and facial expressions.

  6. A convolutional neural network neutrino event classifier

    International Nuclear Information System (INIS)

    Aurisano, A.; Sousa, A.; Radovic, A.; Vahle, P.; Rocco, D.; Pawloski, G.; Himmel, A.; Niner, E.; Messier, M.D.; Psihas, F.

    2016-01-01

    Convolutional neural networks (CNNs) have been widely applied in the computer vision community to solve complex problems in image recognition and analysis. We describe an application of the CNN technology to the problem of identifying particle interactions in sampling calorimeters used commonly in high energy physics and high energy neutrino physics in particular. Following a discussion of the core concepts of CNNs and recent innovations in CNN architectures related to the field of deep learning, we outline a specific application to the NOvA neutrino detector. This algorithm, CVN (Convolutional Visual Network) identifies neutrino interactions based on their topology without the need for detailed reconstruction and outperforms algorithms currently in use by the NOvA collaboration.

  7. Optimal threshold estimation for binary classifiers using game theory.

    Science.gov (United States)

    Sanchez, Ignacio Enrique

    2016-01-01

    Many bioinformatics algorithms can be understood as binary classifiers. They are usually compared using the area under the receiver operating characteristic ( ROC ) curve. On the other hand, choosing the best threshold for practical use is a complex task, due to uncertain and context-dependent skews in the abundance of positives in nature and in the yields/costs for correct/incorrect classification. We argue that considering a classifier as a player in a zero-sum game allows us to use the minimax principle from game theory to determine the optimal operating point. The proposed classifier threshold corresponds to the intersection between the ROC curve and the descending diagonal in ROC space and yields a minimax accuracy of 1-FPR. Our proposal can be readily implemented in practice, and reveals that the empirical condition for threshold estimation of "specificity equals sensitivity" maximizes robustness against uncertainties in the abundance of positives in nature and classification costs.

  8. Comparison of artificial intelligence classifiers for SIP attack data

    Science.gov (United States)

    Safarik, Jakub; Slachta, Jiri

    2016-05-01

    Honeypot application is a source of valuable data about attacks on the network. We run several SIP honeypots in various computer networks, which are separated geographically and logically. Each honeypot runs on public IP address and uses standard SIP PBX ports. All information gathered via honeypot is periodically sent to the centralized server. This server classifies all attack data by neural network algorithm. The paper describes optimizations of a neural network classifier, which lower the classification error. The article contains the comparison of two neural network algorithm used for the classification of validation data. The first is the original implementation of the neural network described in recent work; the second neural network uses further optimizations like input normalization or cross-entropy cost function. We also use other implementations of neural networks and machine learning classification algorithms. The comparison test their capabilities on validation data to find the optimal classifier. The article result shows promise for further development of an accurate SIP attack classification engine.

  9. Least Square Support Vector Machine Classifier vs a Logistic Regression Classifier on the Recognition of Numeric Digits

    Directory of Open Access Journals (Sweden)

    Danilo A. López-Sarmiento

    2013-11-01

    Full Text Available In this paper is compared the performance of a multi-class least squares support vector machine (LSSVM mc versus a multi-class logistic regression classifier to problem of recognizing the numeric digits (0-9 handwritten. To develop the comparison was used a data set consisting of 5000 images of handwritten numeric digits (500 images for each number from 0-9, each image of 20 x 20 pixels. The inputs to each of the systems were vectors of 400 dimensions corresponding to each image (not done feature extraction. Both classifiers used OneVsAll strategy to enable multi-classification and a random cross-validation function for the process of minimizing the cost function. The metrics of comparison were precision and training time under the same computational conditions. Both techniques evaluated showed a precision above 95 %, with LS-SVM slightly more accurate. However the computational cost if we found a marked difference: LS-SVM training requires time 16.42 % less than that required by the logistic regression model based on the same low computational conditions.

  10. Gene-expression Classifier in Papillary Thyroid Carcinoma

    DEFF Research Database (Denmark)

    Londero, Stefano Christian; Jespersen, Marie Louise; Krogdahl, Annelise

    2016-01-01

    BACKGROUND: No reliable biomarker for metastatic potential in the risk stratification of papillary thyroid carcinoma exists. We aimed to develop a gene-expression classifier for metastatic potential. MATERIALS AND METHODS: Genome-wide expression analyses were used. Development cohort: freshly...

  11. Automatic denoising of functional MRI data: combining independent component analysis and hierarchical fusion of classifiers.

    Science.gov (United States)

    Salimi-Khorshidi, Gholamreza; Douaud, Gwenaëlle; Beckmann, Christian F; Glasser, Matthew F; Griffanti, Ludovica; Smith, Stephen M

    2014-04-15

    Many sources of fluctuation contribute to the fMRI signal, and this makes identifying the effects that are truly related to the underlying neuronal activity difficult. Independent component analysis (ICA) - one of the most widely used techniques for the exploratory analysis of fMRI data - has shown to be a powerful technique in identifying various sources of neuronally-related and artefactual fluctuation in fMRI data (both with the application of external stimuli and with the subject "at rest"). ICA decomposes fMRI data into patterns of activity (a set of spatial maps and their corresponding time series) that are statistically independent and add linearly to explain voxel-wise time series. Given the set of ICA components, if the components representing "signal" (brain activity) can be distinguished form the "noise" components (effects of motion, non-neuronal physiology, scanner artefacts and other nuisance sources), the latter can then be removed from the data, providing an effective cleanup of structured noise. Manual classification of components is labour intensive and requires expertise; hence, a fully automatic noise detection algorithm that can reliably detect various types of noise sources (in both task and resting fMRI) is desirable. In this paper, we introduce FIX ("FMRIB's ICA-based X-noiseifier"), which provides an automatic solution for denoising fMRI data via accurate classification of ICA components. For each ICA component FIX generates a large number of distinct spatial and temporal features, each describing a different aspect of the data (e.g., what proportion of temporal fluctuations are at high frequencies). The set of features is then fed into a multi-level classifier (built around several different classifiers). Once trained through the hand-classification of a sufficient number of training datasets, the classifier can then automatically classify new datasets. The noise components can then be subtracted from (or regressed out of) the original

  12. A History of Classified Activities at Oak Ridge National Laboratory

    Energy Technology Data Exchange (ETDEWEB)

    Quist, A.S.

    2001-01-30

    The facilities that became Oak Ridge National Laboratory (ORNL) were created in 1943 during the United States' super-secret World War II project to construct an atomic bomb (the Manhattan Project). During World War II and for several years thereafter, essentially all ORNL activities were classified. Now, in 2000, essentially all ORNL activities are unclassified. The major purpose of this report is to provide a brief history of ORNL's major classified activities from 1943 until the present (September 2000). This report is expected to be useful to the ORNL Classification Officer and to ORNL's Authorized Derivative Classifiers and Authorized Derivative Declassifiers in their classification review of ORNL documents, especially those documents that date from the 1940s and 1950s.

  13. Bridging the gap between sample collection and laboratory analysis: using dried blood spots to identify human exposure to chemical agents

    Science.gov (United States)

    Hamelin, Elizabeth I.; Blake, Thomas A.; Perez, Jonas W.; Crow, Brian S.; Shaner, Rebecca L.; Coleman, Rebecca M.; Johnson, Rudolph C.

    2016-05-01

    Public health response to large scale chemical emergencies presents logistical challenges for sample collection, transport, and analysis. Diagnostic methods used to identify and determine exposure to chemical warfare agents, toxins, and poisons traditionally involve blood collection by phlebotomists, cold transport of biomedical samples, and costly sample preparation techniques. Use of dried blood spots, which consist of dried blood on an FDA-approved substrate, can increase analyte stability, decrease infection hazard for those handling samples, greatly reduce the cost of shipping/storing samples by removing the need for refrigeration and cold chain transportation, and be self-prepared by potentially exposed individuals using a simple finger prick and blood spot compatible paper. Our laboratory has developed clinical assays to detect human exposures to nerve agents through the analysis of specific protein adducts and metabolites, for which a simple extraction from a dried blood spot is sufficient for removing matrix interferents and attaining sensitivities on par with traditional sampling methods. The use of dried blood spots can bridge the gap between the laboratory and the field allowing for large scale sample collection with minimal impact on hospital resources while maintaining sensitivity, specificity, traceability, and quality requirements for both clinical and forensic applications.

  14. Oblique decision trees using embedded support vector machines in classifier ensembles

    NARCIS (Netherlands)

    Menkovski, V.; Christou, I.; Efremidis, S.

    2008-01-01

    Classifier ensembles have emerged in recent years as a promising research area for boosting pattern recognition systems' performance. We present a new base classifier that utilizes oblique decision tree technology based on support vector machines for the construction of oblique (non-axis parallel)

  15. Classifying epileptic EEG signals with delay permutation entropy and Multi-Scale K-means.

    Science.gov (United States)

    Zhu, Guohun; Li, Yan; Wen, Peng Paul; Wang, Shuaifang

    2015-01-01

    Most epileptic EEG classification algorithms are supervised and require large training datasets, that hinder their use in real time applications. This chapter proposes an unsupervised Multi-Scale K-means (MSK-means) MSK-means algorithm to distinguish epileptic EEG signals and identify epileptic zones. The random initialization of the K-means algorithm can lead to wrong clusters. Based on the characteristics of EEGs, the MSK-means MSK-means algorithm initializes the coarse-scale centroid of a cluster with a suitable scale factor. In this chapter, the MSK-means algorithm is proved theoretically superior to the K-means algorithm on efficiency. In addition, three classifiers: the K-means, MSK-means MSK-means and support vector machine (SVM), are used to identify seizure and localize epileptogenic zone using delay permutation entropy features. The experimental results demonstrate that identifying seizure with the MSK-means algorithm and delay permutation entropy achieves 4. 7 % higher accuracy than that of K-means, and 0. 7 % higher accuracy than that of the SVM.

  16. Classifying adolescent attention-deficit/hyperactivity disorder (ADHD) based on functional and structural imaging.

    Science.gov (United States)

    Iannaccone, Reto; Hauser, Tobias U; Ball, Juliane; Brandeis, Daniel; Walitza, Susanne; Brem, Silvia

    2015-10-01

    Attention-deficit/hyperactivity disorder (ADHD) is a common disabling psychiatric disorder associated with consistent deficits in error processing, inhibition and regionally decreased grey matter volumes. The diagnosis is based on clinical presentation, interviews and questionnaires, which are to some degree subjective and would benefit from verification through biomarkers. Here, pattern recognition of multiple discriminative functional and structural brain patterns was applied to classify adolescents with ADHD and controls. Functional activation features in a Flanker/NoGo task probing error processing and inhibition along with structural magnetic resonance imaging data served to predict group membership using support vector machines (SVMs). The SVM pattern recognition algorithm correctly classified 77.78% of the subjects with a sensitivity and specificity of 77.78% based on error processing. Predictive regions for controls were mainly detected in core areas for error processing and attention such as the medial and dorsolateral frontal areas reflecting deficient processing in ADHD (Hart et al., in Hum Brain Mapp 35:3083-3094, 2014), and overlapped with decreased activations in patients in conventional group comparisons. Regions more predictive for ADHD patients were identified in the posterior cingulate, temporal and occipital cortex. Interestingly despite pronounced univariate group differences in inhibition-related activation and grey matter volumes the corresponding classifiers failed or only yielded a poor discrimination. The present study corroborates the potential of task-related brain activation for classification shown in previous studies. It remains to be clarified whether error processing, which performed best here, also contributes to the discrimination of useful dimensions and subtypes, different psychiatric disorders, and prediction of treatment success across studies and sites.

  17. 29 CFR 1926.407 - Hazardous (classified) locations.

    Science.gov (United States)

    2010-07-01

    ...) locations, unless modified by provisions of this section. (b) Electrical installations. Equipment, wiring..., DEPARTMENT OF LABOR (CONTINUED) SAFETY AND HEALTH REGULATIONS FOR CONSTRUCTION Electrical Installation Safety... electric equipment and wiring in locations which are classified depending on the properties of the...

  18. Microarray Meta-Analysis Identifies Acute Lung Injury Biomarkers in Donor Lungs That Predict Development of Primary Graft Failure in Recipients

    Science.gov (United States)

    Haitsma, Jack J.; Furmli, Suleiman; Masoom, Hussain; Liu, Mingyao; Imai, Yumiko; Slutsky, Arthur S.; Beyene, Joseph; Greenwood, Celia M. T.; dos Santos, Claudia

    2012-01-01

    injury" gene predictors that can classify lung injury samples and identify patients at risk for clinically relevant lung injury complications. PMID:23071521

  19. Using the MMPI-2-RF to discriminate psychometrically identified schizotypic college students from a matched comparison sample.

    Science.gov (United States)

    Hunter, Helen K; Bolinskey, P Kevin; Novi, Jonathan H; Hudak, Daniel V; James, Alison V; Myers, Kevin R; Schuder, Kelly M

    2014-01-01

    This study investigates the extent to which the Minnesota Multiphasic Personality Inventory-2 Restructured Form (MMPI-2-RF) profiles of 52 individuals making up a psychometrically identified schizotypes (SZT) sample could be successfully discriminated from the protocols of 52 individuals in a matched comparison (MC) sample. Replication analyses were performed with an additional 53 pairs of SZT and MC participants. Results showed significant differences in mean T-score values between these 2 groups across a variety of MMPI-2-RF scales. Results from discriminant function analyses indicate that schizotypy can be predicted effectively using 4 MMPI-2-RF scales and that this method of classification held up on replication. Additional results demonstrated that these MMPI-2-RF scales nominally outperformed MMPI-2 scales suggested by previous research as being indicative of schizophrenia liability. Directions for future research with the MMPI-2-RF are suggested.

  20. Classifier for gravitational-wave inspiral signals in nonideal single-detector data

    Science.gov (United States)

    Kapadia, S. J.; Dent, T.; Dal Canton, T.

    2017-11-01

    We describe a multivariate classifier for candidate events in a templated search for gravitational-wave (GW) inspiral signals from neutron-star-black-hole (NS-BH) binaries, in data from ground-based detectors where sensitivity is limited by non-Gaussian noise transients. The standard signal-to-noise ratio (SNR) and chi-squared test for inspiral searches use only properties of a single matched filter at the time of an event; instead, we propose a classifier using features derived from a bank of inspiral templates around the time of each event, and also from a search using approximate sine-Gaussian templates. The classifier thus extracts additional information from strain data to discriminate inspiral signals from noise transients. We evaluate a random forest classifier on a set of single-detector events obtained from realistic simulated advanced LIGO data, using simulated NS-BH signals added to the data. The new classifier detects a factor of 1.5-2 more signals at low false positive rates as compared to the standard "reweighted SNR" statistic, and does not require the chi-squared test to be computed. Conversely, if only the SNR and chi-squared values of single-detector events are available, random forest classification performs nearly identically to the reweighted SNR.

  1. Scoring and Classifying Examinees Using Measurement Decision Theory

    Directory of Open Access Journals (Sweden)

    Lawrence M. Rudner

    2009-04-01

    Full Text Available This paper describes and evaluates the use of measurement decision theory (MDT to classify examinees based on their item response patterns. The model has a simple framework that starts with the conditional probabilities of examinees in each category or mastery state responding correctly to each item. The presented evaluation investigates: (1 the classification accuracy of tests scored using decision theory; (2 the effectiveness of different sequential testing procedures; and (3 the number of items needed to make a classification. A large percentage of examinees can be classified accurately with very few items using decision theory. A Java Applet for self instruction and software for generating, calibrating and scoring MDT data are provided.

  2. Credit scoring using ensemble of various classifiers on reduced feature set

    Directory of Open Access Journals (Sweden)

    Dahiya Shashi

    2015-01-01

    Full Text Available Credit scoring methods are widely used for evaluating loan applications in financial and banking institutions. Credit score identifies if applicant customers belong to good risk applicant group or a bad risk applicant group. These decisions are based on the demographic data of the customers, overall business by the customer with bank, and loan payment history of the loan applicants. The advantages of using credit scoring models include reducing the cost of credit analysis, enabling faster credit decisions and diminishing possible risk. Many statistical and machine learning techniques such as Logistic Regression, Support Vector Machines, Neural Networks and Decision tree algorithms have been used independently and as hybrid credit scoring models. This paper proposes an ensemble based technique combining seven individual models to increase the classification accuracy. Feature selection has also been used for selecting important attributes for classification. Cross classification was conducted using three data partitions. German credit dataset having 1000 instances and 21 attributes is used in the present study. The results of the experiments revealed that the ensemble model yielded a very good accuracy when compared to individual models. In all three different partitions, the ensemble model was able to classify more than 80% of the loan customers as good creditors correctly. Also, for 70:30 partition there was a good impact of feature selection on the accuracy of classifiers. The results were improved for almost all individual models including the ensemble model.

  3. Feature selection for Bayesian network classifiers using the MDL-FS score

    NARCIS (Netherlands)

    Drugan, Madalina M.; Wiering, Marco A.

    When constructing a Bayesian network classifier from data, the more or less redundant features included in a dataset may bias the classifier and as a consequence may result in a relatively poor classification accuracy. In this paper, we study the problem of selecting appropriate subsets of features

  4. Identifying and Evaluating External Validity Evidence for Passing Scores

    Science.gov (United States)

    Davis-Becker, Susan L.; Buckendahl, Chad W.

    2013-01-01

    A critical component of the standard setting process is collecting evidence to evaluate the recommended cut scores and their use for making decisions and classifying students based on test performance. Kane (1994, 2001) proposed a framework by which practitioners can identify and evaluate evidence of the results of the standard setting from (1)…

  5. Comparing cosmic web classifiers using information theory

    Energy Technology Data Exchange (ETDEWEB)

    Leclercq, Florent [Institute of Cosmology and Gravitation (ICG), University of Portsmouth, Dennis Sciama Building, Burnaby Road, Portsmouth PO1 3FX (United Kingdom); Lavaux, Guilhem; Wandelt, Benjamin [Institut d' Astrophysique de Paris (IAP), UMR 7095, CNRS – UPMC Université Paris 6, Sorbonne Universités, 98bis boulevard Arago, F-75014 Paris (France); Jasche, Jens, E-mail: florent.leclercq@polytechnique.org, E-mail: lavaux@iap.fr, E-mail: j.jasche@tum.de, E-mail: wandelt@iap.fr [Excellence Cluster Universe, Technische Universität München, Boltzmannstrasse 2, D-85748 Garching (Germany)

    2016-08-01

    We introduce a decision scheme for optimally choosing a classifier, which segments the cosmic web into different structure types (voids, sheets, filaments, and clusters). Our framework, based on information theory, accounts for the design aims of different classes of possible applications: (i) parameter inference, (ii) model selection, and (iii) prediction of new observations. As an illustration, we use cosmographic maps of web-types in the Sloan Digital Sky Survey to assess the relative performance of the classifiers T-WEB, DIVA and ORIGAMI for: (i) analyzing the morphology of the cosmic web, (ii) discriminating dark energy models, and (iii) predicting galaxy colors. Our study substantiates a data-supported connection between cosmic web analysis and information theory, and paves the path towards principled design of analysis procedures for the next generation of galaxy surveys. We have made the cosmic web maps, galaxy catalog, and analysis scripts used in this work publicly available.

  6. Comparing cosmic web classifiers using information theory

    International Nuclear Information System (INIS)

    Leclercq, Florent; Lavaux, Guilhem; Wandelt, Benjamin; Jasche, Jens

    2016-01-01

    We introduce a decision scheme for optimally choosing a classifier, which segments the cosmic web into different structure types (voids, sheets, filaments, and clusters). Our framework, based on information theory, accounts for the design aims of different classes of possible applications: (i) parameter inference, (ii) model selection, and (iii) prediction of new observations. As an illustration, we use cosmographic maps of web-types in the Sloan Digital Sky Survey to assess the relative performance of the classifiers T-WEB, DIVA and ORIGAMI for: (i) analyzing the morphology of the cosmic web, (ii) discriminating dark energy models, and (iii) predicting galaxy colors. Our study substantiates a data-supported connection between cosmic web analysis and information theory, and paves the path towards principled design of analysis procedures for the next generation of galaxy surveys. We have made the cosmic web maps, galaxy catalog, and analysis scripts used in this work publicly available.

  7. Detection of Fundus Lesions Using Classifier Selection

    Science.gov (United States)

    Nagayoshi, Hiroto; Hiramatsu, Yoshitaka; Sako, Hiroshi; Himaga, Mitsutoshi; Kato, Satoshi

    A system for detecting fundus lesions caused by diabetic retinopathy from fundus images is being developed. The system can screen the images in advance in order to reduce the inspection workload on doctors. One of the difficulties that must be addressed in completing this system is how to remove false positives (which tend to arise near blood vessels) without decreasing the detection rate of lesions in other areas. To overcome this difficulty, we developed classifier selection according to the position of a candidate lesion, and we introduced new features that can distinguish true lesions from false positives. A system incorporating classifier selection and these new features was tested in experiments using 55 fundus images with some lesions and 223 images without lesions. The results of the experiments confirm the effectiveness of the proposed system, namely, degrees of sensitivity and specificity of 98% and 81%, respectively.

  8. An evaluation of sampling and full enumeration strategies for Fisher Jenks classification in big data settings

    Science.gov (United States)

    Rey, Sergio J.; Stephens, Philip A.; Laura, Jason R.

    2017-01-01

    Large data contexts present a number of challenges to optimal choropleth map classifiers. Application of optimal classifiers to a sample of the attribute space is one proposed solution. The properties of alternative sampling-based classification methods are examined through a series of Monte Carlo simulations. The impacts of spatial autocorrelation, number of desired classes, and form of sampling are shown to have significant impacts on the accuracy of map classifications. Tradeoffs between improved speed of the sampling approaches and loss of accuracy are also considered. The results suggest the possibility of guiding the choice of classification scheme as a function of the properties of large data sets.

  9. Beyond “implementation strategies”: classifying the full range of strategies used in implementation science and practice

    Directory of Open Access Journals (Sweden)

    Jennifer Leeman

    2017-11-01

    Full Text Available Abstract Background Strategies are central to the National Institutes of Health’s definition of implementation research as “the study of strategies to integrate evidence-based interventions into specific settings.” Multiple scholars have proposed lists of the strategies used in implementation research and practice, which they increasingly are classifying under the single term “implementation strategies.” We contend that classifying all strategies under a single term leads to confusion, impedes synthesis across studies, and limits advancement of the full range of strategies of importance to implementation. To address this concern, we offer a system for classifying implementation strategies that builds on Proctor and colleagues’ (2013 reporting guidelines, which recommend that authors not only name and define their implementation strategies but also specify who enacted the strategy (i.e., the actor and the level and determinants that were targeted (i.e., the action targets. Main body We build on Wandersman and colleagues’ Interactive Systems Framework to distinguish strategies based on whether they are enacted by actors functioning as part of a Delivery, Support, or Synthesis and Translation System. We build on Damschroder and colleague’s Consolidated Framework for Implementation Research to distinguish the levels that strategies target (intervention, inner setting, outer setting, individual, and process. We then draw on numerous resources to identify determinants, which are conceptualized as modifiable factors that prevent or enable the adoption and implementation of evidence-based interventions. Identifying actors and targets resulted in five conceptually distinct classes of implementation strategies: dissemination, implementation process, integration, capacity-building, and scale-up. In our descriptions of each class, we identify the level of the Interactive System Framework at which the strategy is enacted (actors, level and

  10. Analysis of Web Spam for Non-English Content: Toward More Effective Language-Based Classifiers.

    Directory of Open Access Journals (Sweden)

    Mansour Alsaleh

    Full Text Available Web spammers aim to obtain higher ranks for their web pages by including spam contents that deceive search engines in order to include their pages in search results even when they are not related to the search terms. Search engines continue to develop new web spam detection mechanisms, but spammers also aim to improve their tools to evade detection. In this study, we first explore the effect of the page language on spam detection features and we demonstrate how the best set of detection features varies according to the page language. We also study the performance of Google Penguin, a newly developed anti-web spamming technique for their search engine. Using spam pages in Arabic as a case study, we show that unlike similar English pages, Google anti-spamming techniques are ineffective against a high proportion of Arabic spam pages. We then explore multiple detection features for spam pages to identify an appropriate set of features that yields a high detection accuracy compared with the integrated Google Penguin technique. In order to build and evaluate our classifier, as well as to help researchers to conduct consistent measurement studies, we collected and manually labeled a corpus of Arabic web pages, including both benign and spam pages. Furthermore, we developed a browser plug-in that utilizes our classifier to warn users about spam pages after clicking on a URL and by filtering out search engine results. Using Google Penguin as a benchmark, we provide an illustrative example to show that language-based web spam classifiers are more effective for capturing spam contents.

  11. Empirical study of classification process for two-stage turbo air classifier in series

    Science.gov (United States)

    Yu, Yuan; Liu, Jiaxiang; Li, Gang

    2013-05-01

    The suitable process parameters for a two-stage turbo air classifier are important for obtaining the ultrafine powder that has a narrow particle-size distribution, however little has been published internationally on the classification process for the two-stage turbo air classifier in series. The influence of the process parameters of a two-stage turbo air classifier in series on classification performance is empirically studied by using aluminum oxide powders as the experimental material. The experimental results show the following: 1) When the rotor cage rotary speed of the first-stage classifier is increased from 2 300 r/min to 2 500 r/min with a constant rotor cage rotary speed of the second-stage classifier, classification precision is increased from 0.64 to 0.67. However, in this case, the final ultrafine powder yield is decreased from 79% to 74%, which means the classification precision and the final ultrafine powder yield can be regulated through adjusting the rotor cage rotary speed of the first-stage classifier. 2) When the rotor cage rotary speed of the second-stage classifier is increased from 2 500 r/min to 3 100 r/min with a constant rotor cage rotary speed of the first-stage classifier, the cut size is decreased from 13.16 μm to 8.76 μm, which means the cut size of the ultrafine powder can be regulated through adjusting the rotor cage rotary speed of the second-stage classifier. 3) When the feeding speed is increased from 35 kg/h to 50 kg/h, the "fish-hook" effect is strengthened, which makes the ultrafine powder yield decrease. 4) To weaken the "fish-hook" effect, the equalization of the two-stage wind speeds or the combination of a high first-stage wind speed with a low second-stage wind speed should be selected. This empirical study provides a criterion of process parameter configurations for a two-stage or multi-stage classifier in series, which offers a theoretical basis for practical production.

  12. Comparative transcriptional profiling of the axolotl limb identifies a tripartite regeneration-specific gene program.

    Directory of Open Access Journals (Sweden)

    Dunja Knapp

    Full Text Available Understanding how the limb blastema is established after the initial wound healing response is an important aspect of regeneration research. Here we performed parallel expression profile time courses of healing lateral wounds versus amputated limbs in axolotl. This comparison between wound healing and regeneration allowed us to identify amputation-specific genes. By clustering the expression profiles of these samples, we could detect three distinguishable phases of gene expression - early wound healing followed by a transition-phase leading to establishment of the limb development program, which correspond to the three phases of limb regeneration that had been defined by morphological criteria. By focusing on the transition-phase, we identified 93 strictly amputation-associated genes many of which are implicated in oxidative-stress response, chromatin modification, epithelial development or limb development. We further classified the genes based on whether they were or were not significantly expressed in the developing limb bud. The specific localization of 53 selected candidates within the blastema was investigated by in situ hybridization. In summary, we identified a set of genes that are expressed specifically during regeneration and are therefore, likely candidates for the regulation of blastema formation.

  13. Case base classification on digital mammograms: improving the performance of case base classifier

    Science.gov (United States)

    Raman, Valliappan; Then, H. H.; Sumari, Putra; Venkatesa Mohan, N.

    2011-10-01

    Breast cancer continues to be a significant public health problem in the world. Early detection is the key for improving breast cancer prognosis. The aim of the research presented here is in twofold. First stage of research involves machine learning techniques, which segments and extracts features from the mass of digital mammograms. Second level is on problem solving approach which includes classification of mass by performance based case base classifier. In this paper we build a case-based Classifier in order to diagnose mammographic images. We explain different methods and behaviors that have been added to the classifier to improve the performance of the classifier. Currently the initial Performance base Classifier with Bagging is proposed in the paper and it's been implemented and it shows an improvement in specificity and sensitivity.

  14. A fast learning method for large scale and multi-class samples of SVM

    Science.gov (United States)

    Fan, Yu; Guo, Huiming

    2017-06-01

    A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.

  15. Extending cluster Lot Quality Assurance Sampling designs for surveillance programs

    OpenAIRE

    Hund, Lauren; Pagano, Marcello

    2014-01-01

    Lot quality assurance sampling (LQAS) has a long history of applications in industrial quality control. LQAS is frequently used for rapid surveillance in global health settings, with areas classified as poor or acceptable performance based on the binary classification of an indicator. Historically, LQAS surveys have relied on simple random samples from the population; however, implementing two-stage cluster designs for surveillance sampling is often more cost-effective than ...

  16. Variants of the Borda count method for combining ranked classifier hypotheses

    NARCIS (Netherlands)

    van Erp, Merijn; Schomaker, Lambert; Schomaker, Lambert; Vuurpijl, Louis

    2000-01-01

    The Borda count is a simple yet effective method of combining rankings. In pattern recognition, classifiers are often able to return a ranked set of results. Several experiments have been conducted to test the ability of the Borda count and two variant methods to combine these ranked classifier

  17. Should OCD be classified as an anxiety disorder in DSM-V?

    NARCIS (Netherlands)

    Stein, Dan J.; Fineberg, Naomi A.; Bienvenu, O. Joseph; Denys, Damiaan; Lochner, Christine; Nestadt, Gerald; Leckman, James F.; Rauch, Scott L.; Phillips, Katharine A.

    2010-01-01

    In DSM-III, DSM-III-R, and DSM-IV, obsessive-compulsive disorder (OCD) was classified as an anxiety disorder. In ICD-10, OCD is classified separately from the anxiety disorders, although within the same larger category as anxiety disorders (as one of the "neurotic, stress-related, and somatoform

  18. 29 CFR 1910.307 - Hazardous (classified) locations.

    Science.gov (United States)

    2010-07-01

    ... equipment at the location. (c) Electrical installations. Equipment, wiring methods, and installations of... covers the requirements for electric equipment and wiring in locations that are classified depending on... provisions of this section. (4) Division and zone classification. In Class I locations, an installation must...

  19. Parameterization of a fuzzy classifier for the diagnosis of an industrial process

    International Nuclear Information System (INIS)

    Toscano, R.; Lyonnet, P.

    2002-01-01

    The aim of this paper is to present a classifier based on a fuzzy inference system. For this classifier, we propose a parameterization method, which is not necessarily based on an iterative training. This approach can be seen as a pre-parameterization, which allows the determination of the rules base and the parameters of the membership functions. We also present a continuous and derivable version of the previous classifier and suggest an iterative learning algorithm based on a gradient method. An example using the learning basis IRIS, which is a benchmark for classification problems, is presented showing the performances of this classifier. Finally this classifier is applied to the diagnosis of a DC motor showing the utility of this method. However in many cases the total knowledge necessary to the synthesis of the fuzzy diagnosis system (FDS) is not, in general, directly available. It must be extracted from an often-considerable mass of information. For this reason, a general methodology for the design of a FDS is presented and illustrated on a non-linear plant

  20. An SVM classifier to separate false signals from microcalcifications in digital mammograms

    Energy Technology Data Exchange (ETDEWEB)

    Bazzani, Armando; Bollini, Dante; Brancaccio, Rosa; Campanini, Renato; Riccardi, Alessandro; Romani, Davide [Department of Physics, University of Bologna (Italy); INFN, Bologna (Italy); Lanconelli, Nico [Department of Physics, University of Bologna, and INFN, Bologna (Italy). E-mail: nico.lanconelli@bo.infn.it; Bevilacqua, Alessandro [Department of Electronics, Computer Science and Systems, University of Bologna, and INFN, Bologna (Italy)

    2001-06-01

    In this paper we investigate the feasibility of using an SVM (support vector machine) classifier in our automatic system for the detection of clustered microcalcifications in digital mammograms. SVM is a technique for pattern recognition which relies on the statistical learning theory. It minimizes a function of two terms: the number of misclassified vectors of the training set and a term regarding the generalization classifier capability. We compare the SVM classifier with an MLP (multi-layer perceptron) in the false-positive reduction phase of our detection scheme: a detected signal is considered either microcalcification or false signal, according to the value of a set of its features. The SVM classifier gets slightly better results than the MLP one (Az value of 0.963 against 0.958) in the presence of a high number of training data; the improvement becomes much more evident (Az value of 0.952 against 0.918) in training sets of reduced size. Finally, the setting of the SVM classifier is much easier than the MLP one. (author)

  1. A novel approach for fire recognition using hybrid features and manifold learning-based classifier

    Science.gov (United States)

    Zhu, Rong; Hu, Xueying; Tang, Jiajun; Hu, Sheng

    2018-03-01

    Although image/video based fire recognition has received growing attention, an efficient and robust fire detection strategy is rarely explored. In this paper, we propose a novel approach to automatically identify the flame or smoke regions in an image. It is composed to three stages: (1) a block processing is applied to divide an image into several nonoverlapping image blocks, and these image blocks are identified as suspicious fire regions or not by using two color models and a color histogram-based similarity matching method in the HSV color space, (2) considering that compared to other information, the flame and smoke regions have significant visual characteristics, so that two kinds of image features are extracted for fire recognition, where local features are obtained based on the Scale Invariant Feature Transform (SIFT) descriptor and the Bags of Keypoints (BOK) technique, and texture features are extracted based on the Gray Level Co-occurrence Matrices (GLCM) and the Wavelet-based Analysis (WA) methods, and (3) a manifold learning-based classifier is constructed based on two image manifolds, which is designed via an improve Globular Neighborhood Locally Linear Embedding (GNLLE) algorithm, and the extracted hybrid features are used as input feature vectors to train the classifier, which is used to make decision for fire images or non fire images. Experiments and comparative analyses with four approaches are conducted on the collected image sets. The results show that the proposed approach is superior to the other ones in detecting fire and achieving a high recognition accuracy and a low error rate.

  2. Learning for VMM + WTA Embedded Classifiers

    Science.gov (United States)

    2016-03-31

    Learning for VMM + WTA Embedded Classifiers Jennifer Hasler and Sahil Shah Electrical and Computer Engineering Georgia Institute of Technology...enabling correct classification of each novel acoustic signal (generator, idle car, and idle truck ). The classification structure requires, after...measured on our SoC FPAA IC. The test input is composed of signals from urban environment for 3 objects (generator, idle car, and idle truck

  3. Reduction of Staphylococcus Spp. in jerked beef samples after irradiation with Co-60

    Energy Technology Data Exchange (ETDEWEB)

    Silva, Marcio de Albuquerque, E-mail: marcioalbuquerquesilva@gmail.com [Instituto Oswaldo Cruz (FIOCRUZ), Rio de Janeiro, RJ (Brazil). Laboratorio de Genomica Funcional e Bioinformatica; Costa, Maria Claudia V.Vicalvi; Junior, Carlos Eduardo de O.C.; Solidonio, Evelyne G.; Sena, Kesia Xisto F.R. de; Colaco, Waldeciro, E-mail: claudiavicalvi@hotmail.com, E-mail: oliveiracosta@msn.com, E-mail: evelyne_solidonio@yahoo.com.br, E-mail: wcolaco@ufpe.com.br, E-mail: k.xisto@gmail.com [Universidade Federal de Pernambuco (UFPE), Recife (Brazil)

    2015-07-01

    This work aimed to isolate and identify Staphylococcus genus microorganisms in jerked beef before and after radiation doses between 2, 4 and 6kGy. Jerked beef samples were obtained on a Recife-PE supermarket network and divided into three lots. Under sterile conditions, the meat was cut and weighed. Sub-samples were assigned to the control group and to the irradiation source of cobalt-60 on doses of 2, 4 and 6kGy. The sub-samples were added to an Erlenmeyer flask with 225 ml of sterile water and stirred for 15 minutes creating wash water, and another part was added to an Erlenmeyer flask with 225 ml of sterile distilled water that was at rest at room temperature for 14 hours there is the formation of a water desalting. 1μL aliquots of this water was removed and sown by depletion in sheep blood agar medium and incubated at 35 °C for 24 hours for analysis of bacterial growth. After Gram staining colonies classified as Gram positive arranged in bunches were subjected to biochemical tests for identification. Were isolated and identified 94 strains of the genus Staphylococcus being 72 (76%) of the control group and 22 (24%) after irradiation. Of the 22 isolates, after irradiation, with 2 kGy 7 species were identified as Staphylococcus succinus, Staphylococcus carnosus sub. carnosus, Staphylococcus fleurettii, Staphylococcus saprophyticus sub. saprophyticus, Staphylococcus simulans, Staphylococcus auricularis all coagulase negative and coagulase positive Staphylococcus aureus sub. anaerobius. At a dose of 4kGy were identified six species: Staphylococcus epidermidis, Staphylococcus xylosus, Staphylococcus intermedius, Staphylococcus warneri, Staphylococcus fleurettii, Staphylococcus aureus sub. anaerobius. Staphylococcus simulans, Staphylococcus saprophyticus sub. saprophyticus, and Staphylococcus lugdunensis were isolated and identified after a dose of 6 kGy. Was observed that irradiation significantly reduced microbial load, and increased dose decreased the number of

  4. Reduction of Staphylococcus Spp. in jerked beef samples after irradiation with Co-60

    International Nuclear Information System (INIS)

    Silva, Marcio de Albuquerque

    2015-01-01

    This work aimed to isolate and identify Staphylococcus genus microorganisms in jerked beef before and after radiation doses between 2, 4 and 6kGy. Jerked beef samples were obtained on a Recife-PE supermarket network and divided into three lots. Under sterile conditions, the meat was cut and weighed. Sub-samples were assigned to the control group and to the irradiation source of cobalt-60 on doses of 2, 4 and 6kGy. The sub-samples were added to an Erlenmeyer flask with 225 ml of sterile water and stirred for 15 minutes creating wash water, and another part was added to an Erlenmeyer flask with 225 ml of sterile distilled water that was at rest at room temperature for 14 hours there is the formation of a water desalting. 1μL aliquots of this water was removed and sown by depletion in sheep blood agar medium and incubated at 35 °C for 24 hours for analysis of bacterial growth. After Gram staining colonies classified as Gram positive arranged in bunches were subjected to biochemical tests for identification. Were isolated and identified 94 strains of the genus Staphylococcus being 72 (76%) of the control group and 22 (24%) after irradiation. Of the 22 isolates, after irradiation, with 2 kGy 7 species were identified as Staphylococcus succinus, Staphylococcus carnosus sub. carnosus, Staphylococcus fleurettii, Staphylococcus saprophyticus sub. saprophyticus, Staphylococcus simulans, Staphylococcus auricularis all coagulase negative and coagulase positive Staphylococcus aureus sub. anaerobius. At a dose of 4kGy were identified six species: Staphylococcus epidermidis, Staphylococcus xylosus, Staphylococcus intermedius, Staphylococcus warneri, Staphylococcus fleurettii, Staphylococcus aureus sub. anaerobius. Staphylococcus simulans, Staphylococcus saprophyticus sub. saprophyticus, and Staphylococcus lugdunensis were isolated and identified after a dose of 6 kGy. Was observed that irradiation significantly reduced microbial load, and increased dose decreased the number of

  5. Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition

    International Nuclear Information System (INIS)

    Shen Hongbin; Chou Kuochen

    2005-01-01

    The nucleus is the brain of eukaryotic cells that guides the life processes of the cell by issuing key instructions. For in-depth understanding of the biochemical process of the nucleus, the knowledge of localization of nuclear proteins is very important. With the avalanche of protein sequences generated in the post-genomic era, it is highly desired to develop an automated method for fast annotating the subnuclear locations for numerous newly found nuclear protein sequences so as to be able to timely utilize them for basic research and drug discovery. In view of this, a novel approach is developed for predicting the protein subnuclear location. It is featured by introducing a powerful classifier, the optimized evidence-theoretic K-nearest classifier, and using the pseudo amino acid composition [K.C. Chou, PROTEINS: Structure, Function, and Genetics, 43 (2001) 246], which can incorporate a considerable amount of sequence-order effects, to represent protein samples. As a demonstration, identifications were performed for 370 nuclear proteins among the following 9 subnuclear locations: (1) Cajal body, (2) chromatin, (3) heterochromatin, (4) nuclear diffuse, (5) nuclear pore, (6) nuclear speckle, (7) nucleolus, (8) PcG body, and (9) PML body. The overall success rates thus obtained by both the re-substitution test and jackknife cross-validation test are significantly higher than those by existing classifiers on the same working dataset. It is anticipated that the powerful approach may also become a useful high throughput vehicle to bridge the huge gap occurring in the post-genomic era between the number of gene sequences in databases and the number of gene products that have been functionally characterized. The OET-KNN classifier will be available at www.pami.sjtu.edu.cn/people/hbshen

  6. 75 FR 707 - Classified National Security Information

    Science.gov (United States)

    2010-01-05

    ... classified at one of the following three levels: (1) ``Top Secret'' shall be applied to information, the... exercise this authority. (2) ``Top Secret'' original classification authority may be delegated only by the... official has been delegated ``Top Secret'' original classification authority by the agency head. (4) Each...

  7. A distributed approach for optimizing cascaded classifier topologies in real-time stream mining systems.

    Science.gov (United States)

    Foo, Brian; van der Schaar, Mihaela

    2010-11-01

    In this paper, we discuss distributed optimization techniques for configuring classifiers in a real-time, informationally-distributed stream mining system. Due to the large volume of streaming data, stream mining systems must often cope with overload, which can lead to poor performance and intolerable processing delay for real-time applications. Furthermore, optimizing over an entire system of classifiers is a difficult task since changing the filtering process at one classifier can impact both the feature values of data arriving at classifiers further downstream and thus, the classification performance achieved by an ensemble of classifiers, as well as the end-to-end processing delay. To address this problem, this paper makes three main contributions: 1) Based on classification and queuing theoretic models, we propose a utility metric that captures both the performance and the delay of a binary filtering classifier system. 2) We introduce a low-complexity framework for estimating the system utility by observing, estimating, and/or exchanging parameters between the inter-related classifiers deployed across the system. 3) We provide distributed algorithms to reconfigure the system, and analyze the algorithms based on their convergence properties, optimality, information exchange overhead, and rate of adaptation to non-stationary data sources. We provide results using different video classifier systems.

  8. The three-dimensional origin of the classifying algebra

    International Nuclear Information System (INIS)

    Fuchs, Juergen; Schweigert, Christoph; Stigner, Carl

    2010-01-01

    It is known that reflection coefficients for bulk fields of a rational conformal field theory in the presence of an elementary boundary condition can be obtained as representation matrices of irreducible representations of the classifying algebra, a semisimple commutative associative complex algebra. We show how this algebra arises naturally from the three-dimensional geometry of factorization of correlators of bulk fields on the disk. This allows us to derive explicit expressions for the structure constants of the classifying algebra as invariants of ribbon graphs in the three-manifold S 2 xS 1 . Our result unravels a precise relation between intertwiners of the action of the mapping class group on spaces of conformal blocks and boundary conditions in rational conformal field theories.

  9. Prediction of small molecule binding property of protein domains with Bayesian classifiers based on Markov chains.

    Science.gov (United States)

    Bulashevska, Alla; Stein, Martin; Jackson, David; Eils, Roland

    2009-12-01

    Accurate computational methods that can help to predict biological function of a protein from its sequence are of great interest to research biologists and pharmaceutical companies. One approach to assume the function of proteins is to predict the interactions between proteins and other molecules. In this work, we propose a machine learning method that uses a primary sequence of a domain to predict its propensity for interaction with small molecules. By curating the Pfam database with respect to the small molecule binding ability of its component domains, we have constructed a dataset of small molecule binding and non-binding domains. This dataset was then used as training set to learn a Bayesian classifier, which should distinguish members of each class. The domain sequences of both classes are modelled with Markov chains. In a Jack-knife test, our classification procedure achieved the predictive accuracies of 77.2% and 66.7% for binding and non-binding classes respectively. We demonstrate the applicability of our classifier by using it to identify previously unknown small molecule binding domains. Our predictions are available as supplementary material and can provide very useful information to drug discovery specialists. Given the ubiquitous and essential role small molecules play in biological processes, our method is important for identifying pharmaceutically relevant components of complete proteomes. The software is available from the author upon request.

  10. Can the dissociative PTSD subtype be identified across two distinct trauma samples meeting caseness for PTSD?

    Science.gov (United States)

    Hansen, Maj; Műllerová, Jana; Elklit, Ask; Armour, Cherie

    2016-08-01

    For over a century, the occurrence of dissociative symptoms in connection to traumatic exposure has been acknowledged in the scientific literature. Recently, the importance of dissociation has also been recognized in the long-term traumatic response within the DSM-5 nomenclature. Several studies have confirmed the existence of the dissociative posttraumatic stress disorder (PTSD) subtype. However, there is a lack of studies investigating latent profiles of PTSD solely in victims with PTSD. This study investigates the possible presence of PTSD subtypes using latent class analysis (LCA) across two distinct trauma samples meeting caseness for DSM-5 PTSD based on self-reports (N = 787). Moreover, we assessed if a number of risk factors resulted in an increased probability of membership in a dissociative compared with a non-dissociative PTSD class. The results of LCA revealed a two-class solution with two highly symptomatic classes: a dissociative class and a non-dissociative class across both samples. Increased emotion-focused coping increased the probability of individuals being grouped into the dissociative class across both samples. Social support reduced the probability of individuals being grouped into the dissociative class but only in the victims of motor vehicle accidents (MVAs) suffering from whiplash. The results are discussed in light of their clinical implications and suggest that the dissociative subtype can be identified in victims of incest and victims of MVA suffering from whiplash meeting caseness for DSM-5 PTSD.

  11. Hyperspectral image classifier based on beach spectral feature

    International Nuclear Information System (INIS)

    Liang, Zhang; Lianru, Gao; Bing, Zhang

    2014-01-01

    The seashore, especially coral bank, is sensitive to human activities and environmental changes. A multispectral image, with coarse spectral resolution, is inadaptable for identify subtle spectral distinctions between various beaches. To the contrary, hyperspectral image with narrow and consecutive channels increases our capability to retrieve minor spectral features which is suit for identification and classification of surface materials on the shore. Herein, this paper used airborne hyperspectral data, in addition to ground spectral data to study the beaches in Qingdao. The image data first went through image pretreatment to deal with the disturbance of noise, radiation inconsistence and distortion. In succession, the reflection spectrum, the derivative spectrum and the spectral absorption features of the beach surface were inspected in search of diagnostic features. Hence, spectra indices specific for the unique environment of seashore were developed. According to expert decisions based on image spectrums, the beaches are ultimately classified into sand beach, rock beach, vegetation beach, mud beach, bare land and water. In situ surveying reflection spectrum from GER1500 field spectrometer validated the classification production. In conclusion, the classification approach under expert decision based on feature spectrum is proved to be feasible for beaches

  12. A deep learning method for classifying mammographic breast density categories.

    Science.gov (United States)

    Mohamed, Aly A; Berg, Wendie A; Peng, Hong; Luo, Yahong; Jankowitz, Rachel C; Wu, Shandong

    2018-01-01

    Mammographic breast density is an established risk marker for breast cancer and is visually assessed by radiologists in routine mammogram image reading, using four qualitative Breast Imaging and Reporting Data System (BI-RADS) breast density categories. It is particularly difficult for radiologists to consistently distinguish the two most common and most variably assigned BI-RADS categories, i.e., "scattered density" and "heterogeneously dense". The aim of this work was to investigate a deep learning-based breast density classifier to consistently distinguish these two categories, aiming at providing a potential computerized tool to assist radiologists in assigning a BI-RADS category in current clinical workflow. In this study, we constructed a convolutional neural network (CNN)-based model coupled with a large (i.e., 22,000 images) digital mammogram imaging dataset to evaluate the classification performance between the two aforementioned breast density categories. All images were collected from a cohort of 1,427 women who underwent standard digital mammography screening from 2005 to 2016 at our institution. The truths of the density categories were based on standard clinical assessment made by board-certified breast imaging radiologists. Effects of direct training from scratch solely using digital mammogram images and transfer learning of a pretrained model on a large nonmedical imaging dataset were evaluated for the specific task of breast density classification. In order to measure the classification performance, the CNN classifier was also tested on a refined version of the mammogram image dataset by removing some potentially inaccurately labeled images. Receiver operating characteristic (ROC) curves and the area under the curve (AUC) were used to measure the accuracy of the classifier. The AUC was 0.9421 when the CNN-model was trained from scratch on our own mammogram images, and the accuracy increased gradually along with an increased size of training samples

  13. Generic Learning-Based Ensemble Framework for Small Sample Size Face Recognition in Multi-Camera Networks.

    Science.gov (United States)

    Zhang, Cuicui; Liang, Xuefeng; Matsuyama, Takashi

    2014-12-08

    Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the "small sample size" (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0-1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system.

  14. Generic Learning-Based Ensemble Framework for Small Sample Size Face Recognition in Multi-Camera Networks

    Directory of Open Access Journals (Sweden)

    Cuicui Zhang

    2014-12-01

    Full Text Available Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the “small sample size” (SSS problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1 how to define diverse base classifiers from the small data; (2 how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0–1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system.

  15. A support vector machine and a random forest classifier indicates a 15-miRNA set related to osteosarcoma recurrence

    Directory of Open Access Journals (Sweden)

    He Y

    2018-01-01

    Full Text Available Yunfei He,1,2,* Jun Ma,1,* An Wang,1,3,* Weiheng Wang,1 Shengchang Luo,1 Yaoming Liu,2 Xiaojian Ye1 1Department of Orthopaedics, Changzheng Hospital Affiliated with Second Military Medical University, Shanghai, 2Department of Orthopaedics, Lanzhou General Hospital of Lanzhou Military Command Region, Lanzhou, 3Department of Orthopaedics, Shanghai Armed Police Force Hospital, Shanghai, People’s Republic of China *These authors contributed equally to this work Background: Osteosarcoma, which originates in the mesenchymal tissue, is the prevalent primary solid malignancy of the bone. It is of great importance to explore the mechanisms of metastasis and recurrence, which are two primary reasons accounting for the high death rate in osteosarcoma. Data and methods: Three miRNA expression profiles related to osteosarcoma were downloaded from GEO DataSets. Differentially expressed miRNAs (DEmiRs were screened using MetaDE.ES of the MetaDE package. A support vector machine (SVM classifier was constructed using optimal miRNAs, and its prediction efficiency for recurrence was detected in independent datasets. Finally, a co-expression network was constructed based on the DEmiRs and their target genes. Results: In total, 78 significantly DEmiRs were screened. The SVM classifier constructed by 15 miRNAs could accurately classify 58 samples in 65 samples (89.2% in the GSE39040 database, which was validated in another two databases, GSE39052 (84.62%, 22/26 and GSE79181 (91.3%, 21/23. Cox regression showed that four miRNAs, including hsa-miR-10b, hsa-miR-1227, hsa-miR-146b-3p, and hsa-miR-873, significantly correlated with tumor recurrence time. There were 137, 147, 145, and 77 target genes of the above four miRNAs, respectively, which were assigned to 17 gene ontology functionally annotated terms and 14 Kyoto Encyclopedia of Genes and Genomes pathways. Among them, the “Osteoclast differentiation” pathway contained a total of seven target genes and was

  16. On-line detection of apnea/hypopnea events using SpO2 signal: a rule-based approach employing binary classifier models.

    Science.gov (United States)

    Koley, Bijoy Laxmi; Dey, Debangshu

    2014-01-01

    This paper presents an online method for automatic detection of apnea/hypopnea events, with the help of oxygen saturation (SpO2) signal, measured at fingertip by Bluetooth nocturnal pulse oximeter. Event detection is performed by identifying abnormal data segments from the recorded SpO2 signal, employing a binary classifier model based on a support vector machine (SVM). Thereafter the abnormal segment is further analyzed to detect different states within the segment, i.e., steady, desaturation, and resaturation, with the help of another SVM-based binary ensemble classifier model. Finally, a heuristically obtained rule-based system is used to identify the apnea/hypopnea events from the time-sequenced decisions of these classifier models. In the developmental phase, a set of 34 time domain-based features was extracted from the segmented SpO2 signal using an overlapped windowing technique. Later, an optimal set of features was selected on the basis of recursive feature elimination technique. A total of 34 subjects were included in the study. The results show average event detection accuracies of 96.7% and 93.8% for the offline and the online tests, respectively. The proposed system provides direct estimation of the apnea/hypopnea index with the help of a relatively inexpensive and widely available pulse oximeter. Moreover, the system can be monitored and accessed by physicians through LAN/WAN/Internet and can be extended to deploy in Bluetooth-enabled mobile phones.

  17. 75 FR 37253 - Classified National Security Information

    Science.gov (United States)

    2010-06-28

    ... ``Secret.'' (3) Each interior page of a classified document shall be marked at the top and bottom either... ``(TS)'' for Top Secret, ``(S)'' for Secret, and ``(C)'' for Confidential will be used. (2) Portions... from the informational text. (1) Conspicuously place the overall classification at the top and bottom...

  18. Classifying cognitive profiles using machine learning with privileged information in Mild Cognitive Impairment

    Directory of Open Access Journals (Sweden)

    Hanin Hamdan Alahmadi

    2016-11-01

    Full Text Available Early diagnosis of dementia is critical for assessing disease progression and potential treatment. State-or-the-art machine learning techniques have been increasingly employed to take on this diagnostic task. In this study, we employed Generalised Matrix Learning Vector Quantization (GMLVQ classifiers to discriminate patients with Mild Cognitive Impairment (MCI from healthy controls based on their cognitive skills. Further, we adopted a ``Learning with privileged information'' approach to combine cognitive and fMRI data for the classification task. The resulting classifier operates solely on the cognitive data while it incorporates the fMRI data as privileged information (PI during training. This novel classifier is of practical use as the collection of brain imaging data is not always possible with patients and older participants.MCI patients and healthy age-matched controls were trained to extract structure from temporal sequences. We ask whether machine learning classifiers can be used to discriminate patients from controls based on the learning performance and whether differences between these groups relate to individual cognitive profiles. To this end, we tested participants in four cognitive tasks: working memory, cognitive inhibition, divided attention, and selective attention. We also collected fMRI data before and after training on the learning task and extracted fMRI responses and connectivity as features for machine learning classifiers. Our results show that the PI guided GMLVQ classifiers outperform the baseline classifier that only used the cognitive data. In addition, we found that for the baseline classifier, divided attention is the only relevant cognitive feature. When PI was incorporated, divided attention remained the most relevant feature while cognitive inhibition became also relevant for the task. Interestingly, this analysis for the fMRI GMLVQ classifier suggests that (1 when overall fMRI signal for structured stimuli is

  19. Description of children identified as suffering from MAM in Bangladesh: Varying results based on case definitions

    International Nuclear Information System (INIS)

    Waid, Jillian

    2014-01-01

    Full text: Background: There is a wide discrepancy between the proportion of children classified as acutely malnourished when MUAC criteria are used compared to weight for height. This has greatly complicated setting targets for the coverage of SAM and MAM programs in Bangladesh. This difference is much larger for children identified with MAM than for those with SAM, largely because identification as MAM can overlap both with SAM and with children not identified as acutely malnourished. Objective: To review existing data sets in order to determine the relationship between MUAC and other anthropometric measures, helping to provide a better understanding of the implications of different admission criteria to therapeutic and supplementary feeding programs. Methodology: This study uses data collected through national nutritional surveillance projects over multiple seasons in Bangladesh. For the years 1990 to 2006, sub-samples of data from the Nutritional Surveillance Project were pulled from areas of the country that remained constant over a set period. Data from 2010 to 2012 was pulled from the Food Security and Nutrition Surveillance Project. Case definition: Cases of moderate acute malnutrition were identified using MUAC- for-age z-scores (-3>z-score>-2), MUAC cut-offs (115mm>MUAC>125mm), and weight-for-height z-scores (-3>z-score>-2). Results: In all years more than 50% of all children identified as moderately malnourished were classified as such by only one measure (1990 selected sub-districts: 52%, 2012 national sample: 69%) In 1990 a higher proportion of children were categorized as moderately malnourished based on MUAC-for-age z-scores than by weight for height z-scores, but since 2000 the opposite has been true. This change is closely tied to the increasing height of children sampled, due to the declining rates of stunting in the country. After controlling for age and weight-for-height z-scores, an increase in height of one cm was associated with an increase

  20. A Novel Cascade Classifier for Automatic Microcalcification Detection.

    Directory of Open Access Journals (Sweden)

    Seung Yeon Shin

    Full Text Available In this paper, we present a novel cascaded classification framework for automatic detection of individual and clusters of microcalcifications (μC. Our framework comprises three classification stages: i a random forest (RF classifier for simple features capturing the second order local structure of individual μCs, where non-μC pixels in the target mammogram are efficiently eliminated; ii a more complex discriminative restricted Boltzmann machine (DRBM classifier for μC candidates determined in the RF stage, which automatically learns the detailed morphology of μC appearances for improved discriminative power; and iii a detector to detect clusters of μCs from the individual μC detection results, using two different criteria. From the two-stage RF-DRBM classifier, we are able to distinguish μCs using explicitly computed features, as well as learn implicit features that are able to further discriminate between confusing cases. Experimental evaluation is conducted on the original Mammographic Image Analysis Society (MIAS and mini-MIAS databases, as well as our own Seoul National University Bundang Hospital digital mammographic database. It is shown that the proposed method outperforms comparable methods in terms of receiver operating characteristic (ROC and precision-recall curves for detection of individual μCs and free-response receiver operating characteristic (FROC curve for detection of clustered μCs.

  1. Learning to classify organic and conventional wheat - a machine-learning driven approach using the MeltDB 2.0 metabolomics analysis platform

    Directory of Open Access Journals (Sweden)

    Nikolas eKessler

    2015-03-01

    Full Text Available We present results of our machine learning approach to the problem of classifying GC-MS data originating from wheat grains of different farming systems. The aim is to investigate the potential of learning algorithms to classify GC-MS data to be either from conventionally grown or from organically grown samples and considering different cultivars. The motivation of our work is rather obvious on the background of nowadays increased demand for organic food in post-industrialized societies and the necessity to prove organic food authenticity. The background of our data set is given by up to eleven wheat cultivars that have been cultivated in both farming systems, organic and conventional, throughout three years. More than 300 GC-MS measurements were recorded and subsequently processed and analyzed in the MeltDB 2.0 metabolomics analysis platform, being briefly outlined in this paper. We further describe how unsupervised (t-SNE, PCA and supervised (RF, SVM methods can be applied for sample visualization and classification. Our results clearly show that years have most and wheat cultivars have second-most influence on the metabolic composition of a sample. We can also show, that for a given year and cultivar, organic and conventional cultivation can be distinguished by machine-learning algorithms.

  2. Feature weighting using particle swarm optimization for learning vector quantization classifier

    Science.gov (United States)

    Dongoran, A.; Rahmadani, S.; Zarlis, M.; Zakarias

    2018-03-01

    This paper discusses and proposes a method of feature weighting in classification assignments on competitive learning artificial neural network LVQ. The weighting feature method is the search for the weight of an attribute using the PSO so as to give effect to the resulting output. This method is then applied to the LVQ-Classifier and tested on the 3 datasets obtained from the UCI Machine Learning repository. Then an accuracy analysis will be generated by two approaches. The first approach using LVQ1, referred to as LVQ-Classifier and the second approach referred to as PSOFW-LVQ, is a proposed model. The result shows that the PSO algorithm is capable of finding attribute weights that increase LVQ-classifier accuracy.

  3. Accuracy Evaluation of C4.5 and Naive Bayes Classifiers Using Attribute Ranking Method

    Directory of Open Access Journals (Sweden)

    S. Sivakumari

    2009-03-01

    Full Text Available This paper intends to classify the Ljubljana Breast Cancer dataset using C4.5 Decision Tree and Nai?ve Bayes classifiers. In this work, classification is carriedout using two methods. In the first method, dataset is analysed using all the attributes in the dataset. In the second method, attributes are ranked using information gain ranking technique and only the high ranked attributes are used to build the classification model. We are evaluating the results of C4.5 Decision Tree and Nai?ve Bayes classifiers in terms of classifier accuracy for various folds of cross validation. Our results show that both the classifiers achieve good accuracy on the dataset.

  4. A Constrained Multi-Objective Learning Algorithm for Feed-Forward Neural Network Classifiers

    Directory of Open Access Journals (Sweden)

    M. Njah

    2017-06-01

    Full Text Available This paper proposes a new approach to address the optimal design of a Feed-forward Neural Network (FNN based classifier. The originality of the proposed methodology, called CMOA, lie in the use of a new constraint handling technique based on a self-adaptive penalty procedure in order to direct the entire search effort towards finding only Pareto optimal solutions that are acceptable. Neurons and connections of the FNN Classifier are dynamically built during the learning process. The approach includes differential evolution to create new individuals and then keeps only the non-dominated ones as the basis for the next generation. The designed FNN Classifier is applied to six binary classification benchmark problems, obtained from the UCI repository, and results indicated the advantages of the proposed approach over other existing multi-objective evolutionary neural networks classifiers reported recently in the literature.

  5. 40 CFR 260.32 - Variances to be classified as a boiler.

    Science.gov (United States)

    2010-07-01

    ... 40 Protection of Environment 25 2010-07-01 2010-07-01 false Variances to be classified as a boiler... be classified as a boiler. In accordance with the standards and criteria in § 260.10 (definition of “boiler”), and the procedures in § 260.33, the Administrator may determine on a case-by-case basis that...

  6. Local curvature analysis for classifying breast tumors: Preliminary analysis in dedicated breast CT

    International Nuclear Information System (INIS)

    Lee, Juhun; Nishikawa, Robert M.; Reiser, Ingrid; Boone, John M.; Lindfors, Karen K.

    2015-01-01

    Purpose: The purpose of this study is to measure the effectiveness of local curvature measures as novel image features for classifying breast tumors. Methods: A total of 119 breast lesions from 104 noncontrast dedicated breast computed tomography images of women were used in this study. Volumetric segmentation was done using a seed-based segmentation algorithm and then a triangulated surface was extracted from the resulting segmentation. Total, mean, and Gaussian curvatures were then computed. Normalized curvatures were used as classification features. In addition, traditional image features were also extracted and a forward feature selection scheme was used to select the optimal feature set. Logistic regression was used as a classifier and leave-one-out cross-validation was utilized to evaluate the classification performances of the features. The area under the receiver operating characteristic curve (AUC, area under curve) was used as a figure of merit. Results: Among curvature measures, the normalized total curvature (C_T) showed the best classification performance (AUC of 0.74), while the others showed no classification power individually. Five traditional image features (two shape, two margin, and one texture descriptors) were selected via the feature selection scheme and its resulting classifier achieved an AUC of 0.83. Among those five features, the radial gradient index (RGI), which is a margin descriptor, showed the best classification performance (AUC of 0.73). A classifier combining RGI and C_T yielded an AUC of 0.81, which showed similar performance (i.e., no statistically significant difference) to the classifier with the above five traditional image features. Additional comparisons in AUC values between classifiers using different combinations of traditional image features and C_T were conducted. The results showed that C_T was able to replace the other four image features for the classification task. Conclusions: The normalized curvature measure

  7. A CONCISE PANEL OF BIOMARKERS IDENTIFIES NEUROCOGNITIVE FUNCTIONING CHANGES IN HIV-INFECTED INDIVIDUALS

    Science.gov (United States)

    Marcotte, Thomas D.; Deutsch, Reena; Michael, Benedict Daniel; Franklin, Donald; Cookson, Debra Rosario; Bharti, Ajay R.; Grant, Igor; Letendre, Scott L.

    2013-01-01

    Background Neurocognitive (NC) impairment (NCI) occurs commonly in people living with HIV. Despite substantial effort, no biomarkers have been sufficiently validated for diagnosis and prognosis of NCI in the clinic. The goal of this project was to identify diagnostic or prognostic biomarkers for NCI in a comprehensively characterized HIV cohort. Methods Multidisciplinary case review selected 98 HIV-infected individuals and categorized them into four NC groups using normative data: stably normal (SN), stably impaired (SI), worsening (Wo), or improving (Im). All subjects underwent comprehensive NC testing, phlebotomy, and lumbar puncture at two timepoints separated by a median of 6.2 months. Eight biomarkers were measured in CSF and blood by immunoassay. Results were analyzed using mixed model linear regression and staged recursive partitioning. Results At the first visit, subjects were mostly middle-aged (median 45) white (58%) men (84%) who had AIDS (70%). Of the 73% who took antiretroviral therapy (ART), 54% had HIV RNA levels below 50 c/mL in plasma. Mixed model linear regression identified that only MCP-1 in CSF was associated with neurocognitive change group. Recursive partitioning models aimed at diagnosis (i.e., correctly classifying neurocognitive status at the first visit) were complex and required most biomarkers to achieve misclassification limits. In contrast, prognostic models were more efficient. A combination of three biomarkers (sCD14, MCP-1, SDF-1α) correctly classified 82% of Wo and SN subjects, including 88% of SN subjects. A combination of two biomarkers (MCP-1, TNF-α) correctly classified 81% of Im and SI subjects, including 100% of SI subjects. Conclusions This analysis of well-characterized individuals identified concise panels of biomarkers associated with NC change. Across all analyses, the two most frequently identified biomarkers were sCD14 and MCP-1, indicators of monocyte/macrophage activation. While the panels differed depending on

  8. Molecular Characteristics in MRI-classified Group 1 Glioblastoma Multiforme

    Directory of Open Access Journals (Sweden)

    William E Haskins

    2013-07-01

    Full Text Available Glioblastoma multiforme (GBM is a clinically and pathologically heterogeneous brain tumor. Previous study of MRI-classified GBM has revealed a spatial relationship between Group 1 GBM (GBM1 and the subventricular zone (SVZ. The SVZ is an adult neural stem cell niche and is also suspected to be the origin of a subtype of brain tumor. The intimate contact between GBM1 and the SVZ raises the possibility that tumor cells in GBM1 may be most related to SVZ cells. In support of this notion, we found that neural stem cell and neuroblast markers are highly expressed in GBM1. Additionally, we identified molecular characteristics in this type of GBM that include up-regulation of metabolic enzymes, ribosomal proteins, heat shock proteins, and c-Myc oncoprotein. As GBM1 often recurs at great distances from the initial lesion, the rewiring of metabolism and ribosomal biogenesis may facilitate cancer cells’ growth and survival during tumor migration. Taken together, combined our findings and MRI-based classification of GBM1 would offer better prediction and treatment for this multifocal GBM.

  9. 75 FR 733 - Implementation of the Executive Order, ``Classified National Security Information''

    Science.gov (United States)

    2010-01-05

    ... of the Executive Order, ``Classified National Security Information'' Memorandum for the Heads of... Security Information'' (the ``order''), which substantially advances my goals for reforming the security... classified information shall provide the Director of the Information Security Oversight Office (ISOO) a copy...

  10. Psychometrics of the Overall Anxiety Severity and Impairment Scale (OASIS) in a sample of women with and without trauma histories.

    Science.gov (United States)

    Norman, Sonya B; Allard, Carolyn B; Trim, Ryan S; Thorp, Steven R; Behrooznia, Michelle; Masino, Tonya T; Stein, Murray B

    2013-04-01

    Many women have unidentified anxiety or trauma histories that can impact their health and medical treatment-seeking behavior. This study examined the sensitivity, specificity, efficiency, and sensitivity to change of the Overall Anxiety Severity and Impairment Scale (OASIS) for identifying an anxiety disorder in a female sample with and without trauma history related to intimate partner violence (IPV). Forty-three women with full or partial PTSD from IPV and 41 women without PTSD completed the OASIS. All participants with trauma history completed the Clinician Administered PTSD Scale. This report is a secondary analysis of a study on the neurobiology of psychological trauma in survivors of IPV recruited from the community. A cut-score of 5 best discriminated those with PTSD from those without, successfully classifying 91% of the sample with 93% sensitivity and 90% specificity. The measure showed strong sensitivity to change in a subsample of 20 participants who completed PTSD treatment and strong convergent and divergent validity in the full sample. This study suggests that the OASIS can identify the presence of an anxiety disorder among a female sample of IPV survivors when PTSD is present.

  11. Microbiological Monitoring and Proteolytic Study of Clinical Samples From Burned and Burned Wounded Patients

    International Nuclear Information System (INIS)

    Toema, M.A.; El-Bazza, Z.E.; El-Hifnawi, H.N.; Abd-El-Hakim, E.E.

    2013-01-01

    In this study, clinical samples were collected from 100 patients admitted to Burn and Plastic Surgery Department, Faculty of Medicine, Ain Shams University, Egypt, over a period of 12 months. The proteolytic activity of 110 clinical samples taken from surfaces swabs which taken from burned and burned wounded patients with different ages and gender was examined. Screening for the proteolytic activity produced by pathogenic bacteria isolated from burned and burned wounded patients was evaluated as gram positive Bacilli and gram negative bacilli showed high proteolytic activity (46.4%) while 17.9% showed no activity. The isolated bacteria proved to have proteolytic activity were classified into high, moderate and weak. The pathogenic bacteria isolated from burned and burned wounded patients and showing proteolytic activity were identified as Pseudomonas aeruginosa, Proteus mirabilis, Proteus vulgaris, Bacillus megaterium, Bacillus cereus, Staphylococcus aureus, Escherichia coli, Klebsiella ozaeanae, Klebsiella oxytoca, Klebsiella pneumoniae and Pseudomonas fluoresces.

  12. Application of Machine Learning Approaches for Classifying Sitting Posture Based on Force and Acceleration Sensors

    Directory of Open Access Journals (Sweden)

    Roland Zemp

    2016-01-01

    Full Text Available Occupational musculoskeletal disorders, particularly chronic low back pain (LBP, are ubiquitous due to prolonged static sitting or nonergonomic sitting positions. Therefore, the aim of this study was to develop an instrumented chair with force and acceleration sensors to determine the accuracy of automatically identifying the user’s sitting position by applying five different machine learning methods (Support Vector Machines, Multinomial Regression, Boosting, Neural Networks, and Random Forest. Forty-one subjects were requested to sit four times in seven different prescribed sitting positions (total 1148 samples. Sixteen force sensor values and the backrest angle were used as the explanatory variables (features for the classification. The different classification methods were compared by means of a Leave-One-Out cross-validation approach. The best performance was achieved using the Random Forest classification algorithm, producing a mean classification accuracy of 90.9% for subjects with which the algorithm was not familiar. The classification accuracy varied between 81% and 98% for the seven different sitting positions. The present study showed the possibility of accurately classifying different sitting positions by means of the introduced instrumented office chair combined with machine learning analyses. The use of such novel approaches for the accurate assessment of chair usage could offer insights into the relationships between sitting position, sitting behaviour, and the occurrence of musculoskeletal disorders.

  13. Identifying bioaccumulative halogenated organic compounds using a nontargeted analytical approach: seabirds as sentinels.

    Directory of Open Access Journals (Sweden)

    Christopher J Millow

    Full Text Available Persistent organic pollutants (POPs are typically monitored via targeted mass spectrometry, which potentially identifies only a fraction of the contaminants actually present in environmental samples. With new anthropogenic compounds continuously introduced to the environment, novel and proactive approaches that provide a comprehensive alternative to targeted methods are needed in order to more completely characterize the diversity of known and unknown compounds likely to cause adverse effects. Nontargeted mass spectrometry attempts to extensively screen for compounds, providing a feasible approach for identifying contaminants that warrant future monitoring. We employed a nontargeted analytical method using comprehensive two-dimensional gas chromatography coupled to time-of-flight mass spectrometry (GC×GC/TOF-MS to characterize halogenated organic compounds (HOCs in California Black skimmer (Rynchops niger eggs. Our study identified 111 HOCs; 84 of these compounds were regularly detected via targeted approaches, while 27 were classified as typically unmonitored or unknown. Typically unmonitored compounds of note in bird eggs included tris(4-chlorophenylmethane (TCPM, tris(4-chlorophenylmethanol (TCPMOH, triclosan, permethrin, heptachloro-1'-methyl-1,2'-bipyrrole (MBP, as well as four halogenated unknown compounds that could not be identified through database searching or the literature. The presence of these compounds in Black skimmer eggs suggests they are persistent, bioaccumulative, potentially biomagnifying, and maternally transferring. Our results highlight the utility and importance of employing nontargeted analytical tools to assess true contaminant burdens in organisms, as well as to demonstrate the value in using environmental sentinels to proactively identify novel contaminants.

  14. Proposed hybrid-classifier ensemble algorithm to map snow cover area

    Science.gov (United States)

    Nijhawan, Rahul; Raman, Balasubramanian; Das, Josodhir

    2018-01-01

    Metaclassification ensemble approach is known to improve the prediction performance of snow-covered area. The methodology adopted in this case is based on neural network along with four state-of-art machine learning algorithms: support vector machine, artificial neural networks, spectral angle mapper, K-mean clustering, and a snow index: normalized difference snow index. An AdaBoost ensemble algorithm related to decision tree for snow-cover mapping is also proposed. According to available literature, these methods have been rarely used for snow-cover mapping. Employing the above techniques, a study was conducted for Raktavarn and Chaturangi Bamak glaciers, Uttarakhand, Himalaya using multispectral Landsat 7 ETM+ (enhanced thematic mapper) image. The study also compares the results with those obtained from statistical combination methods (majority rule and belief functions) and accuracies of individual classifiers. Accuracy assessment is performed by computing the quantity and allocation disagreement, analyzing statistic measures (accuracy, precision, specificity, AUC, and sensitivity) and receiver operating characteristic curves. A total of 225 combinations of parameters for individual classifiers were trained and tested on the dataset and results were compared with the proposed approach. It was observed that the proposed methodology produced the highest classification accuracy (95.21%), close to (94.01%) that was produced by the proposed AdaBoost ensemble algorithm. From the sets of observations, it was concluded that the ensemble of classifiers produced better results compared to individual classifiers.

  15. Classifier models and architectures for EEG-based neonatal seizure detection

    International Nuclear Information System (INIS)

    Greene, B R; Marnane, W P; Lightbody, G; Reilly, R B; Boylan, G B

    2008-01-01

    Neonatal seizures are the most common neurological emergency in the neonatal period and are associated with a poor long-term outcome. Early detection and treatment may improve prognosis. This paper aims to develop an optimal set of parameters and a comprehensive scheme for patient-independent multi-channel EEG-based neonatal seizure detection. We employed a dataset containing 411 neonatal seizures. The dataset consists of multi-channel EEG recordings with a mean duration of 14.8 h from 17 neonatal patients. Early-integration and late-integration classifier architectures were considered for the combination of information across EEG channels. Three classifier models based on linear discriminants, quadratic discriminants and regularized discriminants were employed. Furthermore, the effect of electrode montage was considered. The best performing seizure detection system was found to be an early integration configuration employing a regularized discriminant classifier model. A referential EEG montage was found to outperform the more standard bipolar electrode montage for automated neonatal seizure detection. A cross-fold validation estimate of the classifier performance for the best performing system yielded 81.03% of seizures correctly detected with a false detection rate of 3.82%. With post-processing, the false detection rate was reduced to 1.30% with 59.49% of seizures correctly detected. These results represent a comprehensive illustration that robust reliable patient-independent neonatal seizure detection is possible using multi-channel EEG

  16. Classifying hot water chemistry: Application of MULTIVARIATE STATISTICS

    OpenAIRE

    Sumintadireja, Prihadi; Irawan, Dasapta Erwin; Rezky, Yuanno; Gio, Prana Ugiana; Agustin, Anggita

    2016-01-01

    This file is the dataset for the following paper "Classifying hot water chemistry: Application of MULTIVARIATE STATISTICS". Authors: Prihadi Sumintadireja1, Dasapta Erwin Irawan1, Yuano Rezky2, Prana Ugiana Gio3, Anggita Agustin1

  17. Identifying the potential of changes to blood sample logistics using simulation.

    Science.gov (United States)

    Jørgensen, Pelle; Jacobsen, Peter; Poulsen, Jørgen Hjelm

    2013-01-01

    Using simulation as an approach to display and improve internal logistics at hospitals has great potential. This study shows how a simulation model displaying the morning blood-taking round at a Danish public hospital can be developed and utilized with the aim of improving the logistics. The focus of the simulation was to evaluate changes made to the transportation of blood samples between wards and the laboratory. The average- (AWT) and maximum waiting time (MWT) from a blood sample was drawn at the ward until it was received at the laboratory, and the distribution of arrivals of blood samples in the laboratory were used as the evaluation criteria. Four different scenarios were tested and compared with the current approach: (1) Using AGVs (mobile robots), (2) using a pneumatic tube system, (3) using porters that are called upon, or (4) using porters that come to the wards every 45 minutes. Furthermore, each of the scenarios was tested in terms of what amount of resources would give the optimal result. The simulations showed a big improvement potential in implementing a new technology/mean for transporting the blood samples. The pneumatic tube system showed the biggest potential lowering the AWT and MWT with approx. 36% and 18%, respectively. Additionally, all of the scenarios had a more even distribution of arrivals except for porters coming to the wards every 45 min. As a consequence of the results obtained in the study, the hospital decided to implement a pneumatic tube system.

  18. Measurement properties of clinical assessment methods for classifying generalized joint hypermobility

    DEFF Research Database (Denmark)

    Juul-Kristensen, Birgit; Schmedling, Karoline; Rombaut, Lies

    2017-01-01

    methods. For BS-self, the validity showed unknown evidence compared with test assessment methods. In conclusion, following recommended uniformity of testing procedures, the recommendation for clinical use in adults is BS with cut-point of 5 of 9 including historical information, while in children it is BS...... with cut-point of at least 6 of 9. However, more studies are needed to conclude on the validity properties of these assessment methods, and before evidence-based recommendations can be made for clinical use on the "best" assessment method for classifying GJH. © 2017 Wiley Periodicals, Inc....... for evaluating the methodological quality of the identified studies, all included studies were rated "fair" or "poor." Most studies were using BS, and for BS the reliability most of the studies showed limited positive to conflicting evidence, with some shortcomings on studies for the validity. The three other...

  19. On the Use of Biomineral Oxygen Isotope Data to Identify Human Migrants in the Archaeological Record: Intra-Sample Variation, Statistical Methods and Geographical Considerations.

    Directory of Open Access Journals (Sweden)

    Emma Lightfoot

    Full Text Available Oxygen isotope analysis of archaeological skeletal remains is an increasingly popular tool to study past human migrations. It is based on the assumption that human body chemistry preserves the δ18O of precipitation in such a way as to be a useful technique for identifying migrants and, potentially, their homelands. In this study, the first such global survey, we draw on published human tooth enamel and bone bioapatite data to explore the validity of using oxygen isotope analyses to identify migrants in the archaeological record. We use human δ18O results to show that there are large variations in human oxygen isotope values within a population sample. This may relate to physiological factors influencing the preservation of the primary isotope signal, or due to human activities (such as brewing, boiling, stewing, differential access to water sources and so on causing variation in ingested water and food isotope values. We compare the number of outliers identified using various statistical methods. We determine that the most appropriate method for identifying migrants is dependent on the data but is likely to be the IQR or median absolute deviation from the median under most archaeological circumstances. Finally, through a spatial assessment of the dataset, we show that the degree of overlap in human isotope values from different locations across Europe is such that identifying individuals' homelands on the basis of oxygen isotope analysis alone is not possible for the regions analysed to date. Oxygen isotope analysis is a valid method for identifying first-generation migrants from an archaeological site when used appropriately, however it is difficult to identify migrants using statistical methods for a sample size of less than c. 25 individuals. In the absence of local previous analyses, each sample should be treated as an individual dataset and statistical techniques can be used to identify migrants, but in most cases pinpointing a specific

  20. The Effect of Reading Comprehension and Problem Solving Strategies on Classifying Elementary 4th Grade Students with High and Low Problem Solving Success

    Science.gov (United States)

    Ulu, Mustafa

    2017-01-01

    In this study, the effect of fluent reading (speed, reading accuracy percentage, prosodic reading), comprehension (literal comprehension, inferential comprehension) and problem solving strategies on classifying students with high and low problem solving success was researched. The sampling of the research is composed of 279 students at elementary…

  1. Discrimination-Aware Classifiers for Student Performance Prediction

    Science.gov (United States)

    Luo, Ling; Koprinska, Irena; Liu, Wei

    2015-01-01

    In this paper we consider discrimination-aware classification of educational data. Mining and using rules that distinguish groups of students based on sensitive attributes such as gender and nationality may lead to discrimination. It is desirable to keep the sensitive attributes during the training of a classifier to avoid information loss but…

  2. Two-categorical bundles and their classifying spaces

    DEFF Research Database (Denmark)

    Baas, Nils A.; Bökstedt, M.; Kro, T.A.

    2012-01-01

    -category is a classifying space for the associated principal 2-bundles. In the process of proving this we develop a lot of powerful machinery which may be useful in further studies of 2-categorical topology. As a corollary we get a new proof of the classification of principal bundles. A calculation based...

  3. Classifying Adverse Events in the Dental Office.

    Science.gov (United States)

    Kalenderian, Elsbeth; Obadan-Udoh, Enihomo; Maramaldi, Peter; Etolue, Jini; Yansane, Alfa; Stewart, Denice; White, Joel; Vaderhobli, Ram; Kent, Karla; Hebballi, Nutan B; Delattre, Veronique; Kahn, Maria; Tokede, Oluwabunmi; Ramoni, Rachel B; Walji, Muhammad F

    2017-06-30

    Dentists strive to provide safe and effective oral healthcare. However, some patients may encounter an adverse event (AE) defined as "unnecessary harm due to dental treatment." In this research, we propose and evaluate two systems for categorizing the type and severity of AEs encountered at the dental office. Several existing medical AE type and severity classification systems were reviewed and adapted for dentistry. Using data collected in previous work, two initial dental AE type and severity classification systems were developed. Eight independent reviewers performed focused chart reviews, and AEs identified were used to evaluate and modify these newly developed classifications. A total of 958 charts were independently reviewed. Among the reviewed charts, 118 prospective AEs were found and 101 (85.6%) were verified as AEs through a consensus process. At the end of the study, a final AE type classification comprising 12 categories, and an AE severity classification comprising 7 categories emerged. Pain and infection were the most common AE types representing 73% of the cases reviewed (56% and 17%, respectively) and 88% were found to cause temporary, moderate to severe harm to the patient. Adverse events found during the chart review process were successfully classified using the novel dental AE type and severity classifications. Understanding the type of AEs and their severity are important steps if we are to learn from and prevent patient harm in the dental office.

  4. Evaluation of immunization coverage by lot quality assurance sampling compared with 30-cluster sampling in a primary health centre in India.

    OpenAIRE

    Singh, J.; Jain, D. C.; Sharma, R. S.; Verghese, T.

    1996-01-01

    The immunization coverage of infants, children and women residing in a primary health centre (PHC) area in Rajasthan was evaluated both by lot quality assurance sampling (LQAS) and by the 30-cluster sampling method recommended by WHO's Expanded Programme on Immunization (EPI). The LQAS survey was used to classify 27 mutually exclusive subunits of the population, defined as residents in health subcentre areas, on the basis of acceptable or unacceptable levels of immunization coverage among inf...

  5. A Bayesian classifier for symbol recognition

    OpenAIRE

    Barrat , Sabine; Tabbone , Salvatore; Nourrissier , Patrick

    2007-01-01

    URL : http://www.buyans.com/POL/UploadedFile/134_9977.pdf; International audience; We present in this paper an original adaptation of Bayesian networks to symbol recognition problem. More precisely, a descriptor combination method, which enables to improve significantly the recognition rate compared to the recognition rates obtained by each descriptor, is presented. In this perspective, we use a simple Bayesian classifier, called naive Bayes. In fact, probabilistic graphical models, more spec...

  6. Air classifier technology (ACT) in dry powder inhalation. Part 1 : Introduction of a novel force distribution concept (FDC) explaining the performance of a basic air classifier on adhesive mixtures

    NARCIS (Netherlands)

    de Boer, A H; Hagedoorn, P; Gjaltema, D; Goede, J; Frijlink, H W

    2003-01-01

    Air classifier technology (ACT) is introduced as part of formulation integrated dry powder inhaler development (FIDPI) to optimise the de-agglomeration of inhalation powders. Carrier retention and de-agglomeration results obtained with a basic classifier concept are discussed. The theoretical

  7. Application of Chemometric Techniques to Colorimetric Data in Classifying Automobile Paint

    International Nuclear Information System (INIS)

    Nur Awatif Rosli; Rozita Osman; Norashikin Saim; Mohd Zuli Jaafar

    2015-01-01

    The analysis of paint chips is of great interest to forensic investigators, particularly in the examination of hit-and run cases. This study proposes a direct and rapid method in classifying automobile paint samples based on colorimetric data sets; absorption value, reflectance value, luminosity value (L), degree of redness (a) and degree of yellowness (b) obtained from video spectral comparator (VSC) technique. A total of 42 automobile paint samples from 7 manufacturers were analysed. The colorimetric datasets obtained from VSC analysis were subjected to chemometric technique namely cluster analysis (CA) and principal component analysis (PCA). Based on CA, 5 clusters were generated; Cluster 1 consisted of silver color, cluster 2 consisted of white color, cluster 3 consisted of blue and black colors, cluster 4 consisted of red color and cluster 5 consisted of light blue color. PCA resulted in two latent factors explaining 95.58 % of the total variance, enabled to group the 42 automobile paints into five groups. Chemometric application on colorimetric datasets provide meaningful classification of automobile paints based on their tone colour (L, a, b) and light intensity These approaches have the potential to ease the interpretation of complex spectral data involving a large number of comparisons. (author)

  8. Techniques and methodologies to identify potential generated industries of NORM in Angola Republic and evaluate its impacts

    International Nuclear Information System (INIS)

    Diogo, José Manuel Sucumula

    2017-01-01

    Numerous steps have been taken worldwide to identify and quantify the radiological risks associated with the mining of ores containing Naturally Occurrence Radioactive Material (NORM), often resulting in unnecessary exposures to individuals and high environmental damage, with devastating consequences for the health of workers and damage to the economy of many countries due to a lack of regulations or inadequate regulations. For these and other reasons, the objective of this work was to identify industrial potential generating NORM in the Republic of Angola and to estimate its radiological environmental impacts. To achieve this objective, we studied the theoretical aspects, identified the main internationally recognized industrial companies that as generate by NORM. The Brazilian experience in the regulatory aspect was observed in the evaluation criteria to classify industries that generate NORM, the methods of mining and its radiological environmental impacts, as well as the main techniques applied to evaluate the concentrations of radionuclides in a specific environmental matrix and/or a NORM sample. The study approach allowed the elaboration of a NORM map for the main provinces of Angola, establishing the evaluation criteria for implementing the Radiation Protection Plan in the extractive industry, establishing measures to control ionizing radiation in mining, identifying and quantifying radionuclides present in samples of lees oil. However, in order to assess adequately the radiological environmental impact of the NORM industry, it is not enough to identify them, it is important to know the origin, quantify the radioactive material released as liquid and gaseous effluents, identify the main routes of exposure and examine how this material spreads into the environment until it reaches man. (author)

  9. Transcriptome profiling of whole blood cells identifies PLEK2 and C1QB in human melanoma.

    Directory of Open Access Journals (Sweden)

    Yuchun Luo

    Full Text Available Developing analytical methodologies to identify biomarkers in easily accessible body fluids is highly valuable for the early diagnosis and management of cancer patients. Peripheral whole blood is a "nucleic acid-rich" and "inflammatory cell-rich" information reservoir and represents systemic processes altered by the presence of cancer cells.We conducted transcriptome profiling of whole blood cells from melanoma patients. To overcome challenges associated with blood-based transcriptome analysis, we used a PAXgene™ tube and NuGEN Ovation™ globin reduction system. The combined use of these systems in microarray resulted in the identification of 78 unique genes differentially expressed in the blood of melanoma patients. Of these, 68 genes were further analyzed by quantitative reverse transcriptase PCR using blood samples from 45 newly diagnosed melanoma patients (stage I to IV and 50 healthy control individuals. Thirty-nine genes were verified to be differentially expressed in blood samples from melanoma patients. A stepwise logit analysis selected eighteen 2-gene signatures that distinguish melanoma from healthy controls. Of these, a 2-gene signature consisting of PLEK2 and C1QB led to the best result that correctly classified 93.3% melanoma patients and 90% healthy controls. Both genes were upregulated in blood samples of melanoma patients from all stages. Further analysis using blood fractionation showed that CD45(- and CD45(+ populations were responsible for the altered expression levels of PLEK2 and C1QB, respectively.The current study provides the first analysis of whole blood-based transcriptome biomarkers for malignant melanoma. The expression of PLEK2, the strongest gene to classify melanoma patients, in CD45(- subsets illustrates the importance of analyzing whole blood cells for biomarker studies. The study suggests that transcriptome profiling of blood cells could be used for both early detection of melanoma and monitoring of patients

  10. Classifying vulnerability to sleep deprivation using baseline measures of psychomotor vigilance.

    Science.gov (United States)

    Patanaik, Amiya; Kwoh, Chee Keong; Chua, Eric C P; Gooley, Joshua J; Chee, Michael W L

    2015-05-01

    To identify measures derived from baseline psychomotor vigilance task (PVT) performance that can reliably predict vulnerability to sleep deprivation. Subjects underwent total sleep deprivation and completed a 10-min PVT every 1-2 h in a controlled laboratory setting. Participants were categorized as vulnerable or resistant to sleep deprivation, based on a median split of lapses that occurred following sleep deprivation. Standard reaction time, drift diffusion model (DDM), and wavelet metrics were derived from PVT response times collected at baseline. A support vector machine model that incorporated maximum relevance and minimum redundancy feature selection and wrapper-based heuristics was used to classify subjects as vulnerable or resistant using rested data. Two academic sleep laboratories. Independent samples of 135 (69 women, age 18 to 25 y), and 45 (3 women, age 22 to 32 y) healthy adults. In both datasets, DDM measures, number of consecutive reaction times that differ by more than 250 ms, and two wavelet features were selected by the model as features predictive of vulnerability to sleep deprivation. Using the best set of features selected in each dataset, classification accuracy was 77% and 82% using fivefold stratified cross-validation, respectively. In both datasets, DDM measures, number of consecutive reaction times that differ by more than 250 ms, and two wavelet features were selected by the model as features predictive of vulnerability to sleep deprivation. Using the best set of features selected in each dataset, classification accuracy was 77% and 82% using fivefold stratified cross-validation, respectively. Despite differences in experimental conditions across studies, drift diffusion model parameters associated reliably with individual differences in performance during total sleep deprivation. These results demonstrate the utility of drift diffusion modeling of baseline performance in estimating vulnerability to psychomotor vigilance decline

  11. A new approach to enhance the performance of decision tree for classifying gene expression data.

    Science.gov (United States)

    Hassan, Md; Kotagiri, Ramamohanarao

    2013-12-20

    Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset. By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature. We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.

  12. A genetic algorithm-based framework for wavelength selection on sample categorization.

    Science.gov (United States)

    Anzanello, Michel J; Yamashita, Gabrielli; Marcelo, Marcelo; Fogliatto, Flávio S; Ortiz, Rafael S; Mariotti, Kristiane; Ferrão, Marco F

    2017-08-01

    In forensic and pharmaceutical scenarios, the application of chemometrics and optimization techniques has unveiled common and peculiar features of seized medicine and drug samples, helping investigative forces to track illegal operations. This paper proposes a novel framework aimed at identifying relevant subsets of attenuated total reflectance Fourier transform infrared (ATR-FTIR) wavelengths for classifying samples into two classes, for example authentic or forged categories in case of medicines, or salt or base form in cocaine analysis. In the first step of the framework, the ATR-FTIR spectra were partitioned into equidistant intervals and the k-nearest neighbour (KNN) classification technique was applied to each interval to insert samples into proper classes. In the next step, selected intervals were refined through the genetic algorithm (GA) by identifying a limited number of wavelengths from the intervals previously selected aimed at maximizing classification accuracy. When applied to Cialis®, Viagra®, and cocaine ATR-FTIR datasets, the proposed method substantially decreased the number of wavelengths needed to categorize, and increased the classification accuracy. From a practical perspective, the proposed method provides investigative forces with valuable information towards monitoring illegal production of drugs and medicines. In addition, focusing on a reduced subset of wavelengths allows the development of portable devices capable of testing the authenticity of samples during police checking events, avoiding the need for later laboratorial analyses and reducing equipment expenses. Theoretically, the proposed GA-based approach yields more refined solutions than the current methods relying on interval approaches, which tend to insert irrelevant wavelengths in the retained intervals. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  13. Analysis and minimization of overtraining effect in rule-based classifiers for computer-aided diagnosis

    International Nuclear Information System (INIS)

    Li Qiang; Doi Kunio

    2006-01-01

    Computer-aided diagnostic (CAD) schemes have been developed to assist radiologists detect various lesions in medical images. In CAD schemes, classifiers play a key role in achieving a high lesion detection rate and a low false-positive rate. Although many popular classifiers such as linear discriminant analysis and artificial neural networks have been employed in CAD schemes for reduction of false positives, a rule-based classifier has probably been the simplest and most frequently used one since the early days of development of various CAD schemes. However, with existing rule-based classifiers, there are major disadvantages that significantly reduce their practicality and credibility. The disadvantages include manual design, poor reproducibility, poor evaluation methods such as resubstitution, and a large overtraining effect. An automated rule-based classifier with a minimized overtraining effect can overcome or significantly reduce the extent of the above-mentioned disadvantages. In this study, we developed an 'optimal' method for the selection of cutoff thresholds and a fully automated rule-based classifier. Experimental results performed with Monte Carlo simulation and a real lung nodule CT data set demonstrated that the automated threshold selection method can completely eliminate overtraining effect in the procedure of cutoff threshold selection, and thus can minimize overall overtraining effect in the constructed rule-based classifier. We believe that this threshold selection method is very useful in the construction of automated rule-based classifiers with minimized overtraining effect

  14. A three-parameter model for classifying anurans into four genera based on advertisement calls.

    Science.gov (United States)

    Gingras, Bruno; Fitch, William Tecumseh

    2013-01-01

    The vocalizations of anurans are innate in structure and may therefore contain indicators of phylogenetic history. Thus, advertisement calls of species which are more closely related phylogenetically are predicted to be more similar than those of distant species. This hypothesis was evaluated by comparing several widely used machine-learning algorithms. Recordings of advertisement calls from 142 species belonging to four genera were analyzed. A logistic regression model, using mean values for dominant frequency, coefficient of variation of root-mean square energy, and spectral flux, correctly classified advertisement calls with regard to genus with an accuracy above 70%. Similar accuracy rates were obtained using these parameters with a support vector machine model, a K-nearest neighbor algorithm, and a multivariate Gaussian distribution classifier, whereas a Gaussian mixture model performed slightly worse. In contrast, models based on mel-frequency cepstral coefficients did not fare as well. Comparable accuracy levels were obtained on out-of-sample recordings from 52 of the 142 original species. The results suggest that a combination of low-level acoustic attributes is sufficient to discriminate efficiently between the vocalizations of these four genera, thus supporting the initial premise and validating the use of high-throughput algorithms on animal vocalizations to evaluate phylogenetic hypotheses.

  15. Statistical text classifier to detect specific type of medical incidents.

    Science.gov (United States)

    Wong, Zoie Shui-Yee; Akiyama, Masanori

    2013-01-01

    WHO Patient Safety has put focus to increase the coherence and expressiveness of patient safety classification with the foundation of International Classification for Patient Safety (ICPS). Text classification and statistical approaches has showed to be successful to identifysafety problems in the Aviation industryusing incident text information. It has been challenging to comprehend the taxonomy of medical incidents in a structured manner. Independent reporting mechanisms for patient safety incidents have been established in the UK, Canada, Australia, Japan, Hong Kong etc. This research demonstrates the potential to construct statistical text classifiers to detect specific type of medical incidents using incident text data. An illustrative example for classifying look-alike sound-alike (LASA) medication incidents using structured text from 227 advisories related to medication errors from Global Patient Safety Alerts (GPSA) is shown in this poster presentation. The classifier was built using logistic regression model. ROC curve and the AUC value indicated that this is a satisfactory good model.

  16. A Naive-Bayes classifier for damage detection in engineering materials

    Energy Technology Data Exchange (ETDEWEB)

    Addin, O. [Laboratory of Intelligent Systems, Institute of Advanced Technology, Universiti Putra Malaysia, 43400 Serdang, Selangor (Malaysia); Sapuan, S.M. [Department of Mechanical and Manufacturing Engineering, Universiti Putra Malaysia, 43400 Serdang, Selangor (Malaysia)]. E-mail: sapuan@eng.upm.edu.my; Mahdi, E. [Department of Aerospace Engineering, Universiti Putra Malaysia, 43400 Serdang, Selangor (Malaysia); Othman, M. [Department of Communication Technology and Networks, Universiti Putra Malaysia, 43400 Serdang, Selangor (Malaysia)

    2007-07-01

    This paper is intended to introduce the Bayesian network in general and the Naive-Bayes classifier in particular as one of the most successful classification systems to simulate damage detection in engineering materials. A method for feature subset selection has also been introduced too. The method is based on mean and maximum values of the amplitudes of waves after dividing them into folds then grouping them by a clustering algorithm (e.g. k-means algorithm). The Naive-Bayes classifier and the feature sub-set selection method were analyzed and tested on two sets of data. The data sets were conducted based on artificial damages created in quasi isotopic laminated composites of the AS4/3501-6 graphite/epoxy system and ball bearing of the type 6204 with a steel cage. The Naive-Bayes classifier and the proposed feature subset selection algorithm have been shown as efficient techniques for damage detection in engineering materials.

  17. Discovering mammography-based machine learning classifiers for breast cancer diagnosis.

    Science.gov (United States)

    Ramos-Pollán, Raúl; Guevara-López, Miguel Angel; Suárez-Ortega, Cesar; Díaz-Herrero, Guillermo; Franco-Valiente, Jose Miguel; Rubio-Del-Solar, Manuel; González-de-Posada, Naimy; Vaz, Mario Augusto Pires; Loureiro, Joana; Ramos, Isabel

    2012-08-01

    This work explores the design of mammography-based machine learning classifiers (MLC) and proposes a new method to build MLC for breast cancer diagnosis. We massively evaluated MLC configurations to classify features vectors extracted from segmented regions (pathological lesion or normal tissue) on craniocaudal (CC) and/or mediolateral oblique (MLO) mammography image views, providing BI-RADS diagnosis. Previously, appropriate combinations of image processing and normalization techniques were applied to reduce image artifacts and increase mammograms details. The method can be used under different data acquisition circumstances and exploits computer clusters to select well performing MLC configurations. We evaluated 286 cases extracted from the repository owned by HSJ-FMUP, where specialized radiologists segmented regions on CC and/or MLO images (biopsies provided the golden standard). Around 20,000 MLC configurations were evaluated, obtaining classifiers achieving an area under the ROC curve of 0.996 when combining features vectors extracted from CC and MLO views of the same case.

  18. A Topic Model Approach to Representing and Classifying Football Plays

    KAUST Repository

    Varadarajan, Jagannadan

    2013-09-09

    We address the problem of modeling and classifying American Football offense teams’ plays in video, a challenging example of group activity analysis. Automatic play classification will allow coaches to infer patterns and tendencies of opponents more ef- ficiently, resulting in better strategy planning in a game. We define a football play as a unique combination of player trajectories. To this end, we develop a framework that uses player trajectories as inputs to MedLDA, a supervised topic model. The joint maximiza- tion of both likelihood and inter-class margins of MedLDA in learning the topics allows us to learn semantically meaningful play type templates, as well as, classify different play types with 70% average accuracy. Furthermore, this method is extended to analyze individual player roles in classifying each play type. We validate our method on a large dataset comprising 271 play clips from real-world football games, which will be made publicly available for future comparisons.

  19. Application of In-Segment Multiple Sampling in Object-Based Classification

    Directory of Open Access Journals (Sweden)

    Nataša Đurić

    2014-12-01

    Full Text Available When object-based analysis is applied to very high-resolution imagery, pixels within the segments reveal large spectral inhomogeneity; their distribution can be considered complex rather than normal. When normality is violated, the classification methods that rely on the assumption of normally distributed data are not as successful or accurate. It is hard to detect normality violations in small samples. The segmentation process produces segments that vary highly in size; samples can be very big or very small. This paper investigates whether the complexity within the segment can be addressed using multiple random sampling of segment pixels and multiple calculations of similarity measures. In order to analyze the effect sampling has on classification results, statistics and probability value equations of non-parametric two-sample Kolmogorov-Smirnov test and parametric Student’s t-test are selected as similarity measures in the classification process. The performance of both classifiers was assessed on a WorldView-2 image for four land cover classes (roads, buildings, grass and trees and compared to two commonly used object-based classifiers—k-Nearest Neighbor (k-NN and Support Vector Machine (SVM. Both proposed classifiers showed a slight improvement in the overall classification accuracies and produced more accurate classification maps when compared to the ground truth image.

  20. Histogram deconvolution - An aid to automated classifiers

    Science.gov (United States)

    Lorre, J. J.

    1983-01-01

    It is shown that N-dimensional histograms are convolved by the addition of noise in the picture domain. Three methods are described which provide the ability to deconvolve such noise-affected histograms. The purpose of the deconvolution is to provide automated classifiers with a higher quality N-dimensional histogram from which to obtain classification statistics.