WorldWideScience

Sample records for identify samples classified

  1. Consensus of sample-balanced classifiers for identifying ligand-binding residue by co-evolutionary physicochemical characteristics of amino acids

    KAUST Repository

    Chen, Peng

    2013-01-01

    Protein-ligand binding is an important mechanism for some proteins to perform their functions, and those binding sites are the residues of proteins that physically bind to ligands. So far, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. Due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we constructed several balanced data sets, for each of which a random forest (RF)-based classifier was trained. The ensemble of these RF classifiers formed a sequence-based protein-ligand binding site predictor. Experimental results on CASP9 targets demonstrated that our method compared favorably with the state-of-the-art. © Springer-Verlag Berlin Heidelberg 2013.

  2. Classifier-Guided Sampling for Complex Energy System Optimization

    Energy Technology Data Exchange (ETDEWEB)

    Backlund, Peter B. [Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States); Eddy, John P. [Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)

    2015-09-01

    This report documents the results of a Laboratory Directed Research and Development (LDRD) effort enti tled "Classifier - Guided Sampling for Complex Energy System Optimization" that was conducted during FY 2014 and FY 2015. The goal of this proj ect was to develop, implement, and test major improvements to the classifier - guided sampling (CGS) algorithm. CGS is type of evolutionary algorithm for perform ing search and optimization over a set of discrete design variables in the face of one or more objective functions. E xisting evolutionary algorithms, such as genetic algorithms , may require a large number of o bjecti ve function evaluations to identify optimal or near - optimal solutions . Reducing the number of evaluations can result in significant time savings, especially if the objective function is computationally expensive. CGS reduce s the evaluation count by us ing a Bayesian network classifier to filter out non - promising candidate designs , prior to evaluation, based on their posterior probabilit ies . In this project, b oth the single - objective and multi - objective version s of the CGS are developed and tested on a set of benchm ark problems. As a domain - specific case study, CGS is used to design a microgrid for use in islanded mode during an extended bulk power grid outage.

  3. REPTREE CLASSIFIER FOR IDENTIFYING LINK SPAM IN WEB SEARCH ENGINES

    Directory of Open Access Journals (Sweden)

    S.K. Jayanthi

    2013-01-01

    Full Text Available Search Engines are used for retrieving the information from the web. Most of the times, the importance is laid on top 10 results sometimes it may shrink as top 5, because of the time constraint and reliability on the search engines. Users believe that top 10 or 5 of total results are more relevant. Here comes the problem of spamdexing. It is a method to deceive the search result quality. Falsified metrics such as inserting enormous amount of keywords or links in website may take that website to the top 10 or 5 positions. This paper proposes a classifier based on the Reptree (Regression tree representative. As an initial step Link-based features such as neighbors, pagerank, truncated pagerank, trustrank and assortativity related attributes are inferred. Based on this features, tree is constructed. The tree uses the feature inference to differentiate spam sites from legitimate sites. WEBSPAM-UK-2007 dataset is taken as a base. It is preprocessed and converted into five datasets FEATA, FEATB, FEATC, FEATD and FEATE. Only link based features are taken for experiments. This paper focus on link spam alone. Finally a representative tree is created which will more precisely classify the web spam entries. Results are given. Regression tree classification seems to perform well as shown through experiments.

  4. Identifying aggressive prostate cancer foci using a DNA methylation classifier.

    Science.gov (United States)

    Mundbjerg, Kamilla; Chopra, Sameer; Alemozaffar, Mehrdad; Duymich, Christopher; Lakshminarasimhan, Ranjani; Nichols, Peter W; Aron, Manju; Siegmund, Kimberly D; Ukimura, Osamu; Aron, Monish; Stern, Mariana; Gill, Parkash; Carpten, John D; Ørntoft, Torben F; Sørensen, Karina D; Weisenberger, Daniel J; Jones, Peter A; Duddalwar, Vinay; Gill, Inderbir; Liang, Gangning

    2017-01-12

    Slow-growing prostate cancer (PC) can be aggressive in a subset of cases. Therefore, prognostic tools to guide clinical decision-making and avoid overtreatment of indolent PC and undertreatment of aggressive disease are urgently needed. PC has a propensity to be multifocal with several different cancerous foci per gland. Here, we have taken advantage of the multifocal propensity of PC and categorized aggressiveness of individual PC foci based on DNA methylation patterns in primary PC foci and matched lymph node metastases. In a set of 14 patients, we demonstrate that over half of the cases have multiple epigenetically distinct subclones and determine the primary subclone from which the metastatic lesion(s) originated. Furthermore, we develop an aggressiveness classifier consisting of 25 DNA methylation probes to determine aggressive and non-aggressive subclones. Upon validation of the classifier in an independent cohort, the predicted aggressive tumors are significantly associated with the presence of lymph node metastases and invasive tumor stages. Overall, this study provides molecular-based support for determining PC aggressiveness with the potential to impact clinical decision-making, such as targeted biopsy approaches for early diagnosis and active surveillance, in addition to focal therapy.

  5. Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.

    Science.gov (United States)

    Tuo, Youlin; An, Ning; Zhang, Ming

    2018-03-01

    The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of PSVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several

  6. Classifier Directed Data Hybridization for Geographic Sample Supervised Segment Generation

    Directory of Open Access Journals (Sweden)

    Christoff Fourie

    2014-11-01

    Full Text Available Quality segment generation is a well-known challenge and research objective within Geographic Object-based Image Analysis (GEOBIA. Although methodological avenues within GEOBIA are diverse, segmentation commonly plays a central role in most approaches, influencing and being influenced by surrounding processes. A general approach using supervised quality measures, specifically user provided reference segments, suggest casting the parameters of a given segmentation algorithm as a multidimensional search problem. In such a sample supervised segment generation approach, spatial metrics observing the user provided reference segments may drive the search process. The search is commonly performed by metaheuristics. A novel sample supervised segment generation approach is presented in this work, where the spectral content of provided reference segments is queried. A one-class classification process using spectral information from inside the provided reference segments is used to generate a probability image, which in turn is employed to direct a hybridization of the original input imagery. Segmentation is performed on such a hybrid image. These processes are adjustable, interdependent and form a part of the search problem. Results are presented detailing the performances of four method variants compared to the generic sample supervised segment generation approach, under various conditions in terms of resultant segment quality, required computing time and search process characteristics. Multiple metrics, metaheuristics and segmentation algorithms are tested with this approach. Using the spectral data contained within user provided reference segments to tailor the output generally improves the results in the investigated problem contexts, but at the expense of additional required computing time.

  7. Stable Sparse Classifiers Identify qEEG Signatures that Predict Learning Disabilities (NOS) Severity.

    Science.gov (United States)

    Bosch-Bayard, Jorge; Galán-García, Lídice; Fernandez, Thalia; Lirio, Rolando B; Bringas-Vega, Maria L; Roca-Stappung, Milene; Ricardo-Garcell, Josefina; Harmony, Thalía; Valdes-Sosa, Pedro A

    2017-01-01

    In this paper, we present a novel methodology to solve the classification problem, based on sparse (data-driven) regressions, combined with techniques for ensuring stability, especially useful for high-dimensional datasets and small samples number. The sensitivity and specificity of the classifiers are assessed by a stable ROC procedure, which uses a non-parametric algorithm for estimating the area under the ROC curve. This method allows assessing the performance of the classification by the ROC technique, when more than two groups are involved in the classification problem, i.e., when the gold standard is not binary. We apply this methodology to the EEG spectral signatures to find biomarkers that allow discriminating between (and predicting pertinence to) different subgroups of children diagnosed as Not Otherwise Specified Learning Disabilities (LD-NOS) disorder. Children with LD-NOS have notable learning difficulties, which affect education but are not able to be put into some specific category as reading (Dyslexia), Mathematics (Dyscalculia), or Writing (Dysgraphia). By using the EEG spectra, we aim to identify EEG patterns that may be related to specific learning disabilities in an individual case. This could be useful to develop subject-based methods of therapy, based on information provided by the EEG. Here we study 85 LD-NOS children, divided in three subgroups previously selected by a clustering technique over the scores of cognitive tests. The classification equation produced stable marginal areas under the ROC of 0.71 for discrimination between Group 1 vs. Group 2; 0.91 for Group 1 vs. Group 3; and 0.75 for Group 2 vs. Group1. A discussion of the EEG characteristics of each group related to the cognitive scores is also presented.

  8. Stable Sparse Classifiers Identify qEEG Signatures that Predict Learning Disabilities (NOS Severity

    Directory of Open Access Journals (Sweden)

    Jorge Bosch-Bayard

    2018-01-01

    Full Text Available In this paper, we present a novel methodology to solve the classification problem, based on sparse (data-driven regressions, combined with techniques for ensuring stability, especially useful for high-dimensional datasets and small samples number. The sensitivity and specificity of the classifiers are assessed by a stable ROC procedure, which uses a non-parametric algorithm for estimating the area under the ROC curve. This method allows assessing the performance of the classification by the ROC technique, when more than two groups are involved in the classification problem, i.e., when the gold standard is not binary. We apply this methodology to the EEG spectral signatures to find biomarkers that allow discriminating between (and predicting pertinence to different subgroups of children diagnosed as Not Otherwise Specified Learning Disabilities (LD-NOS disorder. Children with LD-NOS have notable learning difficulties, which affect education but are not able to be put into some specific category as reading (Dyslexia, Mathematics (Dyscalculia, or Writing (Dysgraphia. By using the EEG spectra, we aim to identify EEG patterns that may be related to specific learning disabilities in an individual case. This could be useful to develop subject-based methods of therapy, based on information provided by the EEG. Here we study 85 LD-NOS children, divided in three subgroups previously selected by a clustering technique over the scores of cognitive tests. The classification equation produced stable marginal areas under the ROC of 0.71 for discrimination between Group 1 vs. Group 2; 0.91 for Group 1 vs. Group 3; and 0.75 for Group 2 vs. Group1. A discussion of the EEG characteristics of each group related to the cognitive scores is also presented.

  9. Gene expression-based classifiers identify Staphylococcus aureus infection in mice and humans.

    Directory of Open Access Journals (Sweden)

    Sun Hee Ahn

    Full Text Available Staphylococcus aureus causes a spectrum of human infection. Diagnostic delays and uncertainty lead to treatment delays and inappropriate antibiotic use. A growing literature suggests the host's inflammatory response to the pathogen represents a potential tool to improve upon current diagnostics. The hypothesis of this study is that the host responds differently to S. aureus than to E. coli infection in a quantifiable way, providing a new diagnostic avenue. This study uses Bayesian sparse factor modeling and penalized binary regression to define peripheral blood gene-expression classifiers of murine and human S. aureus infection. The murine-derived classifier distinguished S. aureus infection from healthy controls and Escherichia coli-infected mice across a range of conditions (mouse and bacterial strain, time post infection and was validated in outbred mice (AUC>0.97. A S. aureus classifier derived from a cohort of 94 human subjects distinguished S. aureus blood stream infection (BSI from healthy subjects (AUC 0.99 and E. coli BSI (AUC 0.84. Murine and human responses to S. aureus infection share common biological pathways, allowing the murine model to classify S. aureus BSI in humans (AUC 0.84. Both murine and human S. aureus classifiers were validated in an independent human cohort (AUC 0.95 and 0.92, respectively. The approach described here lends insight into the conserved and disparate pathways utilized by mice and humans in response to these infections. Furthermore, this study advances our understanding of S. aureus infection; the host response to it; and identifies new diagnostic and therapeutic avenues.

  10. Inter- and Intraexaminer Reliability in Identifying and Classifying Myofascial Trigger Points in Shoulder Muscles.

    Science.gov (United States)

    Nascimento, José Diego Sales do; Alburquerque-Sendín, Francisco; Vigolvino, Lorena Passos; Oliveira, Wandemberg Fortunato de; Sousa, Catarina de Oliveira

    2018-01-01

    To determine inter- and intraexaminer reliability of examiners without clinical experience in identifying and classifying myofascial trigger points (MTPs) in the shoulder muscles of subjects asymptomatic and symptomatic for unilateral subacromial impact syndrome (SIS). Within-day inter- and intraexaminer reliability study. Physical therapy department of a university. Fifty-two subjects participated in the study, 26 symptomatic and 26 asymptomatic for unilateral SIS. Two examiners, without experience for assessing MTPs, independent and blind to the clinical conditions of the subjects, assessed bilaterally the presence of MTPs (present or absent) in 6 shoulder muscles and classified them (latent or active) on the affected side of the symptomatic group. Each examiner performed the same assessment twice in the same day. Reliability was calculated through percentage agreement, prevalence- and bias-adjusted kappa (PABAK) statistics, and weighted kappa. Intraexaminer reliability in identifying MTPs for the symptomatic and asymptomatic groups was moderate to perfect (PABAK, .46-1 and .60-1, respectively). Interexaminer reliability was between moderate and almost perfect in the 2 groups (PABAK, .46-.92), except for the muscles of the symptomatic group, which were below these values. With respect to MTP classification, intraexaminer reliability was moderate to high for most muscles, but interexaminer reliability was moderate for only 1 muscle (weighted κ=.45), and between weak and reasonable for the rest (weighted κ=.06-.31). Intraexaminer reliability is acceptable in clinical practice to identify and classify MTPs. However, interexaminer reliability proved to be reliable only to identify MTPs, with the symptomatic side exhibiting lower values of reliability. Copyright © 2017 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  11. Identifying and Classifying Mobile Business Models Based on Meta-Synthesis Approach

    Directory of Open Access Journals (Sweden)

    Porrandokht Niroomand

    2012-03-01

    Full Text Available The appearance of mobile has provided unique opportunities and fields through the development and creation of businesses and has been able to create the new job opportunities. The current research tries to familiarize entrepreneures who are running the businesses especially in the area of mobile services with business models. These business models can familiarize them for implementing the new ideas and designs since they can enter to business market. Searching in many papers shows that there are no propitiated papers and researches that can identify, categorize and analyze the mobile business models. Consequently, this paper involves innovation. The first part of this paper presents the review about the concepts and theories about the different mobile generations, the mobile commerce and business models. Afterwards, 92 models are compared, interpreted, translated and combined using 33 papers, books based on two different criteria that are expert criterion and kind of product criterion. In the classification of models according to models that are presented by experts, the models are classified based on criteria such as business fields, business partners, the rate of dynamism, the kind of activity, the focus areas, the mobile generations, transparency, the type of operator activities, marketing and advertisements. The models that are classified based on the kind of product have been analyzed and classified at four different areas of mobile commerce including the content production, technology (software and hardware, network and synthetic.

  12. A systematic procedure for identifying and classifying children with dyscalculia among primary school children in India.

    Science.gov (United States)

    Ramaa, S; Gowramma, I P

    2002-01-01

    This paper describes the procedures adopted by two independent studies in India for identifying and classifying children with dyscalculia in primary schools. For determining the presence of dyscalculia both inclusionary and exclusionary criteria were used. When other possible causes of arithmetic failure had been excluded, figures for dyscalculia came out as 5.98% (15 cases out of 251) in one study and 5.54% (78 out of 1408) in the second. It was found in the latter study that 40 out of the 78 (51.27%) also had reading and writing problems. The findings are discussed in the light of previous studies.

  13. Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies

    Science.gov (United States)

    Theis, Fabian J.

    2017-01-01

    Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia. PMID:29312464

  14. Identifying and classifying quality-of-life tools for assessing pressure ulcers after spinal cord injury

    Science.gov (United States)

    Hitzig, Sander L.; Balioussis, Christina; Nussbaum, Ethne; McGillivray, Colleen F.; Catharine Craven, B.; Noreau, Luc

    2013-01-01

    Context Although pressure ulcers may negatively influence quality of life (QoL) post-spinal cord injury (SCI), our understanding of how to assess their impact is confounded by conceptual and measurement issues. To ensure that descriptions of pressure ulcer impact are appropriately characterized, measures should be selected according to the domains that they evaluate and the population and pathologies for which they are designed. Objective To conduct a systematic literature review to identify and classify outcome measures used to assess the impact of pressure ulcers on QoL after SCI. Methods Electronic databases (Medline/PubMed, CINAHL, and PsycInfo) were searched for studies published between 1975 and 2011. Identified outcome measures were classified as being either subjective or objective using a QoL model. Results Fourteen studies were identified. The majority of tools identified in these studies did not have psychometric evidence supporting their use in the SCI population with the exception of two objective measures, the Short-Form 36 and the Craig Handicap Assessment and Reporting Technique, and two subjective measures, the Life Situation Questionnaire-Revised and the Ferrans and Powers Quality of Life Index SCI-Version. Conclusion Many QoL outcome tools showed promise in being sensitive to the presence of pressure ulcers, but few of them have been validated for use with SCI. Prospective studies should employ more rigorous methods for collecting data on pressure ulcer severity and location to improve the quality of findings with regard to their impact on QoL. The Cardiff Wound Impact Schedule is a potential tool for assessing impact of pressure ulcers-post SCI. PMID:24090238

  15. Classifier-guided sampling for discrete variable, discontinuous design space exploration: Convergence and computational performance

    Energy Technology Data Exchange (ETDEWEB)

    Backlund, Peter B. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Shahan, David W. [HRL Labs., LLC, Malibu, CA (United States); Seepersad, Carolyn Conner [Univ. of Texas, Austin, TX (United States)

    2014-04-22

    A classifier-guided sampling (CGS) method is introduced for solving engineering design optimization problems with discrete and/or continuous variables and continuous and/or discontinuous responses. The method merges concepts from metamodel-guided sampling and population-based optimization algorithms. The CGS method uses a Bayesian network classifier for predicting the performance of new designs based on a set of known observations or training points. Unlike most metamodeling techniques, however, the classifier assigns a categorical class label to a new design, rather than predicting the resulting response in continuous space, and thereby accommodates nondifferentiable and discontinuous functions of discrete or categorical variables. The CGS method uses these classifiers to guide a population-based sampling process towards combinations of discrete and/or continuous variable values with a high probability of yielding preferred performance. Accordingly, the CGS method is appropriate for discrete/discontinuous design problems that are ill-suited for conventional metamodeling techniques and too computationally expensive to be solved by population-based algorithms alone. In addition, the rates of convergence and computational properties of the CGS method are investigated when applied to a set of discrete variable optimization problems. Results show that the CGS method significantly improves the rate of convergence towards known global optima, on average, when compared to genetic algorithms.

  16. Glaucomatous patterns in Frequency Doubling Technology (FDT) perimetry data identified by unsupervised machine learning classifiers.

    Science.gov (United States)

    Bowd, Christopher; Weinreb, Robert N; Balasubramanian, Madhusudhanan; Lee, Intae; Jang, Giljin; Yousefi, Siamak; Zangwill, Linda M; Medeiros, Felipe A; Girkin, Christopher A; Liebmann, Jeffrey M; Goldbaum, Michael H

    2014-01-01

    The variational Bayesian independent component analysis-mixture model (VIM), an unsupervised machine-learning classifier, was used to automatically separate Matrix Frequency Doubling Technology (FDT) perimetry data into clusters of healthy and glaucomatous eyes, and to identify axes representing statistically independent patterns of defect in the glaucoma clusters. FDT measurements were obtained from 1,190 eyes with normal FDT results and 786 eyes with abnormal FDT results from the UCSD-based Diagnostic Innovations in Glaucoma Study (DIGS) and African Descent and Glaucoma Evaluation Study (ADAGES). For all eyes, VIM input was 52 threshold test points from the 24-2 test pattern, plus age. FDT mean deviation was -1.00 dB (S.D. = 2.80 dB) and -5.57 dB (S.D. = 5.09 dB) in FDT-normal eyes and FDT-abnormal eyes, respectively (p<0.001). VIM identified meaningful clusters of FDT data and positioned a set of statistically independent axes through the mean of each cluster. The optimal VIM model separated the FDT fields into 3 clusters. Cluster N contained primarily normal fields (1109/1190, specificity 93.1%) and clusters G1 and G2 combined, contained primarily abnormal fields (651/786, sensitivity 82.8%). For clusters G1 and G2 the optimal number of axes were 2 and 5, respectively. Patterns automatically generated along axes within the glaucoma clusters were similar to those known to be indicative of glaucoma. Fields located farther from the normal mean on each glaucoma axis showed increasing field defect severity. VIM successfully separated FDT fields from healthy and glaucoma eyes without a priori information about class membership, and identified familiar glaucomatous patterns of loss.

  17. Identifying Sources of Scientific Knowledge: classifying non-source items in the WoS

    Energy Technology Data Exchange (ETDEWEB)

    Calero-Medina, C.M.

    2016-07-01

    The sources of scientific knowledge can be tracked using the references in scientific publications. For instance, the publications from the scientific journals covered by the Web of Science database (WoS) contain references to publications for which an indexed source record exist in the WoS (source items) or to references for which an indexed source record does not exist in the WoS (non-source items). The classification of the non-source items is the main objective of the work in progress presented here. Some other scholars have classified and identified non-source items with different purposes (e.g. Butler & Visser (2006); Liseé, Larivière & Archambault (2008); Nerderhof, van Leeuwen & van Raan (2010); Hicks & Wang (2013); Boyack & Klavans (2014)). But these studies are focused in specific source types, fields or set of papers. The work presented here is much broader in terms of the number of publications, source types and fields. (Author)

  18. Toward a More Responsive Consumable Materiel Supply Chain: Leveraging New Metrics to Identify and Classify Items of Concern

    Science.gov (United States)

    2016-06-01

    MATERIEL SUPPLY CHAIN: LEVERAGING NEW METRICS TO IDENTIFY AND CLASSIFY ITEMS OF CONCERN by Andrew R. Haley June 2016 Thesis Advisor: Robert...IDENTIFY AND CLASSIFY ITEMS OF CONCERN 5. FUNDING NUMBERS 6. AUTHOR(S) Andrew R. Haley 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Naval...Supply Systems Command (NAVSUP), logistics, inventory, consumable, NSNs at Risk, Bad Actors, Bad Actors with Trend, items of concern , customer time

  19. Stacking machine learning classifiers to identify Higgs bosons at the LHC

    International Nuclear Information System (INIS)

    Alves, A.

    2017-01-01

    Machine learning (ML) algorithms have been employed in the problem of classifying signal and background events with high accuracy in particle physics. In this paper, we compare the performance of a widespread ML technique, namely, stacked generalization , against the results of two state-of-art algorithms: (1) a deep neural network (DNN) in the task of discovering a new neutral Higgs boson and (2) a scalable machine learning system for tree boosting, in the Standard Model Higgs to tau leptons channel, both at the 8 TeV LHC. In a cut-and-count analysis, stacking three algorithms performed around 16% worse than DNN but demanding far less computation efforts, however, the same stacking outperforms boosted decision trees. Using the stacked classifiers in a multivariate statistical analysis (MVA), on the other hand, significantly enhances the statistical significance compared to cut-and-count in both Higgs processes, suggesting that combining an ensemble of simpler and faster ML algorithms with MVA tools is a better approach than building a complex state-of-art algorithm for cut-and-count.

  20. CAGE peaks identified as true TSS by TSS classifier - FANTOM5 | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us FANTOM...p://ftp.biosciencedbc.jp/archive/fantom5/datafiles/phase1.3/extra/TSS_classifier/ File size: 32 MB Simple se...arch URL - Data acquisition method - Data analysis method TSS Classifier( http://sourceforge.net/p/tom...ase Description Download License Update History of This Database Site Policy | Contact Us CAGE peaks identified as true TSS by TSS classifier - FANTOM5 | LSDB Archive ...

  1. Identifying Different Transportation Modes from Trajectory Data Using Tree-Based Ensemble Classifiers

    Directory of Open Access Journals (Sweden)

    Zhibin Xiao

    2017-02-01

    Full Text Available Recognition of transportation modes can be used in different applications including human behavior research, transport management and traffic control. Previous work on transportation mode recognition has often relied on using multiple sensors or matching Geographic Information System (GIS information, which is not possible in many cases. In this paper, an approach based on ensemble learning is proposed to infer hybrid transportation modes using only Global Position System (GPS data. First, in order to distinguish between different transportation modes, we used a statistical method to generate global features and extract several local features from sub-trajectories after trajectory segmentation, before these features were combined in the classification stage. Second, to obtain a better performance, we used tree-based ensemble models (Random Forest, Gradient Boosting Decision Tree, and XGBoost instead of traditional methods (K-Nearest Neighbor, Decision Tree, and Support Vector Machines to classify the different transportation modes. The experiment results on the later have shown the efficacy of our proposed approach. Among them, the XGBoost model produced the best performance with a classification accuracy of 90.77% obtained on the GEOLIFE dataset, and we used a tree-based ensemble method to ensure accurate feature selection to reduce the model complexity.

  2. Identify and Classify Critical Success Factor of Agile Software Development Methodology Using Mind Map

    OpenAIRE

    Tasneem Abd El Hameed; Mahmoud Abd EL Latif; Sherif Kholief

    2016-01-01

    Selecting the right method, right personnel and right practices, and applying them adequately, determine the success of software development. In this paper, a qualitative study is carried out among the critical factors of success from previous studies. The factors of success match with their relative principles to illustrate the most valuable factor for agile approach success, this paper also prove that the twelve principles poorly identified for few factors resulting from qualitative and qua...

  3. Data fusion and machine learning to identify threat vectors for the Zika virus and classify vulnerability

    Science.gov (United States)

    Gentle, J. N., Jr.; Kahn, A.; Pierce, S. A.; Wang, S.; Wade, C.; Moran, S.

    2016-12-01

    With the continued spread of the zika virus in the United States in both Florida and Virginia, increased public awareness, prevention and targeted prediction is necessary to effectively mitigate further infection and propagation of the virus throughout the human population. The goal of this project is to utilize publicly accessible data and HPC resources coupled with machine learning algorithms to identify potential threat vectors for the spread of the zika virus in Texas, the United States and globally by correlating available zika case data collected from incident reports in medical databases (e.g., CDC, Florida Department of Health) with known bodies of water in various earth science databases (e.g., USGS NAQWA Data, NASA ASTER Data, TWDB Data) and by using known mosquito population centers as a proxy for trends in population distribution (e.g., WHO, European CDC, Texas Data) while correlating historical trends in the spread of other mosquito borne diseases (e.g., chikungunya, malaria, dengue, yellow fever, west nile, etc.). The resulting analysis should refine the identification of the specific threat vectors for the spread of the virus which will correspondingly increase the effectiveness of the limited resources allocated towards combating the disease through better strategic implementation of defense measures. The minimal outcome of this research is a better understanding of the factors involved in the spread of the zika virus, with the greater potential to save additional lives through more effective resource utilization and public outreach.

  4. Convolutional Neural Networks with Batch Normalization for Classifying Hi-hat, Snare, and Bass Percussion Sound Samples

    DEFF Research Database (Denmark)

    Gajhede, Nicolai; Beck, Oliver; Purwins, Hendrik

    2016-01-01

    After having revolutionized image and speech processing, convolu- tional neural networks (CNN) are now starting to become more and more successful in music information retrieval as well. We compare four CNN types for classifying a dataset of more than 3000 acoustic and synthesized samples...

  5. Fungi identify the geographic origin of dust samples.

    Directory of Open Access Journals (Sweden)

    Neal S Grantham

    Full Text Available There is a long history of archaeologists and forensic scientists using pollen found in a dust sample to identify its geographic origin or history. Such palynological approaches have important limitations as they require time-consuming identification of pollen grains, a priori knowledge of plant species distributions, and a sufficient diversity of pollen types to permit spatial or temporal identification. We demonstrate an alternative approach based on DNA sequencing analyses of the fungal diversity found in dust samples. Using nearly 1,000 dust samples collected from across the continental U.S., our analyses identify up to 40,000 fungal taxa from these samples, many of which exhibit a high degree of geographic endemism. We develop a statistical learning algorithm via discriminant analysis that exploits this geographic endemicity in the fungal diversity to correctly identify samples to within a few hundred kilometers of their geographic origin with high probability. In addition, our statistical approach provides a measure of certainty for each prediction, in contrast with current palynology methods that are almost always based on expert opinion and devoid of statistical inference. Fungal taxa found in dust samples can therefore be used to identify the origin of that dust and, more importantly, we can quantify our degree of certainty that a sample originated in a particular place. This work opens up a new approach to forensic biology that could be used by scientists to identify the origin of dust or soil samples found on objects, clothing, or archaeological artifacts.

  6. Recovery of thermophilic Campylobacter by three sampling methods from classified river sites in Northeast Georgia, USA

    Science.gov (United States)

    It is not clear how best to sample streams for the detection of Campylobacter which may be introduced from agricultural or community land use. Fifteen sites in the watershed of the South Fork of the Broad River (SFBR) in Northeastern Georgia, USA, were sampled in three seasons. Seven sites were cl...

  7. Using lot quality-assurance sampling and area sampling to identify priority areas for trachoma control: Viet Nam.

    Science.gov (United States)

    Myatt, Mark; Mai, Nguyen Phuong; Quynh, Nguyen Quang; Nga, Nguyen Huy; Tai, Ha Huy; Long, Nguyen Hung; Minh, Tran Hung; Limburg, Hans

    2005-10-01

    To report on the use of lot quality-assurance sampling (LQAS) surveys undertaken within an area-sampling framework to identify priority areas for intervention with trachoma control activities in Viet Nam. The LQAS survey method for the rapid assessment of the prevalence of active trachoma was adapted for use in Viet Nam with the aim of classifying individual communes by the prevalence of active trachoma among children in primary school. School-based sampling was used; school sites to be sampled were selected using an area-sampling approach. A total of 719 communes in 41 districts in 18 provinces were surveyed. Survey staff found the LQAS survey method both simple and rapid to use after initial problems with area-sampling methods were identified and remedied. The method yielded a finer spatial resolution of prevalence than had been previously achieved in Viet Nam using semiquantitative rapid assessment surveys and multistage cluster-sampled surveys. When used with area-sampling techniques, the LQAS survey method has the potential to form the basis of survey instruments that can be used to efficiently target resources for interventions against active trachoma. With additional work, such methods could provide a generally applicable tool for effective programme planning and for the certification of the elimination of trachoma as a blinding disease.

  8. Biased sampling, over-identified parameter problems and beyond

    CERN Document Server

    Qin, Jing

    2017-01-01

    This book is devoted to biased sampling problems (also called choice-based sampling in Econometrics parlance) and over-identified parameter estimation problems. Biased sampling problems appear in many areas of research, including Medicine, Epidemiology and Public Health, the Social Sciences and Economics. The book addresses a range of important topics, including case and control studies, causal inference, missing data problems, meta-analysis, renewal process and length biased sampling problems, capture and recapture problems, case cohort studies, exponential tilting genetic mixture models etc. The goal of this book is to make it easier for Ph. D students and new researchers to get started in this research area. It will be of interest to all those who work in the health, biological, social and physical sciences, as well as those who are interested in survey methodology and other areas of statistical science, among others. .

  9. Sampling And Identifying Of Mould In The Library Building

    Directory of Open Access Journals (Sweden)

    Abdul Wahab Suriani Ngah

    2016-01-01

    Full Text Available Despite the growing concern over mould and fungi infestations on library building, little has been reported in the literature on the development of an objective tool and criteria for measuring and characterising the mould and fungi. In this paper, an objective based approach to mould and fungi growth assessment using various sampling techniques and its identification using microscopic observation are proposed. This study involved three library buildings of Higher Institution Educational in Malaysia for data collection purpose and study of mould growth. The mould sampling of three libraries was collected using Coriolis air sampler, settling plate air sampling using Malt Extract Agar (MEA, IAQ MOLD Alexeter IAQ-Pro Asp/Pen® Test and swab sampling techniques. The IAQ MOLD Alexeter IAQ-Pro Asp/Pen® Test and traditional method technique identified various mould species immediately on the site, and the microscopic observation identifies common types of the mould such as Aspergillus, Penicillium and Stachybotrys’s. The sample size and particular characteristics of each library will result in the mould growth pattern and finding.

  10. Identifying PTSD personality subtypes in a workplace trauma sample.

    Science.gov (United States)

    Sellbom, Martin; Bagby, R Michael

    2009-10-01

    The authors sought to identify personality clusters derived from the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) Personality Psychopathology Five Scales in a sample of workplace claimants with posttraumatic stress disorder (PTSD). Three clusters--low pathology, internalizing, and externalizing were recovered similar to those obtained by M. W. Miller and colleagues (2003, 2004, 2007) in samples of combat veterans and sexual assault victims. Internalizers and externalizers scored comparably on measures of PTSD symptom severity, general distress, and negative affect. Internalizers were uniquely characterized by anhedonia and depressed mood; externalizers by antisocial behavior, substance abuse, and anger/aggression.

  11. Classifying the Progression of Ductal Carcinoma from Single-Cell Sampled Data via Integer Linear Programming: A Case Study.

    Science.gov (United States)

    Catanzaro, Daniele; Shackney, Stanley E; Schaffer, Alejandro A; Schwartz, Russell

    2016-01-01

    Ductal Carcinoma In Situ (DCIS) is a precursor lesion of Invasive Ductal Carcinoma (IDC) of the breast. Investigating its temporal progression could provide fundamental new insights for the development of better diagnostic tools to predict which cases of DCIS will progress to IDC. We investigate the problem of reconstructing a plausible progression from single-cell sampled data of an individual with synchronous DCIS and IDC. Specifically, by using a number of assumptions derived from the observation of cellular atypia occurring in IDC, we design a possible predictive model using integer linear programming (ILP). Computational experiments carried out on a preexisting data set of 13 patients with simultaneous DCIS and IDC show that the corresponding predicted progression models are classifiable into categories having specific evolutionary characteristics. The approach provides new insights into mechanisms of clonal progression in breast cancers and helps illustrate the power of the ILP approach for similar problems in reconstructing tumor evolution scenarios under complex sets of constraints.

  12. A Single Platinum Microelectrode for Identifying Soft Drink Samples

    Directory of Open Access Journals (Sweden)

    Lígia Bueno

    2012-01-01

    Full Text Available Cyclic voltammograms recorded with a single platinum microelectrode were used along with a non-supervised pattern recognition, namely, Principal Component Analysis, to conduct a qualitative analysis of sixteen different brands of carbonated soft drinks (Kuat, Soda Antarctica, H2OH!, Sprite 2.0, Guarana Antarctica, Guarana Antarctica Zero, Coca-Cola, Coca-Cola Zero, Coca-Cola Plus, Pepsi, Pepsi Light, Pepsi Twist, Pepsi Twist Light, Pepsi Twist 3, Schin Cola, and Classic Dillar’s. In this analysis, soft drink samples were not subjected to pre-treatment. Good differentiation among all the analysed soft drinks was achieved using the voltammetric data. An analysis of the loading plots shows that the potentials of −0.65 V, −0.4 V, 0.4 V, and 0.750 V facilitated the discrimination process. The electrochemical processes related to this potential are the reduction of hydrogen ions and inhibition of the platinum oxidation by the caffeine adsorption on the electrode surface. Additionally, the single platinum microelectrode was useful for the quality control of the soft drink samples, as it helped to identify the time at which the beverage was opened.

  13. Spot sputum samples are at least as good as early morning samples for identifying Mycobacterium tuberculosis.

    Science.gov (United States)

    Murphy, Michael E; Phillips, Patrick P J; Mendel, Carl M; Bongard, Emily; Bateson, Anna L C; Hunt, Robert; Murthy, Saraswathi; Singh, Kasha P; Brown, Michael; Crook, Angela M; Nunn, Andrew J; Meredith, Sarah K; Lipman, Marc; McHugh, Timothy D; Gillespie, Stephen H

    2017-10-27

    The use of early morning sputum samples (EMS) to diagnose tuberculosis (TB) can result in treatment delay given the need for the patient to return to the clinic with the EMS, increasing the chance of patients being lost during their diagnostic workup. However, there is little evidence to support the superiority of EMS over spot sputum samples. In this new analysis of the REMoxTB study, we compare the diagnostic accuracy of EMS with spot samples for identifying Mycobacterium tuberculosis pre- and post-treatment. Patients who were smear positive at screening were enrolled into the study. Paired sputum samples (one EMS and one spot) were collected at each trial visit pre- and post-treatment. Microscopy and culture on solid LJ and liquid MGIT media were performed on all samples; those missing corresponding paired results were excluded from the analyses. Data from 1115 pre- and 2995 post-treatment paired samples from 1931 patients enrolled in the REMoxTB study were analysed. Patients were recruited from South Africa (47%), East Africa (21%), India (20%), Asia (11%), and North America (1%); 70% were male, median age 31 years (IQR 24-41), 139 (7%) co-infected with HIV with a median CD4 cell count of 399 cells/μL (IQR 318-535). Pre-treatment spot samples had a higher yield of positive Ziehl-Neelsen smears (98% vs. 97%, P = 0.02) and LJ cultures (87% vs. 82%, P = 0.006) than EMS, but there was no difference for positivity by MGIT (93% vs. 95%, P = 0.18). Contaminated and false-positive MGIT were found more often with EMS rather than spot samples. Surprisingly, pre-treatment EMS had a higher smear grading and shorter time-to-positivity, by 1 day, than spot samples in MGIT culture (4.5 vs. 5.5 days, P spot samples in those with unfavourable outcomes, there were no differences in smear or culture results, and positive results were not detected earlier in Kaplan-Meier analyses in either EMS or spot samples. Our data do not support the hypothesis that EMS

  14. Passive sampling as a tool for identifying micro-organic compounds in groundwater.

    Science.gov (United States)

    Mali, N; Cerar, S; Koroša, A; Auersperger, P

    2017-09-01

    The paper presents the use of a simple and cost efficient passive sampling device with integrated active carbon with which to test the possibility of determining the presence of micro-organic compounds (MOs) in groundwater and identifying the potential source of pollution as well as the seasonal variability of contamination. Advantage of the passive sampler is to cover a long sampling period by integrating the pollutant concentration over time, and the consequently analytical costs over the monitoring period can be reduced substantially. Passive samplers were installed in 15 boreholes in the Maribor City area in Slovenia, with two sampling campaigns covered a period about one year. At all sampling sites in the first series a total of 103 compounds were detected, and 144 in the second series. Of all detected compounds the 53 most frequently detected were selected for further analysis. These were classified into eight groups based on the type of their source: Pesticides, Halogenated solvents, Non-halogenated solvents, Domestic and personal, Plasticizers and additives, Other industrial, Sterols and Natural compounds. The most frequently detected MO compounds in groundwater were tetrachloroethene and trichloroethene from the Halogenated solvents group. The most frequently detected among the compound's groups were pesticides. Analysis of frequency also showed significant differences between the two sampling series, with less frequent detections in the summer series. For the analysis to determine the origin of contamination three groups of compounds were determined according to type of use: agriculture, urban and industry. Frequency of detection indicates mixed land use in the recharge areas of sampling sites, which makes it difficult to specify the dominant origin of the compound. Passive sampling has proved to be useful tool with which to identify MOs in groundwater and for assessing groundwater quality. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Using raster and vector data to identify objects for classify in flood risk. A case study: Raciborz

    Science.gov (United States)

    Porczek, Mariusz; Rucińska, Dorota; Lewiński, Stanisław

    2018-01-01

    The severe flood of 1997, which seriously affected Polish, Czech and German territories, gave impetus to research into the management of flood-prone areas. The material losses caused by the "Flood of the Millennium" totalled billions of Polish zloty. The extent of the disaster and of infrastructure repair costs changed the attitude of many branches of the economy, and of science. This is the direct result of consideration of the introduction of changes into spatial management and crisis management. At the same time, it focused the interest of many who were trained in analysing the vulnerability of land-use features to natural disasters such as floods. Research into the spatial distribution of geographic environmental features susceptible to flood in the Odra valley was conducted at the Faculty of Geography and Regional Studies of the University of Warsaw using Geographic Information Systems (GIS). This study seeks to examine the possibility of adapting vector and raster data and using them for land-use classification in the context of risk of flood and inundation damage. The analysed area of the city and surrounding area of Raciborz, on the upper Odra River, is a case study for identifying objects and lands susceptible to natural hazards based on publicly available satellite databases of the highest resolution, which is a very important factor in the quality of further risk analyses for applied use. The objective of the research was to create a 10×10-m-pixel raster network using raster data made available by ESA (Copernicus Land Monitoring Service) and vector data from Open Street Map.

  16. Laboratory Detective Work Identifies a Mishandling Problem in Sample Aliquoting

    OpenAIRE

    Zhu, Claire; Pinsky, Paul; Huang, Wen-Yi; Purdue, Mark

    2014-01-01

    Data from a recent ovarian cancer biomarker study using serum aliquots from the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial Biorepository showed that CA125II concentrations in these aliquots were significantly lower than those previously measured in the same subjects from the same blood draw. We designed an experiment to investigate whether samples used in the study (reference study) were compromised during the aliquoting process. We measured CA125II in the “sister” ...

  17. A deep-learning classifier identifies patients with clinical heart failure using whole-slide images of H&E tissue.

    Directory of Open Access Journals (Sweden)

    Jeffrey J Nirschl

    Full Text Available Over 26 million people worldwide suffer from heart failure annually. When the cause of heart failure cannot be identified, endomyocardial biopsy (EMB represents the gold-standard for the evaluation of disease. However, manual EMB interpretation has high inter-rater variability. Deep convolutional neural networks (CNNs have been successfully applied to detect cancer, diabetic retinopathy, and dermatologic lesions from images. In this study, we develop a CNN classifier to detect clinical heart failure from H&E stained whole-slide images from a total of 209 patients, 104 patients were used for training and the remaining 105 patients for independent testing. The CNN was able to identify patients with heart failure or severe pathology with a 99% sensitivity and 94% specificity on the test set, outperforming conventional feature-engineering approaches. Importantly, the CNN outperformed two expert pathologists by nearly 20%. Our results suggest that deep learning analytics of EMB can be used to predict cardiac outcome.

  18. Development and analysis of a low-cost screening tool to identify and classify hearing loss in children: a proposal for developing countries

    Directory of Open Access Journals (Sweden)

    Alessandra Giannella Samelli

    2011-01-01

    Full Text Available OBJECTIVE: A lack of attention has been given to hearing health in primary care in developing countries. A strategy involving low-cost screening tools may fill the current gap in hearing health care provided to children. Therefore, it is necessary to establish and adopt lower-cost procedures that are accessible to underserved areas that lack other physical or human resources that would enable the identification of groups at risk for hearing loss. The aim of this study was to develop and analyze the efficacy of a low-cost screening tool to identify and classify hearing loss in children. METHODS: A total of 214 2-to-10 year-old children participated in this study. The study was conducted by providing a questionnaire to the parents and comparing the answers with the results of a complete audiological assessment. Receiver operating characteristic (ROC curves were constructed, and discriminant analysis techniques were used to classify each child based on the total score. RESULTS: We found conductive hearing loss in 39.3% of children, sensorineural hearing loss in 7.4% and normal hearing in 53.3%. The discriminant analysis technique provided the following classification rule for the total score on the questionnaire: 0 to 4 points - normal hearing; 5 to 7 points - conductive hearing loss; over 7 points - sensorineural hearing loss. CONCLUSION: Our results suggest that the questionnaire could be used as a screening tool to classify children with normal hearing or hearing loss and according to the type of hearing loss based on the total questionnaire score

  19. Classifying Microorganisms

    DEFF Research Database (Denmark)

    Sommerlund, Julie

    2006-01-01

    This paper describes the coexistence of two systems for classifying organisms and species: a dominant genetic system and an older naturalist system. The former classifies species and traces their evolution on the basis of genetic characteristics, while the latter employs physiological characteris......This paper describes the coexistence of two systems for classifying organisms and species: a dominant genetic system and an older naturalist system. The former classifies species and traces their evolution on the basis of genetic characteristics, while the latter employs physiological...... characteristics. The coexistence of the classification systems does not lead to a conflict between them. Rather, the systems seem to co-exist in different configurations, through which they are complementary, contradictory and inclusive in different situations-sometimes simultaneously. The systems come...

  20. Treatment response in psychotic patients classified according to social and clinical needs, drug side effects, and previous treatment; a method to identify functional remission

    DEFF Research Database (Denmark)

    Alenius, Malin; Hammarlund-Udenaes, Margareta; Honoré, Per Gustaf Hartvig

    2009-01-01

    , fewer psychotic symptoms, and higher rate of workers than those with the worst treatment outcome. CONCLUSION: In the evaluation, CANSEPT showed validity in discriminating the patients of interest and was well tolerated by the patients. CANSEPT could secure inclusion of correct patients in the clinic......BACKGROUND: Various approaches have been made over the years to classify psychotic patients according to inadequate treatment response, using terms such as treatment resistant or treatment refractory. Existing classifications have been criticized for overestimating positive symptoms......; underestimating residual symptoms, negative symptoms, and side effects; or being to open for individual interpretation. The aim of this study was to present and evaluate a new method of classification according to treatment response and, thus, to identify patients in functional remission. METHOD: A naturalistic...

  1. Treatment response in psychotic patients classified according to social and clinical needs, drug side effects, and previous treatment; a method to identify functional remission.

    Science.gov (United States)

    Alenius, Malin; Hammarlund-Udenaes, Margareta; Hartvig, Per; Sundquist, Staffan; Lindström, Leif

    2009-01-01

    Various approaches have been made over the years to classify psychotic patients according to inadequate treatment response, using terms such as treatment resistant or treatment refractory. Existing classifications have been criticized for overestimating positive symptoms; underestimating residual symptoms, negative symptoms, and side effects; or being to open for individual interpretation. The aim of this study was to present and evaluate a new method of classification according to treatment response and, thus, to identify patients in functional remission. A naturalistic, cross-sectional study was performed using patient interviews and information from patient files. The new classification method CANSEPT, which combines the Camberwell Assessment of Need rating scale, the Udvalg for Kliniske Undersøgelser side effect rating scale (SE), and the patient's previous treatment history (PT), was used to group the patients according to treatment response. CANSEPT was evaluated by comparison of expected and observed results. In the patient population (n = 123), the patients in functional remission, as defined by CANSEPT, had higher quality of life, fewer hospitalizations, fewer psychotic symptoms, and higher rate of workers than those with the worst treatment outcome. In the evaluation, CANSEPT showed validity in discriminating the patients of interest and was well tolerated by the patients. CANSEPT could secure inclusion of correct patients in the clinic or in research.

  2. Estimates of phytomass and net primary productivity in terrestrial ecosystems of the former Soviet Union identified by classified Global Vegetation Index

    Energy Technology Data Exchange (ETDEWEB)

    Gaston, G.G.; Kolchugina, T.P. [Oregon State Univ., Corvallis, OR (United States)

    1995-12-01

    Forty-two regions with similar vegetation and landcover were identified in the former Soviet Union (FSU) by classifying Global Vegetation Index (GVI) images. Image classes were described in terms of vegetation and landcover. Image classes appear to provide more accurate and precise descriptions for most ecosystems when compared to general thematic maps. The area of forest lands were estimated at 1,330 Mha and the actual area of forest ecosystems at 875 Mha. Arable lands were estimated to be 211 Mha. The area of the tundra biome was estimated at 261 Mha. The areas of the forest-tundra/dwarf forest, taiga, mixed-deciduous forest and forest-steppe biomes were estimated t 153, 882, 196, and 144 Mha, respectively. The areas of desert-semidesert biome and arable land with irrigated land and meadows, were estimated at 126 and 237 Mha, respectively. Vegetation and landcover types were associated with the Bazilevich database of phytomass and NPP for vegetation in the FSU. The phytomass in the FSU was estimated at 97.1 Gt C, with 86.8 in forest vegetation, 9.7 in natural non-forest and 0.6 Gt C in arable lands. The NPP was estimated at 8.6 Gt C/yr, with 3.2, 4.8, and 0.6 Gt C/yr of forest, natural non-forest, and arable ecosystems, respectively. The phytomass estimates for forests were greater than previous assessments which considered the age-class distribution of forest stands in the FSU. The NPP of natural ecosystems estimated in this study was 23% greater than previous estimates which used thematic maps to identify ecosystems. 47 refs., 4 figs., 2 tabs.

  3. Carbon classified?

    DEFF Research Database (Denmark)

    Lippert, Ingmar

    2012-01-01

    . Using an actor- network theory (ANT) framework, the aim is to investigate the actors who bring together the elements needed to classify their carbon emission sources and unpack the heterogeneous relations drawn on. Based on an ethnographic study of corporate agents of ecological modernisation over...... a period of 13 months, this paper provides an exploration of three cases of enacting classification. Drawing on ANT, we problematise the silencing of a range of possible modalities of consumption facts and point to the ontological ethics involved in such performances. In a context of global warming...

  4. Methods and Techniques of Sampling, Culturing and Identifying of Subsurface Bacteria

    International Nuclear Information System (INIS)

    Lee, Seung Yeop; Baik, Min Hoon

    2010-11-01

    This report described sampling, culturing and identifying of KURT underground bacteria, which existed as iron-, manganese-, and sulfate-reducing bacteria. The methods of culturing and media preparation were different by bacteria species affecting bacteria growth-rates. It will be possible for the cultured bacteria to be used for various applied experiments and researches in the future

  5. LCC: Light Curves Classifier

    Science.gov (United States)

    Vo, Martin

    2017-08-01

    Light Curves Classifier uses data mining and machine learning to obtain and classify desired objects. This task can be accomplished by attributes of light curves or any time series, including shapes, histograms, or variograms, or by other available information about the inspected objects, such as color indices, temperatures, and abundances. After specifying features which describe the objects to be searched, the software trains on a given training sample, and can then be used for unsupervised clustering for visualizing the natural separation of the sample. The package can be also used for automatic tuning parameters of used methods (for example, number of hidden neurons or binning ratio). Trained classifiers can be used for filtering outputs from astronomical databases or data stored locally. The Light Curve Classifier can also be used for simple downloading of light curves and all available information of queried stars. It natively can connect to OgleII, OgleIII, ASAS, CoRoT, Kepler, Catalina and MACHO, and new connectors or descriptors can be implemented. In addition to direct usage of the package and command line UI, the program can be used through a web interface. Users can create jobs for ”training” methods on given objects, querying databases and filtering outputs by trained filters. Preimplemented descriptors, classifier and connectors can be picked by simple clicks and their parameters can be tuned by giving ranges of these values. All combinations are then calculated and the best one is used for creating the filter. Natural separation of the data can be visualized by unsupervised clustering.

  6. The IGSN Experience: Successes and Challenges of Implementing Persistent Identifiers for Samples

    Science.gov (United States)

    Lehnert, Kerstin; Arko, Robert

    2016-04-01

    Physical samples collected and studied in the Earth sciences represent both a research resource and a research product in the Earth Sciences. As such they need to be properly managed, curated, documented, and cited to ensure re-usability and utility for future science, reproducibility of the data generated by their study, and credit for funding agencies and researchers who invested substantial resources and intellectual effort into their collection and curation. Use of persistent and unique identifiers and deposition of metadata in a persistent registry are therefore as important for physical samples as they are for digital data. The International Geo Sample Number (IGSN) is a persistent, globally unique identifier. Its adoption by individual investigators, repository curators, publishers, and data managers is rapidly growing world-wide. This presentation will provide an analysis of the development and implementation path of the IGSN and relevant insights and experiences gained along its way. Development of the IGSN started in 2004 as part of a US NSF-funded project to establish a registry for sample metadata, the System for Earth Sample Registration (SESAR). The initial system provided a centralized solution for users to submit information about their samples and obtain IGSNs and bar codes. Challenges encountered during this initial phase related to defining the scope of the registry, granularity of registered objects, responsibilities of relevant actors, and workflows, and designing the registry's metadata schema, its user interfaces, and the identifier itself, including its syntax. The most challenging task though was to make the IGSN an integral part of personal and institutional sample management, digital management of sample-based data, and data publication on a global scale. Besides convincing individual researchers, curators, editors and publishers, as well as data managers in US and non-US academia, state and federal agencies, the PIs of the SESAR project

  7. Classifying neighbourhoods by level of access to stores selling fresh fruit and vegetables and groceries: identifying problematic areas in the city of Gatineau, Quebec.

    Science.gov (United States)

    Gould, Adrian C; Apparicio, Philippe; Cloutier, Marie-Soleil

    2012-11-06

    Physical access to stores selling groceries, fresh fruit and vegetables (FV) is essential for urban dwellers. In Canadian cities where low-density development practices are common, social and material deprivation may be compounded by poor geographic access to healthy food. This case study examines access to food stores selling fresh FV in Gatineau, Quebec, to identify areas where poor access is coincident with high deprivation. Food retailers were identified using two secondary sources and each store was visited to establish the total surface area devoted to the sale of fresh FV. Four population-weighted accessibility measures were then calculated for each dissemination area (DA) using road network distances. A deprivation index was created using variables from the 2006 Statistics Canada census, also at the scale of the DA. Finally, six classes of accessibility to a healthy diet were constructed using a k-means classification procedure. These were mapped and superimposed over high deprivation areas. Overall, deprivation is positively correlated with better accessibility. However, more than 18,000 residents (7.5% of the population) live in high deprivation areas characterized by large distances to the nearest retail food store (means of 1.4 km or greater) and virtually no access to fresh FV within walking distance (radius of 1 km). In this research, we identified areas where poor geographic access may introduce an additional constraint for residents already dealing with the challenges of limited financial and social resources. Our results may help guide local food security policies and initiatives.

  8. A Polychoric Correlation to Identify the Principle Component in Classifying Single Tuition Fee Capabilities on the Students Socio-Economic Database

    Science.gov (United States)

    Yustanti, W.; Anistyasari, Y.

    2018-01-01

    The government has issued the regulation number 55 of 2013 about the enactment of a single tuition fee based on the socio-economic conditions of each student. All public universities are required to implement this policy. Therefore, each university needs to create a formulation that can be used to categorize a student into which cost group. The results of the data collection found that the parameters used to determine the classification of tuition fees between one universities with another are different. In this research, taken a sampling of student data at one public university which is using 43 predictor variables and 8 categories of single tuition. The sample data used are socioeconomic data of students of 2016 and 2017 classes received through public university entrance selections. The results of this study reveal that from 43 variables, there are 16 variables which are the most significant in influencing single tuition category with goodness-of-fit index is 0.866. This value means that the proposed model can indicate student’s ability to pay the tuition fee.

  9. Identifying influenza-like illness presentation from unstructured general practice clinical narrative using a text classifier rule-based expert system versus a clinical expert.

    Science.gov (United States)

    MacRae, Jayden; Love, Tom; Baker, Michael G; Dowell, Anthony; Carnachan, Matthew; Stubbe, Maria; McBain, Lynn

    2015-10-06

    We designed and validated a rule-based expert system to identify influenza like illness (ILI) from routinely recorded general practice clinical narrative to aid a larger retrospective research study into the impact of the 2009 influenza pandemic in New Zealand. Rules were assessed using pattern matching heuristics on routine clinical narrative. The system was trained using data from 623 clinical encounters and validated using a clinical expert as a gold standard against a mutually exclusive set of 901 records. We calculated a 98.2 % specificity and 90.2 % sensitivity across an ILI incidence of 12.4 % measured against clinical expert classification. Peak problem list identification of ILI by clinical coding in any month was 9.2 % of all detected ILI presentations. Our system addressed an unusual problem domain for clinical narrative classification; using notational, unstructured, clinician entered information in a community care setting. It performed well compared with other approaches and domains. It has potential applications in real-time surveillance of disease, and in assisted problem list coding for clinicians. Our system identified ILI presentation with sufficient accuracy for use at a population level in the wider research study. The peak coding of 9.2 % illustrated the need for automated coding of unstructured narrative in our study.

  10. Using Extreme Phenotype Sampling to Identify the Rare Causal Variants of Quantitative Traits in Association Studies

    OpenAIRE

    Li, Dalin; Lewinger, Juan Pablo; Gauderman, William J.; Murcray, Cassandra Elizabeth; Conti, David

    2011-01-01

    Variants identified in recent genome-wide association studies based on the common-disease common-variant hypothesis are far from fully explaining the hereditability of complex traits. Rare variants may, in part, explain some of the missing hereditability. Here, we explored the advantage of the extreme phenotype sampling in rare-variant analysis and refined this design framework for future large-scale association studies on quantitative traits. We first proposed a power calculation approach fo...

  11. Identifying potential surface water sampling sites for emerging chemical pollutants in Gauteng Province, South Africa

    OpenAIRE

    Petersen, F; Dabrowski, JM; Forbes, PBC

    2017-01-01

    Emerging chemical pollutants (ECPs) are defined as new chemicals which do not have a regulatory status, but which may have an adverse effect on human health and the environment. The occurrence and concentrations of ECPs in South African water bodies are largely unknown, so monitoring is required in order to determine the potential threat that these ECPs may pose. Relevant surface water sampling sites in the Gauteng Province of South Africa were identified utilising a geographic information sy...

  12. Comparative analysis of Edwardsiella isolates from fish in the eastern United States identifies two distinct genetic taxa amongst organisms phenotypically classified as E. tarda

    Science.gov (United States)

    Griffin, Matt J.; Quiniou, Sylvie M.; Cody, Theresa; Tabuchi, Maki; Ware, Cynthia; Cipriano, Rocco C.; Mauel, Michael J.; Soto, Esteban

    2013-01-01

    Edwardsiella tarda, a Gram-negative member of the family Enterobacteriaceae, has been implicated in significant losses in aquaculture facilities worldwide. Here, we assessed the intra-specific variability of E. tarda isolates from 4 different fish species in the eastern United States. Repetitive sequence mediated PCR (rep-PCR) using 4 different primer sets (ERIC I & II, ERIC II, BOX, and GTG5) and multi-locus sequence analysis of 16S SSU rDNA, groEl, gyrA, gyrB, pho, pgi, pgm, and rpoA gene fragments identified two distinct genotypes of E. tarda (DNA group I; DNA group II). Isolates that fell into DNA group II demonstrated more similarity to E. ictaluri than DNA group I, which contained the reference E. tarda strain (ATCC #15947). Conventional PCR analysis using published E. tarda-specific primer sets yielded variable results, with several primer sets producing no observable amplification of target DNA from some isolates. Fluorometric determination of G + C content demonstrated 56.4% G + C content for DNA group I, 60.2% for DNA group II, and 58.4% for E. ictaluri. Surprisingly, these isolates were indistinguishable using conventional biochemical techniques, with all isolates demonstrating phenotypic characteristics consistent with E. tarda. Analysis using two commercial test kits identified multiple phenotypes, although no single metabolic characteristic could reliably discriminate between genetic groups. Additionally, anti-microbial susceptibility and fatty acid profiles did not demonstrate remarkable differences between groups. The significant genetic variation (<90% similarity at gyrA, gyrB, pho, phi and pgm; <40% similarity by rep-PCR) between these groups suggests organisms from DNA group II may represent an unrecognized, genetically distinct taxa of Edwardsiella that is phenotypically indistinguishable from E. tarda.

  13. Genomic Aberrations in Crizotinib Resistant Lung Adenocarcinoma Samples Identified by Transcriptome Sequencing.

    Directory of Open Access Journals (Sweden)

    Ali Saber

    Full Text Available ALK-break positive non-small cell lung cancer (NSCLC patients initially respond to crizotinib, but resistance occurs inevitably. In this study we aimed to identify fusion genes in crizotinib resistant tumor samples. Re-biopsies of three patients were subjected to paired-end RNA sequencing to identify fusion genes using deFuse and EricScript. The IGV browser was used to determine presence of known resistance-associated mutations. Sanger sequencing was used to validate fusion genes and digital droplet PCR to validate mutations. ALK fusion genes were detected in all three patients with EML4 being the fusion partner. One patient had no additional fusion genes. Another patient had one additional fusion gene, but without a predicted open reading frame (ORF. The third patient had three additional fusion genes, of which two were derived from the same chromosomal region as the EML4-ALK. A predicted ORF was identified only in the CLIP4-VSNL1 fusion product. The fusion genes validated in the post-treatment sample were also present in the biopsy before crizotinib. ALK mutations (p.C1156Y and p.G1269A detected in the re-biopsies of two patients, were not detected in pre-treatment biopsies. In conclusion, fusion genes identified in our study are unlikely to be involved in crizotinib resistance based on presence in pre-treatment biopsies. The detection of ALK mutations in post-treatment tumor samples of two patients underlines their role in crizotinib resistance.

  14. Vibrio parahaemolyticus Strains of Pandemic Serotypes Identified from Clinical and Environmental Samples from Jiangsu, China

    Directory of Open Access Journals (Sweden)

    Jingjiao eLi

    2016-05-01

    Full Text Available Vibrio parahaemolyticus has emerged as a major foodborne pathogen in China, Japan, Thailand and other Asian countries. In this study, 72 strains of V. parahaemolyticus were isolated from clinical and environmental samples between 2006 and 2014 in Jiangsu, China. The serotypes and six virulence genes including thermostable direct hemolysin (TDR and TDR-related hemolysin (TRH genes were assessed among the isolates. Twenty five serotypes were identified and O3:K6 was one of the dominant serotypes. The genetic diversity was assessed by multilocus sequence typing (MLST analysis, and 48 sequence types (STs were found, suggesting this V. parahaemolyticus group is widely dispersed and undergoing rapid evolution. A total of 25 strains of pandemic serotypes such as O3:K6, O5:K17 and O1:KUT were identified. It is worth noting that the pandemic serotypes were not exclusively identified from clinical samples, rather, nine strains were also isolated from environmental samples; and some of these strains harbored several virulence genes, which may render those strains pathogenicity potential. Therefore, the emergence of these environmental pandemic V. parahaemolyticus strains may poses a new threat to the public health in China. Furthermore, six novel serotypes and 34 novel STs were identified among the 72 isolates, indicating that V. parahaemolyticus were widely distributed and fast evolving in the environment in Jiangsu, China. The findings of this study provide new insight into the phylogenic relationship between V. parahaemolyticus strains of pandemic serotypes from clinical and environmental sources and enhance the MLST database; and our proposed possible O- and K- antigen evolving paths of V. parahaemolyticus may help understand how the serotypes of this dispersed bacterial population evolve.

  15. Spectroscopic properties for identifying sapphire samples from Ban Bo Kaew, Phrae Province, Thailand

    Science.gov (United States)

    Mogmued, J.; Monarumit, N.; Won-in, K.; Satitkune, S.

    2017-09-01

    Gemstone commercial is a high revenue for Thailand especially ruby and sapphire. Moreover, Phrae is a potential gem field located in the northern part of Thailand. The studies of spectroscopic properties are mainly to identify gemstone using advanced techniques (e.g. UV-Vis-NIR spectrophotometry, FTIR spectrometry and Raman spectroscopy). Typically, UV-Vis-NIR spectrophotometry is a technique to study the cause of color in gemstones. FTIR spectrometry is a technique to study the functional groups in gem-materials. Raman pattern can be applied to identify the mineral inclusions in gemstones. In this study, the natural sapphires from Ban Bo Kaew were divided into two groups based on colors including blue and green. The samples were analyzed by UV-Vis-NIR spectrophotometer, FTIR spectrometer and Raman spectroscope for studying spectroscopic properties. According to UV-Vis-NIR spectra, the blue sapphires show higher Fe3+/Ti4+ and Fe2+/Fe3+ absorption peaks than those of green sapphires. Otherwise, green sapphires display higher Fe3+/Fe3+ absorption peaks than blue sapphires. The FTIR spectra of both blue and green sapphire samples show the absorption peaks of -OH,-CH and CO2. The mineral inclusions such as ferrocolumbite and rutile in sapphires from this area were observed by Raman spectroscope. The spectroscopic properties of sapphire samples from Ban Bo Kaew, Phrae Province, Thailand are applied to be the specific evidence for gemstone identification.

  16. Comparing hair-morphology and molecular methods to identify fecal samples from Neotropical felids.

    Directory of Open Access Journals (Sweden)

    Carlos C Alberts

    Full Text Available To avoid certain problems encountered with more-traditional and invasive methods in behavioral-ecology studies of mammalian predators, such as felids, molecular approaches have been employed to identify feces found in the field. However, this method requires a complete molecular biology laboratory, and usually also requires very fresh fecal samples to avoid DNA degradation. Both conditions are normally absent in the field. To address these difficulties, identification based on morphological characters (length, color, banding, scales and medullar patterns of hairs found in feces could be employed as an alternative. In this study we constructed a morphological identification key for guard hairs of eight Neotropical felids (jaguar, oncilla, Geoffroy's cat, margay, ocelot, Pampas cat, puma and jaguarundi and compared its efficiency to that of a molecular identification method, using the ATP6 region as a marker. For this molecular approach, we simulated some field conditions by postponing sample-conservation procedures. A blind test of the identification key obtained a nearly 70% overall success rate, which we considered equivalent to or better than the results of some molecular methods (probably due to DNA degradation found in other studies. The jaguar, puma and jaguarundi could be unequivocally discriminated from any other Neotropical felid. On a scale ranging from inadequate to excellent, the key proved poor only for the margay, with only 30% of its hairs successfully identified using this key; and have intermediate success rates for the remaining species, the oncilla, Geoffroy's cat, ocelot and Pampas cat, were intermediate. Complementary information about the known distributions of felid populations may be necessary to substantially improve the results obtained with the key. Our own molecular results were even better, since all blind-tested samples were correctly identified. Part of these identifications were made from samples kept in suboptimal

  17. Can the dissociative PTSD subtype be identified across two distinct trauma samples meeting caseness for PTSD?

    Science.gov (United States)

    Hansen, Maj; Műllerová, Jana; Elklit, Ask; Armour, Cherie

    2016-08-01

    For over a century, the occurrence of dissociative symptoms in connection to traumatic exposure has been acknowledged in the scientific literature. Recently, the importance of dissociation has also been recognized in the long-term traumatic response within the DSM-5 nomenclature. Several studies have confirmed the existence of the dissociative posttraumatic stress disorder (PTSD) subtype. However, there is a lack of studies investigating latent profiles of PTSD solely in victims with PTSD. This study investigates the possible presence of PTSD subtypes using latent class analysis (LCA) across two distinct trauma samples meeting caseness for DSM-5 PTSD based on self-reports (N = 787). Moreover, we assessed if a number of risk factors resulted in an increased probability of membership in a dissociative compared with a non-dissociative PTSD class. The results of LCA revealed a two-class solution with two highly symptomatic classes: a dissociative class and a non-dissociative class across both samples. Increased emotion-focused coping increased the probability of individuals being grouped into the dissociative class across both samples. Social support reduced the probability of individuals being grouped into the dissociative class but only in the victims of motor vehicle accidents (MVAs) suffering from whiplash. The results are discussed in light of their clinical implications and suggest that the dissociative subtype can be identified in victims of incest and victims of MVA suffering from whiplash meeting caseness for DSM-5 PTSD.

  18. Effect of sample size on multi-parametric prediction of tissue outcome in acute ischemic stroke using a random forest classifier

    Science.gov (United States)

    Forkert, Nils Daniel; Fiehler, Jens

    2015-03-01

    The tissue outcome prediction in acute ischemic stroke patients is highly relevant for clinical and research purposes. It has been shown that the combined analysis of diffusion and perfusion MRI datasets using high-level machine learning techniques leads to an improved prediction of final infarction compared to single perfusion parameter thresholding. However, most high-level classifiers require a previous training and, until now, it is ambiguous how many subjects are required for this, which is the focus of this work. 23 MRI datasets of acute stroke patients with known tissue outcome were used in this work. Relative values of diffusion and perfusion parameters as well as the binary tissue outcome were extracted on a voxel-by- voxel level for all patients and used for training of a random forest classifier. The number of patients used for training set definition was iteratively and randomly reduced from using all 22 other patients to only one other patient. Thus, 22 tissue outcome predictions were generated for each patient using the trained random forest classifiers and compared to the known tissue outcome using the Dice coefficient. Overall, a logarithmic relation between the number of patients used for training set definition and tissue outcome prediction accuracy was found. Quantitatively, a mean Dice coefficient of 0.45 was found for the prediction using the training set consisting of the voxel information from only one other patient, which increases to 0.53 if using all other patients (n=22). Based on extrapolation, 50-100 patients appear to be a reasonable tradeoff between tissue outcome prediction accuracy and effort required for data acquisition and preparation.

  19. Stack filter classifiers

    Energy Technology Data Exchange (ETDEWEB)

    Porter, Reid B [Los Alamos National Laboratory; Hush, Don [Los Alamos National Laboratory

    2009-01-01

    Just as linear models generalize the sample mean and weighted average, weighted order statistic models generalize the sample median and weighted median. This analogy can be continued informally to generalized additive modeels in the case of the mean, and Stack Filters in the case of the median. Both of these model classes have been extensively studied for signal and image processing but it is surprising to find that for pattern classification, their treatment has been significantly one sided. Generalized additive models are now a major tool in pattern classification and many different learning algorithms have been developed to fit model parameters to finite data. However Stack Filters remain largely confined to signal and image processing and learning algorithms for classification are yet to be seen. This paper is a step towards Stack Filter Classifiers and it shows that the approach is interesting from both a theoretical and a practical perspective.

  20. Toward a Rapid Synthesis of Field and Desktop Data for Classifying Streams in the Pacific Northwest: Guiding the Sampling and Management of Salmonid Habitat

    Science.gov (United States)

    Kasprak, A.; Wheaton, J. M.; Bouwes, N.; Weber, N. P.; Trahan, N. C.; Jordan, C. E.

    2012-12-01

    River managers often seek to understand habitat availability and quality for riverine organisms within the physical template provided by their landscape. Yet the large amount of natural heterogeneity in landscapes gives rise to stream systems which are highly variable over small spatial scales, potentially complicating site selection for surveying aquatic habitat while simultaneously making a simple, wide-reaching management strategy elusive. This is particularly true in the rugged John Day River Basin of northern Oregon, where efforts as part of the Columbia Habitat Monitoring Program to conduct site-based surveys of physical habitat for endangered steelhead salmon (Oncorhynchus mykiss) are underway. As a complete understanding of the type and distribution of habitat available to these fish would require visits to all streams in the basin (impractical due to its large size), here we develop an approach for classifying channel types which combines remote desktop GIS analyses with rapid field-based stream and landscape surveys. At the core of this method, we build off of the River Styles Framework, an open-ended and process-based approach for classifying streams and informing management decisions. This framework is combined with on-the-ground fluvial audits, which aim to quickly and continuously map sediment dynamics and channel behavior along selected channels. Validation of this classification method is completed by on-the-ground stream surveys using a digital iPad platform and by rapid small aircraft overflights to confirm or refine predictions. We further compare this method with existing channel classification approaches for the region (e.g. Beechie, Montgomery and Buffington). The results of this study will help guide both the refinement of site stratification and selection for salmonid habitat monitoring within the basin, and will be vital in designing and prioritizing restoration and management strategies tailored to the distribution of river styles found

  1. Consensus of sample-balanced classifiers for identifying ligand-binding residue by co-evolutionary physicochemical characteristics of amino acids

    KAUST Repository

    Chen, Peng

    2013-01-01

    Protein-ligand binding is an important mechanism for some proteins to perform their functions, and those binding sites are the residues of proteins that physically bind to ligands. So far, the state-of-the-art methods search for similar, known

  2. Identifying intimate partner violence (IPV) during the postpartum period in a Greek sample.

    Science.gov (United States)

    Vivilaki, Victoria G; Dafermos, Vassilis; Daglas, Maria; Antoniou, Evagelia; Tsopelas, Nicholas D; Theodorakis, Pavlos N; Brown, Judith B; Lionis, Christos

    2010-12-01

    Research has highlighted the wide impact of intimate partner violence (IPV) and the public health role of community health professionals in detection of victimized women. The purpose of this study was to identify postpartum emotional and physical abuse and to validate the Greek version of the Women Abuse Screening Tool (WAST) along with its sensitivity and specificity. Five hundred seventy-nine mothers within 12 weeks postpartum were recruited from the perinatal care registers of the Maternity Departments of two public hospitals in Athens, Greece. Participants were randomly selected by clinic or shift. The WAST and the Partner Violence Screen (PVS) surveys were administered in random order to the mothers from September 2007 to January 2008. The WAST was compared with the PVS as a criterion standard. Agreement between the screening instruments was examined. The psychometric measurements that were performed included: two independent sample t tests, reliability coefficients, explanatory factor analysis using a Varimax rotation, and Principal Components Method. Confirmatory analysis-also called structural equation modeling-of principal components was conducted by Linear Structural Relations. A receiver operating characteristic (ROC) analysis was carried out to evaluate the global functioning of the scale. Two hundred four (35.6%) of the mothers screened were identified as experiencing IPV. Scores on the WAST correlated well with those on the PVS; the internal consistency of the WAST Greek version-tested using Cronbach's alpha coefficient-was found to be 0.926 and that of Guttman's split-half coefficient was 0.924. Our findings confirm the multidimensionality of the WAST, demonstrating a two-factor structure. The area under ROC curve (AUC) was found to be 0.824, and the logistic estimate for the threshold score of 0/1 fitted the model sensitivity at 99.7% and model specificity at 64.4%. Our data confirm the validity of the Greek version of the WAST in identifying IPV

  3. Small Body GN and C Research Report: G-SAMPLE - An In-Flight Dynamical Method for Identifying Sample Mass [External Release Version

    Science.gov (United States)

    Carson, John M., III; Bayard, David S.

    2006-01-01

    G-SAMPLE is an in-flight dynamical method for use by sample collection missions to identify the presence and quantity of collected sample material. The G-SAMPLE method implements a maximum-likelihood estimator to identify the collected sample mass, based on onboard force sensor measurements, thruster firings, and a dynamics model of the spacecraft. With G-SAMPLE, sample mass identification becomes a computation rather than an extra hardware requirement; the added cost of cameras or other sensors for sample mass detection is avoided. Realistic simulation examples are provided for a spacecraft configuration with a sample collection device mounted on the end of an extended boom. In one representative example, a 1000 gram sample mass is estimated to within 110 grams (95% confidence) under realistic assumptions of thruster profile error, spacecraft parameter uncertainty, and sensor noise. For convenience to future mission design, an overall sample-mass estimation error budget is developed to approximate the effect of model uncertainty, sensor noise, data rate, and thrust profile error on the expected estimate of collected sample mass.

  4. Identifying the potential of changes to blood sample logistics using simulation

    DEFF Research Database (Denmark)

    Jørgensen, Pelle Morten Thomas; Jacobsen, Peter; Poulsen, Jørgen Hjelm

    2013-01-01

    of the simulation was to evaluate changes made to the transportation of blood samples between wards and the laboratory. The average- (AWT) and maximum waiting time (MWT) from a blood sample was drawn at the ward until it was received at the laboratory, and the distribution of arrivals of blood samples......, each of the scenarios was tested in terms of what amount of resources would give the optimal result. The simulations showed a big improvement potential in implementing a new technology/mean for transporting the blood samples. The pneumatic tube system showed the biggest potential lowering the AWT...

  5. New complete sample of identified radio sources. Part 2. Statistical study

    International Nuclear Information System (INIS)

    Soltan, A.

    1978-01-01

    Complete sample of radio sources with known redshifts selected in Paper I is studied. Source counts in the sample and the luminosity - volume test show that both quasars and galaxies are subject to the evolution. Luminosity functions for different ranges of redshifts are obtained. Due to many uncertainties only simplified models of the evolution are tested. Exponential decline of the liminosity with time of all the bright sources is in a good agreement both with the luminosity- volume test and N(S) realtion in the entire range of observed flux densities. It is shown that sources in the sample are randomly distributed in scales greater than about 17 Mpc. (author)

  6. Biomarker Analysis of Samples Visually Identified as Microbial in the Eocene Green River Formation: An Analogue for Mars.

    Science.gov (United States)

    Olcott Marshall, Alison; Cestari, Nicholas A

    2015-09-01

    One of the major exploration targets for current and future Mars missions are lithofacies suggestive of biotic activity. Although such lithofacies are not confirmation of biotic activity, they provide a way to identify samples for further analyses. To test the efficacy of this approach, we identified carbonate samples from the Eocene Green River Formation as "microbial" or "non-microbial" based on the macroscale morphology of their laminations. These samples were then crushed and analyzed by gas chromatography/mass spectroscopy (GC/MS) to determine their lipid biomarker composition. GC/MS analysis revealed that carbonates visually identified as "microbial" contained a higher concentration of more diverse biomarkers than those identified as "non-microbial," suggesting that this could be a viable detection strategy for selecting samples for further analysis or caching on Mars.

  7. Identifying the potential of changes to blood sample logistics using simulation.

    Science.gov (United States)

    Jørgensen, Pelle; Jacobsen, Peter; Poulsen, Jørgen Hjelm

    2013-01-01

    Using simulation as an approach to display and improve internal logistics at hospitals has great potential. This study shows how a simulation model displaying the morning blood-taking round at a Danish public hospital can be developed and utilized with the aim of improving the logistics. The focus of the simulation was to evaluate changes made to the transportation of blood samples between wards and the laboratory. The average- (AWT) and maximum waiting time (MWT) from a blood sample was drawn at the ward until it was received at the laboratory, and the distribution of arrivals of blood samples in the laboratory were used as the evaluation criteria. Four different scenarios were tested and compared with the current approach: (1) Using AGVs (mobile robots), (2) using a pneumatic tube system, (3) using porters that are called upon, or (4) using porters that come to the wards every 45 minutes. Furthermore, each of the scenarios was tested in terms of what amount of resources would give the optimal result. The simulations showed a big improvement potential in implementing a new technology/mean for transporting the blood samples. The pneumatic tube system showed the biggest potential lowering the AWT and MWT with approx. 36% and 18%, respectively. Additionally, all of the scenarios had a more even distribution of arrivals except for porters coming to the wards every 45 min. As a consequence of the results obtained in the study, the hospital decided to implement a pneumatic tube system.

  8. The Adrenal Vein Sampling International Study (AVIS) for identifying the major subtypes of primary aldosteronism.

    NARCIS (Netherlands)

    Rossi, G.P.; Barisa, M.; Allolio, B.; Auchus, R.J.; Amar, L.; Cohen, D.; Degenhart, C.; Deinum, J.; Fischer, E.; Gordon, R.; Kickuth, R.; Kline, G.; Lacroix, A.; Magill, S.; Miotto, D.; Naruse, M.; Nishikawa, T.; Omura, M.; Pimenta, E.; Plouin, P.F.; Quinkler, M.; Reincke, M.; Rossi, E.; Rump, L.C.; Satoh, F.; Schultze Kool, L.J.; Seccia, T.M.; Stowasser, M.; Tanabe, A.; Trerotola, S.; Vonend, O.; Widimsky Jr, J.; Wu, K.D.; Wu, V.C.; Pessina, A.C.

    2012-01-01

    CONTEXT: In patients who seek surgical cure of primary aldosteronism (PA), The Endocrine Society Guidelines recommend the use of adrenal vein sampling (AVS), which is invasive, technically challenging, difficult to interpret, and commonly held to be risky. OBJECTIVE: The aim of this study was to

  9. COMPETITIVE METAGENOMIC DNA HYBRIDIZATION IDENTIFIES HOST-SPECIFIC MICROBIAL GENETIC MARKERS IN COW FECAL SAMPLES

    Science.gov (United States)

    Several PCR methods have recently been developed to identify fecal contamination in surface waters. In all cases, researchers have relied on one gene or one microorganism for selection of host specific markers. Here, we describe the application of a genome fragment enrichment met...

  10. Illicit Drug Use in a Community-Based Sample of Heterosexually Identified Emerging Adults

    Science.gov (United States)

    Halkitis, Perry N.; Manasse, Ashley N.; McCready, Karen C.

    2010-01-01

    In this study we assess lifetime and recent drug use patterns among 261 heterosexually identified 18- to 25-year-olds through brief street intercept surveys conducted in New York City. Marijuana, hallucinogens, powder cocaine, and ecstasy were the most frequently reported drugs for both lifetime and recent use. Findings further suggest significant…

  11. Prevalence and persistence of male DNA identified in mixed saliva samples after intense kissing.

    Science.gov (United States)

    Kamodyová, Natália; Durdiaková, Jaroslava; Celec, Peter; Sedláčková, Tatiana; Repiská, Gabriela; Sviežená, Barbara; Minárik, Gabriel

    2013-01-01

    Identification of foreign biological material by genetic profiling is widely used in forensic DNA testing in different cases of sexual violence, sexual abuse or sexual harassment. In all these kinds of sexual assaults, the perpetrator could constrain the victim to kissing. The value of the victim's saliva taken after such an assault has not been investigated in the past with currently widely used molecular methods of extremely high sensitivity (e.g. qPCR) and specificity (e.g. multiplex Y-STR PCR). In our study, 12 voluntary pairs were tested at various intervals after intense kissing and saliva samples were taken from the women to assess the presence of male DNA. Sensitivity-focused assays based on the SRY (single-copy gene) and DYS (multi-copy gene) sequence motifs confirmed the presence of male DNA in female saliva after 10 and even 60min after kissing, respectively. For specificity, standard multiplex Y-STR PCR profiling was performed and male DNA was found in female saliva samples, as the entire Y-STR profile, even after 30min in one sample. Our study confirms that foreign DNA tends to persist for a restricted period of time in the victim's mouth, can be isolated from saliva after prompt collection and can be used as a valuable source of evidence. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  12. Finding needles in a haystack: a methodology for identifying and sampling community-based youth smoking cessation programs.

    Science.gov (United States)

    Emery, Sherry; Lee, Jungwha; Curry, Susan J; Johnson, Tim; Sporer, Amy K; Mermelstein, Robin; Flay, Brian; Warnecke, Richard

    2010-02-01

    Surveys of community-based programs are difficult to conduct when there is virtually no information about the number or locations of the programs of interest. This article describes the methodology used by the Helping Young Smokers Quit (HYSQ) initiative to identify and profile community-based youth smoking cessation programs in the absence of a defined sample frame. We developed a two-stage sampling design, with counties as the first-stage probability sampling units. The second stage used snowball sampling to saturation, to identify individuals who administered youth smoking cessation programs across three economic sectors in each county. Multivariate analyses modeled the relationship between program screening, eligibility, and response rates and economic sector and stratification criteria. Cumulative logit models analyzed the relationship between the number of contacts in a county and the number of programs screened, eligible, or profiled in a county. The snowball process yielded 9,983 unique and traceable contacts. Urban and high-income counties yielded significantly more screened program administrators; urban counties produced significantly more eligible programs, but there was no significant association between the county characteristics and program response rate. There is a positive relationship between the number of informants initially located and the number of programs screened, eligible, and profiled in a county. Our strategy to identify youth tobacco cessation programs could be used to create a sample frame for other nonprofit organizations that are difficult to identify due to a lack of existing directories, lists, or other traditional sample frames.

  13. Some Risk Factors of Chronic Functional Constipation Identified in a Pediatric Population Sample from Romania

    Directory of Open Access Journals (Sweden)

    Claudia Olaru

    2016-01-01

    Full Text Available We conducted an observational study over a 1-year period, including 234 children aged 4–18 years and their caregivers and a matching control group. 60.73% of the children from the study group were males. Average age for the onset of constipation was 26.39 months. The frequency of defecation was 1/4.59 days (1/1.13 days in the control group. 38.49% of the patients in the sample group had a positive family history of functional constipation. The majority of children with functional constipation come from single-parent families, are raised by relatives, or come from orphanages. Constipated subjects had their last meal of the day at later hours and consumed fast foods more frequently than the children in the control sample. We found a statistically significant difference between groups regarding obesity/overweight and constipation (χ2=104.94,  df=2,  p<0.001 and regarding physical activity and constipation (χ2=18.419;  df=3;  p<0.001. There was a positive correlation between the number of hours spent watching television/using the computer and the occurrence of the disease (F = 92.162, p<0.001, and 95% Cl. Children from broken families, with positive family history, defective dietary habits, obesity and sedentary behavior, are at higher risk to develop chronic functional constipation.

  14. Metabolomics identifies a biological response to chronic low-dose natural uranium contamination in urine samples.

    Science.gov (United States)

    Grison, Stéphane; Favé, Gaëlle; Maillot, Matthieu; Manens, Line; Delissen, Olivia; Blanchardon, Eric; Banzet, Nathalie; Defoort, Catherine; Bott, Romain; Dublineau, Isabelle; Aigueperse, Jocelyne; Gourmelon, Patrick; Martin, Jean-Charles; Souidi, Maâmar

    2013-01-01

    Because uranium is a natural element present in the earth's crust, the population may be chronically exposed to low doses of it through drinking water. Additionally, the military and civil uses of uranium can also lead to environmental dispersion that can result in high or low doses of acute or chronic exposure. Recent experimental data suggest this might lead to relatively innocuous biological reactions. The aim of this study was to assess the biological changes in rats caused by ingestion of natural uranium in drinking water with a mean daily intake of 2.7 mg/kg for 9 months and to identify potential biomarkers related to such a contamination. Subsequently, we observed no pathology and standard clinical tests were unable to distinguish between treated and untreated animals. Conversely, LC-MS metabolomics identified urine as an appropriate biofluid for discriminating the experimental groups. Of the 1,376 features detected in urine, the most discriminant were metabolites involved in tryptophan, nicotinate, and nicotinamide metabolic pathways. In particular, N -methylnicotinamide, which was found at a level seven times higher in untreated than in contaminated rats, had the greatest discriminating power. These novel results establish a proof of principle for using metabolomics to address chronic low-dose uranium contamination. They open interesting perspectives for understanding the underlying biological mechanisms and designing a diagnostic test of exposure.

  15. Cloud-based solution to identify statistically significant MS peaks differentiating sample categories.

    Science.gov (United States)

    Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B

    2013-03-23

    Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.

  16. Hybrid Cubature Kalman filtering for identifying nonlinear models from sampled recording: Estimation of neuronal dynamics.

    Science.gov (United States)

    Madi, Mahmoud K; Karameh, Fadi N

    2017-01-01

    Kalman filtering methods have long been regarded as efficient adaptive Bayesian techniques for estimating hidden states in models of linear dynamical systems under Gaussian uncertainty. Recent advents of the Cubature Kalman filter (CKF) have extended this efficient estimation property to nonlinear systems, and also to hybrid nonlinear problems where by the processes are continuous and the observations are discrete (continuous-discrete CD-CKF). Employing CKF techniques, therefore, carries high promise for modeling many biological phenomena where the underlying processes exhibit inherently nonlinear, continuous, and noisy dynamics and the associated measurements are uncertain and time-sampled. This paper investigates the performance of cubature filtering (CKF and CD-CKF) in two flagship problems arising in the field of neuroscience upon relating brain functionality to aggregate neurophysiological recordings: (i) estimation of the firing dynamics and the neural circuit model parameters from electric potentials (EP) observations, and (ii) estimation of the hemodynamic model parameters and the underlying neural drive from BOLD (fMRI) signals. First, in simulated neural circuit models, estimation accuracy was investigated under varying levels of observation noise (SNR), process noise structures, and observation sampling intervals (dt). When compared to the CKF, the CD-CKF consistently exhibited better accuracy for a given SNR, sharp accuracy increase with higher SNR, and persistent error reduction with smaller dt. Remarkably, CD-CKF accuracy shows only a mild deterioration for non-Gaussian process noise, specifically with Poisson noise, a commonly assumed form of background fluctuations in neuronal systems. Second, in simulated hemodynamic models, parametric estimates were consistently improved under CD-CKF. Critically, time-localization of the underlying neural drive, a determinant factor in fMRI-based functional connectivity studies, was significantly more accurate

  17. Hybrid Cubature Kalman filtering for identifying nonlinear models from sampled recording: Estimation of neuronal dynamics

    Science.gov (United States)

    2017-01-01

    Kalman filtering methods have long been regarded as efficient adaptive Bayesian techniques for estimating hidden states in models of linear dynamical systems under Gaussian uncertainty. Recent advents of the Cubature Kalman filter (CKF) have extended this efficient estimation property to nonlinear systems, and also to hybrid nonlinear problems where by the processes are continuous and the observations are discrete (continuous-discrete CD-CKF). Employing CKF techniques, therefore, carries high promise for modeling many biological phenomena where the underlying processes exhibit inherently nonlinear, continuous, and noisy dynamics and the associated measurements are uncertain and time-sampled. This paper investigates the performance of cubature filtering (CKF and CD-CKF) in two flagship problems arising in the field of neuroscience upon relating brain functionality to aggregate neurophysiological recordings: (i) estimation of the firing dynamics and the neural circuit model parameters from electric potentials (EP) observations, and (ii) estimation of the hemodynamic model parameters and the underlying neural drive from BOLD (fMRI) signals. First, in simulated neural circuit models, estimation accuracy was investigated under varying levels of observation noise (SNR), process noise structures, and observation sampling intervals (dt). When compared to the CKF, the CD-CKF consistently exhibited better accuracy for a given SNR, sharp accuracy increase with higher SNR, and persistent error reduction with smaller dt. Remarkably, CD-CKF accuracy shows only a mild deterioration for non-Gaussian process noise, specifically with Poisson noise, a commonly assumed form of background fluctuations in neuronal systems. Second, in simulated hemodynamic models, parametric estimates were consistently improved under CD-CKF. Critically, time-localization of the underlying neural drive, a determinant factor in fMRI-based functional connectivity studies, was significantly more accurate

  18. The principles, procedures and pitfalls in identifying archaeological and historical wood samples.

    Science.gov (United States)

    Cartwright, Caroline R

    2015-07-01

    The science of wood anatomy has evolved in recent decades to add archaeological and historical wood to its repertoire of documenting and characterizing modern and fossil woods. The increasing use of online wood anatomy databases and atlases has fostered the adoption of an international consensus regarding terminology, largely through the work of the International Association of Wood Anatomists (IAWA). This review presents an overview for the general reader of the current state of principles and procedures involved in the study of the wood anatomy of archaeological and historical specimens, some of which may be preserved through charring, waterlogging, desiccation or mineral replacement. By means of selected case studies, the review evaluates to what extent varying preservation of wood anatomical characteristics limits the level of identification to taxon. It assesses the role played by increasingly accessible scanning electron microscopes and complex optical microscopes, and whether these, on the one hand, provide exceptional opportunities for high-quality imaging and analysis of difficult samples, but, on the other hand, might be misleading the novice into thinking that advanced technology can be a substitute for specialized botanical training in wood anatomy. © The Author 2015. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  19. Identifying indicators of harmful and problem gambling in a Canadian sample through receiver operating characteristic analysis.

    Science.gov (United States)

    Quilty, Lena C; Avila Murati, Daniela; Bagby, R Michael

    2014-03-01

    Many gamblers would prefer to reduce gambling on their own rather than to adopt an abstinence approach within the context of a gambling treatment program. Yet responsible gambling guidelines lack quantifiable markers to guide gamblers in wagering safely. To address these issues, the current investigation implemented receiver operating characteristic (ROC) analysis to identify behavioral indicators of harmful and problem gambling. Gambling involvement was assessed in 503 participants (275 psychiatric outpatients and 228 community gamblers) with the Canadian Problem Gambling Index. Overall gambling frequency, duration, and expenditure were able to distinguish harmful and problematic gambling at a moderate level. Indicators of harmful gambling were generated for engagement in specific gambling activities: frequency of tickets and casino; duration of bingo, casino, and investments; and expenditures on bingo, casino, sports betting, games of skill, and investments. Indicators of problem gambling were similarly produced for frequency of tickets and casino, and expenditures on bingo, casino, games of skill, and investments. Logistic regression analyses revealed that overall gambling frequency uniquely predicted the presence of harmful and problem gambling. Furthermore, frequency indicators for tickets and casino uniquely predicted the presence of both harmful and problem gambling. Together, these findings contribute to the development of an empirically based method enabling the minimization of harmful or problem gambling through self-control rather than abstinence.

  20. THE CIRCUMGALACTIC MEDIUM OF SUBMILLIMETER GALAXIES. I. FIRST RESULTS FROM A RADIO-IDENTIFIED SAMPLE

    Energy Technology Data Exchange (ETDEWEB)

    Fu, Hai; Mutel, R.; Isbell, J.; Lang, C.; McGinnis, D. [Department of Physics and Astronomy, University of Iowa, Iowa City, IA 52242 (United States); Hennawi, J. F. [Max-Planck-Institut fur Astronomie, Heidelberg (Germany); Prochaska, J. X. [Department of Astronomy and Astrophysics, UCO/Lick Observatory, University of California, 1156 High Street, Santa Cruz, CA 95064 (United States); Casey, C. [Department of Astronomy, the University of Texas at Austin, 2515 Speedway Blvd, Stop C1400, Austin, TX 78712 (United States); Cooray, A. [Department of Physics and Astronomy, University of California, Irvine, CA 92697 (United States); Kereš, D. [Department of Physics, Center for Astrophysics and Space Sciences, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093 (United States); Zhang, Z.-Y.; Michałowski, M. J. [Institute for Astronomy, University of Edinburgh, Royal Observatory, Blackford Hill, Edinburgh EH9 3HJ (United Kingdom); Clements, D. [Astrophysics Group, Imperial College London, Blackett Laboratory, Prince Consort Road, London SW7 2AZ (United Kingdom); Mooley, K. [Oxford Centre For Astrophysical Surveys, Denys Wilkinson Building, Keble Road, Oxford OX1 3RH (United States); Perley, D. [Dark Cosmology Centre, Niels Bohr Institute, University of Copenhagen, Juliane Maries Vej 30, DK-2100 København Ø (Denmark); Stockton, A. [Institute for Astronomy, University of Hawaii, 2680 Woodlawn Drive, Honolulu, HI 96822 (United States); Thompson, D. [Large Binocular Telescope Observatory, University of Arizona, 933 N. Cherry Ave, Tucson, AZ 85721 (United States)

    2016-11-20

    We present the first results from an ongoing survey to characterize the circumgalactic medium (CGM) of massive high-redshift galaxies detected as submillimeter galaxies (SMGs). We constructed a parent sample of 163 SMG–QSO pairs with separations less than ∼36″ by cross-matching far-infrared-selected galaxies from Herschel with spectroscopically confirmed QSOs. The Herschel sources were selected to match the properties of the SMGs. We determined the sub-arcsecond positions of six Herschel sources with the Very Large Array and obtained secure redshift identification for three of those with near-infrared spectroscopy. The QSO sightlines probe transverse proper distances of 112, 157, and 198 kpc at foreground redshifts of 2.043, 2.515, and 2.184, respectively, which are comparable to the virial radius of the ∼10{sup 13} M {sub ⊙} halos expected to host SMGs. High-quality absorption-line spectroscopy of the QSOs reveals systematically strong H i Ly α absorption around all three SMGs, with rest-frame equivalent widths of ∼2–3 Å. However, none of the three absorbers exhibit compelling evidence for optically thick H i gas or metal absorption, in contrast to the dominance of strong neutral absorbers in the CGM of luminous z ∼ 2 QSOs. The low covering factor of optically thick H i gas around SMGs tentatively indicates that SMGs may not have as prominent cool gas reservoirs in their halos as the coeval QSOs and that they may inhabit less massive halos than previously thought.

  1. Invention and validation of an automated camera system that uses optical character recognition to identify patient name mislabeled samples.

    Science.gov (United States)

    Hawker, Charles D; McCarthy, William; Cleveland, David; Messinger, Bonnie L

    2014-03-01

    Mislabeled samples are a serious problem in most clinical laboratories. Published error rates range from 0.39/1000 to as high as 1.12%. Standardization of bar codes and label formats has not yet achieved the needed improvement. The mislabel rate in our laboratory, although low compared with published rates, prompted us to seek a solution to achieve zero errors. To reduce or eliminate our mislabeled samples, we invented an automated device using 4 cameras to photograph the outside of a sample tube. The system uses optical character recognition (OCR) to look for discrepancies between the patient name in our laboratory information system (LIS) vs the patient name on the customer label. All discrepancies detected by the system's software then require human inspection. The system was installed on our automated track and validated with production samples. We obtained 1 009 830 images during the validation period, and every image was reviewed. OCR passed approximately 75% of the samples, and no mislabeled samples were passed. The 25% failed by the system included 121 samples actually mislabeled by patient name and 148 samples with spelling discrepancies between the patient name on the customer label and the patient name in our LIS. Only 71 of the 121 mislabeled samples detected by OCR were found through our normal quality assurance process. We have invented an automated camera system that uses OCR technology to identify potential mislabeled samples. We have validated this system using samples transported on our automated track. Full implementation of this technology offers the possibility of zero mislabeled samples in the preanalytic stage.

  2. Classifying Returns as Extreme

    DEFF Research Database (Denmark)

    Christiansen, Charlotte

    2014-01-01

    I consider extreme returns for the stock and bond markets of 14 EU countries using two classification schemes: One, the univariate classification scheme from the previous literature that classifies extreme returns for each market separately, and two, a novel multivariate classification scheme tha...

  3. Bayes classifiers for imbalanced traffic accidents datasets.

    Science.gov (United States)

    Mujalli, Randa Oqab; López, Griselda; Garach, Laura

    2016-03-01

    Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under-sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. Bridging the gap between sample collection and laboratory analysis: using dried blood spots to identify human exposure to chemical agents

    Science.gov (United States)

    Hamelin, Elizabeth I.; Blake, Thomas A.; Perez, Jonas W.; Crow, Brian S.; Shaner, Rebecca L.; Coleman, Rebecca M.; Johnson, Rudolph C.

    2016-05-01

    Public health response to large scale chemical emergencies presents logistical challenges for sample collection, transport, and analysis. Diagnostic methods used to identify and determine exposure to chemical warfare agents, toxins, and poisons traditionally involve blood collection by phlebotomists, cold transport of biomedical samples, and costly sample preparation techniques. Use of dried blood spots, which consist of dried blood on an FDA-approved substrate, can increase analyte stability, decrease infection hazard for those handling samples, greatly reduce the cost of shipping/storing samples by removing the need for refrigeration and cold chain transportation, and be self-prepared by potentially exposed individuals using a simple finger prick and blood spot compatible paper. Our laboratory has developed clinical assays to detect human exposures to nerve agents through the analysis of specific protein adducts and metabolites, for which a simple extraction from a dried blood spot is sufficient for removing matrix interferents and attaining sensitivities on par with traditional sampling methods. The use of dried blood spots can bridge the gap between the laboratory and the field allowing for large scale sample collection with minimal impact on hospital resources while maintaining sensitivity, specificity, traceability, and quality requirements for both clinical and forensic applications.

  5. Molecular analysis of faecal samples from birds to identify potential crop pests and useful biocontrol agents in natural areas.

    Science.gov (United States)

    King, R A; Symondson, W O C; Thomas, R J

    2015-06-01

    Wild habitats adjoining farmland are potentially valuable sources of natural enemies, but also of pests. Here we tested the utility of birds as 'sampling devices', to identify the diversity of prey available to predators and particularly to screen for pests and natural enemies using natural ecosystems as refugia. Here we used PCR to amplify prey DNA from three sympatric songbirds foraging on small invertebrates in Phragmites reedbed ecosystems, namely the Reed Warbler (Acrocephalus scirpaceus), Sedge Warbler (Acrocephalus schoenobaenus) and Cetti's Warbler (Cettia cetti). A recently described general invertebrate primer pair was used for the first time to analyse diets. Amplicons were cloned and sequenced, then identified by reference to the Barcoding of Life Database and to our own sequences obtained from fresh invertebrates. Forty-five distinct prey DNA sequences were obtained from 11 faecal samples, of which 39 could be identified to species or genus. Targeting three warbler species ensured that species-specific differences in prey choice broadened the range of prey taken. Amongst the prey found in reedbeds were major pests (including the tomato moth Lacanobia oleracea) as well as many potentially valuable natural enemies including aphidophagous hoverflies and braconid wasps. Given the mobility of birds, this approach provides a practical way of sampling a whole habitat at once, providing growers with information on possible invasion by locally resident pests and the colonization potential of natural enemies from local natural habitats.

  6. Identifying Shifts in Leaf-Litter Ant Assemblages (Hymenoptera: Formicidae across Ecosystem Boundaries Using Multiple Sampling Methods.

    Directory of Open Access Journals (Sweden)

    Michal Wiezik

    Full Text Available Global or regional environmental changes in climate or land use have been increasingly implied in shifts in boundaries (ecotones between adjacent ecosystems such as beech or oak-dominated forests and forest-steppe ecotones that frequently co-occur near the southern range limits of deciduous forest biome in Europe. Yet, our ability to detect changes in biological communities across these ecosystems, or to understand their environmental drivers, can be hampered when different sampling methods are required to characterize biological communities of the adjacent but ecologically different ecosystems. Ants (Hymenoptera: Formicidae have been shown to be particularly sensitive to changes in temperature and vegetation and they require different sampling methods in closed vs. open habitats. We compared ant assemblages of closed-forests (beech- or oak-dominated and open forest-steppe habitats in southwestern Carpathians using methods for closed-forest (litter sifting and open habitats (pitfall trapping, and developed an integrated sampling approach to characterize changes in ant assemblages across these adjacent ecosystems. Using both methods, we collected 5,328 individual ant workers from 28 species. Neither method represented ant communities completely, but pitfall trapping accounted for more species (24 than litter sifting (16. Although pitfall trapping characterized differences in species richness and composition among the ecosystems better, with beech forest being most species poor and ecotone most species rich, litter sifting was more successful in identifying characteristic litter-dwelling species in oak-dominated forest. The integrated sampling approach using both methods yielded more accurate characterization of species richness and composition, and particularly so in species-rich forest-steppe habitat where the combined sample identified significantly higher number of species compared to either of the two methods on their own. Thus, an integrated

  7. Identifying Shifts in Leaf-Litter Ant Assemblages (Hymenoptera: Formicidae) across Ecosystem Boundaries Using Multiple Sampling Methods.

    Science.gov (United States)

    Wiezik, Michal; Svitok, Marek; Wieziková, Adela; Dovčiak, Martin

    2015-01-01

    Global or regional environmental changes in climate or land use have been increasingly implied in shifts in boundaries (ecotones) between adjacent ecosystems such as beech or oak-dominated forests and forest-steppe ecotones that frequently co-occur near the southern range limits of deciduous forest biome in Europe. Yet, our ability to detect changes in biological communities across these ecosystems, or to understand their environmental drivers, can be hampered when different sampling methods are required to characterize biological communities of the adjacent but ecologically different ecosystems. Ants (Hymenoptera: Formicidae) have been shown to be particularly sensitive to changes in temperature and vegetation and they require different sampling methods in closed vs. open habitats. We compared ant assemblages of closed-forests (beech- or oak-dominated) and open forest-steppe habitats in southwestern Carpathians using methods for closed-forest (litter sifting) and open habitats (pitfall trapping), and developed an integrated sampling approach to characterize changes in ant assemblages across these adjacent ecosystems. Using both methods, we collected 5,328 individual ant workers from 28 species. Neither method represented ant communities completely, but pitfall trapping accounted for more species (24) than litter sifting (16). Although pitfall trapping characterized differences in species richness and composition among the ecosystems better, with beech forest being most species poor and ecotone most species rich, litter sifting was more successful in identifying characteristic litter-dwelling species in oak-dominated forest. The integrated sampling approach using both methods yielded more accurate characterization of species richness and composition, and particularly so in species-rich forest-steppe habitat where the combined sample identified significantly higher number of species compared to either of the two methods on their own. Thus, an integrated sampling

  8. Intelligent Garbage Classifier

    Directory of Open Access Journals (Sweden)

    Ignacio Rodríguez Novelle

    2008-12-01

    Full Text Available IGC (Intelligent Garbage Classifier is a system for visual classification and separation of solid waste products. Currently, an important part of the separation effort is based on manual work, from household separation to industrial waste management. Taking advantage of the technologies currently available, a system has been built that can analyze images from a camera and control a robot arm and conveyor belt to automatically separate different kinds of waste.

  9. Classifying Linear Canonical Relations

    OpenAIRE

    Lorand, Jonathan

    2015-01-01

    In this Master's thesis, we consider the problem of classifying, up to conjugation by linear symplectomorphisms, linear canonical relations (lagrangian correspondences) from a finite-dimensional symplectic vector space to itself. We give an elementary introduction to the theory of linear canonical relations and present partial results toward the classification problem. This exposition should be accessible to undergraduate students with a basic familiarity with linear algebra.

  10. Diversity of reductive dehalogenase genes from environmental samples and enrichment cultures identified with degenerate primer PCR screens.

    Directory of Open Access Journals (Sweden)

    Laura Audrey Hug

    2013-11-01

    Full Text Available Reductive dehalogenases are the critical enzymes for anaerobic organohalide respiration, a microbial metabolic process that has been harnessed for bioremediation efforts to resolve chlorinated solvent contamination in groundwater and is implicated in the global halogen cycle. Reductive dehalogenase sequence diversity is informative for the dechlorination potential of the site or enrichment culture. A suite of degenerate PCR primers targeting a comprehensive curated set of reductive dehalogenase genes was designed and applied to twelve DNA samples extracted from contaminated and pristine sites, as well as six enrichment cultures capable of reducing chlorinated compounds to non-toxic end-products. The amplified gene products from four environmental sites and two enrichment cultures were sequenced using Illumina HiSeq, and the reductive dehalogenase complement of each sample determined. The results indicate that the diversity of the reductive dehalogenase gene family is much deeper than is currently accounted for: one-third of the translated proteins have less than 70% pairwise amino acid identity to database sequences. Approximately 60% of the sequenced reductive dehalogenase genes were broadly distributed, being identified in four or more samples, and often in previously sequenced genomes as well. In contrast, 17% of the sequenced reductive dehalogenases were unique, present in only a single sample and bearing less than 90% pairwise amino acid identity to any previously identified proteins. Many of the broadly distributed reductive dehalogenases are uncharacterized in terms of their substrate specificity, making these intriguing targets for further biochemical experimentation. Finally, comparison of samples from a contaminated site and an enrichment culture derived from the same site eight years prior allowed examination of the effect of the enrichment process.

  11. High dimensional classifiers in the imbalanced case

    DEFF Research Database (Denmark)

    Bak, Britta Anker; Jensen, Jens Ledet

    We consider the binary classification problem in the imbalanced case where the number of samples from the two groups differ. The classification problem is considered in the high dimensional case where the number of variables is much larger than the number of samples, and where the imbalance leads...... to a bias in the classification. A theoretical analysis of the independence classifier reveals the origin of the bias and based on this we suggest two new classifiers that can handle any imbalance ratio. The analytical results are supplemented by a simulation study, where the suggested classifiers in some...

  12. A proteomic study to identify soya allergens--the human response to transgenic versus non-transgenic soya samples.

    Science.gov (United States)

    Batista, Rita; Martins, Isabel; Jeno, Paul; Ricardo, Cândido Pinto; Oliveira, Maria Margarida

    2007-01-01

    In spite of being among the main foods responsible for allergic reactions worldwide, soybean (Glycine max)-derived products continue to be increasingly widespread in a variety of food products due to their well-documented health benefits. Soybean also continues to be one of the elected target crops for genetic modification. The aim of this study was to characterize the soya proteome and, specifically, IgE-reactive proteins as well as to compare the IgE response in soya-allergic individuals to genetically modified Roundup Ready soya versus its non-transgenic control. We performed two-dimensional gel electrophoresis of protein extracts from a 5% genetically modified Roundup Ready flour sample and its non-transgenic control followed by Western blotting with plasma from 5 soya-sensitive individuals. We used peptide tandem mass spectrometry to identify soya proteins (55 protein matches), specifically IgE-binding ones, and to evaluate differences between transgenic and non-transgenic samples. We identified 2 new potential soybean allergens--one is maturation associated and seems to be part of the late embryogenesis abundant proteins group and the other is a cysteine proteinase inhibitor. None of the individuals tested reacted differentially to the transgenic versus non-transgenic samples under study. Soybean endogenous allergen expression does not seem to be altered after genetic modification. Proteomics should be considered a powerful tool for functional characterization of plants and for food safety assessment. Copyright (c) 2007 S. Karger AG, Basel.

  13. Classifying a smoker scale in adult daily and nondaily smokers.

    Science.gov (United States)

    Pulvers, Kim; Scheuermann, Taneisha S; Romero, Devan R; Basora, Brittany; Luo, Xianghua; Ahluwalia, Jasjit S

    2014-05-01

    Smoker identity, or the strength of beliefs about oneself as a smoker, is a robust marker of smoking behavior. However, many nondaily smokers do not identify as smokers, underestimating their risk for tobacco-related disease and resulting in missed intervention opportunities. Assessing underlying beliefs about characteristics used to classify smokers may help explain the discrepancy between smoking behavior and smoker identity. This study examines the factor structure, reliability, and validity of the Classifying a Smoker scale among a racially diverse sample of adult smokers. A cross-sectional survey was administered through an online panel survey service to 2,376 current smokers who were at least 25 years of age. The sample was stratified to obtain equal numbers of 3 racial/ethnic groups (African American, Latino, and White) across smoking level (nondaily and daily smoking). The Classifying a Smoker scale displayed a single factor structure and excellent internal consistency (α = .91). Classifying a Smoker scores significantly increased at each level of smoking, F(3,2375) = 23.68, p smoker identity, stronger dependence on cigarettes, greater health risk perceptions, more smoking friends, and were more likely to carry cigarettes. Classifying a Smoker scores explained unique variance in smoking variables above and beyond that explained by smoker identity. The present study supports the use of the Classifying a Smoker scale among diverse, experienced smokers. Stronger endorsement of characteristics used to classify a smoker (i.e., stricter criteria) was positively associated with heavier smoking and related characteristics. Prospective studies are needed to inform prevention and treatment efforts.

  14. Using the MMPI-2-RF to discriminate psychometrically identified schizotypic college students from a matched comparison sample.

    Science.gov (United States)

    Hunter, Helen K; Bolinskey, P Kevin; Novi, Jonathan H; Hudak, Daniel V; James, Alison V; Myers, Kevin R; Schuder, Kelly M

    2014-01-01

    This study investigates the extent to which the Minnesota Multiphasic Personality Inventory-2 Restructured Form (MMPI-2-RF) profiles of 52 individuals making up a psychometrically identified schizotypes (SZT) sample could be successfully discriminated from the protocols of 52 individuals in a matched comparison (MC) sample. Replication analyses were performed with an additional 53 pairs of SZT and MC participants. Results showed significant differences in mean T-score values between these 2 groups across a variety of MMPI-2-RF scales. Results from discriminant function analyses indicate that schizotypy can be predicted effectively using 4 MMPI-2-RF scales and that this method of classification held up on replication. Additional results demonstrated that these MMPI-2-RF scales nominally outperformed MMPI-2 scales suggested by previous research as being indicative of schizophrenia liability. Directions for future research with the MMPI-2-RF are suggested.

  15. DNA typing of ancient parasite eggs from environmental samples identifies human and animal worm infections in viking-age settlement

    DEFF Research Database (Denmark)

    Søe, Martin Jensen; Nejsum, Peter; Fredensborg, Brian Lund

    2015-01-01

    Ancient parasite eggs were recovered from environmental samples collected at a Viking-age settlement in Viborg, Denmark, dated 1018-1030 A.D. Morphological examination identified Ascaris sp., Trichuris sp., and Fasciola sp. eggs, but size and shape did not allow species identification. By carefully...... selecting genetic markers, PCR amplification and sequencing of ancient DNA (aDNA) isolates resulted in identification of: the human whipworm, Trichuris trichiura, using SSUrRNA sequence homology; Ascaris sp. with 100% homology to cox1 haplotype 07; and Fasciola hepatica using ITS1 sequence homology...

  16. Performance of the Lot Quality Assurance Sampling Method Compared to Surveillance for Identifying Inadequately-performing Areas in Matlab, Bangladesh

    OpenAIRE

    Bhuiya, Abbas; Hanifi, S.M.A.; Roy, Nikhil; Streatfield, P. Kim

    2007-01-01

    This paper compared the performance of the lot quality assurance sampling (LQAS) method in identifying inadequately-performing health work-areas with that of using health and demographic surveillance system (HDSS) data and examined the feasibility of applying the method by field-level programme supervisors. The study was carried out in Matlab, the field site of ICDDR,B, where a HDSS has been in place for over 30 years. The LQAS method was applied in 57 work-areas of community health workers i...

  17. Performance of the lot quality assurance sampling method compared to surveillance for identifying inadequately-performing areas in Matlab, Bangladesh.

    Science.gov (United States)

    Bhuiya, Abbas; Hanifi, S M A; Roy, Nikhil; Streatfield, P Kim

    2007-03-01

    This paper compared the performance of the lot quality assurance sampling (LQAS) method in identifying inadequately-performing health work-areas with that of using health and demographic surveillance system (HDSS) data and examined the feasibility of applying the method by field-level programme supervisors. The study was carried out in Matlab, the field site of ICDDR,B, where a HDSS has been in place for over 30 years. The LQAS method was applied in 57 work-areas of community health workers in ICDDR,B-served areas in Matlab during July-September 2002. The performance of the LQAS method in identifying work-areas with adequate and inadequate coverage of various health services was compared with those of the HDSS. The health service-coverage indicators included coverage of DPT, measles, BCG vaccination, and contraceptive use. It was observed that the difference in the proportion of work-areas identified to be inadequately performing using the LQAS method with less than 30 respondents, and the HDSS was not statistically significant. The consistency between the LQAS method and the HDSS in identifying work-areas was greater for adequately-performing areas than inadequately-performing areas. It was also observed that the field managers could be trained to apply the LQAS method in monitoring their performance in reaching the target population.

  18. A sample of galaxy pairs identified from the LAMOST spectral survey and the Sloan Digital Sky Survey

    International Nuclear Information System (INIS)

    Shen, Shi-Yin; Argudo-Fernández, Maria; Chen, Li; Feng, Shuai; Hou, Jin-Liang; Shao, Zheng-Yi; Chen, Xiao-Yan; Luo, A-Li; Wu, Hong; Yang, Hai-Feng; Yang, Ming; Hou, Yong-Hui; Wang, Yue-Fei; Jiang, Peng; Wang, Ting-Gui; Jing, Yi-Peng; Kong, Xu; Wang, Wen-Ting; Luo, Zhi-Jian; Wu, Xue-Bing

    2016-01-01

    A small fraction (< 10%) of the SDSS main galaxy (MG) sample has not been targeted with spectroscopy due to the effect of fiber collisions. These galaxies have been compiled into the input catalog of the LAMOST ExtraGAlactic Surveys and named the complementary galaxy sample. In this paper, we introduce this project and status of the spectroscopies associated with the complementary galaxies in the first two years of the LAMOST spectral survey (till Sep. of 2014). Moreover, we present a sample of 1102 galaxy pairs identified from the LAMOST complementary galaxies and SDSS MGs, which are defined as two members that have a projected distance smaller than 100 h −1 70 kpc and a recessional velocity difference smaller than 500 km s −1 . Compared with galaxy pairs that are only selected from SDSS, the LAMOST-SDSS pairs have the advantages of not being biased toward large separations and therefore act as a useful supplement in statistical studies of galaxy interaction and galaxy merging. (paper)

  19. Identifying Risk Factors for Drug Use in an Iranian Treatment Sample: A Prediction Approach Using Decision Trees.

    Science.gov (United States)

    Amirabadizadeh, Alireza; Nezami, Hossein; Vaughn, Michael G; Nakhaee, Samaneh; Mehrpour, Omid

    2018-05-12

    Substance abuse exacts considerable social and health care burdens throughout the world. The aim of this study was to create a prediction model to better identify risk factors for drug use. A prospective cross-sectional study was conducted in South Khorasan Province, Iran. Of the total of 678 eligible subjects, 70% (n: 474) were randomly selected to provide a training set for constructing decision tree and multiple logistic regression (MLR) models. The remaining 30% (n: 204) were employed in a holdout sample to test the performance of the decision tree and MLR models. Predictive performance of different models was analyzed by the receiver operating characteristic (ROC) curve using the testing set. Independent variables were selected from demographic characteristics and history of drug use. For the decision tree model, the sensitivity and specificity for identifying people at risk for drug abuse were 66% and 75%, respectively, while the MLR model was somewhat less effective at 60% and 73%. Key independent variables in the analyses included first substance experience, age at first drug use, age, place of residence, history of cigarette use, and occupational and marital status. While study findings are exploratory and lack generalizability they do suggest that the decision tree model holds promise as an effective classification approach for identifying risk factors for drug use. Convergent with prior research in Western contexts is that age of drug use initiation was a critical factor predicting a substance use disorder.

  20. Transcriptomic meta-analysis identifies gene expression characteristics in various samples of HIV-infected patients with nonprogressive disease.

    Science.gov (United States)

    Zhang, Le-Le; Zhang, Zi-Ning; Wu, Xian; Jiang, Yong-Jun; Fu, Ya-Jing; Shang, Hong

    2017-09-12

    A small proportion of HIV-infected patients remain clinically and/or immunologically stable for years, including elite controllers (ECs) who have undetectable viremia (10 years). However, the mechanism of nonprogression needs to be further resolved. In this study, a transcriptome meta-analysis was performed on nonprogressor and progressor microarray data to identify differential transcriptome pathways and potential biomarkers. Using the INMEX (integrative meta-analysis of expression data) program, we performed the meta-analysis to identify consistently differentially expressed genes (DEGs) in nonprogressors and further performed functional interpretation (gene ontology analysis and pathway analysis) of the DEGs identified in the meta-analysis. Five microarray datasets (81 cases and 98 controls in total), including whole blood, CD4 + and CD8 + T cells, were collected for meta-analysis. We determined that nonprogressors have reduced expression of important interferon-stimulated genes (ISGs), CD38, lymphocyte activation gene 3 (LAG-3) in whole blood, CD4 + and CD8 + T cells. Gene ontology (GO) analysis showed a significant enrichment in DEGs that function in the type I interferon signaling pathway. Upregulated pathways, including the PI3K-Akt signaling pathway in whole blood, cytokine-cytokine receptor interaction in CD4 + T cells and the MAPK signaling pathway in CD8 + T cells, were identified in nonprogressors compared with progressors. In each metabolic functional category, the number of downregulated DEGs was more than the upregulated DEGs, and almost all genes were downregulated DEGs in the oxidative phosphorylation (OXPHOS) and tricarboxylic acid (TCA) cycle in the three types of samples. Our transcriptomic meta-analysis provides a comprehensive evaluation of the gene expression profiles in major blood types of nonprogressors, providing new insights in the understanding of HIV pathogenesis and developing strategies to delay HIV disease progression.

  1. Consistency Analysis of Nearest Subspace Classifier

    OpenAIRE

    Wang, Yi

    2015-01-01

    The Nearest subspace classifier (NSS) finds an estimation of the underlying subspace within each class and assigns data points to the class that corresponds to its nearest subspace. This paper mainly studies how well NSS can be generalized to new samples. It is proved that NSS is strongly consistent under certain assumptions. For completeness, NSS is evaluated through experiments on various simulated and real data sets, in comparison with some other linear model based classifiers. It is also ...

  2. Sampling

    CERN Document Server

    Thompson, Steven K

    2012-01-01

    Praise for the Second Edition "This book has never had a competitor. It is the only book that takes a broad approach to sampling . . . any good personal statistics library should include a copy of this book." —Technometrics "Well-written . . . an excellent book on an important subject. Highly recommended." —Choice "An ideal reference for scientific researchers and other professionals who use sampling." —Zentralblatt Math Features new developments in the field combined with all aspects of obtaining, interpreting, and using sample data Sampling provides an up-to-date treat

  3. DNA typing of ancient parasite eggs from environmental samples identifies human and animal worm infections in Viking-age settlement.

    Science.gov (United States)

    Søe, Martin Jensen; Nejsum, Peter; Fredensborg, Brian Lund; Kapel, Christian Moliin Outzen

    2015-02-01

    Ancient parasite eggs were recovered from environmental samples collected at a Viking-age settlement in Viborg, Denmark, dated 1018-1030 A.D. Morphological examination identified Ascaris sp., Trichuris sp., and Fasciola sp. eggs, but size and shape did not allow species identification. By carefully selecting genetic markers, PCR amplification and sequencing of ancient DNA (aDNA) isolates resulted in identification of: the human whipworm, Trichuris trichiura , using SSUrRNA sequence homology; Ascaris sp. with 100% homology to cox1 haplotype 07; and Fasciola hepatica using ITS1 sequence homology. The identification of T. trichiura eggs indicates that human fecal material is present and, hence, that the Ascaris sp. haplotype 07 was most likely a human variant in Viking-age Denmark. The location of the F. hepatica finding suggests that sheep or cattle are the most likely hosts. Further, we sequenced the Ascaris sp. 18S rRNA gene in recent isolates from humans and pigs of global distribution and show that this is not a suited marker for species-specific identification. Finally, we discuss ancient parasitism in Denmark and the implementation of aDNA analysis methods in paleoparasitological studies. We argue that when employing species-specific identification, soil samples offer excellent opportunities for studies of human parasite infections and of human and animal interactions of the past.

  4. Synoptic sampling and principal components analysis to identify sources of water and metals to an acid mine drainage stream.

    Science.gov (United States)

    Byrne, Patrick; Runkel, Robert L; Walton-Day, Katherine

    2017-07-01

    Combining the synoptic mass balance approach with principal components analysis (PCA) can be an effective method for discretising the chemistry of inflows and source areas in watersheds where contamination is diffuse in nature and/or complicated by groundwater interactions. This paper presents a field-scale study in which synoptic sampling and PCA are employed in a mineralized watershed (Lion Creek, Colorado, USA) under low flow conditions to (i) quantify the impacts of mining activity on stream water quality; (ii) quantify the spatial pattern of constituent loading; and (iii) identify inflow sources most responsible for observed changes in stream chemistry and constituent loading. Several of the constituents investigated (Al, Cd, Cu, Fe, Mn, Zn) fail to meet chronic aquatic life standards along most of the study reach. The spatial pattern of constituent loading suggests four primary sources of contamination under low flow conditions. Three of these sources are associated with acidic (pH mine water in the Minnesota Mine shaft located to the north-east of the river channel. In addition, water chemistry data during a rainfall-runoff event suggests the spatial pattern of constituent loading may be modified during rainfall due to dissolution of efflorescent salts or erosion of streamside tailings. These data point to the complexity of contaminant mobilisation processes and constituent loading in mining-affected watersheds but the combined synoptic sampling and PCA approach enables a conceptual model of contaminant dynamics to be developed to inform remediation.

  5. Maternal red blood cell alloantibodies identified in blood samples obtained from Iranian pregnant women: the first population study in Iran.

    Science.gov (United States)

    Shahverdi, Ehsan; Moghaddam, Mostafa; Gorzin, Fateme

    2017-01-01

    The objective was to determine the frequency of occurrence of alloantibodies among pregnant women in Iran. This was a prospective cross-sectional study, which was carried out in the immunohematology reference laboratory of the Iranian Blood Transfusion Organization in Tehran, Iran, in 2008 to 2015. Screening and identification of red blood cell (RBC) alloantibodies was done on the sera of 7340 pregnant females using the standard tube method and gel column agglutination technique. Alloantibodies were identified in the serum of 332 of the 7340 (4.5%) pregnant women. A total of 410 antibodies were detected in 332 positive maternal serum samples with no previous history of blood transfusion. Anti-D was the most common antibody accounting for 70.5% of all the antibodies formed in D- women. The incidence of specific alloimmunization other than Rh group was 14.4%. We concluded that the alloimmunization rate was high in comparison with wide pattern in previous studies. In Iran, like other developing countries, alloimmunization screening tests are performed only to detect anti-D in pregnant D- women. This high rate of alloimmunization, quite possibly, is due to the fact that the majority of blood samples came from pregnant women known to have previous obstetric problems. However, we suggest that RBC antibody screening tests should be extended to all D+ women. © 2016 AABB.

  6. Multi-iPPseEvo: A Multi-label Classifier for Identifying Human Phosphorylated Proteins by Incorporating Evolutionary Information into Chou's General PseAAC via Grey System Theory.

    Science.gov (United States)

    Qiu, Wang-Ren; Zheng, Quan-Shu; Sun, Bi-Qian; Xiao, Xuan

    2017-03-01

    Predicting phosphorylation protein is a challenging problem, particularly when query proteins have multi-label features meaning that they may be phosphorylated at two or more different type amino acids. In fact, human protein usually be phosphorylated at serine, threonine and tyrosine. By introducing the "multi-label learning" approach, a novel predictor has been developed that can be used to deal with the systems containing both single- and multi-label phosphorylation protein. Here we proposed a predictor called Multi-iPPseEvo by (1) incorporating the protein sequence evolutionary information into the general pseudo amino acid composition (PseAAC) via the grey system theory, (2) balancing out the skewed training datasets by the asymmetric bootstrap approach, and (3) constructing an ensemble predictor by fusing an array of individual random forest classifiers thru a voting system. Rigorous cross-validations via a set of multi-label metrics indicate that the multi-label phosphorylation predictor is very promising and encouraging. The current approach represents a new strategy to deal with the multi-label biological problems, and the software is freely available for academic use at http://www.jci-bioinfo.cn/Multi-iPPseEvo. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Identifying sources of methane sampled in the Arctic using δ13C in CH4 and Lagrangian particle dispersion modelling.

    Science.gov (United States)

    Cain, Michelle; France, James; Pyle, John; Warwick, Nicola; Fisher, Rebecca; Lowry, Dave; Allen, Grant; O'Shea, Sebastian; Illingworth, Samuel; Jones, Ben; Gallagher, Martin; Welpott, Axel; Muller, Jennifer; Bauguitte, Stephane; George, Charles; Hayman, Garry; Manning, Alistair; Myhre, Catherine Lund; Lanoisellé, Mathias; Nisbet, Euan

    2016-04-01

    An airmass of enhanced methane was sampled during a research flight at ~600 m to ~2000 m altitude between the North coast of Norway and Svalbard on 21 July 2012. The largest source of methane in the summertime Arctic is wetland emissions. Did this enhancement in methane come from wetland emissions? The airmass was identified through continuous methane measurements using a Los Gatos fast greenhouse gas analyser on board the UK's BAe-146 Atmospheric Research Aircraft (ARA) as part of the MAMM (Methane in the Arctic: Measurements and Modelling) campaign. A Lagrangian particle dispersion model (the UK Met Office's NAME model) was run backwards to identify potential methane source regions. This was combined with a methane emission inventory to create "pseudo observations" to compare with the aircraft observations. This modelling was used to constrain the δ13C CH4 wetland source signature (where δ13C CH4 is the ratio of 13C to 12C in methane), resulting in a most likely signature of -73‰ (±4‰7‰). The NAME back trajectories suggest a methane source region of north-western Russian wetlands, and -73‰ is consistent with in situ measurements of wetland methane at similar latitudes in Scandinavia. This analysis has allowed us to study emissions from remote regions for which we do not have in situ observations, giving us an extra tool in the determination of the isotopic source variation of global methane emissions.

  8. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers.

    Science.gov (United States)

    McIntyre, Alexa B R; Ounit, Rachid; Afshinnekoo, Ebrahim; Prill, Robert J; Hénaff, Elizabeth; Alexander, Noah; Minot, Samuel S; Danko, David; Foox, Jonathan; Ahsanuddin, Sofia; Tighe, Scott; Hasan, Nur A; Subramanian, Poorani; Moffat, Kelly; Levy, Shawn; Lonardi, Stefano; Greenfield, Nick; Colwell, Rita R; Rosen, Gail L; Mason, Christopher E

    2017-09-21

    One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.

  9. Defining and Classifying Interest Groups

    DEFF Research Database (Denmark)

    Baroni, Laura; Carroll, Brendan; Chalmers, Adam

    2014-01-01

    The interest group concept is defined in many different ways in the existing literature and a range of different classification schemes are employed. This complicates comparisons between different studies and their findings. One of the important tasks faced by interest group scholars engaged...... in large-N studies is therefore to define the concept of an interest group and to determine which classification scheme to use for different group types. After reviewing the existing literature, this article sets out to compare different approaches to defining and classifying interest groups with a sample...... in the organizational attributes of specific interest group types. As expected, our comparison of coding schemes reveals a closer link between group attributes and group type in narrower classification schemes based on group organizational characteristics than those based on a behavioral definition of lobbying....

  10. Synoptic sampling and principal components analysis to identify sources of water and metals to an acid mine drainage stream

    Science.gov (United States)

    Byrne, Patrick; Runkel, Robert L.; Walton-Day, Katie

    2017-01-01

    Combining the synoptic mass balance approach with principal components analysis (PCA) can be an effective method for discretising the chemistry of inflows and source areas in watersheds where contamination is diffuse in nature and/or complicated by groundwater interactions. This paper presents a field-scale study in which synoptic sampling and PCA are employed in a mineralized watershed (Lion Creek, Colorado, USA) under low flow conditions to (i) quantify the impacts of mining activity on stream water quality; (ii) quantify the spatial pattern of constituent loading; and (iii) identify inflow sources most responsible for observed changes in stream chemistry and constituent loading. Several of the constituents investigated (Al, Cd, Cu, Fe, Mn, Zn) fail to meet chronic aquatic life standards along most of the study reach. The spatial pattern of constituent loading suggests four primary sources of contamination under low flow conditions. Three of these sources are associated with acidic (pH metal and major ion) chemistry using PCA suggests a hydraulic connection between many of the left bank inflows and mine water in the Minnesota Mine shaft located to the north-east of the river channel. In addition, water chemistry data during a rainfall-runoff event suggests the spatial pattern of constituent loading may be modified during rainfall due to dissolution of efflorescent salts or erosion of streamside tailings. These data point to the complexity of contaminant mobilisation processes and constituent loading in mining-affected watersheds but the combined synoptic sampling and PCA approach enables a conceptual model of contaminant dynamics to be developed to inform remediation.

  11. Development and validation of a multi-locus DNA metabarcoding method to identify endangered species in complex samples.

    Science.gov (United States)

    Arulandhu, Alfred J; Staats, Martijn; Hagelaar, Rico; Voorhuijzen, Marleen M; Prins, Theo W; Scholtens, Ingrid; Costessi, Adalberto; Duijsings, Danny; Rechenmann, François; Gaspar, Frédéric B; Barreto Crespo, Maria Teresa; Holst-Jensen, Arne; Birck, Matthew; Burns, Malcolm; Haynes, Edward; Hochegger, Rupert; Klingl, Alexander; Lundberg, Lisa; Natale, Chiara; Niekamp, Hauke; Perri, Elena; Barbante, Alessandra; Rosec, Jean-Philippe; Seyfarth, Ralf; Sovová, Tereza; Van Moorleghem, Christoff; van Ruth, Saskia; Peelen, Tamara; Kok, Esther

    2017-10-01

    DNA metabarcoding provides great potential for species identification in complex samples such as food supplements and traditional medicines. Such a method would aid Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) enforcement officers to combat wildlife crime by preventing illegal trade of endangered plant and animal species. The objective of this research was to develop a multi-locus DNA metabarcoding method for forensic wildlife species identification and to evaluate the applicability and reproducibility of this approach across different laboratories. A DNA metabarcoding method was developed that makes use of 12 DNA barcode markers that have demonstrated universal applicability across a wide range of plant and animal taxa and that facilitate the identification of species in samples containing degraded DNA. The DNA metabarcoding method was developed based on Illumina MiSeq amplicon sequencing of well-defined experimental mixtures, for which a bioinformatics pipeline with user-friendly web-interface was developed. The performance of the DNA metabarcoding method was assessed in an international validation trial by 16 laboratories, in which the method was found to be highly reproducible and sensitive enough to identify species present in a mixture at 1% dry weight content. The advanced multi-locus DNA metabarcoding method assessed in this study provides reliable and detailed data on the composition of complex food products, including information on the presence of CITES-listed species. The method can provide improved resolution for species identification, while verifying species with multiple DNA barcodes contributes to an enhanced quality assurance. © The Authors 2017. Published by Oxford University Press.

  12. Pediatric malignant hyperthermia: risk factors, morbidity, and mortality identified from the Nationwide Inpatient Sample and Kids' Inpatient Database.

    Science.gov (United States)

    Salazar, Jose H; Yang, Jingyan; Shen, Liang; Abdullah, Fizan; Kim, Tae W

    2014-12-01

    Malignant Hyperthermia (MH) is a potentially fatal metabolic disorder. Due to its rarity, limited evidence exists about risk factors, morbidity, and mortality especially in children. Using the Nationwide Inpatient Sample and the Kid's Inpatient Database (KID), admissions with the ICD-9 code for MH (995.86) were extracted for patients 0-17 years of age. Demographic characteristics were analyzed. Logistic regression was performed to identify patient and hospital characteristics associated with mortality. A subset of patients with a surgical ICD-9 code in the KID was studied to calculate the prevalence of MH in the dataset. A total of 310 pediatric admissions were seen in 13 nonoverlapping years of data. Patients had a mortality of 2.9%. Male sex was predominant (64.8%), and 40.5% of the admissions were treated at centers not identified as children's hospitals. The most common associated diagnosis was rhabdomyolysis, which was present in 26 cases. Regression with the outcome of mortality did not yield significant differences between demographic factors, age, sex race, or hospital type, pediatric vs nonpediatric. Within a surgical subset of 530,449 admissions, MH was coded in 55, giving a rate of 1.04 cases per 10,000 cases. This study is the first to combine two large databases to study MH in the pediatric population. The analysis provides an insight into the risk factors, comorbidities, mortality, and prevalence of MH in the United States population. Until more methodologically rigorous, large-scale studies are done, the use of databases will continue to be the optimal method to study rare diseases. © 2014 John Wiley & Sons Ltd.

  13. Three data partitioning strategies for building local classifiers (Chapter 14)

    NARCIS (Netherlands)

    Zliobaite, I.; Okun, O.; Valentini, G.; Re, M.

    2011-01-01

    Divide-and-conquer approach has been recognized in multiple classifier systems aiming to utilize local expertise of individual classifiers. In this study we experimentally investigate three strategies for building local classifiers that are based on different routines of sampling data for training.

  14. Fingerprint prediction using classifier ensembles

    CSIR Research Space (South Africa)

    Molale, P

    2011-11-01

    Full Text Available ); logistic discrimination (LgD), k-nearest neighbour (k-NN), artificial neural network (ANN), association rules (AR) decision tree (DT), naive Bayes classifier (NBC) and the support vector machine (SVM). The performance of several multiple classifier systems...

  15. Classified

    CERN Multimedia

    Computer Security Team

    2011-01-01

    In the last issue of the Bulletin, we have discussed recent implications for privacy on the Internet. But privacy of personal data is just one facet of data protection. Confidentiality is another one. However, confidentiality and data protection are often perceived as not relevant in the academic environment of CERN.   But think twice! At CERN, your personal data, e-mails, medical records, financial and contractual documents, MARS forms, group meeting minutes (and of course your password!) are all considered to be sensitive, restricted or even confidential. And this is not all. Physics results, in particular when being preliminary and pending scrutiny, are sensitive, too. Just recently, an ATLAS collaborator copy/pasted the abstract of an ATLAS note onto an external public blog, despite the fact that this document was clearly marked as an "Internal Note". Such an act was not only embarrassing to the ATLAS collaboration, and had negative impact on CERN’s reputation --- i...

  16. Classifying Sluice Occurrences in Dialogue

    DEFF Research Database (Denmark)

    Baird, Austin; Hamza, Anissa; Hardt, Daniel

    2018-01-01

    perform manual annotation with acceptable inter-coder agreement. We build classifier models with Decision Trees and Naive Bayes, with accuracy of 67%. We deploy a classifier to automatically classify sluice occurrences in OpenSubtitles, resulting in a corpus with 1.7 million occurrences. This will support....... Despite this, the corpus can be of great use in research on sluicing and development of systems, and we are making the corpus freely available on request. Furthermore, we are in the process of improving the accuracy of sluice identification and annotation for the purpose of created a subsequent version...

  17. Evaluation of common genetic variants identified by GWAS for early onset and morbid obesity in population-based samples

    DEFF Research Database (Denmark)

    den Hoed, M; Luan, J; Langenberg, C

    2013-01-01

    BACKGROUND: Meta-analysis of case-control genome-wide association studies (GWAS) for early onset and morbid obesity identified four variants in/near the PRL, PTER, MAF and NPC1 genes. OBJECTIVE: We aimed to validate association of these variants with obesity-related traits in population-based sam......BACKGROUND: Meta-analysis of case-control genome-wide association studies (GWAS) for early onset and morbid obesity identified four variants in/near the PRL, PTER, MAF and NPC1 genes. OBJECTIVE: We aimed to validate association of these variants with obesity-related traits in population......, these variants, which were identified in a GWAS for early onset and morbid obesity, do not seem to influence obesity-related traits in the general population....

  18. Diagnostic Value of Endometrial Sampling with Pipelle Suction Curettage for Identifying Endometrial Lesions in Patients with Abnormal Uterine Bleeding

    Directory of Open Access Journals (Sweden)

    F Behnamfar

    2004-06-01

    Full Text Available Background: While determining the cause of abnormal uterine bleeding, sampling from the endometrium is necessary. Considering that pipelle suction curettage can be performed on an out patient basis and does not require hospitalization, using anesthesia and cervical dilatation, we performed this study. The aim of this study was to compare the diagnostic value of dilatation and curettage (D&C with pipelle suction curettage. Methods: This study was quasiexperimental on 200 pre and postmenopausal patients with abnormal uterine bleeding who refered to Shabihkhani hospital in Kashan, Iran. Endometrial sampling was performed in all patients with two methods namely pipelle and D&C. A pathologist examined the samples each having a predetermined code. Results: The mean age of subjects was 46.2 ±6.2 years, minimum age was 35 years and the maximum was 70 years. The various pathological lab findings were proliferative endometrium, secretory endometrium, athrophic, decidua, cystic and adenomatous hyperplasia. The reports were the same in two methods except for 2 cases where they were different: secretory endometrium with D&C but cystic hyperplasia in pipelle method. Conclusions: The result of our study shows the comparability of obtaining endometrial sample by pipelle with D&C. Due to comfort and convenience of patients in pipelle methode especially in the office setting which does not need anesthesia, pipelle method can easily be employed instead of D&C. Keywords: Pipelle Suction Curette, Dilatation and Curettage, Premenopause, Postmenopause.

  19. Filter-aided sample preparation with dimethyl labeling to identify and quantify milk fat globule membrane proteins.

    NARCIS (Netherlands)

    Lu, J.; Boeren, J.A.; Vries, de S.C.; Valenberg, van H.J.F.; Vervoort, J.J.M.; Hettinga, K.A.

    2011-01-01

    Bovine milk is a major nutrient source in many countries and it is produced at an industrial scale. Milk is a complex mixture of proteins, fats, carbohydrates, vitamins and minerals. The composition of the bovine milk samples can vary depending on the genetic makeup of the bovine species as well as

  20. Comparing identified and statistically significant lipids and polar metabolites in 15-year old serum and dried blood spot samples for longitudinal studies: Comparing lipids and metabolites in serum and DBS samples

    Energy Technology Data Exchange (ETDEWEB)

    Kyle, Jennifer E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Casey, Cameron P. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Stratton, Kelly G. [National Security Directorate, Pacific Northwest National Laboratory, Richland WA USA; Zink, Erika M. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Kim, Young-Mo [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Zheng, Xueyun [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Monroe, Matthew E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Weitz, Karl K. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Bloodsworth, Kent J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Orton, Daniel J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Ibrahim, Yehia M. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Moore, Ronald J. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Lee, Christine G. [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Research Service, Portland Veterans Affairs Medical Center, Portland OR USA; Pedersen, Catherine [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Orwoll, Eric [Department of Medicine, Bone and Mineral Unit, Oregon Health and Science University, Portland OR USA; Smith, Richard D. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Burnum-Johnson, Kristin E. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA; Baker, Erin S. [Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland WA USA

    2017-02-05

    The use of dried blood spots (DBS) has many advantages over traditional plasma and serum samples such as smaller blood volume required, storage at room temperature, and ability for sampling in remote locations. However, understanding the robustness of different analytes in DBS samples is essential, especially in older samples collected for longitudinal studies. Here we analyzed DBS samples collected in 2000-2001 and stored at room temperature and compared them to matched serum samples stored at -80°C to determine if they could be effectively used as specific time points in a longitudinal study following metabolic disease. Four hundred small molecules were identified in both the serum and DBS samples using gas chromatograph-mass spectrometry (GC-MS), liquid chromatography-MS (LC-MS) and LC-ion mobility spectrometry-MS (LC-IMS-MS). The identified polar metabolites overlapped well between the sample types, though only one statistically significant polar metabolite in a case-control study was conserved, indicating degradation occurs in the DBS samples affecting quantitation. Differences in the lipid identifications indicated that some oxidation occurs in the DBS samples. However, thirty-six statistically significant lipids correlated in both sample types indicating that lipid quantitation was more stable across the sample types.

  1. Potential bacterial core species associated with digital dermatitis in cattle herds identified by molecular profiling of interdigital skin samples

    DEFF Research Database (Denmark)

    Weiss Nielsen, Martin; Strube, Mikael Lenz; Isbrand, Anastasia

    2016-01-01

    of different molecular methods. Deep sequencing of the 16S rRNA gene variable regions V1–V2 showed that Treponema, Mycoplasma, Fusobacterium and Porphyromonas were the genera best differentiating the DD samples from the controls. Additional deep sequencing analysis of the most abundant genus, Treponema...... in the epidermal lesions and were present in only a subset of samples. RT-qPCR analysis showed that treponemes were also actively expressing a panel of virulence factors at the site of infection. Our results further support the hypothesis that species belonging to the genus Treponema are major pathogens of DD...... and also provide sufficient clues to motivate additional research into the role of M. fermentans, F. necrophorum and P. levii in the etiology of DD....

  2. Development of standardized methodology for identifying toxins in clinical samples and fish species associated with tetrodotoxin-borne poisoning incidents

    Directory of Open Access Journals (Sweden)

    Tai-Yuan Chen

    2016-01-01

    Full Text Available Tetrodotoxin (TTX is a naturally occurring toxin in food, especially in puffer fish. TTX poisoning is observed frequently in South East Asian regions. In TTX-derived food poisoning outbreaks, the amount of TTX recovered from suspicious fish samples or leftovers, and residual levels from biological fluids of victims are typically trace. However, liquid chromatography–mass spectrometry and liquid chromatography–tandem mass spectrometry methods have been demonstrated to qualitatively and quantitatively determine TTX in clinical samples from victims. Identification and validation of the TTX-originating seafood species responsible for a food poisoning incident is needed. A polymerase chain reaction-based method on mitochondrial DNA analysis is useful for identification of fish species. This review aims to collect pertinent information available on TTX-borne food poisoning incidents with a special emphasis on the analytical methods employed for TTX detection in clinical laboratories as well as for the identification of TTX-bearing species.

  3. Quantum ensembles of quantum classifiers.

    Science.gov (United States)

    Schuld, Maria; Petruccione, Francesco

    2018-02-09

    Quantum machine learning witnesses an increasing amount of quantum algorithms for data-driven decision making, a problem with potential applications ranging from automated image recognition to medical diagnosis. Many of those algorithms are implementations of quantum classifiers, or models for the classification of data inputs with a quantum computer. Following the success of collective decision making with ensembles in classical machine learning, this paper introduces the concept of quantum ensembles of quantum classifiers. Creating the ensemble corresponds to a state preparation routine, after which the quantum classifiers are evaluated in parallel and their combined decision is accessed by a single-qubit measurement. This framework naturally allows for exponentially large ensembles in which - similar to Bayesian learning - the individual classifiers do not have to be trained. As an example, we analyse an exponentially large quantum ensemble in which each classifier is weighed according to its performance in classifying the training data, leading to new results for quantum as well as classical machine learning.

  4. IAEA safeguards and classified materials

    International Nuclear Information System (INIS)

    Pilat, J.F.; Eccleston, G.W.; Fearey, B.L.; Nicholas, N.J.; Tape, J.W.; Kratzer, M.

    1997-01-01

    The international community in the post-Cold War period has suggested that the International Atomic Energy Agency (IAEA) utilize its expertise in support of the arms control and disarmament process in unprecedented ways. The pledges of the US and Russian presidents to place excess defense materials, some of which are classified, under some type of international inspections raises the prospect of using IAEA safeguards approaches for monitoring classified materials. A traditional safeguards approach, based on nuclear material accountancy, would seem unavoidably to reveal classified information. However, further analysis of the IAEA's safeguards approaches is warranted in order to understand fully the scope and nature of any problems. The issues are complex and difficult, and it is expected that common technical understandings will be essential for their resolution. Accordingly, this paper examines and compares traditional safeguards item accounting of fuel at a nuclear power station (especially spent fuel) with the challenges presented by inspections of classified materials. This analysis is intended to delineate more clearly the problems as well as reveal possible approaches, techniques, and technologies that could allow the adaptation of safeguards to the unprecedented task of inspecting classified materials. It is also hoped that a discussion of these issues can advance ongoing political-technical debates on international inspections of excess classified materials

  5. Synoptic sampling and principal components analysis to identify sources of water and metals to an acid mine drainage stream

    OpenAIRE

    Byrne, Patrick; Runkel, Robert L.; Walton-Day, Katherine

    2017-01-01

    Combining the synoptic mass balance approach with principal components analysis (PCA) can be an effective method for discretising the chemistry of inflows and source areas in watersheds where contamination is diffuse in nature and/or complicated by groundwater interactions. This paper presents a field-scale study in which synoptic sampling and PCA are employed in a mineralized watershed (Lion Creek, Colorado, USA) under low flow conditions to (i) quantify the impacts of mining activity on str...

  6. Gender, Sexual Orientation, and Rape Myth Acceptance: Preliminary Findings From a Sample of Primarily LGBQ-Identified Survey Respondents.

    Science.gov (United States)

    Schulze, Corina; Koon-Magnin, Sarah

    2017-02-01

    This study is among the first to examine the relationship between sexual orientation and rape myth adherence using a nationwide survey of primarily lesbian, gay, bisexual, and queer (LGBQ) respondents (n = 184). The more established Illinois Rape Myth Acceptance Scale and a modified Male Rape Survey serve as the primary instruments to test both rape myth adherence and instrument-appropriateness. Results suggest that respondents were most likely to support myths that discredit sexual assault allegations or excuse rape as a biological imperative and least likely to support myths related to physical resistance. Consistent with previous work, men exhibited higher levels of rape myth adherence than women. Regarding sexual orientation, respondents who identified as queer consistently exhibited lower levels of rape myth adherence than respondents who identified as gay.

  7. Optical Whole-Genome Restriction Mapping as a Tool for Rapidly Distinguishing and Identifying Bacterial Contaminants in Clinical Samples

    Science.gov (United States)

    2015-08-01

    Article 3. DATES COVERED (From – To) Oct 2011 – Aug 2012 4. TITLE AND SUBTITLE Optical Whole-Genome Restriction Mapping as a Tool for Rapidly...multiple bacteria could be uniquely identified within mixtures. In the first set of experiments, three unique organisms ( Bacillus subtilis subsp. globigii...be useful in monitoring nosocomial outbreaks in neonatal and intensive care wards, or even as an initial screen for antibiotic resistant strains

  8. DNA typing of ancient parasite eggs from environmental samples identifies human and animal worm infections in Viking-age settlement

    DEFF Research Database (Denmark)

    Søe, Martin Jensen; Fredensborg, Brian Lund; Nejsum, Peter

    Human worm infections have, to a large extent, been eradicated in countries with high sanitary standards by preventing the fecal-oral transmission of infective eggs. It is possible to study parasite infections among past populations by retrieving and analyzing parasite eggs using paleoparasitolog......-age. Further, eggs of the Liver Fluke (Fasciola hepatica), whose primary hosts are cows and sheep, are identified indicating that grazing animals were kept in close proximity of the settlement....

  9. Robust Framework to Combine Diverse Classifiers Assigning Distributed Confidence to Individual Classifiers at Class Level

    Directory of Open Access Journals (Sweden)

    Shehzad Khalid

    2014-01-01

    Full Text Available We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes.

  10. Hybrid classifiers methods of data, knowledge, and classifier combination

    CERN Document Server

    Wozniak, Michal

    2014-01-01

    This book delivers a definite and compact knowledge on how hybridization can help improving the quality of computer classification systems. In order to make readers clearly realize the knowledge of hybridization, this book primarily focuses on introducing the different levels of hybridization and illuminating what problems we will face with as dealing with such projects. In the first instance the data and knowledge incorporated in hybridization were the action points, and then a still growing up area of classifier systems known as combined classifiers was considered. This book comprises the aforementioned state-of-the-art topics and the latest research results of the author and his team from Department of Systems and Computer Networks, Wroclaw University of Technology, including as classifier based on feature space splitting, one-class classification, imbalance data, and data stream classification.

  11. Using Respondent Driven Sampling to Identify Malaria Risks and Occupational Networks among Migrant Workers in Ranong, Thailand.

    Directory of Open Access Journals (Sweden)

    Piyaporn Wangroongsarb

    Full Text Available Ranong Province in southern Thailand is one of the primary entry points for migrants entering Thailand from Myanmar, and borders Kawthaung Township in Myanmar where artemisinin resistance in malaria parasites has been detected. Areas of high population movement could increase the risk of spread of artemisinin resistance in this region and beyond.A respondent-driven sampling (RDS methodology was used to compare migrant populations coming from Myanmar in urban (Site 1 vs. rural (Site 2 settings in Ranong, Thailand. The RDS methodology collected information on knowledge, attitudes, and practices for malaria, travel and occupational histories, as well as social network size and structure. Individuals enrolled were screened for malaria by microscopy, Real Time-PCR, and serology.A total of 619 participants were recruited in Ranong City and 623 participants in Kraburi, a rural sub-district. By PCR, a total of 14 (1.1% samples were positive (2 P. falciparum in Site 1; 10 P. vivax, 1 Pf, and 1 P. malariae in Site 2. PCR analysis demonstrated an overall weighted prevalence of 0.5% (95% CI, 0-1.3% in the urban site and 1.0% (95% CI, 0.5-1.7% in the rural site for all parasite species. PCR positivity did not correlate with serological positivity; however, as expected there was a strong association between antibody prevalence and both age and exposure. Access to long-lasting insecticidal treated nets remains low despite relatively high reported traditional net use among these populations.The low malaria prevalence, relatively smaller networks among migrants in rural settings, and limited frequency of travel to and from other areas of malaria transmission in Myanmar, suggest that the risk for the spread of artemisinin resistance from this area may be limited in these networks currently but may have implications for regional malaria elimination efforts.

  12. An Efficient Method for Identifying Gene Fusions by Targeted RNA Sequencing from Fresh Frozen and FFPE Samples.

    Directory of Open Access Journals (Sweden)

    Jonathan A Scolnick

    Full Text Available Fusion genes are known to be key drivers of tumor growth in several types of cancer. Traditionally, detecting fusion genes has been a difficult task based on fluorescent in situ hybridization to detect chromosomal abnormalities. More recently, RNA sequencing has enabled an increased pace of fusion gene identification. However, RNA-Seq is inefficient for the identification of fusion genes due to the high number of sequencing reads needed to detect the small number of fusion transcripts present in cells of interest. Here we describe a method, Single Primer Enrichment Technology (SPET, for targeted RNA sequencing that is customizable to any target genes, is simple to use, and efficiently detects gene fusions. Using SPET to target 5701 exons of 401 known cancer fusion genes for sequencing, we were able to identify known and previously unreported gene fusions from both fresh-frozen and formalin-fixed paraffin-embedded (FFPE tissue RNA in both normal tissue and cancer cells.

  13. A multi-sample based method for identifying common CNVs in normal human genomic structure using high-resolution aCGH data.

    Directory of Open Access Journals (Sweden)

    Chihyun Park

    Full Text Available BACKGROUND: It is difficult to identify copy number variations (CNV in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. METHODOLOGY AND PRINCIPAL FINDINGS: We developed a multi-sample-based genomic variations detector (MGVD that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs; CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR. CONCLUSIONS AND SIGNIFICANCE: We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php.

  14. 3D Bayesian contextual classifiers

    DEFF Research Database (Denmark)

    Larsen, Rasmus

    2000-01-01

    We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours.......We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours....

  15. Ecological context, concentrated disadvantage, and youth reoffending: identifying the social mechanisms in a sample of serious adolescent offenders.

    Science.gov (United States)

    Wright, Kevin A; Kim, Byungbae; Chassin, Laurie; Losoya, Sandra H; Piquero, Alex R

    2014-10-01

    Serious youthful offenders are presented with a number of significant challenges when trying to make a successful transition from adolescence to adulthood. One of the biggest obstacles for these youth to overcome concerns their ability to desist from further antisocial behavior, and although an emerging body of research has documented important risk and protective factors associated with desistance, the importance of the neighborhoods within which these youth reside has been understudied. Guided by the larger neighborhood effects on crime literature, the current study examines the direct and indirect effects of concentrated disadvantage on youth reoffending among a sample of highly mobile, serious youthful offenders. We use data from Pathways to Desistance, a longitudinal study of serious youthful offenders (N = 1,354; 13.6% female; 41.4% African American, 33.5% Hispanic, 20.2% White), matched up with 2000 Census data on neighborhood conditions for youth's main residence location during waves 7 and 8 of the study. These waves represent the time period in which youth are navigating the transition to adulthood (aged 18-22; average age = 20). We estimate structural equation models to determine direct effects of concentrated disadvantage on youth reoffending and also to examine the possible indirect effects working through individual-level mechanisms as specified by theoretical perspectives including social control (e.g., unsupervised peer activities), strain (e.g., exposure to violence), and learning (e.g., exposure to antisocial peers). Additionally, we estimate models that take into account the impact that a change in neighborhood conditions may have on the behavior of youth who move to new residences during the study period. Our results show that concentrated disadvantage is indirectly associated with youth reoffending primarily through its association with exposure to deviant peers. Taking into account youth mobility during the study period produced an additional

  16. Knowledge Uncertainty and Composed Classifier

    Czech Academy of Sciences Publication Activity Database

    Klimešová, Dana; Ocelíková, E.

    2007-01-01

    Roč. 1, č. 2 (2007), s. 101-105 ISSN 1998-0140 Institutional research plan: CEZ:AV0Z10750506 Keywords : Boosting architecture * contextual modelling * composed classifier * knowledge management, * knowledge * uncertainty Subject RIV: IN - Informatics, Computer Science

  17. Correlation Dimension-Based Classifier

    Czech Academy of Sciences Publication Activity Database

    Jiřina, Marcel; Jiřina jr., M.

    2014-01-01

    Roč. 44, č. 12 (2014), s. 2253-2263 ISSN 2168-2267 R&D Projects: GA MŠk(CZ) LG12020 Institutional support: RVO:67985807 Keywords : classifier * multidimensional data * correlation dimension * scaling exponent * polynomial expansion Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 3.469, year: 2014

  18. On the Use of Biomineral Oxygen Isotope Data to Identify Human Migrants in the Archaeological Record: Intra-Sample Variation, Statistical Methods and Geographical Considerations.

    Directory of Open Access Journals (Sweden)

    Emma Lightfoot

    Full Text Available Oxygen isotope analysis of archaeological skeletal remains is an increasingly popular tool to study past human migrations. It is based on the assumption that human body chemistry preserves the δ18O of precipitation in such a way as to be a useful technique for identifying migrants and, potentially, their homelands. In this study, the first such global survey, we draw on published human tooth enamel and bone bioapatite data to explore the validity of using oxygen isotope analyses to identify migrants in the archaeological record. We use human δ18O results to show that there are large variations in human oxygen isotope values within a population sample. This may relate to physiological factors influencing the preservation of the primary isotope signal, or due to human activities (such as brewing, boiling, stewing, differential access to water sources and so on causing variation in ingested water and food isotope values. We compare the number of outliers identified using various statistical methods. We determine that the most appropriate method for identifying migrants is dependent on the data but is likely to be the IQR or median absolute deviation from the median under most archaeological circumstances. Finally, through a spatial assessment of the dataset, we show that the degree of overlap in human isotope values from different locations across Europe is such that identifying individuals' homelands on the basis of oxygen isotope analysis alone is not possible for the regions analysed to date. Oxygen isotope analysis is a valid method for identifying first-generation migrants from an archaeological site when used appropriately, however it is difficult to identify migrants using statistical methods for a sample size of less than c. 25 individuals. In the absence of local previous analyses, each sample should be treated as an individual dataset and statistical techniques can be used to identify migrants, but in most cases pinpointing a specific

  19. Classified facilities for environmental protection

    International Nuclear Information System (INIS)

    Anon.

    1993-02-01

    The legislation of the classified facilities governs most of the dangerous or polluting industries or fixed activities. It rests on the law of 9 July 1976 concerning facilities classified for environmental protection and its application decree of 21 September 1977. This legislation, the general texts of which appear in this volume 1, aims to prevent all the risks and the harmful effects coming from an installation (air, water or soil pollutions, wastes, even aesthetic breaches). The polluting or dangerous activities are defined in a list called nomenclature which subjects the facilities to a declaration or an authorization procedure. The authorization is delivered by the prefect at the end of an open and contradictory procedure after a public survey. In addition, the facilities can be subjected to technical regulations fixed by the Environment Minister (volume 2) or by the prefect for facilities subjected to declaration (volume 3). (A.B.)

  20. Identifying the major bacteria causing intramammary infections in individual milk samples of sheep and goats using traditional bacteria culturing and real-time polymerase chain reaction.

    Science.gov (United States)

    Rovai, M; Caja, G; Salama, A A K; Jubert, A; Lázaro, B; Lázaro, M; Leitner, G

    2014-09-01

    Use of DNA-based methods, such as real-time PCR, has increased the sensitivity and shortened the time for bacterial identification, compared with traditional bacteriology; however, results should be interpreted carefully because a positive PCR result does not necessarily mean that an infection exists. One hundred eight lactating dairy ewes (56 Manchega and 52 Lacaune) and 24 Murciano-Granadina dairy goats were used for identifying the main bacteria causing intramammary infections (IMI) using traditional bacterial culturing and real-time PCR and their effects on milk performance. Udder-half milk samples were taken for bacterial culturing and somatic cell count (SCC) 3 times throughout lactation. Intramammary infections were assessed based on bacteria isolated in ≥2 samplings accompanied by increased SCC. Prevalence of subclinical IMI was 42.9% in Manchega and 50.0% in Lacaune ewes and 41.7% in goats, with the estimated milk yield loss being 13.1, 17.9, and 18.0%, respectively. According to bacteriology results, 87% of the identified single bacteria species (with more than 3 colonies/plate) or culture-negative growth were identical throughout samplings, which agreed 98.9% with the PCR results. Nevertheless, the study emphasized that 1 sampling may not be sufficient to determine IMI and, therefore, other inflammatory responses such as increased SCC should be monitored to identify true infections. Moreover, when PCR methodology is used, aseptic and precise milk sampling procedures are key for avoiding false-positive amplifications. In conclusion, both PCR and bacterial culture methods proved to have similar accuracy for identifying infective bacteria in sheep and goats. The final choice will depend on their response time and cost analysis, according to the requirements and farm management strategy. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  1. Energy-Efficient Neuromorphic Classifiers.

    Science.gov (United States)

    Martí, Daniel; Rigotti, Mattia; Seok, Mingoo; Fusi, Stefano

    2016-10-01

    Neuromorphic engineering combines the architectural and computational principles of systems neuroscience with semiconductor electronics, with the aim of building efficient and compact devices that mimic the synaptic and neural machinery of the brain. The energy consumptions promised by neuromorphic engineering are extremely low, comparable to those of the nervous system. Until now, however, the neuromorphic approach has been restricted to relatively simple circuits and specialized functions, thereby obfuscating a direct comparison of their energy consumption to that used by conventional von Neumann digital machines solving real-world tasks. Here we show that a recent technology developed by IBM can be leveraged to realize neuromorphic circuits that operate as classifiers of complex real-world stimuli. Specifically, we provide a set of general prescriptions to enable the practical implementation of neural architectures that compete with state-of-the-art classifiers. We also show that the energy consumption of these architectures, realized on the IBM chip, is typically two or more orders of magnitude lower than that of conventional digital machines implementing classifiers with comparable performance. Moreover, the spike-based dynamics display a trade-off between integration time and accuracy, which naturally translates into algorithms that can be flexibly deployed for either fast and approximate classifications, or more accurate classifications at the mere expense of longer running times and higher energy costs. This work finally proves that the neuromorphic approach can be efficiently used in real-world applications and has significant advantages over conventional digital devices when energy consumption is considered.

  2. 76 FR 34761 - Classified National Security Information

    Science.gov (United States)

    2011-06-14

    ... MARINE MAMMAL COMMISSION Classified National Security Information [Directive 11-01] AGENCY: Marine... Commission's (MMC) policy on classified information, as directed by Information Security Oversight Office... of Executive Order 13526, ``Classified National Security Information,'' and 32 CFR part 2001...

  3. Optimization of short amino acid sequences classifier

    Science.gov (United States)

    Barcz, Aleksy; Szymański, Zbigniew

    This article describes processing methods used for short amino acid sequences classification. The data processed are 9-symbols string representations of amino acid sequences, divided into 49 data sets - each one containing samples labeled as reacting or not with given enzyme. The goal of the classification is to determine for a single enzyme, whether an amino acid sequence would react with it or not. Each data set is processed separately. Feature selection is performed to reduce the number of dimensions for each data set. The method used for feature selection consists of two phases. During the first phase, significant positions are selected using Classification and Regression Trees. Afterwards, symbols appearing at the selected positions are substituted with numeric values of amino acid properties taken from the AAindex database. In the second phase the new set of features is reduced using a correlation-based ranking formula and Gram-Schmidt orthogonalization. Finally, the preprocessed data is used for training LS-SVM classifiers. SPDE, an evolutionary algorithm, is used to obtain optimal hyperparameters for the LS-SVM classifier, such as error penalty parameter C and kernel-specific hyperparameters. A simple score penalty is used to adapt the SPDE algorithm to the task of selecting classifiers with best performance measures values.

  4. STATISTICAL TOOLS FOR CLASSIFYING GALAXY GROUP DYNAMICS

    International Nuclear Information System (INIS)

    Hou, Annie; Parker, Laura C.; Harris, William E.; Wilman, David J.

    2009-01-01

    The dynamical state of galaxy groups at intermediate redshifts can provide information about the growth of structure in the universe. We examine three goodness-of-fit tests, the Anderson-Darling (A-D), Kolmogorov, and χ 2 tests, in order to determine which statistical tool is best able to distinguish between groups that are relaxed and those that are dynamically complex. We perform Monte Carlo simulations of these three tests and show that the χ 2 test is profoundly unreliable for groups with fewer than 30 members. Power studies of the Kolmogorov and A-D tests are conducted to test their robustness for various sample sizes. We then apply these tests to a sample of the second Canadian Network for Observational Cosmology Redshift Survey (CNOC2) galaxy groups and find that the A-D test is far more reliable and powerful at detecting real departures from an underlying Gaussian distribution than the more commonly used χ 2 and Kolmogorov tests. We use this statistic to classify a sample of the CNOC2 groups and find that 34 of 106 groups are inconsistent with an underlying Gaussian velocity distribution, and thus do not appear relaxed. In addition, we compute velocity dispersion profiles (VDPs) for all groups with more than 20 members and compare the overall features of the Gaussian and non-Gaussian groups, finding that the VDPs of the non-Gaussian groups are distinct from those classified as Gaussian.

  5. Genome-wide association study meta-analysis of European and Asian-ancestry samples identifies three novel loci associated with bipolar disorder.

    Science.gov (United States)

    Chen, D T; Jiang, X; Akula, N; Shugart, Y Y; Wendland, J R; Steele, C J M; Kassem, L; Park, J-H; Chatterjee, N; Jamain, S; Cheng, A; Leboyer, M; Muglia, P; Schulze, T G; Cichon, S; Nöthen, M M; Rietschel, M; McMahon, F J; Farmer, A; McGuffin, P; Craig, I; Lewis, C; Hosang, G; Cohen-Woods, S; Vincent, J B; Kennedy, J L; Strauss, J

    2013-02-01

    Meta-analyses of bipolar disorder (BD) genome-wide association studies (GWAS) have identified several genome-wide significant signals in European-ancestry samples, but so far account for little of the inherited risk. We performed a meta-analysis of ∼750,000 high-quality genetic markers on a combined sample of ∼14,000 subjects of European and Asian-ancestry (phase I). The most significant findings were further tested in an extended sample of ∼17,700 cases and controls (phase II). The results suggest novel association findings near the genes TRANK1 (LBA1), LMAN2L and PTGFR. In phase I, the most significant single nucleotide polymorphism (SNP), rs9834970 near TRANK1, was significant at the P=2.4 × 10(-11) level, with no heterogeneity. Supportive evidence for prior association findings near ANK3 and a locus on chromosome 3p21.1 was also observed. The phase II results were similar, although the heterogeneity test became significant for several SNPs. On the basis of these results and other established risk loci, we used the method developed by Park et al. to estimate the number, and the effect size distribution, of BD risk loci that could still be found by GWAS methods. We estimate that >63,000 case-control samples would be needed to identify the ∼105 BD risk loci discoverable by GWAS, and that these will together explain <6% of the inherited risk. These results support previous GWAS findings and identify three new candidate genes for BD. Further studies are needed to replicate these findings and may potentially lead to identification of functional variants. Sample size will remain a limiting factor in the discovery of common alleles associated with BD.

  6. An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data

    Directory of Open Access Journals (Sweden)

    Datta Susmita

    2010-08-01

    Full Text Available Abstract Background Generally speaking, different classifiers tend to work well for certain types of data and conversely, it is usually not known a priori which algorithm will be optimal in any given classification application. In addition, for most classification problems, selecting the best performing classification algorithm amongst a number of competing algorithms is a difficult task for various reasons. As for example, the order of performance may depend on the performance measure employed for such a comparison. In this work, we present a novel adaptive ensemble classifier constructed by combining bagging and rank aggregation that is capable of adaptively changing its performance depending on the type of data that is being classified. The attractive feature of the proposed classifier is its multi-objective nature where the classification results can be simultaneously optimized with respect to several performance measures, for example, accuracy, sensitivity and specificity. We also show that our somewhat complex strategy has better predictive performance as judged on test samples than a more naive approach that attempts to directly identify the optimal classifier based on the training data performances of the individual classifiers. Results We illustrate the proposed method with two simulated and two real-data examples. In all cases, the ensemble classifier performs at the level of the best individual classifier comprising the ensemble or better. Conclusions For complex high-dimensional datasets resulting from present day high-throughput experiments, it may be wise to consider a number of classification algorithms combined with dimension reduction techniques rather than a fixed standard algorithm set a priori.

  7. An Investigation to Improve Classifier Accuracy for Myo Collected Data

    Science.gov (United States)

    2017-02-01

    Bad Samples Effect on Classification Accuracy 7 5.1 Naïve Bayes (NB) Classifier Accuracy 7 5.2 Logistic Model Tree (LMT) 10 5.3 K-Nearest Neighbor...gesture, pitch feature, user 06. All samples exhibit reversed movement...20 Fig. A-2 Come gesture, pitch feature, user 14. All samples exhibit reversed movement

  8. Cancer associated epigenetic transitions identified by genome-wide histone methylation binding profiles in human colorectal cancer samples and paired normal mucosa

    International Nuclear Information System (INIS)

    Enroth, Stefan; Rada-Iglesisas, Alvaro; Andersson, Robin; Wallerman, Ola; Wanders, Alkwin; Påhlman, Lars; Komorowski, Jan; Wadelius, Claes

    2011-01-01

    Despite their well-established functional roles, histone modifications have received less attention than DNA methylation in the cancer field. In order to evaluate their importance in colorectal cancer (CRC), we generated the first genome-wide histone modification profiles in paired normal colon mucosa and tumor samples. Chromatin immunoprecipitation and microarray hybridization (ChIP-chip) was used to identify promoters enriched for histone H3 trimethylated on lysine 4 (H3K4me3) and lysine 27 (H3K27me3) in paired normal colon mucosa and tumor samples from two CRC patients and for the CRC cell line HT29. By comparing histone modification patterns in normal mucosa and tumors, we found that alterations predicted to have major functional consequences were quite rare. Furthermore, when normal or tumor tissue samples were compared to HT29, high similarities were observed for H3K4me3. However, the differences found for H3K27me3, which is important in determining cellular identity, indicates that cell lines do not represent optimal tissue models. Finally, using public expression data, we uncovered previously unknown changes in CRC expression patterns. Genes positive for H3K4me3 in normal and/or tumor samples, which are typically already active in normal mucosa, became hyperactivated in tumors, while genes with H3K27me3 in normal and/or tumor samples and which are expressed at low levels in normal mucosa, became hypersilenced in tumors. Genome wide histone modification profiles can be used to find epigenetic aberrations in genes associated with cancer. This strategy gives further insights into the epigenetic contribution to the oncogenic process and may identify new biomarkers

  9. Validity of the Male Depression Risk Scale in a representative Canadian sample: sensitivity and specificity in identifying men with recent suicide attempt.

    Science.gov (United States)

    Rice, Simon M; Ogrodniczuk, John S; Kealy, David; Seidler, Zac E; Dhillon, Haryana M; Oliffe, John L

    2017-12-22

    Clinical practice and literature has supported the existence of a phenotypic sub-type of depression in men. While a number of self-report rating scales have been developed in order to empirically test the male depression construct, psychometric validation of these scales is limited. To confirm the psychometric properties of the multidimensional Male Depression Risk Scale (MDRS-22) and to develop clinical cut-off scores for the MDRS-22. Data were obtained from an online sample of 1000 Canadian men (median age (M) = 49.63, standard deviation (SD) = 14.60). Confirmatory factor analysis (CFA) was used to replicate the established six-factor model of the MDRS-22. Psychometric values of the MDRS subscales were comparable to the widely used Patient Health Questionnaire-9. CFA model fit indices indicated adequate model fit for the six-factor MDRS-22 model. ROC curve analysis indicated the MDRS-22 was effective for identifying those with a recent (previous four-weeks) suicide attempt (area under curve (AUC) values = 0.837). The MDRS-22 cut-off identified proportionally more (84.62%) cases of recent suicide attempt relative to the PHQ-9 moderate range (53.85%). The MDRS-22 is the first male-sensitive depression scale to be psychometrically validated using CFA techniques in independent and cross-nation samples. Additional studies should identify differential item functioning and evaluate cross-cultural effects.

  10. An ensemble of dissimilarity based classifiers for Mackerel gender determination

    International Nuclear Information System (INIS)

    Blanco, A; Rodriguez, R; Martinez-Maranon, I

    2014-01-01

    Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity

  11. An ensemble of dissimilarity based classifiers for Mackerel gender determination

    Science.gov (United States)

    Blanco, A.; Rodriguez, R.; Martinez-Maranon, I.

    2014-03-01

    Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity.

  12. Classifying machinery condition using oil samples and binary logistic regression

    Science.gov (United States)

    Phillips, J.; Cripps, E.; Lau, John W.; Hodkiewicz, M. R.

    2015-08-01

    The era of big data has resulted in an explosion of condition monitoring information. The result is an increasing motivation to automate the costly and time consuming human elements involved in the classification of machine health. When working with industry it is important to build an understanding and hence some trust in the classification scheme for those who use the analysis to initiate maintenance tasks. Typically "black box" approaches such as artificial neural networks (ANN) and support vector machines (SVM) can be difficult to provide ease of interpretability. In contrast, this paper argues that logistic regression offers easy interpretability to industry experts, providing insight to the drivers of the human classification process and to the ramifications of potential misclassification. Of course, accuracy is of foremost importance in any automated classification scheme, so we also provide a comparative study based on predictive performance of logistic regression, ANN and SVM. A real world oil analysis data set from engines on mining trucks is presented and using cross-validation we demonstrate that logistic regression out-performs the ANN and SVM approaches in terms of prediction for healthy/not healthy engines.

  13. Waste classifying and separation device

    International Nuclear Information System (INIS)

    Kakiuchi, Hiroki.

    1997-01-01

    A flexible plastic bags containing solid wastes of indefinite shape is broken and the wastes are classified. The bag cutting-portion of the device has an ultrasonic-type or a heater-type cutting means, and the cutting means moves in parallel with the transferring direction of the plastic bags. A classification portion separates and discriminates the plastic bag from the contents and conducts classification while rotating a classification table. Accordingly, the plastic bag containing solids of indefinite shape can be broken and classification can be conducted efficiently and reliably. The device of the present invention has a simple structure which requires small installation space and enables easy maintenance. (T.M.)

  14. Identifying the tundra-forest border in the stomate record: an analysis of lake surface samples from the Yellowknife area, Northwest Territories, Canada

    Energy Technology Data Exchange (ETDEWEB)

    Hansen, B.C.S. [Minnesota Univ., Minneapolis, MN (United States). Limnological Research Center; MacDonald, G.M. [California Univ., Los Angeles, CA (United States). Dept. of Botanical Sciences; Moser, K.A. [McMaster Univ., Hamilton, ON (Canada)

    1996-05-01

    The relationship between conifer stomata and existing vegetation across tundra, forest-tundra, and closed zones in the Yellowknife area of the Northwest Territories was studied. Conifer stomata were identified in surface samples from lakes in the treeline zone, but were absent in samples from tundra lakes. Stomate analysis was recorded and the results were presented in a concentration diagram plotting stomate concentrations according to vegetation zone. Conifer stomate analysis was not able to resolve differences between forest-tundra and closed forest. Nevertheless, it was suggested that stomate analysis will become an important technique supplementing pollen analysis for reconstructing past tree-line changes since the presence of stomata in lakes make it possible to separate the tundra from forest-tundra and closed forest. The limited dispersal of conifer stomata permitted a better resolution of tree-line boundaries than did pollen. 13 refs., 3 figs.

  15. DNA-based hair sampling to identify road crossings and estimate population size of black bears in Great Dismal Swamp National Wildlife Refuge, Virginia

    OpenAIRE

    Wills, Johnny

    2008-01-01

    The planned widening of U.S. Highway 17 along the east boundary of Great Dismal Swamp National Wildlife Refuge (GDSNWR) and a lack of knowledge about the refugeâ s bear population created the need to identify potential sites for wildlife crossings and estimate the size of the refugeâ s bear population. I collected black bear hair in order to collect DNA samples to estimate population size, density, and sex ratio, and determine road crossing locations for black bears (Ursus americanus) in G...

  16. Frog sound identification using extended k-nearest neighbor classifier

    Science.gov (United States)

    Mukahar, Nordiana; Affendi Rosdi, Bakhtiar; Athiar Ramli, Dzati; Jaafar, Haryati

    2017-09-01

    Frog sound identification based on the vocalization becomes important for biological research and environmental monitoring. As a result, different types of feature extractions and classifiers have been employed to evaluate the accuracy of frog sound identification. This paper presents a frog sound identification with Extended k-Nearest Neighbor (EKNN) classifier. The EKNN classifier integrates the nearest neighbors and mutual sharing of neighborhood concepts, with the aims of improving the classification performance. It makes a prediction based on who are the nearest neighbors of the testing sample and who consider the testing sample as their nearest neighbors. In order to evaluate the classification performance in frog sound identification, the EKNN classifier is compared with competing classifier, k -Nearest Neighbor (KNN), Fuzzy k -Nearest Neighbor (FKNN) k - General Nearest Neighbor (KGNN)and Mutual k -Nearest Neighbor (MKNN) on the recorded sounds of 15 frog species obtained in Malaysia forest. The recorded sounds have been segmented using Short Time Energy and Short Time Average Zero Crossing Rate (STE+STAZCR), sinusoidal modeling (SM), manual and the combination of Energy (E) and Zero Crossing Rate (ZCR) (E+ZCR) while the features are extracted by Mel Frequency Cepstrum Coefficient (MFCC). The experimental results have shown that the EKNCN classifier exhibits the best performance in terms of accuracy compared to the competing classifiers, KNN, FKNN, GKNN and MKNN for all cases.

  17. Identifying undiagnosed HIV in men who have sex with men (MSM) by offering HIV home sampling via online gay social media: a service evaluation.

    Science.gov (United States)

    Elliot, E; Rossi, M; McCormack, S; McOwan, A

    2016-09-01

    An estimated one in eight men who have sex with men (MSM) in London lives with HIV, of which 16% are undiagnosed. It is a public health priority to minimise time spent undiagnosed and reduce morbidity, mortality and onward HIV transmission. 'Dean Street at Home' provided an online HIV risk self-assessment and postal home HIV sampling service aimed at hard-to-reach, high-risk MSM. This 2-year service evaluation aims to determine the HIV risk behaviour of users, the uptake of offer of home sampling and the acceptability of the service. Users were invited to assess their HIV risk anonymously through messages or promotional banners on several gay social networking websites. Regardless of risk, they were offered a free postal HIV oral fluid or blood self-sampling kit. Reactive results were confirmed in clinic. A user survey was sent to first year respondents. 17 361 respondents completed the risk self-assessment. Of these, half had an 'identifiable risk' for HIV and a third was previously untested. 5696 test kits were returned. 121 individuals had a reactive sample; 82 (1.4% of returned samples) confirmed as new HIV diagnoses linked to care; 14 (0.25%) already knew their diagnosis; and 14 (0.25%) were false reactives. The median age at diagnosis was 38; median CD4 505 cells/µL and 20% were recent infections. 61/82 (78%) were confirmed on treatment at the time of writing. The post-test email survey revealed a high service acceptability rate. The service was the first of its kind in the UK. This evaluation provides evidence to inform the potential roll-out of further online strategies to enhance community HIV testing. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  18. Obscenity detection using haar-like features and Gentle Adaboost classifier.

    Science.gov (United States)

    Mustafa, Rashed; Min, Yang; Zhu, Dingju

    2014-01-01

    Large exposure of skin area of an image is considered obscene. This only fact may lead to many false images having skin-like objects and may not detect those images which have partially exposed skin area but have exposed erotogenic human body parts. This paper presents a novel method for detecting nipples from pornographic image contents. Nipple is considered as an erotogenic organ to identify pornographic contents from images. In this research Gentle Adaboost (GAB) haar-cascade classifier and haar-like features used for ensuring detection accuracy. Skin filter prior to detection made the system more robust. The experiment showed that, considering accuracy, haar-cascade classifier performs well, but in order to satisfy detection time, train-cascade classifier is suitable. To validate the results, we used 1198 positive samples containing nipple objects and 1995 negative images. The detection rates for haar-cascade and train-cascade classifiers are 0.9875 and 0.8429, respectively. The detection time for haar-cascade is 0.162 seconds and is 0.127 seconds for train-cascade classifier.

  19. Obscenity Detection Using Haar-Like Features and Gentle Adaboost Classifier

    Directory of Open Access Journals (Sweden)

    Rashed Mustafa

    2014-01-01

    Full Text Available Large exposure of skin area of an image is considered obscene. This only fact may lead to many false images having skin-like objects and may not detect those images which have partially exposed skin area but have exposed erotogenic human body parts. This paper presents a novel method for detecting nipples from pornographic image contents. Nipple is considered as an erotogenic organ to identify pornographic contents from images. In this research Gentle Adaboost (GAB haar-cascade classifier and haar-like features used for ensuring detection accuracy. Skin filter prior to detection made the system more robust. The experiment showed that, considering accuracy, haar-cascade classifier performs well, but in order to satisfy detection time, train-cascade classifier is suitable. To validate the results, we used 1198 positive samples containing nipple objects and 1995 negative images. The detection rates for haar-cascade and train-cascade classifiers are 0.9875 and 0.8429, respectively. The detection time for haar-cascade is 0.162 seconds and is 0.127 seconds for train-cascade classifier.

  20. The interprocess NIR sampling as an alternative approach to multivariate statistical process control for identifying sources of product-quality variability.

    Science.gov (United States)

    Marković, Snežana; Kerč, Janez; Horvat, Matej

    2017-03-01

    We are presenting a new approach of identifying sources of variability within a manufacturing process by NIR measurements of samples of intermediate material after each consecutive unit operation (interprocess NIR sampling technique). In addition, we summarize the development of a multivariate statistical process control (MSPC) model for the production of enteric-coated pellet product of the proton-pump inhibitor class. By developing provisional NIR calibration models, the identification of critical process points yields comparable results to the established MSPC modeling procedure. Both approaches are shown to lead to the same conclusion, identifying parameters of extrusion/spheronization and characteristics of lactose that have the greatest influence on the end-product's enteric coating performance. The proposed approach enables quicker and easier identification of variability sources during manufacturing process, especially in cases when historical process data is not straightforwardly available. In the presented case the changes of lactose characteristics are influencing the performance of the extrusion/spheronization process step. The pellet cores produced by using one (considered as less suitable) lactose source were on average larger and more fragile, leading to consequent breakage of the cores during subsequent fluid bed operations. These results were confirmed by additional experimental analyses illuminating the underlying mechanism of fracture of oblong pellets during the pellet coating process leading to compromised film coating.

  1. Classifying smoking urges via machine learning.

    Science.gov (United States)

    Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

    2016-12-01

    Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights

  2. Return of a giant: DNA from archival museum samples helps to identify a unique cutthroat trout lineage formerly thought to be extinct.

    Science.gov (United States)

    Peacock, Mary M; Hekkala, Evon R; Kirchoff, Veronica S; Heki, Lisa G

    2017-11-01

    Currently one small, native population of the culturally and ecologically important Lahontan cutthroat trout ( Oncorhynchus clarkii henshawi , LCT, Federally listed) remains in the Truckee River watershed of northwestern Nevada and northeastern California. The majority of populations in this watershed were extirpated in the 1940s due to invasive species, overharvest, anthropogenic water consumption and changing precipitation regimes. In 1977, a population of cutthroat trout discovered in the Pilot Peak Mountains in the Bonneville basin of Utah, was putatively identified as the extirpated LCT lacustrine lineage native to Pyramid Lake in the Truckee River basin based on morphological and meristic characters. Our phylogenetic and Bayesian genotype clustering analyses of museum specimens collected from the large lakes (1872-1913) and contemporary samples collected from populations throughout the extant range provide evidence in support of a genetically distinct Truckee River basin origin for this population. Analysis of museum samples alone identified three distinct genotype clusters and historical connectivity among water bodies within the Truckee River basin. Baseline data from museum collections indicate that the extant Pilot Peak strain represents a remnant of the extirpated lacustrine lineage. Given the limitations on high-quality data when working with a sparse number of preserved museum samples, we acknowledge that, in the end, this may be a more complicated story. However, the paucity of remnant populations in the Truckee River watershed, in combination with data on the distribution of morphological, meristic and genetic data for Lahontan cutthroat trout, suggests that recovery strategies, particularly in the large lacustrine habitats should consider this lineage as an important part of the genetic legacy of this species.

  3. Using Language Sample Analysis in Clinical Practice: Measures of Grammatical Accuracy for Identifying Language Impairment in Preschool and School-Aged Children.

    Science.gov (United States)

    Eisenberg, Sarita; Guo, Ling-Yu

    2016-05-01

    This article reviews the existing literature on the diagnostic accuracy of two grammatical accuracy measures for differentiating children with and without language impairment (LI) at preschool and early school age based on language samples. The first measure, the finite verb morphology composite (FVMC), is a narrow grammatical measure that computes children's overall accuracy of four verb tense morphemes. The second measure, percent grammatical utterances (PGU), is a broader grammatical measure that computes children's accuracy in producing grammatical utterances. The extant studies show that FVMC demonstrates acceptable (i.e., 80 to 89% accurate) to good (i.e., 90% accurate or higher) diagnostic accuracy for children between 4;0 (years;months) and 6;11 in conversational or narrative samples. In contrast, PGU yields acceptable to good diagnostic accuracy for children between 3;0 and 8;11 regardless of sample types. Given the diagnostic accuracy shown in the literature, we suggest that FVMC and PGU can be used as one piece of evidence for identifying children with LI in assessment when appropriate. However, FVMC or PGU should not be used as therapy goals directly. Instead, when children are low in FVMC or PGU, we suggest that follow-up analyses should be conducted to determine the verb tense morphemes or grammatical structures that children have difficulty with. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  4. Validating the Copenhagen Psychosocial Questionnaire (COPSOQ-II) Using Set-ESEM: Identifying Psychosocial Risk Factors in a Sample of School Principals.

    Science.gov (United States)

    Dicke, Theresa; Marsh, Herbert W; Riley, Philip; Parker, Philip D; Guo, Jiesi; Horwood, Marcus

    2018-01-01

    School principals world-wide report high levels of strain and attrition resulting in a shortage of qualified principals. It is thus crucial to identify psychosocial risk factors that reflect principals' occupational wellbeing. For this purpose, we used the Copenhagen Psychosocial Questionnaire (COPSOQ-II), a widely used self-report measure covering multiple psychosocial factors identified by leading occupational stress theories. We evaluated the COPSOQ-II regarding factor structure and longitudinal, discriminant, and convergent validity using latent structural equation modeling in a large sample of Australian school principals ( N = 2,049). Results reveal that confirmatory factor analysis produced marginally acceptable model fit. A novel approach we call set exploratory structural equation modeling (set-ESEM), where cross-loadings were only allowed within a priori defined sets of factors, fit well, and was more parsimonious than a full ESEM. Further multitrait-multimethod models based on the set-ESEM confirm the importance of a principal's psychosocial risk factors; Stressors and depression were related to demands and ill-being, while confidence and autonomy were related to wellbeing. We also show that working in the private sector was beneficial for showing a low psychosocial risk, while other demographics have little effects. Finally, we identify five latent risk profiles (high risk to no risk) of school principals based on all psychosocial factors. Overall the research presented here closes the theory application gap of a strong multi-dimensional measure of psychosocial risk-factors.

  5. Composite Classifiers for Automatic Target Recognition

    National Research Council Canada - National Science Library

    Wang, Lin-Cheng

    1998-01-01

    ...) using forward-looking infrared (FLIR) imagery. Two existing classifiers, one based on learning vector quantization and the other on modular neural networks, are used as the building blocks for our composite classifiers...

  6. Detecting and classifying method based on similarity matching of Android malware behavior with profile.

    Science.gov (United States)

    Jang, Jae-Wook; Yun, Jaesung; Mohaisen, Aziz; Woo, Jiyoung; Kim, Huy Kang

    2016-01-01

    Mass-market mobile security threats have increased recently due to the growth of mobile technologies and the popularity of mobile devices. Accordingly, techniques have been introduced for identifying, classifying, and defending against mobile threats utilizing static, dynamic, on-device, and off-device techniques. Static techniques are easy to evade, while dynamic techniques are expensive. On-device techniques are evasion, while off-device techniques need being always online. To address some of those shortcomings, we introduce Andro-profiler, a hybrid behavior based analysis and classification system for mobile malware. Andro-profiler main goals are efficiency, scalability, and accuracy. For that, Andro-profiler classifies malware by exploiting the behavior profiling extracted from the integrated system logs including system calls. Andro-profiler executes a malicious application on an emulator in order to generate the integrated system logs, and creates human-readable behavior profiles by analyzing the integrated system logs. By comparing the behavior profile of malicious application with representative behavior profile for each malware family using a weighted similarity matching technique, Andro-profiler detects and classifies it into malware families. The experiment results demonstrate that Andro-profiler is scalable, performs well in detecting and classifying malware with accuracy greater than 98 %, outperforms the existing state-of-the-art work, and is capable of identifying 0-day mobile malware samples.

  7. Aggregation Operator Based Fuzzy Pattern Classifier Design

    DEFF Research Database (Denmark)

    Mönks, Uwe; Larsen, Henrik Legind; Lohweg, Volker

    2009-01-01

    This paper presents a novel modular fuzzy pattern classifier design framework for intelligent automation systems, developed on the base of the established Modified Fuzzy Pattern Classifier (MFPC) and allows designing novel classifier models which are hardware-efficiently implementable....... The performances of novel classifiers using substitutes of MFPC's geometric mean aggregator are benchmarked in the scope of an image processing application against the MFPC to reveal classification improvement potentials for obtaining higher classification rates....

  8. 15 CFR 4.8 - Classified Information.

    Science.gov (United States)

    2010-01-01

    ... 15 Commerce and Foreign Trade 1 2010-01-01 2010-01-01 false Classified Information. 4.8 Section 4... INFORMATION Freedom of Information Act § 4.8 Classified Information. In processing a request for information..., the information shall be reviewed to determine whether it should remain classified. Ordinarily the...

  9. Classifying Particles By Acoustic Levitation

    Science.gov (United States)

    Barmatz, Martin B.; Stoneburner, James D.

    1983-01-01

    Separation technique well suited to material processing. Apparatus with rectangular-cross-section chamber used to measure equilibrium positions of low-density spheres in gravitational field. Vertical acoustic forces generated by two opposing compression drivers exciting fundamental plane-wave mode at 1.2 kHz. Additional horizontal drivers centered samples along vertical axis. Applications in fusion-target separation, biological separation, and manufacturing processes in liquid or gas media.

  10. A novel statistical method for classifying habitat generalists and specialists

    DEFF Research Database (Denmark)

    Chazdon, Robin L; Chao, Anne; Colwell, Robert K

    2011-01-01

    in second-growth (SG) and old-growth (OG) rain forests in the Caribbean lowlands of northeastern Costa Rica. We evaluate the multinomial model in detail for the tree data set. Our results for birds were highly concordant with a previous nonstatistical classification, but our method classified a higher......: (1) generalist; (2) habitat A specialist; (3) habitat B specialist; and (4) too rare to classify with confidence. We illustrate our multinomial classification method using two contrasting data sets: (1) bird abundance in woodland and heath habitats in southeastern Australia and (2) tree abundance...... fraction (57.7%) of bird species with statistical confidence. Based on a conservative specialization threshold and adjustment for multiple comparisons, 64.4% of tree species in the full sample were too rare to classify with confidence. Among the species classified, OG specialists constituted the largest...

  11. The decision tree classifier - Design and potential. [for Landsat-1 data

    Science.gov (United States)

    Hauska, H.; Swain, P. H.

    1975-01-01

    A new classifier has been developed for the computerized analysis of remote sensor data. The decision tree classifier is essentially a maximum likelihood classifier using multistage decision logic. It is characterized by the fact that an unknown sample can be classified into a class using one or several decision functions in a successive manner. The classifier is applied to the analysis of data sensed by Landsat-1 over Kenosha Pass, Colorado. The classifier is illustrated by a tree diagram which for processing purposes is encoded as a string of symbols such that there is a unique one-to-one relationship between string and decision tree.

  12. Two new solid solutions in calcite-magnesite system identified in a sample from coral reefs in the northern Perth basin

    International Nuclear Information System (INIS)

    Li, D.Y.; O'Connor, B.H.; Zhu, Z.R.; Collins, L.B.; Hunter, B.

    1998-01-01

    Full text: Dolomite, CaMg(CO 3 ) 2 , is an economically important mineral, being of particular significance in petroleum geology. Carbonate rocks have long been a focus of investigation because these rocks contain an estimated 60 percent of the world's recoverable petroleum, and include most of the world's largest reservoirs. Correct phase identification in carbonates has concerned sedimentologists and petroleum geologists for decades. A new type of solid solution in the calcite (CaCO 3 ) - magnesite (MgCO 3 ) system has been identified at Curtin University by Rietveld XRD and neutron diffraction data analysis in a sample from late Pleistocene reefs in the northern Perth Basin. It is known that the structure of calcite (space group R3C) will be transformed to dolomite (R3), which has an ordered distribution of Ca and Mg in the structure, if 50% of its Ca atoms are substituted by Mg in terms of the Ca-Mg atomic ratio. However, the upper limit of Mg substitution for Ca in calcite under sedimentary-geological conditions without there being a change in structure to dolomite is still unknown. Two carbonates examined at Curtin showed Mg substitution for Ca in calcite under coral reef sedimentary conditions of 18.1% and 37.7%, whereas Bragg peak shifts for a 'dolomite, line for these samples were interpreted by geologists as indicative of dolomite with a certain extent of order-disorder distribution between Ca and Mg atoms. The observations have provided an opportunity to re-examine the origins of dolomite and aspects of dolomitization in a coral reef environment in the Quaternary

  13. Determination of 2-alkylcyclobutanones in ultraviolet light-irradiated fatty acids, triglycerides, corn oil, and pork samples: Identifying a new source of 2-alkylcyclobutanones.

    Science.gov (United States)

    Meng, Xiangpeng; Chan, Wan

    2017-02-15

    Previous studies have established that 2-alkylcyclobutanones (2-ACBs) are unique radiolytic products in lipid-containing foods that could only be formed through exposure to ionizing radiation, but not by any other means of physical/heat treatment methods. Therefore, 2-ACBs are currently the marker molecules required by the European Committee for Standardization to be used to identify foods irradiated with ionizing irradiation. Using a spectrum of state-of-the-art analytical instruments, we present in this study for the first time that the generation of 2-ACBs was also possible when fatty acids and triglycerides are exposed to a non-ionizing, short-wavelength ultraviolet (UV-C) light source. An irradiation dosage-dependent formation of 2-ACBs was also observed in UV-C irradiated fatty acids, triglycerides, corn oil, and pork samples. With UV-C irradiation becoming an increasingly common food treatment procedure, it is anticipated that the results from this study will alert food scientists and regulatory officials to a potential new source for 2-ACBs. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Error minimizing algorithms for nearest eighbor classifiers

    Energy Technology Data Exchange (ETDEWEB)

    Porter, Reid B [Los Alamos National Laboratory; Hush, Don [Los Alamos National Laboratory; Zimmer, G. Beate [TEXAS A& M

    2011-01-03

    Stack Filters define a large class of discrete nonlinear filter first introd uced in image and signal processing for noise removal. In recent years we have suggested their application to classification problems, and investigated their relationship to other types of discrete classifiers such as Decision Trees. In this paper we focus on a continuous domain version of Stack Filter Classifiers which we call Ordered Hypothesis Machines (OHM), and investigate their relationship to Nearest Neighbor classifiers. We show that OHM classifiers provide a novel framework in which to train Nearest Neighbor type classifiers by minimizing empirical error based loss functions. We use the framework to investigate a new cost sensitive loss function that allows us to train a Nearest Neighbor type classifier for low false alarm rate applications. We report results on both synthetic data and real-world image data.

  15. Pictorial binding: endeavor to classify

    Directory of Open Access Journals (Sweden)

    Zinchenko S.

    2015-01-01

    Full Text Available The article is devoted to the classification of bindings of the 1-19th centuries with a unique and untypical book binding decoration technique (encaustic, tempera and oil paintings. Analysis of design features, materials and techniques of art decoration made it possible to identify them as a separate type - pictorial bindings and divide them into four groups. The first group consists of Coptic bindings, decorated with icon-painting images in encaustic technique. The second group is made up of leather Western bindings of the 13-14th centuries, which have the decoration and technique of ornamentation close to iconography. The third group involves parchment bindings, ornamentation technique of which is closer to the miniature. The last group comprises bindings of East Slavic origin of the 15-19th centuries, decorated with icon-painting pictures made in the technique of tempera or oil painting. The proposed classification requires further basic research as several specific kinds of bindings have not yet been investigated

  16. Identification and optimization of classifier genes from multi-class earthworm microarray dataset.

    Directory of Open Access Journals (Sweden)

    Ying Li

    Full Text Available Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. A variety of toxicological effects have been associated with explosive compounds TNT and RDX. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. We have developed an earthworm microarray containing 15,208 unique oligo probes and have used it to profile gene expression in 248 earthworms exposed to TNT, RDX or neither. We assembled a new machine learning pipeline consisting of several well-established feature filtering/selection and classification techniques to analyze the 248-array dataset in order to construct classifier models that can separate earthworm samples into three groups: control, TNT-treated, and RDX-treated. First, a total of 869 genes differentially expressed in response to TNT or RDX exposure were identified using a univariate statistical algorithm of class comparison. Then, decision tree-based algorithms were applied to select a subset of 354 classifier genes, which were ranked by their overall weight of significance. A multiclass support vector machine (MC-SVM method and an unsupervised K-mean clustering method were applied to independently refine the classifier, producing a smaller subset of 39 and 30 classifier genes, separately, with 11 common genes being potential biomarkers. The combined 58 genes were considered the refined subset and used to build MC-SVM and clustering models with classification accuracy of 83.5% and 56.9%, respectively. This study demonstrates that the machine learning approach can be used to identify and optimize a small subset of classifier/biomarker genes from high dimensional datasets and generate classification models of acceptable precision for multiple classes.

  17. Verification of classified fissile material using unclassified attributes

    International Nuclear Information System (INIS)

    Nicholas, N.J.; Fearey, B.L.; Puckett, J.M.; Tape, J.W.

    1998-01-01

    This paper reports on the most recent efforts of US technical experts to explore verification by IAEA of unclassified attributes of classified excess fissile material. Two propositions are discussed: (1) that multiple unclassified attributes could be declared by the host nation and then verified (and reverified) by the IAEA in order to provide confidence in that declaration of a classified (or unclassified) inventory while protecting classified or sensitive information; and (2) that attributes could be measured, remeasured, or monitored to provide continuity of knowledge in a nonintrusive and unclassified manner. They believe attributes should relate to characteristics of excess weapons materials and should be verifiable and authenticatable with methods usable by IAEA inspectors. Further, attributes (along with the methods to measure them) must not reveal any classified information. The approach that the authors have taken is as follows: (1) assume certain attributes of classified excess material, (2) identify passive signatures, (3) determine range of applicable measurement physics, (4) develop a set of criteria to assess and select measurement technologies, (5) select existing instrumentation for proof-of-principle measurements and demonstration, and (6) develop and design information barriers to protect classified information. While the attribute verification concepts and measurements discussed in this paper appear promising, neither the attribute verification approach nor the measurement technologies have been fully developed, tested, and evaluated

  18. Nonparametric, Coupled ,Bayesian ,Dictionary ,and Classifier Learning for Hyperspectral Classification.

    Science.gov (United States)

    Akhtar, Naveed; Mian, Ajmal

    2017-10-03

    We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.

  19. Population distribution of the sagittal abdominal diameter (SAD) from a representative sample of US adults: comparison of SAD, waist circumference and body mass index for identifying dysglycemia.

    Science.gov (United States)

    Kahn, Henry S; Gu, Qiuping; Bullard, Kai McKeever; Freedman, David S; Ahluwalia, Namanjeet; Ogden, Cynthia L

    2014-01-01

    The sagittal abdominal diameter (SAD) measured in supine position is an alternative adiposity indicator that estimates the quantity of dysfunctional adipose tissue in the visceral depot. However, supine SAD's distribution and its association with health risk at the population level are unknown. Here we describe standardized measurements of SAD, provide the first, national estimates of the SAD distribution among US adults, and test associations of SAD and other adiposity indicators with prevalent dysglycemia. In the 2011-2012 National Health and Nutrition Examination Survey, supine SAD was measured ("abdominal height") between arms of a sliding-beam caliper at the level of the iliac crests. From 4817 non-pregnant adults (age ≥ 20; response rate 88%) we used sample weights to estimate SAD's population distribution by sex and age groups. SAD's population mean was 22.5 cm [95% confidence interval 22.2-22.8]; median was 21.9 cm [21.6-22.4]. The mean and median values of SAD were greater for men than women. For the subpopulation without diagnosed diabetes, we compared the abilities of SAD, waist circumference (WC), and body mass index (BMI, kg/m(2)) to identify prevalent dysglycemia (HbA1c ≥ 5.7%). For age-adjusted, logistic-regression models in which sex-specific quartiles of SAD were considered simultaneously with quartiles of either WC or BMI, only SAD quartiles 3 (pSAD (age-adjusted) was 0.734 for men (greater than the AUC for WC, pSAD was associated with dysglycemia independently of WC or BMI. Standardized SAD measurements may enhance assessment of dysfunctional adiposity.

  20. Cascaded lexicalised classifiers for second-person reference resolution

    NARCIS (Netherlands)

    Purver, M.; Fernández, R.; Frampton, M.; Peters, S.; Healey, P.; Pieraccini, R.; Byron, D.; Young, S.; Purver, M.

    2009-01-01

    This paper examines the resolution of the second person English pronoun you in multi-party dialogue. Following previous work, we attempt to classify instances as generic or referential, and in the latter case identify the singular or plural addressee. We show that accuracy and robustness can be

  1. Classifying patients' complaints for regulatory purposes : A Pilot Study

    NARCIS (Netherlands)

    Bouwman, R.J.R.; Bomhoff, Manja; Robben, Paul; Friele, R.D.

    2018-01-01

    Objectives: It is assumed that classifying and aggregated reporting of patients' complaints by regulators helps to identify problem areas, to respond better to patients and increase public accountability. This pilot study addresses what a classification of complaints in a regulatory setting

  2. Hierarchical mixtures of naive Bayes classifiers

    NARCIS (Netherlands)

    Wiering, M.A.

    2002-01-01

    Naive Bayes classifiers tend to perform very well on a large number of problem domains, although their representation power is quite limited compared to more sophisticated machine learning algorithms. In this pa- per we study combining multiple naive Bayes classifiers by using the hierar- chical

  3. Comparing classifiers for pronunciation error detection

    NARCIS (Netherlands)

    Strik, H.; Truong, K.; Wet, F. de; Cucchiarini, C.

    2007-01-01

    Providing feedback on pronunciation errors in computer assisted language learning systems requires that pronunciation errors be detected automatically. In the present study we compare four types of classifiers that can be used for this purpose: two acoustic-phonetic classifiers (one of which employs

  4. Feature extraction for dynamic integration of classifiers

    NARCIS (Netherlands)

    Pechenizkiy, M.; Tsymbal, A.; Puuronen, S.; Patterson, D.W.

    2007-01-01

    Recent research has shown the integration of multiple classifiers to be one of the most important directions in machine learning and data mining. In this paper, we present an algorithm for the dynamic integration of classifiers in the space of extracted features (FEDIC). It is based on the technique

  5. Implications of physical symmetries in adaptive image classifiers

    DEFF Research Database (Denmark)

    Sams, Thomas; Hansen, Jonas Lundbek

    2000-01-01

    It is demonstrated that rotational invariance and reflection symmetry of image classifiers lead to a reduction in the number of free parameters in the classifier. When used in adaptive detectors, e.g. neural networks, this may be used to decrease the number of training samples necessary to learn...... a given classification task, or to improve generalization of the neural network. Notably, the symmetrization of the detector does not compromise the ability to distinguish objects that break the symmetry. (C) 2000 Elsevier Science Ltd. All rights reserved....

  6. Deconvolution When Classifying Noisy Data Involving Transformations

    KAUST Repository

    Carroll, Raymond

    2012-09-01

    In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online.

  7. Deconvolution When Classifying Noisy Data Involving Transformations.

    Science.gov (United States)

    Carroll, Raymond; Delaigle, Aurore; Hall, Peter

    2012-09-01

    In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online.

  8. Deconvolution When Classifying Noisy Data Involving Transformations

    KAUST Repository

    Carroll, Raymond; Delaigle, Aurore; Hall, Peter

    2012-01-01

    In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online.

  9. Classifying Adverse Events in the Dental Office.

    Science.gov (United States)

    Kalenderian, Elsbeth; Obadan-Udoh, Enihomo; Maramaldi, Peter; Etolue, Jini; Yansane, Alfa; Stewart, Denice; White, Joel; Vaderhobli, Ram; Kent, Karla; Hebballi, Nutan B; Delattre, Veronique; Kahn, Maria; Tokede, Oluwabunmi; Ramoni, Rachel B; Walji, Muhammad F

    2017-06-30

    Dentists strive to provide safe and effective oral healthcare. However, some patients may encounter an adverse event (AE) defined as "unnecessary harm due to dental treatment." In this research, we propose and evaluate two systems for categorizing the type and severity of AEs encountered at the dental office. Several existing medical AE type and severity classification systems were reviewed and adapted for dentistry. Using data collected in previous work, two initial dental AE type and severity classification systems were developed. Eight independent reviewers performed focused chart reviews, and AEs identified were used to evaluate and modify these newly developed classifications. A total of 958 charts were independently reviewed. Among the reviewed charts, 118 prospective AEs were found and 101 (85.6%) were verified as AEs through a consensus process. At the end of the study, a final AE type classification comprising 12 categories, and an AE severity classification comprising 7 categories emerged. Pain and infection were the most common AE types representing 73% of the cases reviewed (56% and 17%, respectively) and 88% were found to cause temporary, moderate to severe harm to the patient. Adverse events found during the chart review process were successfully classified using the novel dental AE type and severity classifications. Understanding the type of AEs and their severity are important steps if we are to learn from and prevent patient harm in the dental office.

  10. Is it important to classify ischaemic stroke?

    LENUS (Irish Health Repository)

    Iqbal, M

    2012-02-01

    Thirty-five percent of all ischemic events remain classified as cryptogenic. This study was conducted to ascertain the accuracy of diagnosis of ischaemic stroke based on information given in the medical notes. It was tested by applying the clinical information to the (TOAST) criteria. Hundred and five patients presented with acute stroke between Jan-Jun 2007. Data was collected on 90 patients. Male to female ratio was 39:51 with age range of 47-93 years. Sixty (67%) patients had total\\/partial anterior circulation stroke; 5 (5.6%) had a lacunar stroke and in 25 (28%) the mechanism of stroke could not be identified. Four (4.4%) patients with small vessel disease were anticoagulated; 5 (5.6%) with atrial fibrillation received antiplatelet therapy and 2 (2.2%) patients with atrial fibrillation underwent CEA. This study revealed deficiencies in the clinical assessment of patients and treatment was not tailored to the mechanism of stroke in some patients.

  11. RRHGE: A Novel Approach to Classify the Estrogen Receptor Based Breast Cancer Subtypes

    Directory of Open Access Journals (Sweden)

    Ashish Saini

    2014-01-01

    Full Text Available Background. Breast cancer is the most common type of cancer among females with a high mortality rate. It is essential to classify the estrogen receptor based breast cancer subtypes into correct subclasses, so that the right treatments can be applied to lower the mortality rate. Using gene signatures derived from gene interaction networks to classify breast cancers has proven to be more reproducible and can achieve higher classification performance. However, the interactions in the gene interaction network usually contain many false-positive interactions that do not have any biological meanings. Therefore, it is a challenge to incorporate the reliability assessment of interactions when deriving gene signatures from gene interaction networks. How to effectively extract gene signatures from available resources is critical to the success of cancer classification. Methods. We propose a novel method to measure and extract the reliable (biologically true or valid interactions from gene interaction networks and incorporate the extracted reliable gene interactions into our proposed RRHGE algorithm to identify significant gene signatures from microarray gene expression data for classifying ER+ and ER− breast cancer samples. Results. The evaluation on real breast cancer samples showed that our RRHGE algorithm achieved higher classification accuracy than the existing approaches.

  12. Assessment of a 44 gene classifier for the evaluation of chronic fatigue syndrome from peripheral blood mononuclear cell gene expression.

    Directory of Open Access Journals (Sweden)

    Daniel Frampton

    Full Text Available Chronic fatigue syndrome (CFS is a clinically defined illness estimated to affect millions of people worldwide causing significant morbidity and an annual cost of billions of dollars. Currently there are no laboratory-based diagnostic methods for CFS. However, differences in gene expression profiles between CFS patients and healthy persons have been reported in the literature. Using mRNA relative quantities for 44 previously identified reporter genes taken from a large dataset comprising both CFS patients and healthy volunteers, we derived a gene profile scoring metric to accurately classify CFS and healthy samples. This metric out-performed any of the reporter genes used individually as a classifier of CFS.To determine whether the reporter genes were robust across populations, we applied this metric to classify a separate blind dataset of mRNA relative quantities from a new population of CFS patients and healthy persons with limited success. Although the metric was able to successfully classify roughly two-thirds of both CFS and healthy samples correctly, the level of misclassification was high. We conclude many of the previously identified reporter genes are study-specific and thus cannot be used as a broad CFS diagnostic.

  13. Logarithmic learning for generalized classifier neural network.

    Science.gov (United States)

    Ozyildirim, Buse Melis; Avci, Mutlu

    2014-12-01

    Generalized classifier neural network is introduced as an efficient classifier among the others. Unless the initial smoothing parameter value is close to the optimal one, generalized classifier neural network suffers from convergence problem and requires quite a long time to converge. In this work, to overcome this problem, a logarithmic learning approach is proposed. The proposed method uses logarithmic cost function instead of squared error. Minimization of this cost function reduces the number of iterations used for reaching the minima. The proposed method is tested on 15 different data sets and performance of logarithmic learning generalized classifier neural network is compared with that of standard one. Thanks to operation range of radial basis function included by generalized classifier neural network, proposed logarithmic approach and its derivative has continuous values. This makes it possible to adopt the advantage of logarithmic fast convergence by the proposed learning method. Due to fast convergence ability of logarithmic cost function, training time is maximally decreased to 99.2%. In addition to decrease in training time, classification performance may also be improved till 60%. According to the test results, while the proposed method provides a solution for time requirement problem of generalized classifier neural network, it may also improve the classification accuracy. The proposed method can be considered as an efficient way for reducing the time requirement problem of generalized classifier neural network. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. Current Directional Protection of Series Compensated Line Using Intelligent Classifier

    Directory of Open Access Journals (Sweden)

    M. Mollanezhad Heydarabadi

    2016-12-01

    Full Text Available Current inversion condition leads to incorrect operation of current based directional relay in power system with series compensated device. Application of the intelligent system for fault direction classification has been suggested in this paper. A new current directional protection scheme based on intelligent classifier is proposed for the series compensated line. The proposed classifier uses only half cycle of pre-fault and post fault current samples at relay location to feed the classifier. A lot of forward and backward fault simulations under different system conditions upon a transmission line with a fixed series capacitor are carried out using PSCAD/EMTDC software. The applicability of decision tree (DT, probabilistic neural network (PNN and support vector machine (SVM are investigated using simulated data under different system conditions. The performance comparison of the classifiers indicates that the SVM is a best suitable classifier for fault direction discriminating. The backward faults can be accurately distinguished from forward faults even under current inversion without require to detect of the current inversion condition.

  15. A CLASSIFIER SYSTEM USING SMOOTH GRAPH COLORING

    Directory of Open Access Journals (Sweden)

    JORGE FLORES CRUZ

    2017-01-01

    Full Text Available Unsupervised classifiers allow clustering methods with less or no human intervention. Therefore it is desirable to group the set of items with less data processing. This paper proposes an unsupervised classifier system using the model of soft graph coloring. This method was tested with some classic instances in the literature and the results obtained were compared with classifications made with human intervention, yielding as good or better results than supervised classifiers, sometimes providing alternative classifications that considers additional information that humans did not considered.

  16. A systems biology-based classifier for hepatocellular carcinoma diagnosis.

    Directory of Open Access Journals (Sweden)

    Yanqiong Zhang

    Full Text Available AIM: The diagnosis of hepatocellular carcinoma (HCC in the early stage is crucial to the application of curative treatments which are the only hope for increasing the life expectancy of patients. Recently, several large-scale studies have shed light on this problem through analysis of gene expression profiles to identify markers correlated with HCC progression. However, those marker sets shared few genes in common and were poorly validated using independent data. Therefore, we developed a systems biology based classifier by combining the differential gene expression with topological features of human protein interaction networks to enhance the ability of HCC diagnosis. METHODS AND RESULTS: In the Oncomine platform, genes differentially expressed in HCC tissues relative to their corresponding normal tissues were filtered by a corrected Q value cut-off and Concept filters. The identified genes that are common to different microarray datasets were chosen as the candidate markers. Then, their networks were analyzed by GeneGO Meta-Core software and the hub genes were chosen. After that, an HCC diagnostic classifier was constructed by Partial Least Squares modeling based on the microarray gene expression data of the hub genes. Validations of diagnostic performance showed that this classifier had high predictive accuracy (85.88∼92.71% and area under ROC curve (approximating 1.0, and that the network topological features integrated into this classifier contribute greatly to improving the predictive performance. Furthermore, it has been demonstrated that this modeling strategy is not only applicable to HCC, but also to other cancers. CONCLUSION: Our analysis suggests that the systems biology-based classifier that combines the differential gene expression and topological features of human protein interaction network may enhance the diagnostic performance of HCC classifier.

  17. Arabic Handwriting Recognition Using Neural Network Classifier

    African Journals Online (AJOL)

    pc

    2018-03-05

    Mar 5, 2018 ... an OCR using Neural Network classifier preceded by a set of preprocessing .... Artificial Neural Networks (ANNs), which we adopt in this research, consist of ... advantage and disadvantages of each technique. In [9],. Khemiri ...

  18. Classifiers based on optimal decision rules

    KAUST Repository

    Amin, Talha

    2013-11-25

    Based on dynamic programming approach we design algorithms for sequential optimization of exact and approximate decision rules relative to the length and coverage [3, 4]. In this paper, we use optimal rules to construct classifiers, and study two questions: (i) which rules are better from the point of view of classification-exact or approximate; and (ii) which order of optimization gives better results of classifier work: length, length+coverage, coverage, or coverage+length. Experimental results show that, on average, classifiers based on exact rules are better than classifiers based on approximate rules, and sequential optimization (length+coverage or coverage+length) is better than the ordinary optimization (length or coverage).

  19. Classifiers based on optimal decision rules

    KAUST Repository

    Amin, Talha M.; Chikalov, Igor; Moshkov, Mikhail; Zielosko, Beata

    2013-01-01

    Based on dynamic programming approach we design algorithms for sequential optimization of exact and approximate decision rules relative to the length and coverage [3, 4]. In this paper, we use optimal rules to construct classifiers, and study two questions: (i) which rules are better from the point of view of classification-exact or approximate; and (ii) which order of optimization gives better results of classifier work: length, length+coverage, coverage, or coverage+length. Experimental results show that, on average, classifiers based on exact rules are better than classifiers based on approximate rules, and sequential optimization (length+coverage or coverage+length) is better than the ordinary optimization (length or coverage).

  20. Combining multiple classifiers for age classification

    CSIR Research Space (South Africa)

    Van Heerden, C

    2009-11-01

    Full Text Available The authors compare several different classifier combination methods on a single task, namely speaker age classification. This task is well suited to combination strategies, since significantly different feature classes are employed. Support vector...

  1. Neural Network Classifiers for Local Wind Prediction.

    Science.gov (United States)

    Kretzschmar, Ralf; Eckert, Pierre; Cattani, Daniel; Eggimann, Fritz

    2004-05-01

    This paper evaluates the quality of neural network classifiers for wind speed and wind gust prediction with prediction lead times between +1 and +24 h. The predictions were realized based on local time series and model data. The selection of appropriate input features was initiated by time series analysis and completed by empirical comparison of neural network classifiers trained on several choices of input features. The selected input features involved day time, yearday, features from a single wind observation device at the site of interest, and features derived from model data. The quality of the resulting classifiers was benchmarked against persistence for two different sites in Switzerland. The neural network classifiers exhibited superior quality when compared with persistence judged on a specific performance measure, hit and false-alarm rates.

  2. A convolutional neural network neutrino event classifier

    International Nuclear Information System (INIS)

    Aurisano, A.; Sousa, A.; Radovic, A.; Vahle, P.; Rocco, D.; Pawloski, G.; Himmel, A.; Niner, E.; Messier, M.D.; Psihas, F.

    2016-01-01

    Convolutional neural networks (CNNs) have been widely applied in the computer vision community to solve complex problems in image recognition and analysis. We describe an application of the CNN technology to the problem of identifying particle interactions in sampling calorimeters used commonly in high energy physics and high energy neutrino physics in particular. Following a discussion of the core concepts of CNNs and recent innovations in CNN architectures related to the field of deep learning, we outline a specific application to the NOvA neutrino detector. This algorithm, CVN (Convolutional Visual Network) identifies neutrino interactions based on their topology without the need for detailed reconstruction and outperforms algorithms currently in use by the NOvA collaboration.

  3. A radial basis classifier for the automatic detection of aspiration in children with dysphagia

    Directory of Open Access Journals (Sweden)

    Blain Stefanie

    2006-07-01

    Full Text Available Abstract Background Silent aspiration or the inhalation of foodstuffs without overt physiological signs presents a serious health issue for children with dysphagia. To date, there are no reliable means of detecting aspiration in the home or community. An assistive technology that performs in these environments could inform caregivers of adverse events and potentially reduce the morbidity and anxiety of the feeding experience for the child and caregiver, respectively. This paper proposes a classifier for automatic classification of aspiration and swallow vibration signals non-invasively recorded on the neck of children with dysphagia. Methods Vibration signals associated with safe swallows and aspirations, both identified via videofluoroscopy, were collected from over 100 children with neurologically-based dysphagia using a single-axis accelerometer. Five potentially discriminatory mathematical features were extracted from the accelerometry signals. All possible combinations of the five features were investigated in the design of radial basis function classifiers. Performance of different classifiers was compared and the best feature sets were identified. Results Optimal feature combinations for two, three and four features resulted in statistically comparable adjusted accuracies with a radial basis classifier. In particular, the feature pairing of dispersion ratio and normality achieved an adjusted accuracy of 79.8 ± 7.3%, a sensitivity of 79.4 ± 11.7% and specificity of 80.3 ± 12.8% for aspiration detection. Addition of a third feature, namely energy, increased adjusted accuracy to 81.3 ± 8.5% but the change was not statistically significant. A closer look at normality and dispersion ratio features suggest leptokurticity and the frequency and magnitude of atypical values as distinguishing characteristics between swallows and aspirations. The achieved accuracies are 30% higher than those reported for bedside cervical auscultation. Conclusion

  4. MAMMOGRAMS ANALYSIS USING SVM CLASSIFIER IN COMBINED TRANSFORMS DOMAIN

    Directory of Open Access Journals (Sweden)

    B.N. Prathibha

    2011-02-01

    Full Text Available Breast cancer is a primary cause of mortality and morbidity in women. Reports reveal that earlier the detection of abnormalities, better the improvement in survival. Digital mammograms are one of the most effective means for detecting possible breast anomalies at early stages. Digital mammograms supported with Computer Aided Diagnostic (CAD systems help the radiologists in taking reliable decisions. The proposed CAD system extracts wavelet features and spectral features for the better classification of mammograms. The Support Vector Machines classifier is used to analyze 206 mammogram images from Mias database pertaining to the severity of abnormality, i.e., benign and malign. The proposed system gives 93.14% accuracy for discrimination between normal-malign and 87.25% accuracy for normal-benign samples and 89.22% accuracy for benign-malign samples. The study reveals that features extracted in hybrid transform domain with SVM classifier proves to be a promising tool for analysis of mammograms.

  5. Alternative polymerase chain reaction method to identify Plasmodium species in human blood samples: the semi-nested multiplex malaria PCR (SnM-PCR)

    NARCIS (Netherlands)

    Rubio, J.M.; Post, R.J.; Docters van Leeuwen, W.M.; Henry, M.C.; Lindergard, G.; Hommel, M.

    2002-01-01

    A simplified protocol for the identification of Plasmodium species by semi-nested multiplex polymerase chain reaction (SnM-PCR) in human blood samples is compared with microscopical examination of thin and thick blood films in 2 field trials in Côte d'Ivoire and Cameroon. Also, dried blood spots or

  6. A Consistent System for Coding Laboratory Samples

    Science.gov (United States)

    Sih, John C.

    1996-07-01

    A formal laboratory coding system is presented to keep track of laboratory samples. Preliminary useful information regarding the sample (origin and history) is gained without consulting a research notebook. Since this system uses and retains the same research notebook page number for each new experiment (reaction), finding and distinguishing products (samples) of the same or different reactions becomes an easy task. Using this system multiple products generated from a single reaction can be identified and classified in a uniform fashion. Samples can be stored and filed according to stage and degree of purification, e.g. crude reaction mixtures, recrystallized samples, chromatographed or distilled products.

  7. A Minimally Invasive Method for Sampling Nest and Roost Cavities for Fungi: a Novel Approach to Identify the Fungi Associated with Cavity-Nesting Birds

    Science.gov (United States)

    Michelle A. Jusino; Daniel Lindner; John K. Cianchetti; Adam T. Grisé; Nicholas J. Brazee; Jeffrey R. Walters

    2014-01-01

    Relationships among cavity-nesting birds, trees, and wood decay fungi pose interesting management challenges and research questions in many systems. Ornithologists need to understand the relationships between cavity-nesting birds and fungi in order to understand the habitat requirements of these birds. Typically, researchers rely on fruiting body surveys to identify...

  8. Application of a semiautomatic classifier for modic and disk hernia changes in magnetic resonance

    Directory of Open Access Journals (Sweden)

    Eduardo López Arce Vivas

    2015-03-01

    Full Text Available OBJECTIVE: Early detection of degenerative changes in lumbar intervertebral disc by magnetic resonance imaging in a semiautomatic classifier for prevention of degenerative disease. METHOD: MRIs were selected with a diagnosis of degenerative disc disease or back pain from January to May 2014, with a sample of 23 patients and a total of 170 disks evaluated by sagittal T2 MRI image, first evaluated by a specialist physician in training and them were introduced into the software, being the results compared. RESULTS: One hundred and fifteen discs were evaluated by a programmed semiautomatic classifier to identify MODIC changes and hernia, which produced results "normal or MODIC" and "normal or abnormal", respectively. With a total of 230 readings, of which 141 were correct, 84 were reading errors and 10 readings were undiagnosed, the semiautomatic classifier is a useful tool for early diagnosis or established disease and is easy to apply because of the speed and ease of use; however, at this early stage of development, software is inferior to clinical observations and the results were from around 65% to 60% certainty for MODIC rating and 61% to 58% for disc herniation, compared with clinical evaluations. CONCLUSION: The comparative results between the two doctors were 94 consistent results and only 21 errors, which represents 81% certainty.

  9. Classified study and clinical value of the phase imaging features

    International Nuclear Information System (INIS)

    Dang Yaping; Ma Aiqun; Zheng Xiaopu; Yang Aimin; Xiao Jiang; Gao Xinyao

    2000-01-01

    445 patients with various heart diseases were examined by the gated cardiac blood pool imaging, and the phase was classified. The relationship between the seven types with left ventricular function index, clinical heart function, different heart diseases as well as electrocardiograph was studied. The results showed that the phase image classification could match with the clinical heart function. It can visually, directly and accurately indicate clinical heart function and can be used to identify diagnosis of heart disease

  10. Reinforcement Learning Based Artificial Immune Classifier

    Directory of Open Access Journals (Sweden)

    Mehmet Karakose

    2013-01-01

    Full Text Available One of the widely used methods for classification that is a decision-making process is artificial immune systems. Artificial immune systems based on natural immunity system can be successfully applied for classification, optimization, recognition, and learning in real-world problems. In this study, a reinforcement learning based artificial immune classifier is proposed as a new approach. This approach uses reinforcement learning to find better antibody with immune operators. The proposed new approach has many contributions according to other methods in the literature such as effectiveness, less memory cell, high accuracy, speed, and data adaptability. The performance of the proposed approach is demonstrated by simulation and experimental results using real data in Matlab and FPGA. Some benchmark data and remote image data are used for experimental results. The comparative results with supervised/unsupervised based artificial immune system, negative selection classifier, and resource limited artificial immune classifier are given to demonstrate the effectiveness of the proposed new method.

  11. Project Engage: Snowball Sampling and Direct Recruitment to Identify and Link Hard-to-Reach HIV-Infected Persons Who Are Out of Care.

    Science.gov (United States)

    Wohl, Amy Rock; Ludwig-Barron, Natasha; Dierst-Davies, Rhodri; Kulkarni, Sonali; Bendetson, Jesse; Jordan, Wilbert; Bolan, Robert; Smith, Terry; Cunningham, William; Pérez, Mario J

    2017-06-01

    Innovative strategies are needed to identify and link hard-to-find persons living with HIV (PLWH) who are out of care (OOC). Project Engage, a health department-based project in Los Angeles County, used a mixed-methods approach to locate and provide linkage for PLWH who have limited contact with HIV medical and nonmedical services. Incentivized social network recruitment (SNR) and direct recruitment (DR) was used to identify eligible OOC alters for a linkage intervention that included HIV clinic selection, appointment and transportation support, reminder calls/texts, and clinic navigation. Between 2012 and 2015, 112 alters were identified using SNR (n = 74) and DR (n = 38). Most alters were male (80%), African American (38%), and gay (60%). Sizable percentages were homeless (78%), had engaged in sex work (32%) in the previous 6 months, had injected drugs (47%), were incarcerated in the previous 12 months (50%), and had only received HIV care during the previous 5 years while incarcerated (24%). SNR alters were more likely than DR alters to be African American, uninsured, unemployed, homeless, sex workers, injection drug users, recently incarcerated, and have unmet service needs. Alters linked to care within 3 (69%), 4-6 (5%), and 7-12 months (8%), and 72% were retained at 6-12 months. The percent virally suppressed increased (27% vs. 41%) and the median viral load decreased (P = 0.003) between linkage and follow-up at 6-12 months. The alternative approaches presented were effective at locating marginalized HIV-positive persons who are OOC for linkage and retention. The SNR approach was most successful at identifying alters with serious social challenges and gaps in needed medical/ancillary services.

  12. Classifier Fusion With Contextual Reliability Evaluation.

    Science.gov (United States)

    Liu, Zhunga; Pan, Quan; Dezert, Jean; Han, Jun-Wei; He, You

    2018-05-01

    Classifier fusion is an efficient strategy to improve the classification performance for the complex pattern recognition problem. In practice, the multiple classifiers to combine can have different reliabilities and the proper reliability evaluation plays an important role in the fusion process for getting the best classification performance. We propose a new method for classifier fusion with contextual reliability evaluation (CF-CRE) based on inner reliability and relative reliability concepts. The inner reliability, represented by a matrix, characterizes the probability of the object belonging to one class when it is classified to another class. The elements of this matrix are estimated from the -nearest neighbors of the object. A cautious discounting rule is developed under belief functions framework to revise the classification result according to the inner reliability. The relative reliability is evaluated based on a new incompatibility measure which allows to reduce the level of conflict between the classifiers by applying the classical evidence discounting rule to each classifier before their combination. The inner reliability and relative reliability capture different aspects of the classification reliability. The discounted classification results are combined with Dempster-Shafer's rule for the final class decision making support. The performance of CF-CRE have been evaluated and compared with those of main classical fusion methods using real data sets. The experimental results show that CF-CRE can produce substantially higher accuracy than other fusion methods in general. Moreover, CF-CRE is robust to the changes of the number of nearest neighbors chosen for estimating the reliability matrix, which is appealing for the applications.

  13. Classifying sows' activity types from acceleration patterns

    DEFF Research Database (Denmark)

    Cornou, Cecile; Lundbye-Christensen, Søren

    2008-01-01

    An automated method of classifying sow activity using acceleration measurements would allow the individual sow's behavior to be monitored throughout the reproductive cycle; applications for detecting behaviors characteristic of estrus and farrowing or to monitor illness and welfare can be foreseen....... This article suggests a method of classifying five types of activity exhibited by group-housed sows. The method involves the measurement of acceleration in three dimensions. The five activities are: feeding, walking, rooting, lying laterally and lying sternally. Four time series of acceleration (the three...

  14. Data characteristics that determine classifier performance

    CSIR Research Space (South Africa)

    Van der Walt, Christiaan M

    2006-11-01

    Full Text Available available at [11]. The kNN uses a LinearNN nearest neighbour search algorithm with an Euclidean distance metric [8]. The optimal k value is determined by performing 10-fold cross-validation. An optimal k value between 1 and 10 is used for Experiments 1... classifiers. 10-fold cross-validation is used to evaluate and compare the performance of the classifiers on the different data sets. 3.1. Artificial data generation Multivariate Gaussian distributions are used to generate artificial data sets. We use d...

  15. A Customizable Text Classifier for Text Mining

    Directory of Open Access Journals (Sweden)

    Yun-liang Zhang

    2007-12-01

    Full Text Available Text mining deals with complex and unstructured texts. Usually a particular collection of texts that is specified to one or more domains is necessary. We have developed a customizable text classifier for users to mine the collection automatically. It derives from the sentence category of the HNC theory and corresponding techniques. It can start with a few texts, and it can adjust automatically or be adjusted by user. The user can also control the number of domains chosen and decide the standard with which to choose the texts based on demand and abundance of materials. The performance of the classifier varies with the user's choice.

  16. A survey of decision tree classifier methodology

    Science.gov (United States)

    Safavian, S. R.; Landgrebe, David

    1991-01-01

    Decision tree classifiers (DTCs) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps the most important feature of DTCs is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  17. A Gene Expression Classifier of Node-Positive Colorectal Cancer

    Directory of Open Access Journals (Sweden)

    Paul F. Meeh

    2009-10-01

    Full Text Available We used digital long serial analysis of gene expression to discover gene expression differences between node-negative and node-positive colorectal tumors and developed a multigene classifier able to discriminate between these two tumor types. We prepared and sequenced long serial analysis of gene expression libraries from one node-negative and one node-positive colorectal tumor, sequenced to a depth of 26,060 unique tags, and identified 262 tags significantly differentially expressed between these two tumors (P < 2 x 10-6. We confirmed the tag-to-gene assignments and differential expression of 31 genes by quantitative real-time polymerase chain reaction, 12 of which were elevated in the node-positive tumor. We analyzed the expression levels of these 12 upregulated genes in a validation panel of 23 additional tumors and developed an optimized seven-gene logistic regression classifier. The classifier discriminated between node-negative and node-positive tumors with 86% sensitivity and 80% specificity. Receiver operating characteristic analysis of the classifier revealed an area under the curve of 0.86. Experimental manipulation of the function of one classification gene, Fibronectin, caused profound effects on invasion and migration of colorectal cancer cells in vitro. These results suggest that the development of node-positive colorectal cancer occurs in part through elevated epithelial FN1 expression and suggest novel strategies for the diagnosis and treatment of advanced disease.

  18. Pixel Classification of SAR ice images using ANFIS-PSO Classifier

    Directory of Open Access Journals (Sweden)

    G. Vasumathi

    2016-12-01

    Full Text Available Synthetic Aperture Radar (SAR is playing a vital role in taking extremely high resolution radar images. It is greatly used to monitor the ice covered ocean regions. Sea monitoring is important for various purposes which includes global climate systems and ship navigation. Classification on the ice infested area gives important features which will be further useful for various monitoring process around the ice regions. Main objective of this paper is to classify the SAR ice image that helps in identifying the regions around the ice infested areas. In this paper three stages are considered in classification of SAR ice images. It starts with preprocessing in which the speckled SAR ice images are denoised using various speckle removal filters; comparison is made on all these filters to find the best filter in speckle removal. Second stage includes segmentation in which different regions are segmented using K-means and watershed segmentation algorithms; comparison is made between these two algorithms to find the best in segmenting SAR ice images. The last stage includes pixel based classification which identifies and classifies the segmented regions using various supervised learning classifiers. The algorithms includes Back propagation neural networks (BPN, Fuzzy Classifier, Adaptive Neuro Fuzzy Inference Classifier (ANFIS classifier and proposed ANFIS with Particle Swarm Optimization (PSO classifier; comparison is made on all these classifiers to propose which classifier is best suitable for classifying the SAR ice image. Various evaluation metrics are performed separately at all these three stages.

  19. Classifying Radio Galaxies with the Convolutional Neural Network

    International Nuclear Information System (INIS)

    Aniyan, A. K.; Thorat, K.

    2017-01-01

    We present the application of a deep machine learning technique to classify radio images of extended sources on a morphological basis using convolutional neural networks (CNN). In this study, we have taken the case of the Fanaroff–Riley (FR) class of radio galaxies as well as radio galaxies with bent-tailed morphology. We have used archival data from the Very Large Array (VLA)—Faint Images of the Radio Sky at Twenty Centimeters survey and existing visually classified samples available in the literature to train a neural network for morphological classification of these categories of radio sources. Our training sample size for each of these categories is ∼200 sources, which has been augmented by rotated versions of the same. Our study shows that CNNs can classify images of the FRI and FRII and bent-tailed radio galaxies with high accuracy (maximum precision at 95%) using well-defined samples and a “fusion classifier,” which combines the results of binary classifications, while allowing for a mechanism to find sources with unusual morphologies. The individual precision is highest for bent-tailed radio galaxies at 95% and is 91% and 75% for the FRI and FRII classes, respectively, whereas the recall is highest for FRI and FRIIs at 91% each, while the bent-tailed class has a recall of 79%. These results show that our results are comparable to that of manual classification, while being much faster. Finally, we discuss the computational and data-related challenges associated with the morphological classification of radio galaxies with CNNs.

  20. Classifying Radio Galaxies with the Convolutional Neural Network

    Energy Technology Data Exchange (ETDEWEB)

    Aniyan, A. K.; Thorat, K. [Department of Physics and Electronics, Rhodes University, Grahamstown (South Africa)

    2017-06-01

    We present the application of a deep machine learning technique to classify radio images of extended sources on a morphological basis using convolutional neural networks (CNN). In this study, we have taken the case of the Fanaroff–Riley (FR) class of radio galaxies as well as radio galaxies with bent-tailed morphology. We have used archival data from the Very Large Array (VLA)—Faint Images of the Radio Sky at Twenty Centimeters survey and existing visually classified samples available in the literature to train a neural network for morphological classification of these categories of radio sources. Our training sample size for each of these categories is ∼200 sources, which has been augmented by rotated versions of the same. Our study shows that CNNs can classify images of the FRI and FRII and bent-tailed radio galaxies with high accuracy (maximum precision at 95%) using well-defined samples and a “fusion classifier,” which combines the results of binary classifications, while allowing for a mechanism to find sources with unusual morphologies. The individual precision is highest for bent-tailed radio galaxies at 95% and is 91% and 75% for the FRI and FRII classes, respectively, whereas the recall is highest for FRI and FRIIs at 91% each, while the bent-tailed class has a recall of 79%. These results show that our results are comparable to that of manual classification, while being much faster. Finally, we discuss the computational and data-related challenges associated with the morphological classification of radio galaxies with CNNs.

  1. Classifying Radio Galaxies with the Convolutional Neural Network

    Science.gov (United States)

    Aniyan, A. K.; Thorat, K.

    2017-06-01

    We present the application of a deep machine learning technique to classify radio images of extended sources on a morphological basis using convolutional neural networks (CNN). In this study, we have taken the case of the Fanaroff-Riley (FR) class of radio galaxies as well as radio galaxies with bent-tailed morphology. We have used archival data from the Very Large Array (VLA)—Faint Images of the Radio Sky at Twenty Centimeters survey and existing visually classified samples available in the literature to train a neural network for morphological classification of these categories of radio sources. Our training sample size for each of these categories is ˜200 sources, which has been augmented by rotated versions of the same. Our study shows that CNNs can classify images of the FRI and FRII and bent-tailed radio galaxies with high accuracy (maximum precision at 95%) using well-defined samples and a “fusion classifier,” which combines the results of binary classifications, while allowing for a mechanism to find sources with unusual morphologies. The individual precision is highest for bent-tailed radio galaxies at 95% and is 91% and 75% for the FRI and FRII classes, respectively, whereas the recall is highest for FRI and FRIIs at 91% each, while the bent-tailed class has a recall of 79%. These results show that our results are comparable to that of manual classification, while being much faster. Finally, we discuss the computational and data-related challenges associated with the morphological classification of radio galaxies with CNNs.

  2. 75 FR 37253 - Classified National Security Information

    Science.gov (United States)

    2010-06-28

    ... ``Secret.'' (3) Each interior page of a classified document shall be marked at the top and bottom either... ``(TS)'' for Top Secret, ``(S)'' for Secret, and ``(C)'' for Confidential will be used. (2) Portions... from the informational text. (1) Conspicuously place the overall classification at the top and bottom...

  3. 75 FR 707 - Classified National Security Information

    Science.gov (United States)

    2010-01-05

    ... classified at one of the following three levels: (1) ``Top Secret'' shall be applied to information, the... exercise this authority. (2) ``Top Secret'' original classification authority may be delegated only by the... official has been delegated ``Top Secret'' original classification authority by the agency head. (4) Each...

  4. Neural Network Classifier Based on Growing Hyperspheres

    Czech Academy of Sciences Publication Activity Database

    Jiřina Jr., Marcel; Jiřina, Marcel

    2000-01-01

    Roč. 10, č. 3 (2000), s. 417-428 ISSN 1210-0552. [Neural Network World 2000. Prague, 09.07.2000-12.07.2000] Grant - others:MŠMT ČR(CZ) VS96047; MPO(CZ) RP-4210 Institutional research plan: AV0Z1030915 Keywords : neural network * classifier * hyperspheres * big -dimensional data Subject RIV: BA - General Mathematics

  5. Histogram deconvolution - An aid to automated classifiers

    Science.gov (United States)

    Lorre, J. J.

    1983-01-01

    It is shown that N-dimensional histograms are convolved by the addition of noise in the picture domain. Three methods are described which provide the ability to deconvolve such noise-affected histograms. The purpose of the deconvolution is to provide automated classifiers with a higher quality N-dimensional histogram from which to obtain classification statistics.

  6. Classifying web pages with visual features

    NARCIS (Netherlands)

    de Boer, V.; van Someren, M.; Lupascu, T.; Filipe, J.; Cordeiro, J.

    2010-01-01

    To automatically classify and process web pages, current systems use the textual content of those pages, including both the displayed content and the underlying (HTML) code. However, a very important feature of a web page is its visual appearance. In this paper, we show that using generic visual

  7. Classifying seismic waveforms from scratch: a case study in the alpine environment

    Science.gov (United States)

    Hammer, C.; Ohrnberger, M.; Fäh, D.

    2013-01-01

    Nowadays, an increasing amount of seismic data is collected by daily observatory routines. The basic step for successfully analyzing those data is the correct detection of various event types. However, the visually scanning process is a time-consuming task. Applying standard techniques for detection like the STA/LTA trigger still requires the manual control for classification. Here, we present a useful alternative. The incoming data stream is scanned automatically for events of interest. A stochastic classifier, called hidden Markov model, is learned for each class of interest enabling the recognition of highly variable waveforms. In contrast to other automatic techniques as neural networks or support vector machines the algorithm allows to start the classification from scratch as soon as interesting events are identified. Neither the tedious process of collecting training samples nor a time-consuming configuration of the classifier is required. An approach originally introduced for the volcanic task force action allows to learn classifier properties from a single waveform example and some hours of background recording. Besides a reduction of required workload this also enables to detect very rare events. Especially the latter feature provides a milestone point for the use of seismic devices in alpine warning systems. Furthermore, the system offers the opportunity to flag new signal classes that have not been defined before. We demonstrate the application of the classification system using a data set from the Swiss Seismological Survey achieving very high recognition rates. In detail we document all refinements of the classifier providing a step-by-step guide for the fast set up of a well-working classification system.

  8. The numerical evaluation of maximum-likelihood estimates of the parameters for a mixture of normal distributions from partially identified samples

    Science.gov (United States)

    Walker, H. F.

    1976-01-01

    Likelihood equations determined by the two types of samples which are necessary conditions for a maximum-likelihood estimate are considered. These equations, suggest certain successive-approximations iterative procedures for obtaining maximum-likelihood estimates. These are generalized steepest ascent (deflected gradient) procedures. It is shown that, with probability 1 as N sub 0 approaches infinity (regardless of the relative sizes of N sub 0 and N sub 1, i=1,...,m), these procedures converge locally to the strongly consistent maximum-likelihood estimates whenever the step size is between 0 and 2. Furthermore, the value of the step size which yields optimal local convergence rates is bounded from below by a number which always lies between 1 and 2.

  9. Identifying the Basal Angiosperm Node in Chloroplast GenomePhylogenies: Sampling One's Way Out of the Felsenstein Zone

    Energy Technology Data Exchange (ETDEWEB)

    Leebens-Mack, Jim; Raubeson, Linda A.; Cui, Liying; Kuehl,Jennifer V.; Fourcade, Matthew H.; Chumley, Timothy W.; Boore, JeffreyL.; Jansen, Robert K.; dePamphilis, Claude W.

    2005-05-27

    While there has been strong support for Amborella and Nymphaeales (water lilies) as branching from basal-most nodes in the angiosperm phylogeny, this hypothesis has recently been challenged by phylogenetic analyses of 61 protein-coding genes extracted from the chloroplast genome sequences of Amborella, Nymphaea and 12 other available land plant chloroplast genomes. These character-rich analyses placed the monocots, represented by three grasses (Poaceae), as sister to all other extant angiosperm lineages. We have extracted protein-coding regions from draft sequences for six additional chloroplast genomes to test whether this surprising result could be an artifact of long-branch attraction due to limited taxon sampling. The added taxa include three monocots (Acorus, Yucca and Typha), a water lily (Nuphar), a ranunculid(Ranunculus), and a gymnosperm (Ginkgo). Phylogenetic analyses of the expanded DNA and protein datasets together with microstructural characters (indels) provided unambiguous support for Amborella and the Nymphaeales as branching from the basal-most nodes in the angiospermphylogeny. However, their relative positions proved to be dependent on method of analysis, with parsimony favoring Amborella as sister to all other angiosperms, and maximum likelihood and neighbor-joining methods favoring an Amborella + Nympheales clade as sister. The maximum likelihood phylogeny supported the later hypothesis, but the likelihood for the former hypothesis was not significantly different. Parametric bootstrap analysis, single gene phylogenies, estimated divergence dates and conflicting in del characters all help to illuminate the nature of the conflict in resolution of the most basal nodes in the angiospermphylogeny. Molecular dating analyses provided median age estimates of 161 mya for the most recent common ancestor of all extant angiosperms and 145 mya for the most recent common ancestor of monocots, magnoliids andeudicots. Whereas long sequences reduce variance in

  10. A multiplex PCR-based method to identify strongylid parasite larvae recovered from ovine faecal cultures and/or pasture samples.

    Science.gov (United States)

    Bisset, S A; Knight, J S; Bouchet, C L G

    2014-02-24

    A multiplex PCR-based method was developed to overcome the limitations of microscopic examination as a means of identifying individual infective larvae from the wide range of strongylid parasite species commonly encountered in sheep in mixed sheep-cattle grazing situations in New Zealand. The strategy employed targets unique species-specific sequence markers in the second internal transcribed spacer (ITS-2) region of ribosomal DNA of the nematodes and utilises individual larval lysates as reaction templates. The basic assay involves two sets of reactions designed to target the ten strongylid species most often encountered in ovine faecal cultures under New Zealand conditions (viz. Haemonchus contortus, Teladorsagia circumcincta, Trichostrongylus axei, Trichostrongylus colubriformis, Trichostrongylus vitrinus, Cooperia curticei, Cooperia oncophora, Nematodirus spathiger, Chabertia ovina, and Oesophagostomum venulosum). Five species-specific primers, together with a pair of "generic" (conserved) primers, are used in each of the reactions. Two products are generally amplified, one by the generic primer pair regardless of species (providing a positive PCR control) and the other (whose size is indicative of the species present) by the appropriate species-specific primer in combination with one or other of the generic primers. If necessary, any larvae not identified by these reactions can subsequently be tested using primers designed specifically to detect those species less frequently encountered in ovine faecal cultures (viz. Ostertagia ostertagi, Ostertagia leptospicularis, Cooperia punctata, Nematodirus filicollis, and Bunostomum trigonocephalum). Results of assays undertaken on >5500 nematode larvae cultured from lambs on 16 different farms distributed throughout New Zealand indicated that positive identifications were initially obtained for 92.8% of them, while a further 4.4% of reactions gave a generic but no visible specific product and 2.8% gave no discernible

  11. Genome Sequencing Identifies Two Nearly Unchanged Strains of Persistent Listeria monocytogenes Isolated at Two Different Fish Processing Plants Sampled 6 Years Apart

    DEFF Research Database (Denmark)

    Holch, Anne; Webb, Kristen; Lukjancenko, Oksana

    2013-01-01

    Listeria monocytogenes is a food-borne human-pathogenic bacterium that can cause infections with a high mortality rate. It has a remarkable ability to persist in food processing facilities. Here we report the genome sequences for two L. monocytogenes strains (N53-1 and La111) that were isolated 6...... that has been isolated as a persistent subtype in several European countries. The purpose of this study was to use genome analyses to identify genes or proteins that could contribute to persistence. In a genome comparison, the two persistent strains were extremely similar and collectively differed from...... are required to determine if the absence of these genes promotes persistence. While the genome comparison did not point to a clear physiological explanation of the persistent phenotype, the remarkable similarity between the two strains indicates that subtypes with specific traits are selected for in the food...

  12. Using the Theory of Planned Behavior to identify key beliefs underlying chlamydia testing intentions in a sample of young people living in deprived areas.

    Science.gov (United States)

    Booth, Amy R; Norman, Paul; Harris, Peter R; Goyder, Elizabeth

    2015-09-01

    The Theory of Planned Behavior was used to identify the key behavioural, normative and control beliefs underlying intentions to test regularly for chlamydia among young people living in socially and economically deprived areas - a high-risk group for infection. Participants (N = 278, 53% male; mean age 17 years) were recruited from a vocational college situated in an area in the most deprived national quintile (England). Participants completed measures of behavioural, normative and control beliefs, plus intention to test regularly for chlamydia. The behavioural, normative and control beliefs most strongly correlated with intentions to test regularly for chlamydia were beliefs about stopping the spread of infection, partners' behaviour and the availability of testing. These beliefs represent potential targets for interventions to increase chlamydia testing among young people living in deprived areas. © The Author(s) 2013.

  13. How large a training set is needed to develop a classifier for microarray data?

    Science.gov (United States)

    Dobbin, Kevin K; Zhao, Yingdong; Simon, Richard M

    2008-01-01

    A common goal of gene expression microarray studies is the development of a classifier that can be used to divide patients into groups with different prognoses, or with different expected responses to a therapy. These types of classifiers are developed on a training set, which is the set of samples used to train a classifier. The question of how many samples are needed in the training set to produce a good classifier from high-dimensional microarray data is challenging. We present a model-based approach to determining the sample size required to adequately train a classifier. It is shown that sample size can be determined from three quantities: standardized fold change, class prevalence, and number of genes or features on the arrays. Numerous examples and important experimental design issues are discussed. The method is adapted to address ex post facto determination of whether the size of a training set used to develop a classifier was adequate. An interactive web site for performing the sample size calculations is provided. We showed that sample size calculations for classifier development from high-dimensional microarray data are feasible, discussed numerous important considerations, and presented examples.

  14. Classifying features in CT imagery: accuracy for some single- and multiple-species classifiers

    Science.gov (United States)

    Daniel L. Schmoldt; Jing He; A. Lynn Abbott

    1998-01-01

    Our current approach to automatically label features in CT images of hardwood logs classifies each pixel of an image individually. These feature classifiers use a back-propagation artificial neural network (ANN) and feature vectors that include a small, local neighborhood of pixels and the distance of the target pixel to the center of the log. Initially, this type of...

  15. Applying meta-pathway analyses through metagenomics to identify the functional properties of the major bacterial communities of a single spontaneous cocoa bean fermentation process sample.

    Science.gov (United States)

    Illeghems, Koen; Weckx, Stefan; De Vuyst, Luc

    2015-09-01

    A high-resolution functional metagenomic analysis of a representative single sample of a Brazilian spontaneous cocoa bean fermentation process was carried out to gain insight into its bacterial community functioning. By reconstruction of microbial meta-pathways based on metagenomic data, the current knowledge about the metabolic capabilities of bacterial members involved in the cocoa bean fermentation ecosystem was extended. Functional meta-pathway analysis revealed the distribution of the metabolic pathways between the bacterial members involved. The metabolic capabilities of the lactic acid bacteria present were most associated with the heterolactic fermentation and citrate assimilation pathways. The role of Enterobacteriaceae in the conversion of substrates was shown through the use of the mixed-acid fermentation and methylglyoxal detoxification pathways. Furthermore, several other potential functional roles for Enterobacteriaceae were indicated, such as pectinolysis and citrate assimilation. Concerning acetic acid bacteria, metabolic pathways were partially reconstructed, in particular those related to responses toward stress, explaining their metabolic activities during cocoa bean fermentation processes. Further, the in-depth metagenomic analysis unveiled functionalities involved in bacterial competitiveness, such as the occurrence of CRISPRs and potential bacteriocin production. Finally, comparative analysis of the metagenomic data with bacterial genomes of cocoa bean fermentation isolates revealed the applicability of the selected strains as functional starter cultures. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Disassembly and Sanitization of Classified Matter

    International Nuclear Information System (INIS)

    Stockham, Dwight J.; Saad, Max P.

    2008-01-01

    The Disassembly Sanitization Operation (DSO) process was implemented to support weapon disassembly and disposition by using recycling and waste minimization measures. This process was initiated by treaty agreements and reconfigurations within both the DOD and DOE Complexes. The DOE is faced with disassembling and disposing of a huge inventory of retired weapons, components, training equipment, spare parts, weapon maintenance equipment, and associated material. In addition, regulations have caused a dramatic increase in the need for information required to support the handling and disposition of these parts and materials. In the past, huge inventories of classified weapon components were required to have long-term storage at Sandia and at many other locations throughout the DoE Complex. These materials are placed in onsite storage unit due to classification issues and they may also contain radiological and/or hazardous components. Since no disposal options exist for this material, the only choice was long-term storage. Long-term storage is costly and somewhat problematic, requiring a secured storage area, monitoring, auditing, and presenting the potential for loss or theft of the material. Overall recycling rates for materials sent through the DSO process have enabled 70 to 80% of these components to be recycled. These components are made of high quality materials and once this material has been sanitized, the demand for the component metals for recycling efforts is very high. The DSO process for NGPF, classified components established the credibility of this technique for addressing the long-term storage requirements of the classified weapons component inventory. The success of this application has generated interest from other Sandia organizations and other locations throughout the complex. Other organizations are requesting the help of the DSO team and the DSO is responding to these requests by expanding its scope to include Work-for- Other projects. For example

  17. Comparing cosmic web classifiers using information theory

    International Nuclear Information System (INIS)

    Leclercq, Florent; Lavaux, Guilhem; Wandelt, Benjamin; Jasche, Jens

    2016-01-01

    We introduce a decision scheme for optimally choosing a classifier, which segments the cosmic web into different structure types (voids, sheets, filaments, and clusters). Our framework, based on information theory, accounts for the design aims of different classes of possible applications: (i) parameter inference, (ii) model selection, and (iii) prediction of new observations. As an illustration, we use cosmographic maps of web-types in the Sloan Digital Sky Survey to assess the relative performance of the classifiers T-WEB, DIVA and ORIGAMI for: (i) analyzing the morphology of the cosmic web, (ii) discriminating dark energy models, and (iii) predicting galaxy colors. Our study substantiates a data-supported connection between cosmic web analysis and information theory, and paves the path towards principled design of analysis procedures for the next generation of galaxy surveys. We have made the cosmic web maps, galaxy catalog, and analysis scripts used in this work publicly available.

  18. Design of Robust Neural Network Classifiers

    DEFF Research Database (Denmark)

    Larsen, Jan; Andersen, Lars Nonboe; Hintz-Madsen, Mads

    1998-01-01

    This paper addresses a new framework for designing robust neural network classifiers. The network is optimized using the maximum a posteriori technique, i.e., the cost function is the sum of the log-likelihood and a regularization term (prior). In order to perform robust classification, we present...... a modified likelihood function which incorporates the potential risk of outliers in the data. This leads to the introduction of a new parameter, the outlier probability. Designing the neural classifier involves optimization of network weights as well as outlier probability and regularization parameters. We...... suggest to adapt the outlier probability and regularisation parameters by minimizing the error on a validation set, and a simple gradient descent scheme is derived. In addition, the framework allows for constructing a simple outlier detector. Experiments with artificial data demonstrate the potential...

  19. Comparing cosmic web classifiers using information theory

    Energy Technology Data Exchange (ETDEWEB)

    Leclercq, Florent [Institute of Cosmology and Gravitation (ICG), University of Portsmouth, Dennis Sciama Building, Burnaby Road, Portsmouth PO1 3FX (United Kingdom); Lavaux, Guilhem; Wandelt, Benjamin [Institut d' Astrophysique de Paris (IAP), UMR 7095, CNRS – UPMC Université Paris 6, Sorbonne Universités, 98bis boulevard Arago, F-75014 Paris (France); Jasche, Jens, E-mail: florent.leclercq@polytechnique.org, E-mail: lavaux@iap.fr, E-mail: j.jasche@tum.de, E-mail: wandelt@iap.fr [Excellence Cluster Universe, Technische Universität München, Boltzmannstrasse 2, D-85748 Garching (Germany)

    2016-08-01

    We introduce a decision scheme for optimally choosing a classifier, which segments the cosmic web into different structure types (voids, sheets, filaments, and clusters). Our framework, based on information theory, accounts for the design aims of different classes of possible applications: (i) parameter inference, (ii) model selection, and (iii) prediction of new observations. As an illustration, we use cosmographic maps of web-types in the Sloan Digital Sky Survey to assess the relative performance of the classifiers T-WEB, DIVA and ORIGAMI for: (i) analyzing the morphology of the cosmic web, (ii) discriminating dark energy models, and (iii) predicting galaxy colors. Our study substantiates a data-supported connection between cosmic web analysis and information theory, and paves the path towards principled design of analysis procedures for the next generation of galaxy surveys. We have made the cosmic web maps, galaxy catalog, and analysis scripts used in this work publicly available.

  20. Detection of Fundus Lesions Using Classifier Selection

    Science.gov (United States)

    Nagayoshi, Hiroto; Hiramatsu, Yoshitaka; Sako, Hiroshi; Himaga, Mitsutoshi; Kato, Satoshi

    A system for detecting fundus lesions caused by diabetic retinopathy from fundus images is being developed. The system can screen the images in advance in order to reduce the inspection workload on doctors. One of the difficulties that must be addressed in completing this system is how to remove false positives (which tend to arise near blood vessels) without decreasing the detection rate of lesions in other areas. To overcome this difficulty, we developed classifier selection according to the position of a candidate lesion, and we introduced new features that can distinguish true lesions from false positives. A system incorporating classifier selection and these new features was tested in experiments using 55 fundus images with some lesions and 223 images without lesions. The results of the experiments confirm the effectiveness of the proposed system, namely, degrees of sensitivity and specificity of 98% and 81%, respectively.

  1. Classifying objects in LWIR imagery via CNNs

    Science.gov (United States)

    Rodger, Iain; Connor, Barry; Robertson, Neil M.

    2016-10-01

    The aim of the presented work is to demonstrate enhanced target recognition and improved false alarm rates for a mid to long range detection system, utilising a Long Wave Infrared (LWIR) sensor. By exploiting high quality thermal image data and recent techniques in machine learning, the system can provide automatic target recognition capabilities. A Convolutional Neural Network (CNN) is trained and the classifier achieves an overall accuracy of > 95% for 6 object classes related to land defence. While the highly accurate CNN struggles to recognise long range target classes, due to low signal quality, robust target discrimination is achieved for challenging candidates. The overall performance of the methodology presented is assessed using human ground truth information, generating classifier evaluation metrics for thermal image sequences.

  2. Learning for VMM + WTA Embedded Classifiers

    Science.gov (United States)

    2016-03-31

    Learning for VMM + WTA Embedded Classifiers Jennifer Hasler and Sahil Shah Electrical and Computer Engineering Georgia Institute of Technology...enabling correct classification of each novel acoustic signal (generator, idle car, and idle truck ). The classification structure requires, after...measured on our SoC FPAA IC. The test input is composed of signals from urban environment for 3 objects (generator, idle car, and idle truck

  3. A Bayesian classifier for symbol recognition

    OpenAIRE

    Barrat , Sabine; Tabbone , Salvatore; Nourrissier , Patrick

    2007-01-01

    URL : http://www.buyans.com/POL/UploadedFile/134_9977.pdf; International audience; We present in this paper an original adaptation of Bayesian networks to symbol recognition problem. More precisely, a descriptor combination method, which enables to improve significantly the recognition rate compared to the recognition rates obtained by each descriptor, is presented. In this perspective, we use a simple Bayesian classifier, called naive Bayes. In fact, probabilistic graphical models, more spec...

  4. An Active Learning Classifier for Further Reducing Diabetic Retinopathy Screening System Cost

    Directory of Open Access Journals (Sweden)

    Yinan Zhang

    2016-01-01

    Full Text Available Diabetic retinopathy (DR screening system raises a financial problem. For further reducing DR screening cost, an active learning classifier is proposed in this paper. Our approach identifies retinal images based on features extracted by anatomical part recognition and lesion detection algorithms. Kernel extreme learning machine (KELM is a rapid classifier for solving classification problems in high dimensional space. Both active learning and ensemble technique elevate performance of KELM when using small training dataset. The committee only proposes necessary manual work to doctor for saving cost. On the publicly available Messidor database, our classifier is trained with 20%–35% of labeled retinal images and comparative classifiers are trained with 80% of labeled retinal images. Results show that our classifier can achieve better classification accuracy than Classification and Regression Tree, radial basis function SVM, Multilayer Perceptron SVM, Linear SVM, and K Nearest Neighbor. Empirical experiments suggest that our active learning classifier is efficient for further reducing DR screening cost.

  5. SVM classifier on chip for melanoma detection.

    Science.gov (United States)

    Afifi, Shereen; GholamHosseini, Hamid; Sinha, Roopak

    2017-07-01

    Support Vector Machine (SVM) is a common classifier used for efficient classification with high accuracy. SVM shows high accuracy for classifying melanoma (skin cancer) clinical images within computer-aided diagnosis systems used by skin cancer specialists to detect melanoma early and save lives. We aim to develop a medical low-cost handheld device that runs a real-time embedded SVM-based diagnosis system for use in primary care for early detection of melanoma. In this paper, an optimized SVM classifier is implemented onto a recent FPGA platform using the latest design methodology to be embedded into the proposed device for realizing online efficient melanoma detection on a single system on chip/device. The hardware implementation results demonstrate a high classification accuracy of 97.9% and a significant acceleration factor of 26 from equivalent software implementation on an embedded processor, with 34% of resources utilization and 2 watts for power consumption. Consequently, the implemented system meets crucial embedded systems constraints of high performance and low cost, resources utilization and power consumption, while achieving high classification accuracy.

  6. A naïve Bayes classifier for planning transfusion requirements in heart surgery.

    Science.gov (United States)

    Cevenini, Gabriele; Barbini, Emanuela; Massai, Maria R; Barbini, Paolo

    2013-02-01

    Transfusion of allogeneic blood products is a key issue in cardiac surgery. Although blood conservation and standard transfusion guidelines have been published by different medical groups, actual transfusion practices after cardiac surgery vary widely among institutions. Models can be a useful support for decision making and may reduce the total cost of care. The objective of this study was to propose and evaluate a procedure to develop a simple locally customized decision-support system. We analysed 3182 consecutive patients undergoing cardiac surgery at the University Hospital of Siena, Italy. Univariate statistical tests were performed to identify a set of preoperative and intraoperative variables as likely independent features for planning transfusion quantities. These features were utilized to design a naïve Bayes classifier. Model performance was evaluated using the leave-one-out cross-validation approach. All computations were done using spss and matlab code. The overall correct classification percentage was not particularly high if several classes of patients were to be identified. Model performance improved appreciably when the patient sample was divided into two classes (transfused and non-transfused patients). In this case the naïve Bayes model correctly classified about three quarters of patients with 71.2% sensitivity and 78.4% specificity, thus providing useful information for recognizing patients with transfusion requirements in the specific scenario considered. Although the classifier is customized to a particular setting and cannot be generalized to other scenarios, the simplicity of its development and the results obtained make it a promising approach for designing a simple model for different heart surgery centres needing a customized decision-support system for planning transfusion requirements in intensive care unit. © 2011 Blackwell Publishing Ltd.

  7. Classifying magnetic resonance image modalities with convolutional neural networks

    Science.gov (United States)

    Remedios, Samuel; Pham, Dzung L.; Butman, John A.; Roy, Snehashis

    2018-02-01

    Magnetic Resonance (MR) imaging allows the acquisition of images with different contrast properties depending on the acquisition protocol and the magnetic properties of tissues. Many MR brain image processing techniques, such as tissue segmentation, require multiple MR contrasts as inputs, and each contrast is treated differently. Thus it is advantageous to automate the identification of image contrasts for various purposes, such as facilitating image processing pipelines, and managing and maintaining large databases via content-based image retrieval (CBIR). Most automated CBIR techniques focus on a two-step process: extracting features from data and classifying the image based on these features. We present a novel 3D deep convolutional neural network (CNN)- based method for MR image contrast classification. The proposed CNN automatically identifies the MR contrast of an input brain image volume. Specifically, we explored three classification problems: (1) identify T1-weighted (T1-w), T2-weighted (T2-w), and fluid-attenuated inversion recovery (FLAIR) contrasts, (2) identify pre vs postcontrast T1, (3) identify pre vs post-contrast FLAIR. A total of 3418 image volumes acquired from multiple sites and multiple scanners were used. To evaluate each task, the proposed model was trained on 2137 images and tested on the remaining 1281 images. Results showed that image volumes were correctly classified with 97.57% accuracy.

  8. A support vector machine (SVM) based voltage stability classifier

    Energy Technology Data Exchange (ETDEWEB)

    Dosano, R.D.; Song, H. [Kunsan National Univ., Kunsan, Jeonbuk (Korea, Republic of); Lee, B. [Korea Univ., Seoul (Korea, Republic of)

    2007-07-01

    Power system stability has become even more complex and critical with the advent of deregulated energy markets and the growing desire to completely employ existing transmission and infrastructure. The economic pressure on electricity markets forces the operation of power systems and components to their limit of capacity and performance. System conditions can be more exposed to instability due to greater uncertainty in day to day system operations and increase in the number of potential components for system disturbances potentially resulting in voltage stability. This paper proposed a support vector machine (SVM) based power system voltage stability classifier using local measurements of voltage and active power of load. It described the procedure for fast classification of long-term voltage stability using the SVM algorithm. The application of the SVM based voltage stability classifier was presented with reference to the choice of input parameters; input data preconditioning; moving window for feature vector; determination of learning samples; and other considerations in SVM applications. The paper presented a case study with numerical examples of an 11-bus test system. The test results for the feasibility study demonstrated that the classifier could offer an excellent performance in classification with time-series measurements in terms of long-term voltage stability. 9 refs., 14 figs.

  9. The Protection of Classified Information: The Legal Framework

    National Research Council Canada - National Science Library

    Elsea, Jennifer K

    2006-01-01

    Recent incidents involving leaks of classified information have heightened interest in the legal framework that governs security classification, access to classified information, and penalties for improper disclosure...

  10. General and Local: Averaged k-Dependence Bayesian Classifiers

    Directory of Open Access Journals (Sweden)

    Limin Wang

    2015-06-01

    Full Text Available The inference of a general Bayesian network has been shown to be an NP-hard problem, even for approximate solutions. Although k-dependence Bayesian (KDB classifier can construct at arbitrary points (values of k along the attribute dependence spectrum, it cannot identify the changes of interdependencies when attributes take different values. Local KDB, which learns in the framework of KDB, is proposed in this study to describe the local dependencies implicated in each test instance. Based on the analysis of functional dependencies, substitution-elimination resolution, a new type of semi-naive Bayesian operation, is proposed to substitute or eliminate generalization to achieve accurate estimation of conditional probability distribution while reducing computational complexity. The final classifier, averaged k-dependence Bayesian (AKDB classifiers, will average the output of KDB and local KDB. Experimental results on the repository of machine learning databases from the University of California Irvine (UCI showed that AKDB has significant advantages in zero-one loss and bias relative to naive Bayes (NB, tree augmented naive Bayes (TAN, Averaged one-dependence estimators (AODE, and KDB. Moreover, KDB and local KDB show mutually complementary characteristics with respect to variance.

  11. Patients on weaning trials classified with support vector machines

    International Nuclear Information System (INIS)

    Garde, Ainara; Caminal, Pere; Giraldo, Beatriz F; Schroeder, Rico; Voss, Andreas; Benito, Salvador

    2010-01-01

    The process of discontinuing mechanical ventilation is called weaning and is one of the most challenging problems in intensive care. An unnecessary delay in the discontinuation process and an early weaning trial are undesirable. This study aims to characterize the respiratory pattern through features that permit the identification of patients' conditions in weaning trials. Three groups of patients have been considered: 94 patients with successful weaning trials, who could maintain spontaneous breathing after 48 h (GSucc); 39 patients who failed the weaning trial (GFail) and 21 patients who had successful weaning trials, but required reintubation in less than 48 h (GRein). Patients are characterized by their cardiorespiratory interactions, which are described by joint symbolic dynamics (JSD) applied to the cardiac interbeat and breath durations. The most discriminating features in the classification of the different groups of patients (GSucc, GFail and GRein) are identified by support vector machines (SVMs). The SVM-based feature selection algorithm has an accuracy of 81% in classifying GSucc versus the rest of the patients, 83% in classifying GRein versus GSucc patients and 81% in classifying GRein versus the rest of the patients. Moreover, a good balance between sensitivity and specificity is achieved in all classifications

  12. Classifying spaces of degenerating polarized Hodge structures

    CERN Document Server

    Kato, Kazuya

    2009-01-01

    In 1970, Phillip Griffiths envisioned that points at infinity could be added to the classifying space D of polarized Hodge structures. In this book, Kazuya Kato and Sampei Usui realize this dream by creating a logarithmic Hodge theory. They use the logarithmic structures begun by Fontaine-Illusie to revive nilpotent orbits as a logarithmic Hodge structure. The book focuses on two principal topics. First, Kato and Usui construct the fine moduli space of polarized logarithmic Hodge structures with additional structures. Even for a Hermitian symmetric domain D, the present theory is a refinem

  13. Gearbox Condition Monitoring Using Advanced Classifiers

    Directory of Open Access Journals (Sweden)

    P. Večeř

    2010-01-01

    Full Text Available New efficient and reliable methods for gearbox diagnostics are needed in automotive industry because of growing demand for production quality. This paper presents the application of two different classifiers for gearbox diagnostics – Kohonen Neural Networks and the Adaptive-Network-based Fuzzy Interface System (ANFIS. Two different practical applications are presented. In the first application, the tested gearboxes are separated into two classes according to their condition indicators. In the second example, ANFIS is applied to label the tested gearboxes with a Quality Index according to the condition indicators. In both applications, the condition indicators were computed from the vibration of the gearbox housing. 

  14. Cubical sets as a classifying topos

    DEFF Research Database (Denmark)

    Spitters, Bas

    Coquand’s cubical set model for homotopy type theory provides the basis for a computational interpretation of the univalence axiom and some higher inductive types, as implemented in the cubical proof assistant. We show that the underlying cube category is the opposite of the Lawvere theory of De...... Morgan algebras. The topos of cubical sets itself classifies the theory of ‘free De Morgan algebras’. This provides us with a topos with an internal ‘interval’. Using this interval we construct a model of type theory following van den Berg and Garner. We are currently investigating the precise relation...

  15. Double Ramp Loss Based Reject Option Classifier

    Science.gov (United States)

    2015-05-22

    of convex (DC) functions. To minimize it, we use DC programming approach [1]. The proposed method has following advantages: (1) the proposed loss LDR ...space constraints. We see that LDR does not put any restriction on ρ for it to be an upper bound of L0−d−1. 2.2 Risk Formulation Using LDR Let S = {(xn...classifier learnt using LDR based approach (C = 100, μ = 1, d = .2). Filled circles and triangles represent the support vectors. 4 Experimental Results We show

  16. Reducing variability in the output of pattern classifiers using histogram shaping

    International Nuclear Information System (INIS)

    Gupta, Shalini; Kan, Chih-Wen; Markey, Mia K.

    2010-01-01

    Purpose: The authors present a novel technique based on histogram shaping to reduce the variability in the output and (sensitivity, specificity) pairs of pattern classifiers with identical ROC curves, but differently distributed outputs. Methods: The authors identify different sources of variability in the output of linear pattern classifiers with identical ROC curves, which also result in classifiers with differently distributed outputs. They theoretically develop a novel technique based on the matching of the histograms of these differently distributed pattern classifier outputs to reduce the variability in their (sensitivity, specificity) pairs at fixed decision thresholds, and to reduce the variability in their actual output values. They empirically demonstrate the efficacy of the proposed technique by means of analyses on the simulated data and real world mammography data. Results: For the simulated data, with three different known sources of variability, and for the real world mammography data with unknown sources of variability, the proposed classifier output calibration technique significantly reduced the variability in the classifiers' (sensitivity, specificity) pairs at fixed decision thresholds. Furthermore, for classifiers with monotonically or approximately monotonically related output variables, the histogram shaping technique also significantly reduced the variability in their actual output values. Conclusions: Classifier output calibration based on histogram shaping can be successfully employed to reduce the variability in the output values and (sensitivity, specificity) pairs of pattern classifiers with identical ROC curves, but differently distributed outputs.

  17. Classifying Coding DNA with Nucleotide Statistics

    Directory of Open Access Journals (Sweden)

    Nicolas Carels

    2009-10-01

    Full Text Available In this report, we compared the success rate of classification of coding sequences (CDS vs. introns by Codon Structure Factor (CSF and by a method that we called Universal Feature Method (UFM. UFM is based on the scoring of purine bias (Rrr and stop codon frequency. We show that the success rate of CDS/intron classification by UFM is higher than by CSF. UFM classifies ORFs as coding or non-coding through a score based on (i the stop codon distribution, (ii the product of purine probabilities in the three positions of nucleotide triplets, (iii the product of Cytosine (C, Guanine (G, and Adenine (A probabilities in the 1st, 2nd, and 3rd positions of triplets, respectively, (iv the probabilities of G in 1st and 2nd position of triplets and (v the distance of their GC3 vs. GC2 levels to the regression line of the universal correlation. More than 80% of CDSs (true positives of Homo sapiens (>250 bp, Drosophila melanogaster (>250 bp and Arabidopsis thaliana (>200 bp are successfully classified with a false positive rate lower or equal to 5%. The method releases coding sequences in their coding strand and coding frame, which allows their automatic translation into protein sequences with 95% confidence. The method is a natural consequence of the compositional bias of nucleotides in coding sequences.

  18. A systematic comparison of supervised classifiers.

    Directory of Open Access Journals (Sweden)

    Diego Raphael Amancio

    Full Text Available Pattern recognition has been employed in a myriad of industrial, commercial and academic applications. Many techniques have been devised to tackle such a diversity of applications. Despite the long tradition of pattern recognition research, there is no technique that yields the best classification in all scenarios. Therefore, as many techniques as possible should be considered in high accuracy applications. Typical related works either focus on the performance of a given algorithm or compare various classification methods. In many occasions, however, researchers who are not experts in the field of machine learning have to deal with practical classification tasks without an in-depth knowledge about the underlying parameters. Actually, the adequate choice of classifiers and parameters in such practical circumstances constitutes a long-standing problem and is one of the subjects of the current paper. We carried out a performance study of nine well-known classifiers implemented in the Weka framework and compared the influence of the parameter configurations on the accuracy. The default configuration of parameters in Weka was found to provide near optimal performance for most cases, not including methods such as the support vector machine (SVM. In addition, the k-nearest neighbor method frequently allowed the best accuracy. In certain conditions, it was possible to improve the quality of SVM by more than 20% with respect to their default parameter configuration.

  19. Mercury⊕: An evidential reasoning image classifier

    Science.gov (United States)

    Peddle, Derek R.

    1995-12-01

    MERCURY⊕ is a multisource evidential reasoning classification software system based on the Dempster-Shafer theory of evidence. The design and implementation of this software package is described for improving the classification and analysis of multisource digital image data necessary for addressing advanced environmental and geoscience applications. In the remote-sensing context, the approach provides a more appropriate framework for classifying modern, multisource, and ancillary data sets which may contain a large number of disparate variables with different statistical properties, scales of measurement, and levels of error which cannot be handled using conventional Bayesian approaches. The software uses a nonparametric, supervised approach to classification, and provides a more objective and flexible interface to the evidential reasoning framework using a frequency-based method for computing support values from training data. The MERCURY⊕ software package has been implemented efficiently in the C programming language, with extensive use made of dynamic memory allocation procedures and compound linked list and hash-table data structures to optimize the storage and retrieval of evidence in a Knowledge Look-up Table. The software is complete with a full user interface and runs under Unix, Ultrix, VAX/VMS, MS-DOS, and Apple Macintosh operating system. An example of classifying alpine land cover and permafrost active layer depth in northern Canada is presented to illustrate the use and application of these ideas.

  20. 36 CFR 1256.46 - National security-classified information.

    Science.gov (United States)

    2010-07-01

    ... 36 Parks, Forests, and Public Property 3 2010-07-01 2010-07-01 false National security-classified... Restrictions § 1256.46 National security-classified information. In accordance with 5 U.S.C. 552(b)(1), NARA... properly classified under the provisions of the pertinent Executive Order on Classified National Security...

  1. Proton NMR-based metabolite analyses of archived serial paired serum and urine samples from myeloma patients at different stages of disease activity identifies acetylcarnitine as a novel marker of active disease.

    Directory of Open Access Journals (Sweden)

    Alessia Lodi

    Full Text Available BACKGROUND: Biomarker identification is becoming increasingly important for the development of personalized or stratified therapies. Metabolomics yields biomarkers indicative of phenotype that can be used to characterize transitions between health and disease, disease progression and therapeutic responses. The desire to reproducibly detect ever greater numbers of metabolites at ever diminishing levels has naturally nurtured advances in best practice for sample procurement, storage and analysis. Reciprocally, since many of the available extensive clinical archives were established prior to the metabolomics era and were not processed in such an 'ideal' fashion, considerable scepticism has arisen as to their value for metabolomic analysis. Here we have challenged that paradigm. METHODS: We performed proton nuclear magnetic resonance spectroscopy-based metabolomics on blood serum and urine samples from 32 patients representative of a total cohort of 1970 multiple myeloma patients entered into the United Kingdom Medical Research Council Myeloma IX trial. FINDINGS: Using serial paired blood and urine samples we detected metabolite profiles that associated with diagnosis, post-treatment remission and disease progression. These studies identified carnitine and acetylcarnitine as novel potential biomarkers of active disease both at diagnosis and relapse and as a mediator of disease associated pathologies. CONCLUSIONS: These findings show that samples conventionally processed and archived can provide useful metabolomic information that has important implications for understanding the biology of myeloma, discovering new therapies and identifying biomarkers potentially useful in deciding the choice and application of therapy.

  2. Species classifier choice is a key consideration when analysing low-complexity food microbiome data.

    Science.gov (United States)

    Walsh, Aaron M; Crispie, Fiona; O'Sullivan, Orla; Finnegan, Laura; Claesson, Marcus J; Cotter, Paul D

    2018-03-20

    The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads. Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R 2  = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R 2  = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using

  3. Two channel EEG thought pattern classifier.

    Science.gov (United States)

    Craig, D A; Nguyen, H T; Burchey, H A

    2006-01-01

    This paper presents a real-time electro-encephalogram (EEG) identification system with the goal of achieving hands free control. With two EEG electrodes placed on the scalp of the user, EEG signals are amplified and digitised directly using a ProComp+ encoder and transferred to the host computer through the RS232 interface. Using a real-time multilayer neural network, the actual classification for the control of a powered wheelchair has a very fast response. It can detect changes in the user's thought pattern in 1 second. Using only two EEG electrodes at positions O(1) and C(4) the system can classify three mental commands (forward, left and right) with an accuracy of more than 79 %

  4. Classifying Drivers' Cognitive Load Using EEG Signals.

    Science.gov (United States)

    Barua, Shaibal; Ahmed, Mobyen Uddin; Begum, Shahina

    2017-01-01

    A growing traffic safety issue is the effect of cognitive loading activities on traffic safety and driving performance. To monitor drivers' mental state, understanding cognitive load is important since while driving, performing cognitively loading secondary tasks, for example talking on the phone, can affect the performance in the primary task, i.e. driving. Electroencephalography (EEG) is one of the reliable measures of cognitive load that can detect the changes in instantaneous load and effect of cognitively loading secondary task. In this driving simulator study, 1-back task is carried out while the driver performs three different simulated driving scenarios. This paper presents an EEG based approach to classify a drivers' level of cognitive load using Case-Based Reasoning (CBR). The results show that for each individual scenario as well as using data combined from the different scenarios, CBR based system achieved approximately over 70% of classification accuracy.

  5. Classifying prion and prion-like phenomena.

    Science.gov (United States)

    Harbi, Djamel; Harrison, Paul M

    2014-01-01

    The universe of prion and prion-like phenomena has expanded significantly in the past several years. Here, we overview the challenges in classifying this data informatically, given that terms such as "prion-like", "prion-related" or "prion-forming" do not have a stable meaning in the scientific literature. We examine the spectrum of proteins that have been described in the literature as forming prions, and discuss how "prion" can have a range of meaning, with a strict definition being for demonstration of infection with in vitro-derived recombinant prions. We suggest that although prion/prion-like phenomena can largely be apportioned into a small number of broad groups dependent on the type of transmissibility evidence for them, as new phenomena are discovered in the coming years, a detailed ontological approach might be necessary that allows for subtle definition of different "flavors" of prion / prion-like phenomena.

  6. Hybrid Neuro-Fuzzy Classifier Based On Nefclass Model

    Directory of Open Access Journals (Sweden)

    Bogdan Gliwa

    2011-01-01

    Full Text Available The paper presents hybrid neuro-fuzzy classifier, based on NEFCLASS model, which wasmodified. The presented classifier was compared to popular classifiers – neural networks andk-nearest neighbours. Efficiency of modifications in classifier was compared with methodsused in original model NEFCLASS (learning methods. Accuracy of classifier was testedusing 3 datasets from UCI Machine Learning Repository: iris, wine and breast cancer wisconsin.Moreover, influence of ensemble classification methods on classification accuracy waspresented.

  7. Classifying Transition Behaviour in Postural Activity Monitoring

    Directory of Open Access Journals (Sweden)

    James BRUSEY

    2009-10-01

    Full Text Available A few accelerometers positioned on different parts of the body can be used to accurately classify steady state behaviour, such as walking, running, or sitting. Such systems are usually built using supervised learning approaches. Transitions between postures are, however, difficult to deal with using posture classification systems proposed to date, since there is no label set for intermediary postures and also the exact point at which the transition occurs can sometimes be hard to pinpoint. The usual bypass when using supervised learning to train such systems is to discard a section of the dataset around each transition. This leads to poorer classification performance when the systems are deployed out of the laboratory and used on-line, particularly if the regimes monitored involve fast paced activity changes. Time-based filtering that takes advantage of sequential patterns is a potential mechanism to improve posture classification accuracy in such real-life applications. Also, such filtering should reduce the number of event messages needed to be sent across a wireless network to track posture remotely, hence extending the system’s life. To support time-based filtering, understanding transitions, which are the major event generators in a classification system, is a key. This work examines three approaches to post-process the output of a posture classifier using time-based filtering: a naïve voting scheme, an exponentially weighted voting scheme, and a Bayes filter. Best performance is obtained from the exponentially weighted voting scheme although it is suspected that a more sophisticated treatment of the Bayes filter might yield better results.

  8. Just-in-time adaptive classifiers-part II: designing the classifier.

    Science.gov (United States)

    Alippi, Cesare; Roveri, Manuel

    2008-12-01

    Aging effects, environmental changes, thermal drifts, and soft and hard faults affect physical systems by changing their nature and behavior over time. To cope with a process evolution adaptive solutions must be envisaged to track its dynamics; in this direction, adaptive classifiers are generally designed by assuming the stationary hypothesis for the process generating the data with very few results addressing nonstationary environments. This paper proposes a methodology based on k-nearest neighbor (NN) classifiers for designing adaptive classification systems able to react to changing conditions just-in-time (JIT), i.e., exactly when it is needed. k-NN classifiers have been selected for their computational-free training phase, the possibility to easily estimate the model complexity k and keep under control the computational complexity of the classifier through suitable data reduction mechanisms. A JIT classifier requires a temporal detection of a (possible) process deviation (aspect tackled in a companion paper) followed by an adaptive management of the knowledge base (KB) of the classifier to cope with the process change. The novelty of the proposed approach resides in the general framework supporting the real-time update of the KB of the classification system in response to novel information coming from the process both in stationary conditions (accuracy improvement) and in nonstationary ones (process tracking) and in providing a suitable estimate of k. It is shown that the classification system grants consistency once the change targets the process generating the data in a new stationary state, as it is the case in many real applications.

  9. Novelty Detection Classifiers in Weed Mapping: Silybum marianum Detection on UAV Multispectral Images.

    Science.gov (United States)

    Alexandridis, Thomas K; Tamouridou, Afroditi Alexandra; Pantazi, Xanthoula Eirini; Lagopodi, Anastasia L; Kashefi, Javid; Ovakoglou, Georgios; Polychronos, Vassilios; Moshou, Dimitrios

    2017-09-01

    In the present study, the detection and mapping of Silybum marianum (L.) Gaertn. weed using novelty detection classifiers is reported. A multispectral camera (green-red-NIR) on board a fixed wing unmanned aerial vehicle (UAV) was employed for obtaining high-resolution images. Four novelty detection classifiers were used to identify S. marianum between other vegetation in a field. The classifiers were One Class Support Vector Machine (OC-SVM), One Class Self-Organizing Maps (OC-SOM), Autoencoders and One Class Principal Component Analysis (OC-PCA). As input features to the novelty detection classifiers, the three spectral bands and texture were used. The S. marianum identification accuracy using OC-SVM reached an overall accuracy of 96%. The results show the feasibility of effective S. marianum mapping by means of novelty detection classifiers acting on multispectral UAV imagery.

  10. Interface Prostheses With Classifier-Feedback-Based User Training.

    Science.gov (United States)

    Fang, Yinfeng; Zhou, Dalin; Li, Kairu; Liu, Honghai

    2017-11-01

    It is evident that user training significantly affects performance of pattern-recognition-based myoelectric prosthetic device control. Despite plausible classification accuracy on offline datasets, online accuracy usually suffers from the changes in physiological conditions and electrode displacement. The user ability in generating consistent electromyographic (EMG) patterns can be enhanced via proper user training strategies in order to improve online performance. This study proposes a clustering-feedback strategy that provides real-time feedback to users by means of a visualized online EMG signal input as well as the centroids of the training samples, whose dimensionality is reduced to minimal number by dimension reduction. Clustering feedback provides a criterion that guides users to adjust motion gestures and muscle contraction forces intentionally. The experiment results have demonstrated that hand motion recognition accuracy increases steadily along the progress of the clustering-feedback-based user training, while conventional classifier-feedback methods, i.e., label feedback, hardly achieve any improvement. The result concludes that the use of proper classifier feedback can accelerate the process of user training, and implies prosperous future for the amputees with limited or no experience in pattern-recognition-based prosthetic device manipulation.It is evident that user training significantly affects performance of pattern-recognition-based myoelectric prosthetic device control. Despite plausible classification accuracy on offline datasets, online accuracy usually suffers from the changes in physiological conditions and electrode displacement. The user ability in generating consistent electromyographic (EMG) patterns can be enhanced via proper user training strategies in order to improve online performance. This study proposes a clustering-feedback strategy that provides real-time feedback to users by means of a visualized online EMG signal input as well

  11. Identification of flooded area from satellite images using Hybrid Kohonen Fuzzy C-Means sigma classifier

    Directory of Open Access Journals (Sweden)

    Krishna Kant Singh

    2017-06-01

    Full Text Available A novel neuro fuzzy classifier Hybrid Kohonen Fuzzy C-Means-σ (HKFCM-σ is proposed in this paper. The proposed classifier is a hybridization of Kohonen Clustering Network (KCN with FCM-σ clustering algorithm. The network architecture of HKFCM-σ is similar to simple KCN network having only two layers, i.e., input and output layer. However, the selection of winner neuron is done based on FCM-σ algorithm. Thus, embedding the features of both, a neural network and a fuzzy clustering algorithm in the classifier. This hybridization results in a more efficient, less complex and faster classifier for classifying satellite images. HKFCM-σ is used to identify the flooding that occurred in Kashmir area in September 2014. The HKFCM-σ classifier is applied on pre and post flooding Landsat 8 OLI images of Kashmir to detect the areas that were flooded due to the heavy rainfalls of September, 2014. The classifier is trained using the mean values of the various spectral indices like NDVI, NDWI, NDBI and first component of Principal Component Analysis. The error matrix was computed to test the performance of the method. The method yields high producer’s accuracy, consumer’s accuracy and kappa coefficient value indicating that the proposed classifier is highly effective and efficient.

  12. Early Diagnosis of Breast Cancer by Identifying Malignant Cells Within Neoplasias Histologically Classified as Benign

    National Research Council Canada - National Science Library

    Lelievre, Sophie A

    2005-01-01

    Current diagnostic tools permit the classification of breast neoplasias into categories that represent different relative risks of developing cancer, but they do not indicate which particular lesion...

  13. A Machine Learning Approach for Identifying and Classifying Faults in Wireless Sensor Networks

    NARCIS (Netherlands)

    Warriach, Ehsan Ullah; Aiello, Marco; Tei, Kenji

    2012-01-01

    Wireless Sensor Network (WSN) deployment experiences show that collected data is prone to be faulty. Faults are due to internal and external influences, such as calibration, low battery, environmental interference and sensor aging. However, only few solutions exist to deal with faulty sensory data

  14. Identifying and classifying water hyacinth (Eichhornia crassipes) using the HyMap sensor

    Science.gov (United States)

    Rajapakse, Sepalika S.; Khanna, Shruti; Andrew, Margaret E.; Ustin, Susan L.; Lay, Mui

    2006-08-01

    In recent years, the impact of aquatic invasive species on biodiversity has become a major global concern. In the Sacramento-San Joaquin Delta region in the Central Valley of California, USA, dense infestations of the invasive aquatic emergent weed, water hyacinth (Eichhornia crassipes) interfere with ecosystem functioning. This silent invader constantly encroaches into waterways, eventually making them unusable by people and uninhabitable to aquatic fauna. Quantifying and mapping invasive plant species in aquatic ecosystems is important for efficient management and implementation of mitigation measures. This paper evaluates the ability of hyperspectral imagery, acquired using the HyMap sensor, for mapping water hyacinth in the Sacramento-San Joaquin Delta region. Classification was performed on sixty-four flightlines acquired over the study site using a decision tree which incorporated Spectral Angle Mapper (SAM) algorithm, absorption feature parameters in the spectral region between 0.4 and 2.5μm, and spectral endmembers. The total image dataset was 130GB. Spectral signatures of other emergent aquatic species like pennywort (Hydrocotyle ranunculoides) and water primrose (Ludwigia peploides) showed close similarity with the water hyacinth spectrum, however, the decision tree successfully discriminated water hyacinth from other emergent aquatic vegetation species. The classification algorithm showed high accuracy (κ value = 0.8) in discriminating water hyacinth.

  15. Stress fracture development classified by bone scintigraphy

    International Nuclear Information System (INIS)

    Zwas, S.T.; Elkanovich, R.; Frank, G.; Aharonson, Z.

    1985-01-01

    There is no consensus on classifying stress fractures (SF) appearing on bone scans. The authors present a system of classification based on grading the severity and development of bone lesions by visual inspection, according to three main scintigraphic criteria: focality and size, intensity of uptake compare to adjacent bone, and local medular extension. Four grades of development (I-IV) were ranked, ranging from ill defined slightly increased cortical uptake to well defined regions with markedly increased uptake extending transversely bicortically. 310 male subjects aged 19-2, suffering several weeks from leg pains occurring during intensive physical training underwent bone scans of the pelvis and lower extremities using Tc-99-m-MDP. 76% of the scans were positive with 354 lesions, of which 88% were in th4e mild (I-II) grades and 12% in the moderate (III) and severe (IV) grades. Post-treatment scans were obtained in 65 cases having 78 lesions during 1- to 6-month intervals. Complete resolution was found after 1-2 months in 36% of the mild lesions but in only 12% of the moderate and severe ones, and after 3-6 months in 55% of the mild lesions and 15% of the severe ones. 75% of the moderate and severe lesions showed residual uptake in various stages throughout the follow-up period. Early recognition and treatment of mild SF lesions in this study prevented protracted disability and progression of the lesions and facilitated complete healing

  16. Classifying MCI Subtypes in Community-Dwelling Elderly Using Cross-Sectional and Longitudinal MRI-Based Biomarkers

    Directory of Open Access Journals (Sweden)

    Hao Guan

    2017-09-01

    Full Text Available Amnestic MCI (aMCI and non-amnestic MCI (naMCI are considered to differ in etiology and outcome. Accurately classifying MCI into meaningful subtypes would enable early intervention with targeted treatment. In this study, we employed structural magnetic resonance imaging (MRI for MCI subtype classification. This was carried out in a sample of 184 community-dwelling individuals (aged 73–85 years. Cortical surface based measurements were computed from longitudinal and cross-sectional scans. By introducing a feature selection algorithm, we identified a set of discriminative features, and further investigated the temporal patterns of these features. A voting classifier was trained and evaluated via 10 iterations of cross-validation. The best classification accuracies achieved were: 77% (naMCI vs. aMCI, 81% (aMCI vs. cognitively normal (CN and 70% (naMCI vs. CN. The best results for differentiating aMCI from naMCI were achieved with baseline features. Hippocampus, amygdala and frontal pole were found to be most discriminative for classifying MCI subtypes. Additionally, we observed the dynamics of classification of several MRI biomarkers. Learning the dynamics of atrophy may aid in the development of better biomarkers, as it may track the progression of cognitive impairment.

  17. A predictive toxicogenomics signature to classify genotoxic versus non-genotoxic chemicals in human TK6 cells

    Directory of Open Access Journals (Sweden)

    Andrew Williams

    2015-12-01

    Full Text Available Genotoxicity testing is a critical component of chemical assessment. The use of integrated approaches in genetic toxicology, including the incorporation of gene expression data to determine the DNA damage response pathways involved in response, is becoming more common. In companion papers previously published in Environmental and Molecular Mutagenesis, Li et al. (2015 [6] developed a dose optimization protocol that was based on evaluating expression changes in several well-characterized stress-response genes using quantitative real-time PCR in human lymphoblastoid TK6 cells in culture. This optimization approach was applied to the analysis of TK6 cells exposed to one of 14 genotoxic or 14 non-genotoxic agents, with sampling 4 h post-exposure. Microarray-based transcriptomic analyses were then used to develop a classifier for genotoxicity using the nearest shrunken centroids method. A panel of 65 genes was identified that could accurately classify toxicants as genotoxic or non-genotoxic. In Buick et al. (2015 [1], the utility of the biomarker for chemicals that require metabolic activation was evaluated. In this study, TK6 cells were exposed to increasing doses of four chemicals (two genotoxic that require metabolic activation and two non-genotoxic chemicals in the presence of rat liver S9 to demonstrate that S9 does not impair the ability to classify genotoxicity using this genomic biomarker in TK6cells.

  18. Rapid Land Cover Map Updates Using Change Detection and Robust Random Forest Classifiers

    Directory of Open Access Journals (Sweden)

    Konrad J. Wessels

    2016-10-01

    Full Text Available The paper evaluated the Landsat Automated Land Cover Update Mapping (LALCUM system designed to rapidly update a land cover map to a desired nominal year using a pre-existing reference land cover map. The system uses the Iteratively Reweighted Multivariate Alteration Detection (IRMAD to identify areas of change and no change. The system then automatically generates large amounts of training samples (n > 1 million in the no-change areas as input to an optimized Random Forest classifier. Experiments were conducted in the KwaZulu-Natal Province of South Africa using a reference land cover map from 2008, a change mask between 2008 and 2011 and Landsat ETM+ data for 2011. The entire system took 9.5 h to process. We expected that the use of the change mask would improve classification accuracy by reducing the number of mislabeled training data caused by land cover change between 2008 and 2011. However, this was not the case due to exceptional robustness of Random Forest classifier to mislabeled training samples. The system achieved an overall accuracy of 65%–67% using 22 detailed classes and 72%–74% using 12 aggregated national classes. “Water”, “Plantations”, “Plantations—clearfelled”, “Orchards—trees”, “Sugarcane”, “Built-up/dense settlement”, “Cultivation—Irrigated” and “Forest (indigenous” had user’s accuracies above 70%. Other detailed classes (e.g., “Low density settlements”, “Mines and Quarries”, and “Cultivation, subsistence, drylands” which are required for operational, provincial-scale land use planning and are usually mapped using manual image interpretation, could not be mapped using Landsat spectral data alone. However, the system was able to map the 12 national classes, at a sufficiently high level of accuracy for national scale land cover monitoring. This update approach and the highly automated, scalable LALCUM system can improve the efficiency and update rate of regional land

  19. Utility of functional status for classifying community versus institutional discharges after inpatient rehabilitation for stroke.

    Science.gov (United States)

    Reistetter, Timothy A; Graham, James E; Deutsch, Anne; Granger, Carl V; Markello, Samuel; Ottenbacher, Kenneth J

    2010-03-01

    To evaluate the ability of patient functional status to differentiate between community and institutional discharges after rehabilitation for stroke. Retrospective cross-sectional design. Inpatient rehabilitation facilities contributing to the Uniform Data System for Medical Rehabilitation. Patients (N=157,066) receiving inpatient rehabilitation for stroke from 2006 and 2007. Not applicable. Discharge FIM rating and discharge setting (community vs institutional). Approximately 71% of the sample was discharged to the community. Receiver operating characteristic curve analyses revealed that FIM total performed as well as or better than FIM motor and FIM cognition subscales in differentiating discharge settings. Area under the curve for FIM total was .85, indicating very good ability to identify persons discharged to the community. A FIM total rating of 78 was identified as the optimal cut point for distinguishing between positive (community) and negative (institution) tests. This cut point yielded balanced sensitivity and specificity (both=.77). Discharge planning is complex, involving many factors. Identifying a functional threshold for classifying discharge settings can provide important information to assist in this process. Additional research is needed to determine if the risks and benefits of classification errors justify shifting the cut point to weight either sensitivity or specificity of FIM ratings. Copyright 2010 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  20. 41 CFR 105-62.102 - Authority to originally classify.

    Science.gov (United States)

    2010-07-01

    ... originally classify. (a) Top secret, secret, and confidential. The authority to originally classify information as Top Secret, Secret, or Confidential may be exercised only by the Administrator and is delegable...

  1. Naive Bayesian classifiers for multinomial features: a theoretical analysis

    CSIR Research Space (South Africa)

    Van Dyk, E

    2007-11-01

    Full Text Available The authors investigate the use of naive Bayesian classifiers for multinomial feature spaces and derive error estimates for these classifiers. The error analysis is done by developing a mathematical model to estimate the probability density...

  2. Ensemble of classifiers based network intrusion detection system performance bound

    CSIR Research Space (South Africa)

    Mkuzangwe, Nenekazi NP

    2017-11-01

    Full Text Available This paper provides a performance bound of a network intrusion detection system (NIDS) that uses an ensemble of classifiers. Currently researchers rely on implementing the ensemble of classifiers based NIDS before they can determine the performance...

  3. Sex Bias in Classifying Borderline and Narcissistic Personality Disorder.

    Science.gov (United States)

    Braamhorst, Wouter; Lobbestael, Jill; Emons, Wilco H M; Arntz, Arnoud; Witteman, Cilia L M; Bekker, Marrie H J

    2015-10-01

    This study investigated sex bias in the classification of borderline and narcissistic personality disorders. A sample of psychologists in training for a post-master degree (N = 180) read brief case histories (male or female version) and made DSM classification. To differentiate sex bias due to sex stereotyping or to base rate variation, we used different case histories, respectively: (1) non-ambiguous case histories with enough criteria of either borderline or narcissistic personality disorder to meet the threshold for classification, and (2) an ambiguous case with subthreshold features of both borderline and narcissistic personality disorder. Results showed significant differences due to sex of the patient in the ambiguous condition. Thus, when the diagnosis is not straightforward, as in the case of mixed subthreshold features, sex bias is present and is influenced by base-rate variation. These findings emphasize the need for caution in classifying personality disorders, especially borderline or narcissistic traits.

  4. Fast Most Similar Neighbor (MSN) classifiers for Mixed Data

    OpenAIRE

    Hernández Rodríguez, Selene

    2010-01-01

    The k nearest neighbor (k-NN) classifier has been extensively used in Pattern Recognition because of its simplicity and its good performance. However, in large datasets applications, the exhaustive k-NN classifier becomes impractical. Therefore, many fast k-NN classifiers have been developed; most of them rely on metric properties (usually the triangle inequality) to reduce the number of prototype comparisons. Hence, the existing fast k-NN classifiers are applicable only when the comparison f...

  5. Recognition of pornographic web pages by classifying texts and images.

    Science.gov (United States)

    Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve

    2007-06-01

    With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.

  6. 32 CFR 2400.28 - Dissemination of classified information.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Dissemination of classified information. 2400.28... SECURITY PROGRAM Safeguarding § 2400.28 Dissemination of classified information. Heads of OSTP offices... originating official may prescribe specific restrictions on dissemination of classified information when...

  7. Classifier utility modeling and analysis of hypersonic inlet start/unstart considering training data costs

    Science.gov (United States)

    Chang, Juntao; Hu, Qinghua; Yu, Daren; Bao, Wen

    2011-11-01

    Start/unstart detection is one of the most important issues of hypersonic inlets and is also the foundation of protection control of scramjet. The inlet start/unstart detection can be attributed to a standard pattern classification problem, and the training sample costs have to be considered for the classifier modeling as the CFD numerical simulations and wind tunnel experiments of hypersonic inlets both cost time and money. To solve this problem, the CFD simulation of inlet is studied at first step, and the simulation results could provide the training data for pattern classification of hypersonic inlet start/unstart. Then the classifier modeling technology and maximum classifier utility theories are introduced to analyze the effect of training data cost on classifier utility. In conclusion, it is useful to introduce support vector machine algorithms to acquire the classifier model of hypersonic inlet start/unstart, and the minimum total cost of hypersonic inlet start/unstart classifier can be obtained by the maximum classifier utility theories.

  8. Application of chemometric techniques to classify the quality of surface water in the watershed of the river Bermudez in Heredia, Costa Rica

    International Nuclear Information System (INIS)

    Herrera Murillo, Jorge; Rodriguez Roman, Susana; Solis Torres, Ligia Dina; Castro Delgado, Francisco

    2009-01-01

    The application of selected chemometric techniques have been investigated: cluster analysis, principal component analysis and factor analysis, to classify the quality of rivers water and evaluate pollution data. Fourteen physicochemical parameters were monitored at 10 stations located in the watershed of the river Bermudez, from August 2005 to February 2007. The results have identified the existence of two natural clusters of monitoring sites with similar characteristics of contamination and identify the DQO, DBO, NO 3 - , SO 4 -2 and SST, as the main variables that discriminate between sampling sites. (author) [es

  9. Multivariate analysis of quantitative traits can effectively classify rapeseed germplasm

    Directory of Open Access Journals (Sweden)

    Jankulovska Mirjana

    2014-01-01

    Full Text Available In this study, the use of different multivariate approaches to classify rapeseed genotypes based on quantitative traits has been presented. Tree regression analysis, PCA analysis and two-way cluster analysis were applied in order todescribe and understand the extent of genetic variability in spring rapeseed genotype by trait data. The traits which highly influenced seed and oil yield in rapeseed were successfully identified by the tree regression analysis. Principal predictor for both response variables was number of pods per plant (NP. NP and 1000 seed weight could help in the selection of high yielding genotypes. High values for both traits and oil content could lead to high oil yielding genotypes. These traits may serve as indirect selection criteria and can lead to improvement of seed and oil yield in rapeseed. Quantitative traits that explained most of the variability in the studied germplasm were classified using principal component analysis. In this data set, five PCs were identified, out of which the first three PCs explained 63% of the total variance. It helped in facilitating the choice of variables based on which the genotypes’ clustering could be performed. The two-way cluster analysissimultaneously clustered genotypes and quantitative traits. The final number of clusters was determined using bootstrapping technique. This approach provided clear overview on the variability of the analyzed genotypes. The genotypes that have similar performance regarding the traits included in this study can be easily detected on the heatmap. Genotypes grouped in the clusters 1 and 8 had high values for seed and oil yield, and relatively short vegetative growth duration period and those in cluster 9, combined moderate to low values for vegetative growth duration and moderate to high seed and oil yield. These genotypes should be further exploited and implemented in the rapeseed breeding program. The combined application of these multivariate methods

  10. Identifying Effectiveness Criteria for Internet Payment Systems.

    Science.gov (United States)

    Shon, Tae-Hwan; Swatman, Paula M. C.

    1998-01-01

    Examines Internet payment systems (IPS): third-party, card, secure Web server, electronic token, financial electronic data interchange (EDI), and micropayment based. Reports the results of a Delphi survey of experts identifying and classifying IPS effectiveness criteria and classifying types of IPS providers. Includes the survey invitation letter…

  11. Peat classified as slowly renewable biomass fuel

    International Nuclear Information System (INIS)

    2001-01-01

    thousands of years. The report states also that peat should be classified as biomass fuel instead of biofuels, such as wood, or fossil fuels such as coal. According to the report peat is a renewable biomass fuel like biofuels, but due to slow accumulation it should be considered as slowly renewable fuel. The report estimates that bonding of carbon in both virgin and forest drained peatlands are so high that it can compensate the emissions formed in combustion of energy peat

  12. Classifying the Optical Morphology of Shocked POststarburst Galaxies

    Science.gov (United States)

    Stewart, Tess; SPOGs Team

    2018-01-01

    The Shocked POststarburst Galaxy Survey (SPOGS) is a sample of galaxies in transition from blue, star forming spirals to red, inactive ellipticals. These galaxies are earlier in the transition than classical poststarburst samples. We have classified the physical characteristics of the full sample of 1067 SPOGs in 7 categories, covering (1) their shape; (2) the relative prominence of their nuclei; (3) the uniformity of their optical color; (4) whether the outskirts of the galaxy were indicative of on-going star formation; (5) whether they are engaged in interactions with other galaxies, and if so, (6) the kinds of galaxies with which they are interacting; and (7) the presence of asymmetrical features, possibly indicative of recent interactions. We determined that a plurality of SPOGs are in elliptical galaxies, indicating morphological transformations may tend to conclude before other indicators of transitions have faded. Further, early-type SPOGs also tend to have the brightest optical nuclei. Most galaxies do not show signs of current or recent interactions. We used these classifications to search for correlations between qualitative and quantitative characteristics of SPOGs using Sloan Digital Sky Survey and Wide-field Infrared Survey Explorer magnitudes. We find that relative optical nuclear brightness is not a good indicator of the presence of an active galactic nuclei and that galaxies with visible indications of active star formation also cluster in optical color and diagnostic line ratios.

  13. Addressing the Challenge of Defining Valid Proteomic Biomarkers and Classifiers

    LENUS (Irish Health Repository)

    Dakna, Mohammed

    2010-12-10

    Abstract Background The purpose of this manuscript is to provide, based on an extensive analysis of a proteomic data set, suggestions for proper statistical analysis for the discovery of sets of clinically relevant biomarkers. As tractable example we define the measurable proteomic differences between apparently healthy adult males and females. We choose urine as body-fluid of interest and CE-MS, a thoroughly validated platform technology, allowing for routine analysis of a large number of samples. The second urine of the morning was collected from apparently healthy male and female volunteers (aged 21-40) in the course of the routine medical check-up before recruitment at the Hannover Medical School. Results We found that the Wilcoxon-test is best suited for the definition of potential biomarkers. Adjustment for multiple testing is necessary. Sample size estimation can be performed based on a small number of observations via resampling from pilot data. Machine learning algorithms appear ideally suited to generate classifiers. Assessment of any results in an independent test-set is essential. Conclusions Valid proteomic biomarkers for diagnosis and prognosis only can be defined by applying proper statistical data mining procedures. In particular, a justification of the sample size should be part of the study design.

  14. Comparison of several chemometric methods of libraries and classifiers for the analysis of expired drugs based on Raman spectra.

    Science.gov (United States)

    Gao, Qun; Liu, Yan; Li, Hao; Chen, Hui; Chai, Yifeng; Lu, Feng

    2014-06-01

    Some expired drugs are difficult to detect by conventional means. If they are repackaged and sold back into market, they will constitute a new public health challenge. For the detection of repackaged expired drugs within specification, paracetamol tablet from a manufacturer was used as a model drug in this study for comparison of Raman spectra-based library verification and classification methods. Raman spectra of different batches of paracetamol tablets were collected and a library including standard spectra of unexpired batches of tablets was established. The Raman spectrum of each sample was identified by cosine and correlation with the standard spectrum. The average HQI of the suspicious samples and the standard spectrum were calculated. The optimum threshold values were 0.997 and 0.998 respectively as a result of ROC and four evaluations, for which the accuracy was up to 97%. Three supervised classifiers, PLS-DA, SVM and k-NN, were chosen to establish two-class classification models and compared subsequently. They were used to establish a classification of expired batches and an unexpired batch, and predict the suspect samples. The average accuracy was 90.12%, 96.80% and 89.37% respectively. Different pre-processing techniques were tried to find that first derivative was optimal for methods of libraries and max-min normalization was optimal for that of classifiers. The results obtained from these studies indicated both libraries and classifier methods could detect the expired drugs effectively, and they should be used complementarily in the fast-screening. Copyright © 2014 Elsevier B.V. All rights reserved.

  15. Minimal methylation classifier (MIMIC): A novel method for derivation and rapid diagnostic detection of disease-associated DNA methylation signatures.

    Science.gov (United States)

    Schwalbe, E C; Hicks, D; Rafiee, G; Bashton, M; Gohlke, H; Enshaei, A; Potluri, S; Matthiesen, J; Mather, M; Taleongpong, P; Chaston, R; Silmon, A; Curtis, A; Lindsey, J C; Crosier, S; Smith, A J; Goschzik, T; Doz, F; Rutkowski, S; Lannering, B; Pietsch, T; Bailey, S; Williamson, D; Clifford, S C

    2017-10-18

    Rapid and reliable detection of disease-associated DNA methylation patterns has major potential to advance molecular diagnostics and underpin research investigations. We describe the development and validation of minimal methylation classifier (MIMIC), combining CpG signature design from genome-wide datasets, multiplex-PCR and detection by single-base extension and MALDI-TOF mass spectrometry, in a novel method to assess multi-locus DNA methylation profiles within routine clinically-applicable assays. We illustrate the application of MIMIC to successfully identify the methylation-dependent diagnostic molecular subgroups of medulloblastoma (the most common malignant childhood brain tumour), using scant/low-quality samples remaining from the most recently completed pan-European medulloblastoma clinical trial, refractory to analysis by conventional genome-wide DNA methylation analysis. Using this approach, we identify critical DNA methylation patterns from previously inaccessible cohorts, and reveal novel survival differences between the medulloblastoma disease subgroups with significant potential for clinical exploitation.

  16. Quest to identify geochemical risk factors associated with chronic kidney disease of unknown etiology (CKDu) in an endemic region of Sri Lanka-a multimedia laboratory analysis of biological, food, and environmental samples.

    Science.gov (United States)

    Levine, Keith E; Redmon, Jennifer Hoponick; Elledge, Myles F; Wanigasuriya, Kamani P; Smith, Kristin; Munoz, Breda; Waduge, Vajira A; Periris-John, Roshini J; Sathiakumar, Nalini; Harrington, James M; Womack, Donna S; Wickremasinghe, Rajitha

    2016-10-01

    fluoride, iron, manganese, sodium, and lead exceeding applicable drinking water standards in some instances. Current literature suggests that the etiology of CKDu is likely multifactorial, with no single biological or hydrogeochemical parameter directly related to disease genesis and progression. This preliminary screening identified that specific constituents may be present above levels of concern, but does not compare results against specific kidney toxicity values or cumulative risk related to a multifactorial disease process. The data collected from this limited investigation are intended to be used in the subsequent study design of a comprehensive and multifactorial etiological study of CKDu risk factors that includes sample collection, individual surveys, and laboratory analyses to more fully evaluate the potential environmental, behavioral, genetic, and lifestyle risk factors associated with CKDu.

  17. The use of hyperspectral data for tree species discrimination: Combining binary classifiers

    CSIR Research Space (South Africa)

    Dastile, X

    2010-11-01

    Full Text Available classifier Classification system 7 class 1 class 2 new sample For 5-nearest neighbour classification: assign new sample to class 1. RU SASA 2010 ? Given learning task {(x1,t1),(x 2,t2),?,(x p,tp)} (xi ? Rn feature vectors, ti ? {?1,?, ?c...). A review on the combination of binary classifiers in multiclass problems. Springer science and Business Media B.V [7] Dietterich T.G and Bakiri G.(1995). Solving Multiclass Learning Problem via Error-Correcting Output Codes. AI Access Foundation...

  18. Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy

    Science.gov (United States)

    2017-01-01

    Background Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor’s activity for the purposes of quality assurance, safety, and continuing professional development. Objective The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors’ professional performance in the United Kingdom. Methods We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians’ colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Results Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to “popular” (recall=.97), “innovator” (recall=.98), and “respected” (recall=.87) codes and was lower for the “interpersonal” (recall=.80) and “professional” (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as “respected,” “professional,” and “interpersonal” related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P.05). Conclusions Machine learning algorithms can classify open-text feedback

  19. Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy.

    Science.gov (United States)

    Gibbons, Chris; Richards, Suzanne; Valderas, Jose Maria; Campbell, John

    2017-03-15

    Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor's activity for the purposes of quality assurance, safety, and continuing professional development. The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors' professional performance in the United Kingdom. We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians' colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to "popular" (recall=.97), "innovator" (recall=.98), and "respected" (recall=.87) codes and was lower for the "interpersonal" (recall=.80) and "professional" (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as "respected," "professional," and "interpersonal" related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P.05). Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high

  20. Fuzzy set classifier for waste classification tracking

    International Nuclear Information System (INIS)

    Gavel, D.T.

    1992-01-01

    We have developed an expert system based on fuzzy logic theory to fuse the data from multiple sensors and make classification decisions for objects in a waste reprocessing stream. Fuzzy set theory has been applied in decision and control applications with some success, particularly by the Japanese. We have found that the fuzzy logic system is rather easy to design and train, a feature that can cut development costs considerably. With proper training, the classification accuracy is quite high. We performed several tests sorting radioactive test samples using a gamma spectrometer to compare fuzzy logic to more conventional sorting schemes

  1. Re-interpreting Prominences Classified as Tornadoes

    Science.gov (United States)

    Martin, Sara F.; Venkataramanasastry, Aparna

    2015-04-01

    Some papers in the recent literature identify tornado prominences with barbs of quiescent prominences while papers in the much older historic literature include a second category of tornado prominence that does not correspond to a barb of a quiescent prominence. The latter are described as prominence mass rotating around a nearly vertical axis prior to its eruption and the rotation was verified by spectral measurements. From H alpha Doppler-shifted mass motions recorded at Helio Research or the Dutch Open Telescope, we illustrate how the apparent tornado-like motions, identified with barbs, are illusions in our mind’s eye resulting from poorly resolved counterstreaming threads of mass in the barbs of quiescent prominences. In contrast, we confirm the second category of rotational motion in prominences shortly before and during eruption. In addition, we identify this second category as part of the late phase of a phenomenon called the roll effect in erupting prominences. In these cases, the eruption begins with the sideways rolling of the top of a prominence. As the eruption proceeds the rolling motion propagates down one leg or both legs of the prominence depending on whether the eruption is asymmetric or symmetric respectively. As an asymmetric eruption continues, the longer lasting leg becomes nearly vertical and its rotational motion also continues. If only this phase of the eruption was observed, as in some historic cases, it was called a tornado prominence. However, when we now observe entire eruptions in time-lapse sequences, the similarity to terrestrial tornadoes is lost. We conclude that neither prominence barbs, that give the illusion of rotation, nor the cases of true rotational motion, in the legs of erupting prominences, are usefully described as tornado prominences when the complete prominence structure or complete erupting event is observed.

  2. A deep learning method for classifying mammographic breast density categories.

    Science.gov (United States)

    Mohamed, Aly A; Berg, Wendie A; Peng, Hong; Luo, Yahong; Jankowitz, Rachel C; Wu, Shandong

    2018-01-01

    Mammographic breast density is an established risk marker for breast cancer and is visually assessed by radiologists in routine mammogram image reading, using four qualitative Breast Imaging and Reporting Data System (BI-RADS) breast density categories. It is particularly difficult for radiologists to consistently distinguish the two most common and most variably assigned BI-RADS categories, i.e., "scattered density" and "heterogeneously dense". The aim of this work was to investigate a deep learning-based breast density classifier to consistently distinguish these two categories, aiming at providing a potential computerized tool to assist radiologists in assigning a BI-RADS category in current clinical workflow. In this study, we constructed a convolutional neural network (CNN)-based model coupled with a large (i.e., 22,000 images) digital mammogram imaging dataset to evaluate the classification performance between the two aforementioned breast density categories. All images were collected from a cohort of 1,427 women who underwent standard digital mammography screening from 2005 to 2016 at our institution. The truths of the density categories were based on standard clinical assessment made by board-certified breast imaging radiologists. Effects of direct training from scratch solely using digital mammogram images and transfer learning of a pretrained model on a large nonmedical imaging dataset were evaluated for the specific task of breast density classification. In order to measure the classification performance, the CNN classifier was also tested on a refined version of the mammogram image dataset by removing some potentially inaccurately labeled images. Receiver operating characteristic (ROC) curves and the area under the curve (AUC) were used to measure the accuracy of the classifier. The AUC was 0.9421 when the CNN-model was trained from scratch on our own mammogram images, and the accuracy increased gradually along with an increased size of training samples

  3. 48 CFR 952.223-76 - Conditional payment of fee or profit-safeguarding restricted data and other classified...

    Science.gov (United States)

    2010-10-01

    ... disclosure of Top Secret Restricted Data or other information classified as Top Secret, any classification... result in the loss, compromise, or unauthorized disclosure of Top Secret Restricted Data, or other information classified as Top Secret, any classification level of information in a SAP, information identified...

  4. 48 CFR 952.204-76 - Conditional payment of fee or profit-safeguarding restricted data and other classified information.

    Science.gov (United States)

    2010-10-01

    ... disclosure of Top Secret Restricted Data or other information classified as Top Secret, any classification... result in the loss, compromise, or unauthorized disclosure of Top Secret Restricted Data, or other information classified as Top Secret, any classification level of information in a SAP, information identified...

  5. Classifying indicators of quality: a collaboration between Dutch and English regulators.

    Science.gov (United States)

    Mears, Alex; Vesseur, Jan; Hamblin, Richard; Long, Paul; Den Ouden, Lya

    2011-12-01

    Many approaches to measuring quality in healthcare exist, generally employing indicators or metrics. While there are important differences, most of these approaches share three key areas of measurement: safety, effectiveness and patient experience. The European Partnership for Supervisory Organisations in Health Services and Social Care (EPSO) exists as a working group and discussion forum for European regulators. This group undertook to identify a common framework within which European approaches to indicators could be compared. A framework was developed to classify indicators, using four sets of criteria: conceptualization of quality, Donabedian definition (structure, process, outcome), data type (derivable, collectable from routine sources, special collections, samples) and data use (judgement (singular or part of framework) benchmarking, risk assessment). Indicators from English and Dutch hospital measurement programmes were put into the framework, showing areas of agreement and levels of comparability. In the first instance, results are only illustrative. The EPSO has been a powerful driver for undertaking cross-European research, and this project is the first of many to take advantage of the access to international expertize. It has shown that through development of a framework that deconstructs national indicators, commonalities can be identified. Future work will attempt to incorporate other nations' indicators, and attempt cross-national comparison.

  6. Machine Learning Based Classifier for Falsehood Detection

    Science.gov (United States)

    Mallikarjun, H. M.; Manimegalai, P., Dr.; Suresh, H. N., Dr.

    2017-08-01

    The investigation of physiological techniques for Falsehood identification tests utilizing the enthusiastic aggravations started as a part of mid 1900s. The need of Falsehood recognition has been a piece of our general public from hundreds of years back. Different requirements drifted over the general public raising the need to create trick evidence philosophies for Falsehood identification. The established similar addressing tests have been having a tendency to gather uncertain results against which new hearty strategies are being explored upon for acquiring more productive Falsehood discovery set up. Electroencephalography (EEG) is a non-obtrusive strategy to quantify the action of mind through the anodes appended to the scalp of a subject. Electroencephalogram is a record of the electric signs produced by the synchronous activity of mind cells over a timeframe. The fundamental goal is to accumulate and distinguish the important information through this action which can be acclimatized for giving surmising to Falsehood discovery in future analysis. This work proposes a strategy for Falsehood discovery utilizing EEG database recorded on irregular people of various age gatherings and social organizations. The factual investigation is directed utilizing MATLAB v-14. It is a superior dialect for specialized registering which spares a considerable measure of time with streamlined investigation systems. In this work center is made on Falsehood Classification by Support Vector Machine (SVM). 72 Samples are set up by making inquiries from standard poll with a Wright and wrong replies in a diverse era from the individual in wearable head unit. 52 samples are trained and 20 are tested. By utilizing Bluetooth based Neurosky’s Mindwave kit, brain waves are recorded and qualities are arranged appropriately. In this work confusion matrix is derived by matlab programs and accuracy of 56.25 % is achieved.

  7. Classifying quantum entanglement through topological links

    Science.gov (United States)

    Quinta, Gonçalo M.; André, Rui

    2018-04-01

    We propose an alternative classification scheme for quantum entanglement based on topological links. This is done by identifying a nonrigid ring to a particle, attributing the act of cutting and removing a ring to the operation of tracing out the particle, and associating linked rings to entangled particles. This analogy naturally leads us to a classification of multipartite quantum entanglement based on all possible distinct links for a given number of rings. To determine all different possibilities, we develop a formalism that associates any link to a polynomial, with each polynomial thereby defining a distinct equivalence class. To demonstrate the use of this classification scheme, we choose qubit quantum states as our example of physical system. A possible procedure to obtain qubit states from the polynomials is also introduced, providing an example state for each link class. We apply the formalism for the quantum systems of three and four qubits and demonstrate the potential of these tools in a context of qubit networks.

  8. Classifying and Visualising Roman Pottery using Computer-scanned Typologies

    Directory of Open Access Journals (Sweden)

    Jacqueline Christmas

    2018-05-01

    Full Text Available For many archaeological assemblages and type-series, accurate drawings of standardised pottery vessels have been recorded in consistent styles. This provides the opportunity to extract individual pot drawings and derive from them data that can be used for analysis and visualisation. Starting from PDF scans of the original pages of pot drawings, we have automated much of the process for locating, defining the boundaries, extracting and orientating each individual pot drawing. From these processed images, basic features such as width and height, the volume of the interior, the edges, and the shape of the cross-section outline are extracted and are then used to construct more complex features such as a measure of a pot's 'circularity'. Capturing these traits opens up new possibilities for (a classifying vessel form in a way that is sensitive to the physical characteristics of pots relative to other vessels in an assemblage, and (b visualising the results of quantifying assemblages using standard typologies. A frequently encountered problem when trying to compare pottery from different archaeological sites is that the pottery is classified into forms and labels using different standards. With a set of data from early Roman urban centres and related sites that has been labelled both with forms (e.g. 'platter' and 'bowl' and shape identifiers (based on the Camulodunum type-series, we use the extracted features from images to look both at how the pottery forms cluster for a given set of features, and at how the features may be used to compare finds from different sites.

  9. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.

    Science.gov (United States)

    Zhang, Yi; Ren, Jinchang; Jiang, Jianmin

    2015-01-01

    Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.

  10. Glycosyltransferase Gene Expression Profiles Classify Cancer Types and Propose Prognostic Subtypes

    Science.gov (United States)

    Ashkani, Jahanshah; Naidoo, Kevin J.

    2016-05-01

    Aberrant glycosylation in tumours stem from altered glycosyltransferase (GT) gene expression but can the expression profiles of these signature genes be used to classify cancer types and lead to cancer subtype discovery? The differential structural changes to cellular glycan structures are predominantly regulated by the expression patterns of GT genes and are a hallmark of neoplastic cell metamorphoses. We found that the expression of 210 GT genes taken from 1893 cancer patient samples in The Cancer Genome Atlas (TCGA) microarray data are able to classify six cancers; breast, ovarian, glioblastoma, kidney, colon and lung. The GT gene expression profiles are used to develop cancer classifiers and propose subtypes. The subclassification of breast cancer solid tumour samples illustrates the discovery of subgroups from GT genes that match well against basal-like and HER2-enriched subtypes and correlates to clinical, mutation and survival data. This cancer type glycosyltransferase gene signature finding provides foundational evidence for the centrality of glycosylation in cancer.

  11. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations

    Directory of Open Access Journals (Sweden)

    Yi Zhang

    2015-01-01

    Full Text Available Maximum likelihood classifier (MLC and support vector machines (SVM are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.

  12. Early esophageal cancer detection using RF classifiers

    Science.gov (United States)

    Janse, Markus H. A.; van der Sommen, Fons; Zinger, Svitlana; Schoon, Erik J.; de With, Peter H. N.

    2016-03-01

    Esophageal cancer is one of the fastest rising forms of cancer in the Western world. Using High-Definition (HD) endoscopy, gastroenterology experts can identify esophageal cancer at an early stage. Recent research shows that early cancer can be found using a state-of-the-art computer-aided detection (CADe) system based on analyzing static HD endoscopic images. Our research aims at extending this system by applying Random Forest (RF) classification, which introduces a confidence measure for detected cancer regions. To visualize this data, we propose a novel automated annotation system, employing the unique characteristics of the previous confidence measure. This approach allows reliable modeling of multi-expert knowledge and provides essential data for real-time video processing, to enable future use of the system in a clinical setting. The performance of the CADe system is evaluated on a 39-patient dataset, containing 100 images annotated by 5 expert gastroenterologists. The proposed system reaches a precision of 75% and recall of 90%, thereby improving the state-of-the-art results by 11 and 6 percentage points, respectively.

  13. Effective Sequential Classifier Training for SVM-Based Multitemporal Remote Sensing Image Classification

    Science.gov (United States)

    Guo, Yiqing; Jia, Xiuping; Paull, David

    2018-06-01

    The explosive availability of remote sensing images has challenged supervised classification algorithms such as Support Vector Machines (SVM), as training samples tend to be highly limited due to the expensive and laborious task of ground truthing. The temporal correlation and spectral similarity between multitemporal images have opened up an opportunity to alleviate this problem. In this study, a SVM-based Sequential Classifier Training (SCT-SVM) approach is proposed for multitemporal remote sensing image classification. The approach leverages the classifiers of previous images to reduce the required number of training samples for the classifier training of an incoming image. For each incoming image, a rough classifier is firstly predicted based on the temporal trend of a set of previous classifiers. The predicted classifier is then fine-tuned into a more accurate position with current training samples. This approach can be applied progressively to sequential image data, with only a small number of training samples being required from each image. Experiments were conducted with Sentinel-2A multitemporal data over an agricultural area in Australia. Results showed that the proposed SCT-SVM achieved better classification accuracies compared with two state-of-the-art model transfer algorithms. When training data are insufficient, the overall classification accuracy of the incoming image was improved from 76.18% to 94.02% with the proposed SCT-SVM, compared with those obtained without the assistance from previous images. These results demonstrate that the leverage of a priori information from previous images can provide advantageous assistance for later images in multitemporal image classification.

  14. A Supervised Multiclass Classifier for an Autocoding System

    Directory of Open Access Journals (Sweden)

    Yukako Toko

    2017-11-01

    Full Text Available Classification is often required in various contexts, including in the field of official statistics. In the previous study, we have developed a multiclass classifier that can classify short text descriptions with high accuracy. The algorithm borrows the concept of the naïve Bayes classifier and is so simple that its structure is easily understandable. The proposed classifier has the following two advantages. First, the processing times for both learning and classifying are extremely practical. Second, the proposed classifier yields high-accuracy results for a large portion of a dataset. We have previously developed an autocoding system for the Family Income and Expenditure Survey in Japan that has a better performing classifier. While the original system was developed in Perl in order to improve the efficiency of the coding process of short Japanese texts, the proposed system is implemented in the R programming language in order to explore versatility and is modified to make the system easily applicable to English text descriptions, in consideration of the increasing number of R users in the field of official statistics. We are planning to publish the proposed classifier as an R-package. The proposed classifier would be generally applicable to other classification tasks including coding activities in the field of official statistics, and it would contribute greatly to improving their efficiency.

  15. An improved early detection method of type-2 diabetes mellitus using multiple classifier system

    KAUST Repository

    Zhu, Jia

    2015-01-01

    The specific causes of complex diseases such as Type-2 Diabetes Mellitus (T2DM) have not yet been identified. Nevertheless, many medical science researchers believe that complex diseases are caused by a combination of genetic, environmental, and lifestyle factors. Detection of such diseases becomes an issue because it is not free from false presumptions and is accompanied by unpredictable effects. Given the greatly increased amount of data gathered in medical databases, data mining has been used widely in recent years to detect and improve the diagnosis of complex diseases. However, past research showed that no single classifier can be considered optimal for all problems. Therefore, in this paper, we focus on employing multiple classifier systems to improve the accuracy of detection for complex diseases, such as T2DM. We proposed a dynamic weighted voting scheme called multiple factors weighted combination for classifiers\\' decision combination. This method considers not only the local and global accuracy but also the diversity among classifiers and localized generalization error of each classifier. We evaluated our method on two real T2DM data sets and other medical data sets. The favorable results indicated that our proposed method significantly outperforms individual classifiers and other fusion methods.

  16. Ensembles of novelty detection classifiers for structural health monitoring using guided waves

    Science.gov (United States)

    Dib, Gerges; Karpenko, Oleksii; Koricho, Ermias; Khomenko, Anton; Haq, Mahmoodul; Udpa, Lalita

    2018-01-01

    Guided wave structural health monitoring uses sparse sensor networks embedded in sophisticated structures for defect detection and characterization. The biggest challenge of those sensor networks is developing robust techniques for reliable damage detection under changing environmental and operating conditions (EOC). To address this challenge, we develop a novelty classifier for damage detection based on one class support vector machines. We identify appropriate features for damage detection and introduce a feature aggregation method which quadratically increases the number of available training observations. We adopt a two-level voting scheme by using an ensemble of classifiers and predictions. Each classifier is trained on a different segment of the guided wave signal, and each classifier makes an ensemble of predictions based on a single observation. Using this approach, the classifier can be trained using a small number of baseline signals. We study the performance using Monte-Carlo simulations of an analytical model and data from impact damage experiments on a glass fiber composite plate. We also demonstrate the classifier performance using two types of baseline signals: fixed and rolling baseline training set. The former requires prior knowledge of baseline signals from all EOC, while the latter does not and leverages the fact that EOC vary slowly over time and can be modeled as a Gaussian process.

  17. Derivation of LDA log likelihood ratio one-to-one classifier

    NARCIS (Netherlands)

    Spreeuwers, Lieuwe Jan

    2014-01-01

    The common expression for the Likelihood Ratio classifier using LDA assumes that the reference class mean is available. In biometrics, this is often not the case and only a single sample of the reference class is available. In this paper expressions are derived for biometric comparison between

  18. Hybrid Radar Emitter Recognition Based on Rough k-Means Classifier and Relevance Vector Machine

    Science.gov (United States)

    Yang, Zhutian; Wu, Zhilu; Yin, Zhendong; Quan, Taifan; Sun, Hongjian

    2013-01-01

    Due to the increasing complexity of electromagnetic signals, there exists a significant challenge for recognizing radar emitter signals. In this paper, a hybrid recognition approach is presented that classifies radar emitter signals by exploiting the different separability of samples. The proposed approach comprises two steps, namely the primary signal recognition and the advanced signal recognition. In the former step, a novel rough k-means classifier, which comprises three regions, i.e., certain area, rough area and uncertain area, is proposed to cluster the samples of radar emitter signals. In the latter step, the samples within the rough boundary are used to train the relevance vector machine (RVM). Then RVM is used to recognize the samples in the uncertain area; therefore, the classification accuracy is improved. Simulation results show that, for recognizing radar emitter signals, the proposed hybrid recognition approach is more accurate, and presents lower computational complexity than traditional approaches. PMID:23344380

  19. Classifying Internet Pathological Users: Their Usage, Internet Sensation Seeking, and Perceptions.

    Science.gov (United States)

    Lin, Sunny S. J.

    A study was conducted to identify pathological Internet users and to reveal their psychological features and problematic usage patterns. One thousand and fifty Taiwanese undergraduates were selected. An Internet Addiction Scale was adopted to classify 648 students into 4 clusters. The 146 users in the 4th cluster, who reported significantly higher…

  20. Detecting Dutch political tweets : A classifier based on voting system using supervised learning

    NARCIS (Netherlands)

    de Mello Araújo, Eric Fernandes; Ebbelaar, Dave

    The task of classifying political tweets has been shown to be very difficult, with controversial results in many works and with non-replicable methods. Most of the works with this goal use rule-based methods to identify political tweets. We propose here two methods, being one rule-based approach,

  1. 18 CFR 3a.12 - Authority to classify official information.

    Science.gov (United States)

    2010-04-01

    ... efficient administration. (b) The authority to classify information or material originally as Top Secret is... classify information or material originally as Secret is exercised only by: (1) Officials who have Top... information or material originally as Confidential is exercised by officials who have Top Secret or Secret...

  2. Using Neural Networks to Classify Digitized Images of Galaxies

    Science.gov (United States)

    Goderya, S. N.; McGuire, P. C.

    2000-12-01

    Automated classification of Galaxies into Hubble types is of paramount importance to study the large scale structure of the Universe, particularly as survey projects like the Sloan Digital Sky Survey complete their data acquisition of one million galaxies. At present it is not possible to find robust and efficient artificial intelligence based galaxy classifiers. In this study we will summarize progress made in the development of automated galaxy classifiers using neural networks as machine learning tools. We explore the Bayesian linear algorithm, the higher order probabilistic network, the multilayer perceptron neural network and Support Vector Machine Classifier. The performance of any machine classifier is dependant on the quality of the parameters that characterize the different groups of galaxies. Our effort is to develop geometric and invariant moment based parameters as input to the machine classifiers instead of the raw pixel data. Such an approach reduces the dimensionality of the classifier considerably, and removes the effects of scaling and rotation, and makes it easier to solve for the unknown parameters in the galaxy classifier. To judge the quality of training and classification we develop the concept of Mathews coefficients for the galaxy classification community. Mathews coefficients are single numbers that quantify classifier performance even with unequal prior probabilities of the classes.

  3. Fisher classifier and its probability of error estimation

    Science.gov (United States)

    Chittineni, C. B.

    1979-01-01

    Computationally efficient expressions are derived for estimating the probability of error using the leave-one-out method. The optimal threshold for the classification of patterns projected onto Fisher's direction is derived. A simple generalization of the Fisher classifier to multiple classes is presented. Computational expressions are developed for estimating the probability of error of the multiclass Fisher classifier.

  4. Performance of classification confidence measures in dynamic classifier systems

    Czech Academy of Sciences Publication Activity Database

    Štefka, D.; Holeňa, Martin

    2013-01-01

    Roč. 23, č. 4 (2013), s. 299-319 ISSN 1210-0552 R&D Projects: GA ČR GA13-17187S Institutional support: RVO:67985807 Keywords : classifier combining * dynamic classifier systems * classification confidence Subject RIV: IN - Informatics, Computer Science Impact factor: 0.412, year: 2013

  5. 32 CFR 2400.30 - Reproduction of classified information.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Reproduction of classified information. 2400.30... SECURITY PROGRAM Safeguarding § 2400.30 Reproduction of classified information. Documents or portions of... the originator or higher authority. Any stated prohibition against reproduction shall be strictly...

  6. Classifying spaces with virtually cyclic stabilizers for linear groups

    DEFF Research Database (Denmark)

    Degrijse, Dieter Dries; Köhl, Ralf; Petrosyan, Nansen

    2015-01-01

    We show that every discrete subgroup of GL(n, ℝ) admits a finite-dimensional classifying space with virtually cyclic stabilizers. Applying our methods to SL(3, ℤ), we obtain a four-dimensional classifying space with virtually cyclic stabilizers and a decomposition of the algebraic K-theory of its...

  7. Dynamic integration of classifiers in the space of principal components

    NARCIS (Netherlands)

    Tsymbal, A.; Pechenizkiy, M.; Puuronen, S.; Patterson, D.W.; Kalinichenko, L.A.; Manthey, R.; Thalheim, B.; Wloka, U.

    2003-01-01

    Recent research has shown the integration of multiple classifiers to be one of the most important directions in machine learning and data mining. It was shown that, for an ensemble to be successful, it should consist of accurate and diverse base classifiers. However, it is also important that the

  8. Classifying risk status of non-clinical adolescents using psychometric indicators for psychosis spectrum disorders.

    Science.gov (United States)

    Fonseca-Pedrero, Eduardo; Gooding, Diane C; Ortuño-Sierra, Javier; Pflum, Madeline; Paino, Mercedes; Muñiz, José

    2016-09-30

    This study is an attempt to evaluate extant psychometric indicators using latent profile analysis for classifying community-derived individuals based on a set of clinical, behavioural, and personality traits considered risk markers for psychosis spectrum disorders. The present investigation included four hundred and forty-nine high-school students between the ages of 12 and 19. We used the following to assess risk: the Prodromal Questionnaire-Brief (PQ-B), Oviedo Schizotypy Assessment Questionnaire (ESQUIZO-Q), Anticipatory and Consummatory Interpersonal Pleasure Scale-Adolescent version (ACIPS-A), and General Health Questionnaire 12 (GHQ-12). Using Latent profile analysis six latent classes (LC) were identified: participants in class 1 (LC1) displayed little or no symptoms and accounted for 38.53% of the sample; class 2 (LC2), who accounted for 28.06%, also produced low mean scores across most measures though they expressed somewhat higher levels of subjective distress; LC3, a positive schizotypy group (10.24%); LC4 (13.36%), a psychosis high-risk group; LC5, a high positive and negative schizotypy group (4.45%); and LC6, a very high distress, severe clinical high-risk group, comprised 5.34% of the sample. The current research indicates that different latent classes of early individuals at risk can be empirically defined in adolescent community samples using psychometric indicators for psychosis spectrum disorders. These findings may have implications for early detection and prevention strategies in psychosis spectrum disorders. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  9. Classifying Taiwan Lianas with Radiating Plates of Xylem

    Directory of Open Access Journals (Sweden)

    Sheng-Zehn Yang

    2015-12-01

    Full Text Available Radiating plates of xylem are a lianas cambium variation, of which, 22 families have this feature. This study investigates 15 liana species representing nine families with radiating plates of xylem structures. The features of the transverse section and epidermis in fresh liana samples are documented, including shapes and colors of xylem and phloem, ray width and numbers, and skin morphology. Experimental results indicated that the shape of phloem fibers in Ampelopsis brevipedunculata var. hancei is gradually tapered and flame-like, which is in contrast with the other characteristics of this type, including those classified as rays. Both inner and outer cylinders of vascular bundles are found in Piper kwashoense, and the irregularly inner cylinder persists yet gradually diminishes. Red crystals are numerous in the cortex of Celastrus kusanoi. Aristolochia shimadai and A. zollingeriana develop a combination of two cambium variants, radiating plates of xylem and a lobed xylem. The shape of phloem in Stauntonia obovatifoliola is square or truncate, and its rays are numerous. Meanwhile, that of Neoalsomitra integrifolia is blunt and its rays are fewer. As for the features of a stem surface within the same family, Cyclea ochiaiana is brownish in color and has a deep vertical depression with lenticels, Pericampylus glaucus is greenish in color with a vertical shallow depression. Within the same genus, Aristolochia shimadai develops lenticels, which are not in A. zollingeriana; although the periderm developed in Clematis grata is a ring bark and tears easily, that of Clematis tamura is thick and soft.

  10. Just-in-time classifiers for recurrent concepts.

    Science.gov (United States)

    Alippi, Cesare; Boracchi, Giacomo; Roveri, Manuel

    2013-04-01

    Just-in-time (JIT) classifiers operate in evolving environments by classifying instances and reacting to concept drift. In stationary conditions, a JIT classifier improves its accuracy over time by exploiting additional supervised information coming from the field. In nonstationary conditions, however, the classifier reacts as soon as concept drift is detected; the current classification setup is discarded and a suitable one activated to keep the accuracy high. We present a novel generation of JIT classifiers able to deal with recurrent concept drift by means of a practical formalization of the concept representation and the definition of a set of operators working on such representations. The concept-drift detection activity, which is crucial in promptly reacting to changes exactly when needed, is advanced by considering change-detection tests monitoring both inputs and classes distributions.

  11. Molecular Characteristics in MRI-classified Group 1 Glioblastoma Multiforme

    Directory of Open Access Journals (Sweden)

    William E Haskins

    2013-07-01

    Full Text Available Glioblastoma multiforme (GBM is a clinically and pathologically heterogeneous brain tumor. Previous study of MRI-classified GBM has revealed a spatial relationship between Group 1 GBM (GBM1 and the subventricular zone (SVZ. The SVZ is an adult neural stem cell niche and is also suspected to be the origin of a subtype of brain tumor. The intimate contact between GBM1 and the SVZ raises the possibility that tumor cells in GBM1 may be most related to SVZ cells. In support of this notion, we found that neural stem cell and neuroblast markers are highly expressed in GBM1. Additionally, we identified molecular characteristics in this type of GBM that include up-regulation of metabolic enzymes, ribosomal proteins, heat shock proteins, and c-Myc oncoprotein. As GBM1 often recurs at great distances from the initial lesion, the rewiring of metabolism and ribosomal biogenesis may facilitate cancer cells’ growth and survival during tumor migration. Taken together, combined our findings and MRI-based classification of GBM1 would offer better prediction and treatment for this multifocal GBM.

  12. Stability of halophilic proteins: from dipeptide attributes to discrimination classifier.

    Science.gov (United States)

    Zhang, Guangya; Huihua, Ge; Yi, Lin

    2013-02-01

    To investigate the molecular features responsible for protein halophilicity is of great significance for understanding the structure basis of protein halo-stability and would help to develop a practical strategy for designing halophilic proteins. In this work, we have systematically analyzed the dipeptide composition of the halophilic and non-halophilic protein sequences. We observed the halophilic proteins contained more DA, RA, AD, RR, AP, DD, PD, EA, VG and DV at the expense of LK, IL, II, IA, KK, IS, KA, GK, RK and AI. We identified some macromolecular signatures of halo-adaptation, and thought the dipeptide composition might contain more information than amino acid composition. Based on the dipeptide composition, we have developed a machine learning method for classifying halophilic and non-halophilic proteins for the first time. The accuracy of our method for the training dataset was 100.0%, and for the 10-fold cross-validation was 93.1%. We also discussed the influence of some specific dipeptides on prediction accuracy. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. Hyperspectral image classifier based on beach spectral feature

    International Nuclear Information System (INIS)

    Liang, Zhang; Lianru, Gao; Bing, Zhang

    2014-01-01

    The seashore, especially coral bank, is sensitive to human activities and environmental changes. A multispectral image, with coarse spectral resolution, is inadaptable for identify subtle spectral distinctions between various beaches. To the contrary, hyperspectral image with narrow and consecutive channels increases our capability to retrieve minor spectral features which is suit for identification and classification of surface materials on the shore. Herein, this paper used airborne hyperspectral data, in addition to ground spectral data to study the beaches in Qingdao. The image data first went through image pretreatment to deal with the disturbance of noise, radiation inconsistence and distortion. In succession, the reflection spectrum, the derivative spectrum and the spectral absorption features of the beach surface were inspected in search of diagnostic features. Hence, spectra indices specific for the unique environment of seashore were developed. According to expert decisions based on image spectrums, the beaches are ultimately classified into sand beach, rock beach, vegetation beach, mud beach, bare land and water. In situ surveying reflection spectrum from GER1500 field spectrometer validated the classification production. In conclusion, the classification approach under expert decision based on feature spectrum is proved to be feasible for beaches

  14. Identifying Corneal Infections in Formalin-Fixed Specimens Using Next Generation Sequencing.

    Science.gov (United States)

    Li, Zhigang; Breitwieser, Florian P; Lu, Jennifer; Jun, Albert S; Asnaghi, Laura; Salzberg, Steven L; Eberhart, Charles G

    2018-01-01

    We test the ability of next-generation sequencing, combined with computational analysis, to identify a range of organisms causing infectious keratitis. This retrospective study evaluated 16 cases of infectious keratitis and four control corneas in formalin-fixed tissues from the pathology laboratory. Infectious cases also were analyzed in the microbiology laboratory using culture, polymerase chain reaction, and direct staining. Classified sequence reads were analyzed with two different metagenomics classification engines, Kraken and Centrifuge, and visualized using the Pavian software tool. Sequencing generated 20 to 46 million reads per sample. On average, 96% of the reads were classified as human, 0.3% corresponded to known vectors or contaminant sequences, 1.7% represented microbial sequences, and 2.4% could not be classified. The two computational strategies successfully identified the fungal, bacterial, and amoebal pathogens in most patients, including all four bacterial and mycobacterial cases, five of six fungal cases, three of three Acanthamoeba cases, and one of three herpetic keratitis cases. In several cases, additional potential pathogens also were identified. In one case with cytomegalovirus identified by Kraken and Centrifuge, the virus was confirmed by direct testing, while two where Staphylococcus aureus or cytomegalovirus were identified by Centrifuge but not Kraken could not be confirmed. Confirmation was not attempted for an additional three potential pathogens identified by Kraken and 11 identified by Centrifuge. Next generation sequencing combined with computational analysis can identify a wide range of pathogens in formalin-fixed corneal specimens, with potential applications in clinical diagnostics and research.

  15. Application of Machine Learning Approaches for Classifying Sitting Posture Based on Force and Acceleration Sensors

    Directory of Open Access Journals (Sweden)

    Roland Zemp

    2016-01-01

    Full Text Available Occupational musculoskeletal disorders, particularly chronic low back pain (LBP, are ubiquitous due to prolonged static sitting or nonergonomic sitting positions. Therefore, the aim of this study was to develop an instrumented chair with force and acceleration sensors to determine the accuracy of automatically identifying the user’s sitting position by applying five different machine learning methods (Support Vector Machines, Multinomial Regression, Boosting, Neural Networks, and Random Forest. Forty-one subjects were requested to sit four times in seven different prescribed sitting positions (total 1148 samples. Sixteen force sensor values and the backrest angle were used as the explanatory variables (features for the classification. The different classification methods were compared by means of a Leave-One-Out cross-validation approach. The best performance was achieved using the Random Forest classification algorithm, producing a mean classification accuracy of 90.9% for subjects with which the algorithm was not familiar. The classification accuracy varied between 81% and 98% for the seven different sitting positions. The present study showed the possibility of accurately classifying different sitting positions by means of the introduced instrumented office chair combined with machine learning analyses. The use of such novel approaches for the accurate assessment of chair usage could offer insights into the relationships between sitting position, sitting behaviour, and the occurrence of musculoskeletal disorders.

  16. Application of Machine Learning Approaches for Classifying Sitting Posture Based on Force and Acceleration Sensors.

    Science.gov (United States)

    Zemp, Roland; Tanadini, Matteo; Plüss, Stefan; Schnüriger, Karin; Singh, Navrag B; Taylor, William R; Lorenzetti, Silvio

    2016-01-01

    Occupational musculoskeletal disorders, particularly chronic low back pain (LBP), are ubiquitous due to prolonged static sitting or nonergonomic sitting positions. Therefore, the aim of this study was to develop an instrumented chair with force and acceleration sensors to determine the accuracy of automatically identifying the user's sitting position by applying five different machine learning methods (Support Vector Machines, Multinomial Regression, Boosting, Neural Networks, and Random Forest). Forty-one subjects were requested to sit four times in seven different prescribed sitting positions (total 1148 samples). Sixteen force sensor values and the backrest angle were used as the explanatory variables (features) for the classification. The different classification methods were compared by means of a Leave-One-Out cross-validation approach. The best performance was achieved using the Random Forest classification algorithm, producing a mean classification accuracy of 90.9% for subjects with which the algorithm was not familiar. The classification accuracy varied between 81% and 98% for the seven different sitting positions. The present study showed the possibility of accurately classifying different sitting positions by means of the introduced instrumented office chair combined with machine learning analyses. The use of such novel approaches for the accurate assessment of chair usage could offer insights into the relationships between sitting position, sitting behaviour, and the occurrence of musculoskeletal disorders.

  17. Effective Heart Disease Detection Based on Quantitative Computerized Traditional Chinese Medicine Using Representation Based Classifiers

    Directory of Open Access Journals (Sweden)

    Ting Shu

    2017-01-01

    Full Text Available At present, heart disease is the number one cause of death worldwide. Traditionally, heart disease is commonly detected using blood tests, electrocardiogram, cardiac computerized tomography scan, cardiac magnetic resonance imaging, and so on. However, these traditional diagnostic methods are time consuming and/or invasive. In this paper, we propose an effective noninvasive computerized method based on facial images to quantitatively detect heart disease. Specifically, facial key block color features are extracted from facial images and analyzed using the Probabilistic Collaborative Representation Based Classifier. The idea of facial key block color analysis is founded in Traditional Chinese Medicine. A new dataset consisting of 581 heart disease and 581 healthy samples was experimented by the proposed method. In order to optimize the Probabilistic Collaborative Representation Based Classifier, an analysis of its parameters was performed. According to the experimental results, the proposed method obtains the highest accuracy compared with other classifiers and is proven to be effective at heart disease detection.

  18. Feature extraction using convolutional neural network for classifying breast density in mammographic images

    Science.gov (United States)

    Thomaz, Ricardo L.; Carneiro, Pedro C.; Patrocinio, Ana C.

    2017-03-01

    Breast cancer is the leading cause of death for women in most countries. The high levels of mortality relate mostly to late diagnosis and to the direct proportionally relationship between breast density and breast cancer development. Therefore, the correct assessment of breast density is important to provide better screening for higher risk patients. However, in modern digital mammography the discrimination among breast densities is highly complex due to increased contrast and visual information for all densities. Thus, a computational system for classifying breast density might be a useful tool for aiding medical staff. Several machine-learning algorithms are already capable of classifying small number of classes with good accuracy. However, machinelearning algorithms main constraint relates to the set of features extracted and used for classification. Although well-known feature extraction techniques might provide a good set of features, it is a complex task to select an initial set during design of a classifier. Thus, we propose feature extraction using a Convolutional Neural Network (CNN) for classifying breast density by a usual machine-learning classifier. We used 307 mammographic images downsampled to 260x200 pixels to train a CNN and extract features from a deep layer. After training, the activation of 8 neurons from a deep fully connected layer are extracted and used as features. Then, these features are feedforward to a single hidden layer neural network that is cross-validated using 10-folds to classify among four classes of breast density. The global accuracy of this method is 98.4%, presenting only 1.6% of misclassification. However, the small set of samples and memory constraints required the reuse of data in both CNN and MLP-NN, therefore overfitting might have influenced the results even though we cross-validated the network. Thus, although we presented a promising method for extracting features and classifying breast density, a greater database is

  19. Class-specific Error Bounds for Ensemble Classifiers

    Energy Technology Data Exchange (ETDEWEB)

    Prenger, R; Lemmond, T; Varshney, K; Chen, B; Hanley, W

    2009-10-06

    The generalization error, or probability of misclassification, of ensemble classifiers has been shown to be bounded above by a function of the mean correlation between the constituent (i.e., base) classifiers and their average strength. This bound suggests that increasing the strength and/or decreasing the correlation of an ensemble's base classifiers may yield improved performance under the assumption of equal error costs. However, this and other existing bounds do not directly address application spaces in which error costs are inherently unequal. For applications involving binary classification, Receiver Operating Characteristic (ROC) curves, performance curves that explicitly trade off false alarms and missed detections, are often utilized to support decision making. To address performance optimization in this context, we have developed a lower bound for the entire ROC curve that can be expressed in terms of the class-specific strength and correlation of the base classifiers. We present empirical analyses demonstrating the efficacy of these bounds in predicting relative classifier performance. In addition, we specify performance regions of the ROC curve that are naturally delineated by the class-specific strengths of the base classifiers and show that each of these regions can be associated with a unique set of guidelines for performance optimization of binary classifiers within unequal error cost regimes.

  20. Building gene expression profile classifiers with a simple and efficient rejection option in R.

    Science.gov (United States)

    Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco; Savino, Alessandro; Hafeezurrehman, Hafeez

    2011-01-01

    The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional

  1. Cluster analysis of clinical data identifies fibromyalgia subgroups.

    Directory of Open Access Journals (Sweden)

    Elisa Docampo

    Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.

  2. Exploring Land Use and Land Cover of Geotagged Social-Sensing Images Using Naive Bayes Classifier

    Directory of Open Access Journals (Sweden)

    Asamaporn Sitthi

    2016-09-01

    Full Text Available Online social media crowdsourced photos contain a vast amount of visual information about the physical properties and characteristics of the earth’s surface. Flickr is an important online social media platform for users seeking this information. Each day, users generate crowdsourced geotagged digital imagery containing an immense amount of information. In this paper, geotagged Flickr images are used for automatic extraction of low-level land use/land cover (LULC features. The proposed method uses a naive Bayes classifier with color, shape, and color index descriptors. The classified images are mapped using a majority filtering approach. The classifier performance in overall accuracy, kappa coefficient, precision, recall, and f-measure was 87.94%, 82.89%, 88.20%, 87.90%, and 88%, respectively. Labeled-crowdsourced images were filtered into a spatial tile of a 30 m × 30 m resolution using the majority voting method to reduce geolocation uncertainty from the crowdsourced data. These tile datasets were used as training and validation samples to classify Landsat TM5 images. The supervised maximum likelihood method was used for the LULC classification. The results show that the geotagged Flickr images can classify LULC types with reasonable accuracy and that the proposed approach improves LULC classification efficiency if a sufficient spatial distribution of crowdsourced data exists.

  3. Comparison of Different Features and Classifiers for Driver Fatigue Detection Based on a Single EEG Channel

    Directory of Open Access Journals (Sweden)

    Jianfeng Hu

    2017-01-01

    Full Text Available Driver fatigue has become an important factor to traffic accidents worldwide, and effective detection of driver fatigue has major significance for public health. The purpose method employs entropy measures for feature extraction from a single electroencephalogram (EEG channel. Four types of entropies measures, sample entropy (SE, fuzzy entropy (FE, approximate entropy (AE, and spectral entropy (PE, were deployed for the analysis of original EEG signal and compared by ten state-of-the-art classifiers. Results indicate that optimal performance of single channel is achieved using a combination of channel CP4, feature FE, and classifier Random Forest (RF. The highest accuracy can be up to 96.6%, which has been able to meet the needs of real applications. The best combination of channel + features + classifier is subject-specific. In this work, the accuracy of FE as the feature is far greater than the Acc of other features. The accuracy using classifier RF is the best, while that of classifier SVM with linear kernel is the worst. The impact of channel selection on the Acc is larger. The performance of various channels is very different.

  4. Localization and Recognition of Dynamic Hand Gestures Based on Hierarchy of Manifold Classifiers

    Science.gov (United States)

    Favorskaya, M.; Nosov, A.; Popov, A.

    2015-05-01

    Generally, the dynamic hand gestures are captured in continuous video sequences, and a gesture recognition system ought to extract the robust features automatically. This task involves the highly challenging spatio-temporal variations of dynamic hand gestures. The proposed method is based on two-level manifold classifiers including the trajectory classifiers in any time instants and the posture classifiers of sub-gestures in selected time instants. The trajectory classifiers contain skin detector, normalized skeleton representation of one or two hands, and motion history representing by motion vectors normalized through predetermined directions (8 and 16 in our case). Each dynamic gesture is separated into a set of sub-gestures in order to predict a trajectory and remove those samples of gestures, which do not satisfy to current trajectory. The posture classifiers involve the normalized skeleton representation of palm and fingers and relative finger positions using fingertips. The min-max criterion is used for trajectory recognition, and the decision tree technique was applied for posture recognition of sub-gestures. For experiments, a dataset "Multi-modal Gesture Recognition Challenge 2013: Dataset and Results" including 393 dynamic hand-gestures was chosen. The proposed method yielded 84-91% recognition accuracy, in average, for restricted set of dynamic gestures.

  5. LOCALIZATION AND RECOGNITION OF DYNAMIC HAND GESTURES BASED ON HIERARCHY OF MANIFOLD CLASSIFIERS

    Directory of Open Access Journals (Sweden)

    M. Favorskaya

    2015-05-01

    Full Text Available Generally, the dynamic hand gestures are captured in continuous video sequences, and a gesture recognition system ought to extract the robust features automatically. This task involves the highly challenging spatio-temporal variations of dynamic hand gestures. The proposed method is based on two-level manifold classifiers including the trajectory classifiers in any time instants and the posture classifiers of sub-gestures in selected time instants. The trajectory classifiers contain skin detector, normalized skeleton representation of one or two hands, and motion history representing by motion vectors normalized through predetermined directions (8 and 16 in our case. Each dynamic gesture is separated into a set of sub-gestures in order to predict a trajectory and remove those samples of gestures, which do not satisfy to current trajectory. The posture classifiers involve the normalized skeleton representation of palm and fingers and relative finger positions using fingertips. The min-max criterion is used for trajectory recognition, and the decision tree technique was applied for posture recognition of sub-gestures. For experiments, a dataset “Multi-modal Gesture Recognition Challenge 2013: Dataset and Results” including 393 dynamic hand-gestures was chosen. The proposed method yielded 84–91% recognition accuracy, in average, for restricted set of dynamic gestures.

  6. Fault Diagnosis for Distribution Networks Using Enhanced Support Vector Machine Classifier with Classical Multidimensional Scaling

    Directory of Open Access Journals (Sweden)

    Ming-Yuan Cho

    2017-09-01

    Full Text Available In this paper, a new fault diagnosis techniques based on time domain reflectometry (TDR method with pseudo-random binary sequence (PRBS stimulus and support vector machine (SVM classifier has been investigated to recognize the different types of fault in the radial distribution feeders. This novel technique has considered the amplitude of reflected signals and the peaks of cross-correlation (CCR between the reflected and incident wave for generating fault current dataset for SVM. Furthermore, this multi-layer enhanced SVM classifier is combined with classical multidimensional scaling (CMDS feature extraction algorithm and kernel parameter optimization to increase training speed and improve overall classification accuracy. The proposed technique has been tested on a radial distribution feeder to identify ten different types of fault considering 12 input features generated by using Simulink software and MATLAB Toolbox. The success rate of SVM classifier is over 95% which demonstrates the effectiveness and the high accuracy of proposed method.

  7. Ship localization in Santa Barbara Channel using machine learning classifiers.

    Science.gov (United States)

    Niu, Haiqiang; Ozanich, Emma; Gerstoft, Peter

    2017-11-01

    Machine learning classifiers are shown to outperform conventional matched field processing for a deep water (600 m depth) ocean acoustic-based ship range estimation problem in the Santa Barbara Channel Experiment when limited environmental information is known. Recordings of three different ships of opportunity on a vertical array were used as training and test data for the feed-forward neural network and support vector machine classifiers, demonstrating the feasibility of machine learning methods to locate unseen sources. The classifiers perform well up to 10 km range whereas the conventional matched field processing fails at about 4 km range without accurate environmental information.

  8. Classifying hot water chemistry: Application of MULTIVARIATE STATISTICS

    OpenAIRE

    Sumintadireja, Prihadi; Irawan, Dasapta Erwin; Rezky, Yuanno; Gio, Prana Ugiana; Agustin, Anggita

    2016-01-01

    This file is the dataset for the following paper "Classifying hot water chemistry: Application of MULTIVARIATE STATISTICS". Authors: Prihadi Sumintadireja1, Dasapta Erwin Irawan1, Yuano Rezky2, Prana Ugiana Gio3, Anggita Agustin1

  9. Robust Combining of Disparate Classifiers Through Order Statistics

    Science.gov (United States)

    Tumer, Kagan; Ghosh, Joydeep

    2001-01-01

    Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.

  10. Using Statistical Process Control Methods to Classify Pilot Mental Workloads

    National Research Council Canada - National Science Library

    Kudo, Terence

    2001-01-01

    .... These include cardiac, ocular, respiratory, and brain activity measures. The focus of this effort is to apply statistical process control methodology on different psychophysiological features in an attempt to classify pilot mental workload...

  11. An ensemble classifier to predict track geometry degradation

    International Nuclear Information System (INIS)

    Cárdenas-Gallo, Iván; Sarmiento, Carlos A.; Morales, Gilberto A.; Bolivar, Manuel A.; Akhavan-Tabatabaei, Raha

    2017-01-01

    Railway operations are inherently complex and source of several problems. In particular, track geometry defects are one of the leading causes of train accidents in the United States. This paper presents a solution approach which entails the construction of an ensemble classifier to forecast the degradation of track geometry. Our classifier is constructed by solving the problem from three different perspectives: deterioration, regression and classification. We considered a different model from each perspective and our results show that using an ensemble method improves the predictive performance. - Highlights: • We present an ensemble classifier to forecast the degradation of track geometry. • Our classifier considers three perspectives: deterioration, regression and classification. • We construct and test three models and our results show that using an ensemble method improves the predictive performance.

  12. 6 CFR 7.23 - Emergency release of classified information.

    Science.gov (United States)

    2010-01-01

    ... Classified Information Non-disclosure Form. In emergency situations requiring immediate verbal release of... information through approved communication channels by the most secure and expeditious method possible, or by...

  13. A GIS semiautomatic tool for classifying and mapping wetland soils

    Science.gov (United States)

    Moreno-Ramón, Héctor; Marqués-Mateu, Angel; Ibáñez-Asensio, Sara

    2016-04-01

    Wetlands are one of the most productive and biodiverse ecosystems in the world. Water is the main resource and controls the relationships between agents and factors that determine the quality of the wetland. However, vegetation, wildlife and soils are also essential factors to understand these environments. It is possible that soils have been the least studied resource due to their sampling problems. This feature has caused that sometimes wetland soils have been classified broadly. The traditional methodology states that homogeneous soil units should be based on the five soil forming-factors. The problem can appear when the variation of one soil-forming factor is too small to differentiate a change in soil units, or in case that there is another factor, which is not taken into account (e.g. fluctuating water table). This is the case of Albufera of Valencia, a coastal wetland located in the middle east of the Iberian Peninsula (Spain). The saline water table fluctuates throughout the year and it generates differences in soils. To solve this problem, the objectives of this study were to establish a reliable methodology to avoid that problems, and develop a GIS tool that would allow us to define homogeneous soil units in wetlands. This step is essential for the soil scientist, who has to decide the number of soil profiles in a study. The research was conducted with data from 133 soil pits of a previous study in the wetland. In that study, soil parameters of 401 samples (organic carbon, salinity, carbonates, n-value, etc.) were analysed. In a first stage, GIS layers were generated according to depth. The method employed was Bayesian Maxim Entropy. Subsequently, it was designed a program in GIS environment that was based on the decision tree algorithms. The goal of this tool was to create a single layer, for each soil variable, according to the different diagnostic criteria of Soil Taxonomy (properties, horizons and diagnostic epipedons). At the end, the program

  14. DECISION TREE CLASSIFIERS FOR STAR/GALAXY SEPARATION

    International Nuclear Information System (INIS)

    Vasconcellos, E. C.; Ruiz, R. S. R.; De Carvalho, R. R.; Capelato, H. V.; Gal, R. R.; LaBarbera, F. L.; Frago Campos Velho, H.; Trevisan, M.

    2011-01-01

    We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: 14 ≤ r ≤ 21 (85.2%) and r ≥ 19 (82.1%). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT, and Ball et al. We find that our FT classifier is comparable to or better in completeness over the full magnitude range 15 ≤ r ≤ 21, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination (∼2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 ≤ r ≤ 21.

  15. Local-global classifier fusion for screening chest radiographs

    Science.gov (United States)

    Ding, Meng; Antani, Sameer; Jaeger, Stefan; Xue, Zhiyun; Candemir, Sema; Kohli, Marc; Thoma, George

    2017-03-01

    Tuberculosis (TB) is a severe comorbidity of HIV and chest x-ray (CXR) analysis is a necessary step in screening for the infective disease. Automatic analysis of digital CXR images for detecting pulmonary abnormalities is critical for population screening, especially in medical resource constrained developing regions. In this article, we describe steps that improve previously reported performance of NLM's CXR screening algorithms and help advance the state of the art in the field. We propose a local-global classifier fusion method where two complementary classification systems are combined. The local classifier focuses on subtle and partial presentation of the disease leveraging information in radiology reports that roughly indicates locations of the abnormalities. In addition, the global classifier models the dominant spatial structure in the gestalt image using GIST descriptor for the semantic differentiation. Finally, the two complementary classifiers are combined using linear fusion, where the weight of each decision is calculated by the confidence probabilities from the two classifiers. We evaluated our method on three datasets in terms of the area under the Receiver Operating Characteristic (ROC) curve, sensitivity, specificity and accuracy. The evaluation demonstrates the superiority of our proposed local-global fusion method over any single classifier.

  16. A cardiorespiratory classifier of voluntary and involuntary electrodermal activity

    Directory of Open Access Journals (Sweden)

    Sejdic Ervin

    2010-02-01

    Full Text Available Abstract Background Electrodermal reactions (EDRs can be attributed to many origins, including spontaneous fluctuations of electrodermal activity (EDA and stimuli such as deep inspirations, voluntary mental activity and startling events. In fields that use EDA as a measure of psychophysiological state, the fact that EDRs may be elicited from many different stimuli is often ignored. This study attempts to classify observed EDRs as voluntary (i.e., generated from intentional respiratory or mental activity or involuntary (i.e., generated from startling events or spontaneous electrodermal fluctuations. Methods Eight able-bodied participants were subjected to conditions that would cause a change in EDA: music imagery, startling noises, and deep inspirations. A user-centered cardiorespiratory classifier consisting of 1 an EDR detector, 2 a respiratory filter and 3 a cardiorespiratory filter was developed to automatically detect a participant's EDRs and to classify the origin of their stimulation as voluntary or involuntary. Results Detected EDRs were classified with a positive predictive value of 78%, a negative predictive value of 81% and an overall accuracy of 78%. Without the classifier, EDRs could only be correctly attributed as voluntary or involuntary with an accuracy of 50%. Conclusions The proposed classifier may enable investigators to form more accurate interpretations of electrodermal activity as a measure of an individual's psychophysiological state.

  17. Evaluation of Classifier Performance for Multiclass Phenotype Discrimination in Untargeted Metabolomics.

    Science.gov (United States)

    Trainor, Patrick J; DeFilippis, Andrew P; Rai, Shesh N

    2017-06-21

    Statistical classification is a critical component of utilizing metabolomics data for examining the molecular determinants of phenotypes. Despite this, a comprehensive and rigorous evaluation of the accuracy of classification techniques for phenotype discrimination given metabolomics data has not been conducted. We conducted such an evaluation using both simulated and real metabolomics datasets, comparing Partial Least Squares-Discriminant Analysis (PLS-DA), Sparse PLS-DA, Random Forests, Support Vector Machines (SVM), Artificial Neural Network, k -Nearest Neighbors ( k -NN), and Naïve Bayes classification techniques for discrimination. We evaluated the techniques on simulated data generated to mimic global untargeted metabolomics data by incorporating realistic block-wise correlation and partial correlation structures for mimicking the correlations and metabolite clustering generated by biological processes. Over the simulation studies, covariance structures, means, and effect sizes were stochastically varied to provide consistent estimates of classifier performance over a wide range of possible scenarios. The effects of the presence of non-normal error distributions, the introduction of biological and technical outliers, unbalanced phenotype allocation, missing values due to abundances below a limit of detection, and the effect of prior-significance filtering (dimension reduction) were evaluated via simulation. In each simulation, classifier parameters, such as the number of hidden nodes in a Neural Network, were optimized by cross-validation to minimize the probability of detecting spurious results due to poorly tuned classifiers. Classifier performance was then evaluated using real metabolomics datasets of varying sample medium, sample size, and experimental design. We report that in the most realistic simulation studies that incorporated non-normal error distributions, unbalanced phenotype allocation, outliers, missing values, and dimension reduction

  18. Development of multicriteria models to classify energy efficiency alternatives

    International Nuclear Information System (INIS)

    Neves, Luis Pires; Antunes, Carlos Henggeler; Dias, Luis Candido; Martins, Antonio Gomes

    2005-01-01

    This paper aims at describing a novel constructive approach to develop decision support models to classify energy efficiency initiatives, including traditional Demand-Side Management and Market Transformation initiatives, overcoming the limitations and drawbacks of Cost-Benefit Analysis. A multicriteria approach based on the ELECTRE-TRI method is used, focusing on four perspectives: - an independent Agency with the aim of promoting energy efficiency; - Distribution-only utilities under a regulated framework; - the Regulator; - Supply companies in a competitive liberalized market. These perspectives were chosen after a system analysis of the decision situation regarding the implementation of energy efficiency initiatives, looking for the main roles and power relations, with the purpose of structuring the decision problem by identifying the actors, the decision makers, the decision paradigm, and the relevant criteria. The multicriteria models developed allow considering different kinds of impacts, but avoiding difficult measurements and unit conversions due to the nature of the multicriteria method chosen. The decision is then based on all the significant effects of the initiative, both positive and negative ones, including ancillary effects often forgotten in cost-benefit analysis. The ELECTRE-TRI, as most multicriteria methods, provides to the Decision Maker the ability of controlling the relevance each impact can have on the final decision. The decision support process encompasses a robustness analysis, which, together with a good documentation of the parameters supplied into the model, should support sound decisions. The models were tested with a set of real-world initiatives and compared with possible decisions based on Cost-Benefit analysis

  19. CLASSIFYING BENIGN AND MALIGNANT MASSES USING STATISTICAL MEASURES

    Directory of Open Access Journals (Sweden)

    B. Surendiran

    2011-11-01

    Full Text Available Breast cancer is the primary and most common disease found in women which causes second highest rate of death after lung cancer. The digital mammogram is the X-ray of breast captured for the analysis, interpretation and diagnosis. According to Breast Imaging Reporting and Data System (BIRADS benign and malignant can be differentiated using its shape, size and density, which is how radiologist visualize the mammograms. According to BIRADS mass shape characteristics, benign masses tend to have round, oval, lobular in shape and malignant masses are lobular or irregular in shape. Measuring regular and irregular shapes mathematically is found to be a difficult task, since there is no single measure to differentiate various shapes. In this paper, the malignant and benign masses present in mammogram are classified using Hue, Saturation and Value (HSV weight function based statistical measures. The weight function is robust against noise and captures the degree of gray content of the pixel. The statistical measures use gray weight value instead of gray pixel value to effectively discriminate masses. The 233 mammograms from the Digital Database for Screening Mammography (DDSM benchmark dataset have been used. The PASW data mining modeler has been used for constructing Neural Network for identifying importance of statistical measures. Based on the obtained important statistical measure, the C5.0 tree has been constructed with 60-40 data split. The experimental results are found to be encouraging. Also, the results will agree to the standard specified by the American College of Radiology-BIRADS Systems.

  20. Balanced sensitivity functions for tuning multi-dimensional Bayesian network classifiers

    NARCIS (Netherlands)

    Bolt, J.H.; van der Gaag, L.C.

    Multi-dimensional Bayesian network classifiers are Bayesian networks of restricted topological structure, which are tailored to classifying data instances into multiple dimensions. Like more traditional classifiers, multi-dimensional classifiers are typically learned from data and may include

  1. Representative Vector Machines: A Unified Framework for Classical Classifiers.

    Science.gov (United States)

    Gui, Jie; Liu, Tongliang; Tao, Dacheng; Sun, Zhenan; Tan, Tieniu

    2016-08-01

    Classifier design is a fundamental problem in pattern recognition. A variety of pattern classification methods such as the nearest neighbor (NN) classifier, support vector machine (SVM), and sparse representation-based classification (SRC) have been proposed in the literature. These typical and widely used classifiers were originally developed from different theory or application motivations and they are conventionally treated as independent and specific solutions for pattern classification. This paper proposes a novel pattern classification framework, namely, representative vector machines (or RVMs for short). The basic idea of RVMs is to assign the class label of a test example according to its nearest representative vector. The contributions of RVMs are twofold. On one hand, the proposed RVMs establish a unified framework of classical classifiers because NN, SVM, and SRC can be interpreted as the special cases of RVMs with different definitions of representative vectors. Thus, the underlying relationship among a number of classical classifiers is revealed for better understanding of pattern classification. On the other hand, novel and advanced classifiers are inspired in the framework of RVMs. For example, a robust pattern classification method called discriminant vector machine (DVM) is motivated from RVMs. Given a test example, DVM first finds its k -NNs and then performs classification based on the robust M-estimator and manifold regularization. Extensive experimental evaluations on a variety of visual recognition tasks such as face recognition (Yale and face recognition grand challenge databases), object categorization (Caltech-101 dataset), and action recognition (Action Similarity LAbeliNg) demonstrate the advantages of DVM over other classifiers.

  2. Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

    Science.gov (United States)

    Sankari, E Siva; Manimegalai, D

    2017-12-21

    Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. An Improvement To The k-Nearest Neighbor Classifier For ECG Database

    Science.gov (United States)

    Jaafar, Haryati; Hidayah Ramli, Nur; Nasir, Aimi Salihah Abdul

    2018-03-01

    The k nearest neighbor (kNN) is a non-parametric classifier and has been widely used for pattern classification. However, in practice, the performance of kNN often tends to fail due to the lack of information on how the samples are distributed among them. Moreover, kNN is no longer optimal when the training samples are limited. Another problem observed in kNN is regarding the weighting issues in assigning the class label before classification. Thus, to solve these limitations, a new classifier called Mahalanobis fuzzy k-nearest centroid neighbor (MFkNCN) is proposed in this study. Here, a Mahalanobis distance is applied to avoid the imbalance of samples distribition. Then, a surrounding rule is employed to obtain the nearest centroid neighbor based on the distributions of training samples and its distance to the query point. Consequently, the fuzzy membership function is employed to assign the query point to the class label which is frequently represented by the nearest centroid neighbor Experimental studies from electrocardiogram (ECG) signal is applied in this study. The classification performances are evaluated in two experimental steps i.e. different values of k and different sizes of feature dimensions. Subsequently, a comparative study of kNN, kNCN, FkNN and MFkCNN classifier is conducted to evaluate the performances of the proposed classifier. The results show that the performance of MFkNCN consistently exceeds the kNN, kNCN and FkNN with the best classification rates of 96.5%.

  4. Neural network classifier of attacks in IP telephony

    Science.gov (United States)

    Safarik, Jakub; Voznak, Miroslav; Mehic, Miralem; Partila, Pavol; Mikulec, Martin

    2014-05-01

    Various types of monitoring mechanism allow us to detect and monitor behavior of attackers in VoIP networks. Analysis of detected malicious traffic is crucial for further investigation and hardening the network. This analysis is typically based on statistical methods and the article brings a solution based on neural network. The proposed algorithm is used as a classifier of attacks in a distributed monitoring network of independent honeypot probes. Information about attacks on these honeypots is collected on a centralized server and then classified. This classification is based on different mechanisms. One of them is based on the multilayer perceptron neural network. The article describes inner structure of used neural network and also information about implementation of this network. The learning set for this neural network is based on real attack data collected from IP telephony honeypot called Dionaea. We prepare the learning set from real attack data after collecting, cleaning and aggregation of this information. After proper learning is the neural network capable to classify 6 types of most commonly used VoIP attacks. Using neural network classifier brings more accurate attack classification in a distributed system of honeypots. With this approach is possible to detect malicious behavior in a different part of networks, which are logically or geographically divided and use the information from one network to harden security in other networks. Centralized server for distributed set of nodes serves not only as a collector and classifier of attack data, but also as a mechanism for generating a precaution steps against attacks.

  5. Maximum margin classifier working in a set of strings.

    Science.gov (United States)

    Koyano, Hitoshi; Hayashida, Morihiro; Akutsu, Tatsuya

    2016-03-01

    Numbers and numerical vectors account for a large portion of data. However, recently, the amount of string data generated has increased dramatically. Consequently, classifying string data is a common problem in many fields. The most widely used approach to this problem is to convert strings into numerical vectors using string kernels and subsequently apply a support vector machine that works in a numerical vector space. However, this non-one-to-one conversion involves a loss of information and makes it impossible to evaluate, using probability theory, the generalization error of a learning machine, considering that the given data to train and test the machine are strings generated according to probability laws. In this study, we approach this classification problem by constructing a classifier that works in a set of strings. To evaluate the generalization error of such a classifier theoretically, probability theory for strings is required. Therefore, we first extend a limit theorem for a consensus sequence of strings demonstrated by one of the authors and co-workers in a previous study. Using the obtained result, we then demonstrate that our learning machine classifies strings in an asymptotically optimal manner. Furthermore, we demonstrate the usefulness of our machine in practical data analysis by applying it to predicting protein-protein interactions using amino acid sequences and classifying RNAs by the secondary structure using nucleotide sequences.

  6. Use of information barriers to protect classified information

    International Nuclear Information System (INIS)

    MacArthur, D.; Johnson, M.W.; Nicholas, N.J.; Whiteson, R.

    1998-01-01

    This paper discusses the detailed requirements for an information barrier (IB) for use with verification systems that employ intrusive measurement technologies. The IB would protect classified information in a bilateral or multilateral inspection of classified fissile material. Such a barrier must strike a balance between providing the inspecting party the confidence necessary to accept the measurement while protecting the inspected party's classified information. The authors discuss the structure required of an IB as well as the implications of the IB on detector system maintenance. A defense-in-depth approach is proposed which would provide assurance to the inspected party that all sensitive information is protected and to the inspecting party that the measurements are being performed as expected. The barrier could include elements of physical protection (such as locks, surveillance systems, and tamper indicators), hardening of key hardware components, assurance of capabilities and limitations of hardware and software systems, administrative controls, validation and verification of the systems, and error detection and resolution. Finally, an unclassified interface could be used to display and, possibly, record measurement results. The introduction of an IB into an analysis system may result in many otherwise innocuous components (detectors, analyzers, etc.) becoming classified and unavailable for routine maintenance by uncleared personnel. System maintenance and updating will be significantly simplified if the classification status of as many components as possible can be made reversible (i.e. the component can become unclassified following the removal of classified objects)

  7. Detection of microaneurysms in retinal images using an ensemble classifier

    Directory of Open Access Journals (Sweden)

    M.M. Habib

    2017-01-01

    Full Text Available This paper introduces, and reports on the performance of, a novel combination of algorithms for automated microaneurysm (MA detection in retinal images. The presence of MAs in retinal images is a pathognomonic sign of Diabetic Retinopathy (DR which is one of the leading causes of blindness amongst the working age population. An extensive survey of the literature is presented and current techniques in the field are summarised. The proposed technique first detects an initial set of candidates using a Gaussian Matched Filter and then classifies this set to reduce the number of false positives. A Tree Ensemble classifier is used with a set of 70 features (the most commons features in the literature. A new set of 32 MA groundtruth images (with a total of 256 labelled MAs based on images from the MESSIDOR dataset is introduced as a public dataset for benchmarking MA detection algorithms. We evaluate our algorithm on this dataset as well as another public dataset (DIARETDB1 v2.1 and compare it against the best available alternative. Results show that the proposed classifier is superior in terms of eliminating false positive MA detection from the initial set of candidates. The proposed method achieves an ROC score of 0.415 compared to 0.2636 achieved by the best available technique. Furthermore, results show that the classifier model maintains consistent performance across datasets, illustrating the generalisability of the classifier and that overfitting does not occur.

  8. Generalization in the XCSF classifier system: analysis, improvement, and extension.

    Science.gov (United States)

    Lanzi, Pier Luca; Loiacono, Daniele; Wilson, Stewart W; Goldberg, David E

    2007-01-01

    We analyze generalization in XCSF and introduce three improvements. We begin by showing that the types of generalizations evolved by XCSF can be influenced by the input range. To explain these results we present a theoretical analysis of the convergence of classifier weights in XCSF which highlights a broader issue. In XCSF, because of the mathematical properties of the Widrow-Hoff update, the convergence of classifier weights in a given subspace can be slow when the spread of the eigenvalues of the autocorrelation matrix associated with each classifier is large. As a major consequence, the system's accuracy pressure may act before classifier weights are adequately updated, so that XCSF may evolve piecewise constant approximations, instead of the intended, and more efficient, piecewise linear ones. We propose three different ways to update classifier weights in XCSF so as to increase the generalization capabilities of XCSF: one based on a condition-based normalization of the inputs, one based on linear least squares, and one based on the recursive version of linear least squares. Through a series of experiments we show that while all three approaches significantly improve XCSF, least squares approaches appear to be best performing and most robust. Finally we show how XCSF can be extended to include polynomial approximations.

  9. Dynamic cluster generation for a fuzzy classifier with ellipsoidal regions.

    Science.gov (United States)

    Abe, S

    1998-01-01

    In this paper, we discuss a fuzzy classifier with ellipsoidal regions that dynamically generates clusters. First, for the data belonging to a class we define a fuzzy rule with an ellipsoidal region. Namely, using the training data for each class, we calculate the center and the covariance matrix of the ellipsoidal region for the class. Then we tune the fuzzy rules, i.e., the slopes of the membership functions, successively until there is no improvement in the recognition rate of the training data. Then if the number of the data belonging to a class that are misclassified into another class exceeds a prescribed number, we define a new cluster to which those data belong and the associated fuzzy rule. Then we tune the newly defined fuzzy rules in the similar way as stated above, fixing the already obtained fuzzy rules. We iterate generation of clusters and tuning of the newly generated fuzzy rules until the number of the data belonging to a class that are misclassified into another class does not exceed the prescribed number. We evaluate our method using thyroid data, Japanese Hiragana data of vehicle license plates, and blood cell data. By dynamic cluster generation, the generalization ability of the classifier is improved and the recognition rate of the fuzzy classifier for the test data is the best among the neural network classifiers and other fuzzy classifiers if there are no discrete input variables.

  10. SpectraClassifier 1.0: a user friendly, automated MRS-based classifier-development system

    Directory of Open Access Journals (Sweden)

    Julià-Sapé Margarida

    2010-02-01

    Full Text Available Abstract Background SpectraClassifier (SC is a Java solution for designing and implementing Magnetic Resonance Spectroscopy (MRS-based classifiers. The main goal of SC is to allow users with minimum background knowledge of multivariate statistics to perform a fully automated pattern recognition analysis. SC incorporates feature selection (greedy stepwise approach, either forward or backward, and feature extraction (PCA. Fisher Linear Discriminant Analysis is the method of choice for classification. Classifier evaluation is performed through various methods: display of the confusion matrix of the training and testing datasets; K-fold cross-validation, leave-one-out and bootstrapping as well as Receiver Operating Characteristic (ROC curves. Results SC is composed of the following modules: Classifier design, Data exploration, Data visualisation, Classifier evaluation, Reports, and Classifier history. It is able to read low resolution in-vivo MRS (single-voxel and multi-voxel and high resolution tissue MRS (HRMAS, processed with existing tools (jMRUI, INTERPRET, 3DiCSI or TopSpin. In addition, to facilitate exchanging data between applications, a standard format capable of storing all the information needed for a dataset was developed. Each functionality of SC has been specifically validated with real data with the purpose of bug-testing and methods validation. Data from the INTERPRET project was used. Conclusions SC is a user-friendly software designed to fulfil the needs of potential users in the MRS community. It accepts all kinds of pre-processed MRS data types and classifies them semi-automatically, allowing spectroscopists to concentrate on interpretation of results with the use of its visualisation tools.

  11. Classifying the mechanisms of electrochemical shock in ion-intercalation materials

    OpenAIRE

    Woodford, William; Carter, W. Craig; Chiang, Yet-Ming

    2014-01-01

    “Electrochemical shock” – the electrochemical cycling-induced fracture of materials – contributes to impedance growth and performance degradation in ion-intercalation batteries, such as lithium-ion. Using a combination of micromechanical models and acoustic emission experiments, the mechanisms of electrochemical shock are identified, classified, and modeled in targeted model systems with different composition and microstructure. A particular emphasis is placed on mechanical degradation occurr...

  12. A History of Classified Activities at Oak Ridge National Laboratory

    Energy Technology Data Exchange (ETDEWEB)

    Quist, A.S.

    2001-01-30

    The facilities that became Oak Ridge National Laboratory (ORNL) were created in 1943 during the United States' super-secret World War II project to construct an atomic bomb (the Manhattan Project). During World War II and for several years thereafter, essentially all ORNL activities were classified. Now, in 2000, essentially all ORNL activities are unclassified. The major purpose of this report is to provide a brief history of ORNL's major classified activities from 1943 until the present (September 2000). This report is expected to be useful to the ORNL Classification Officer and to ORNL's Authorized Derivative Classifiers and Authorized Derivative Declassifiers in their classification review of ORNL documents, especially those documents that date from the 1940s and 1950s.

  13. COMPARISON OF SVM AND FUZZY CLASSIFIER FOR AN INDIAN SCRIPT

    Directory of Open Access Journals (Sweden)

    M. J. Baheti

    2012-01-01

    Full Text Available With the advent of technological era, conversion of scanned document (handwritten or printed into machine editable format has attracted many researchers. This paper deals with the problem of recognition of Gujarati handwritten numerals. Gujarati numeral recognition requires performing some specific steps as a part of preprocessing. For preprocessing digitization, segmentation, normalization and thinning are done with considering that the image have almost no noise. Further affine invariant moments based model is used for feature extraction and finally Support Vector Machine (SVM and Fuzzy classifiers are used for numeral classification. . The comparison of SVM and Fuzzy classifier is made and it can be seen that SVM procured better results as compared to Fuzzy Classifier.

  14. Optimal threshold estimation for binary classifiers using game theory.

    Science.gov (United States)

    Sanchez, Ignacio Enrique

    2016-01-01

    Many bioinformatics algorithms can be understood as binary classifiers. They are usually compared using the area under the receiver operating characteristic ( ROC ) curve. On the other hand, choosing the best threshold for practical use is a complex task, due to uncertain and context-dependent skews in the abundance of positives in nature and in the yields/costs for correct/incorrect classification. We argue that considering a classifier as a player in a zero-sum game allows us to use the minimax principle from game theory to determine the optimal operating point. The proposed classifier threshold corresponds to the intersection between the ROC curve and the descending diagonal in ROC space and yields a minimax accuracy of 1-FPR. Our proposal can be readily implemented in practice, and reveals that the empirical condition for threshold estimation of "specificity equals sensitivity" maximizes robustness against uncertainties in the abundance of positives in nature and classification costs.

  15. Statistical text classifier to detect specific type of medical incidents.

    Science.gov (United States)

    Wong, Zoie Shui-Yee; Akiyama, Masanori

    2013-01-01

    WHO Patient Safety has put focus to increase the coherence and expressiveness of patient safety classification with the foundation of International Classification for Patient Safety (ICPS). Text classification and statistical approaches has showed to be successful to identifysafety problems in the Aviation industryusing incident text information. It has been challenging to comprehend the taxonomy of medical incidents in a structured manner. Independent reporting mechanisms for patient safety incidents have been established in the UK, Canada, Australia, Japan, Hong Kong etc. This research demonstrates the potential to construct statistical text classifiers to detect specific type of medical incidents using incident text data. An illustrative example for classifying look-alike sound-alike (LASA) medication incidents using structured text from 227 advisories related to medication errors from Global Patient Safety Alerts (GPSA) is shown in this poster presentation. The classifier was built using logistic regression model. ROC curve and the AUC value indicated that this is a satisfactory good model.

  16. A Topic Model Approach to Representing and Classifying Football Plays

    KAUST Repository

    Varadarajan, Jagannadan

    2013-09-09

    We address the problem of modeling and classifying American Football offense teams’ plays in video, a challenging example of group activity analysis. Automatic play classification will allow coaches to infer patterns and tendencies of opponents more ef- ficiently, resulting in better strategy planning in a game. We define a football play as a unique combination of player trajectories. To this end, we develop a framework that uses player trajectories as inputs to MedLDA, a supervised topic model. The joint maximiza- tion of both likelihood and inter-class margins of MedLDA in learning the topics allows us to learn semantically meaningful play type templates, as well as, classify different play types with 70% average accuracy. Furthermore, this method is extended to analyze individual player roles in classifying each play type. We validate our method on a large dataset comprising 271 play clips from real-world football games, which will be made publicly available for future comparisons.

  17. Defending Malicious Script Attacks Using Machine Learning Classifiers

    Directory of Open Access Journals (Sweden)

    Nayeem Khan

    2017-01-01

    Full Text Available The web application has become a primary target for cyber criminals by injecting malware especially JavaScript to perform malicious activities for impersonation. Thus, it becomes an imperative to detect such malicious code in real time before any malicious activity is performed. This study proposes an efficient method of detecting previously unknown malicious java scripts using an interceptor at the client side by classifying the key features of the malicious code. Feature subset was obtained by using wrapper method for dimensionality reduction. Supervised machine learning classifiers were used on the dataset for achieving high accuracy. Experimental results show that our method can efficiently classify malicious code from benign code with promising results.

  18. A bench-top hyperspectral imaging system to classify beef from Nellore cattle based on tenderness

    Science.gov (United States)

    Nubiato, Keni Eduardo Zanoni; Mazon, Madeline Rezende; Antonelo, Daniel Silva; Calkins, Chris R.; Naganathan, Govindarajan Konda; Subbiah, Jeyamkondan; da Luz e Silva, Saulo

    2018-03-01

    The aim of this study was to evaluate the accuracy of classification of Nellore beef aged for 0, 7, 14, or 21 days and classification based on tenderness and aging period using a bench-top hyperspectral imaging system. A hyperspectral imaging system (λ = 928-2524 nm) was used to collect hyperspectral images of the Longissimus thoracis et lumborum (aging n = 376 and tenderness n = 345) of Nellore cattle. The image processing steps included selection of region of interest, extraction of spectra, and indentification and evalution of selected wavelengths for classification. Six linear discriminant models were developed to classify samples based on tenderness and aging period. The model using the first derivative of partial absorbance spectra (give wavelength range spectra) was able to classify steaks based on the tenderness with an overall accuracy of 89.8%. The model using the first derivative of full absorbance spectra was able to classify steaks based on aging period with an overall accuracy of 84.8%. The results demonstrate that the HIS may be a viable technology for classifying beef based on tenderness and aging period.

  19. Clean-up progress at the SNL/NM Classified Waste Landfill

    International Nuclear Information System (INIS)

    Slavin, P.J.; Galloway, R.B.

    1999-01-01

    The Sandia National Laboratories/New Mexico (SNL/NM)Environmental Restoration Project is currently excavating the Classified Waste Landfill in Technical Area II, a disposal area for weapon components for approximately 40 years until it closed in 1987. Many different types of classified parts were disposed in unlined trenches and pits throughout the course of the landfill's history. A percentage of the parts contain explosives and/or radioactive components or contamination. The excavation has progressed backward chronologically from the last trenches filled through to the earlier pits. Excavation commenced in March 1998, and approximately 75 percent of the site (as defined by geophysical anomalies) has been completed as of November 1999. The material excavated consists primarily of classified weapon assemblies and related components, so disposition must include demilitarization and sanitization. This has resulted in substantial waste minimization and cost avoidance for the project as upwards of 90 percent of the classified materials are being demilitarized and recycled. The project is using field screening and lab analysis in conjunction with preliminary and in-process risk assessments to characterize soil and make waste determinations in a timely a fashion as possible. Challenges in waste management have prompted the adoption of innovative solutions. The hand-picked crew (both management and field staff) and the ability to quickly adapt to changing conditions has ensured the success of the project. The current schedule is to complete excavation in July 2000, with follow-on verification sampling, demilitarization, and waste management activities following

  20. Classifying galaxy spectra at 0.5 < z < 1 with self-organizing maps

    Science.gov (United States)

    Rahmani, S.; Teimoorinia, H.; Barmby, P.

    2018-05-01

    The spectrum of a galaxy contains information about its physical properties. Classifying spectra using templates helps elucidate the nature of a galaxy's energy sources. In this paper, we investigate the use of self-organizing maps in classifying galaxy spectra against templates. We trained semi-supervised self-organizing map networks using a set of templates covering the wavelength range from far ultraviolet to near infrared. The trained networks were used to classify the spectra of a sample of 142 galaxies with 0.5 K-means clustering, a supervised neural network, and chi-squared minimization. Spectra corresponding to quiescent galaxies were more likely to be classified similarly by all methods while starburst spectra showed more variability. Compared to classification using chi-squared minimization or the supervised neural network, the galaxies classed together by the self-organizing map had more similar spectra. The class ordering provided by the one-dimensional self-organizing maps corresponds to an ordering in physical properties, a potentially important feature for the exploration of large datasets.

  1. Classification Identification of Acoustic Emission Signals from Underground Metal Mine Rock by ICIMF Classifier

    Directory of Open Access Journals (Sweden)

    Hongyan Zuo

    2014-01-01

    Full Text Available To overcome the drawback that fuzzy classifier was sensitive to noises and outliers, Mamdani fuzzy classifier based on improved chaos immune algorithm was developed, in which bilateral Gaussian membership function parameters were set as constraint conditions and the indexes of fuzzy classification effectiveness and number of correct samples of fuzzy classification as the subgoal of fitness function. Moreover, Iris database was used for simulation experiment, classification, and recognition of acoustic emission signals and interference signals from stope wall rock of underground metal mines. The results showed that Mamdani fuzzy classifier based on improved chaos immune algorithm could effectively improve the prediction accuracy of classification of data sets with noises and outliers and the classification accuracy of acoustic emission signal and interference signal from stope wall rock of underground metal mines was 90.00%. It was obvious that the improved chaos immune Mamdani fuzzy (ICIMF classifier was useful for accurate diagnosis of acoustic emission signal and interference signal from stope wall rock of underground metal mines.

  2. Silicon nanowire arrays as learning chemical vapour classifiers

    International Nuclear Information System (INIS)

    Niskanen, A O; Colli, A; White, R; Li, H W; Spigone, E; Kivioja, J M

    2011-01-01

    Nanowire field-effect transistors are a promising class of devices for various sensing applications. Apart from detecting individual chemical or biological analytes, it is especially interesting to use multiple selective sensors to look at their collective response in order to perform classification into predetermined categories. We show that non-functionalised silicon nanowire arrays can be used to robustly classify different chemical vapours using simple statistical machine learning methods. We were able to distinguish between acetone, ethanol and water with 100% accuracy while methanol, ethanol and 2-propanol were classified with 96% accuracy in ambient conditions.

  3. Finger vein identification using fuzzy-based k-nearest centroid neighbor classifier

    Science.gov (United States)

    Rosdi, Bakhtiar Affendi; Jaafar, Haryati; Ramli, Dzati Athiar

    2015-02-01

    In this paper, a new approach for personal identification using finger vein image is presented. Finger vein is an emerging type of biometrics that attracts attention of researchers in biometrics area. As compared to other biometric traits such as face, fingerprint and iris, finger vein is more secured and hard to counterfeit since the features are inside the human body. So far, most of the researchers focus on how to extract robust features from the captured vein images. Not much research was conducted on the classification of the extracted features. In this paper, a new classifier called fuzzy-based k-nearest centroid neighbor (FkNCN) is applied to classify the finger vein image. The proposed FkNCN employs a surrounding rule to obtain the k-nearest centroid neighbors based on the spatial distributions of the training images and their distance to the test image. Then, the fuzzy membership function is utilized to assign the test image to the class which is frequently represented by the k-nearest centroid neighbors. Experimental evaluation using our own database which was collected from 492 fingers shows that the proposed FkNCN has better performance than the k-nearest neighbor, k-nearest-centroid neighbor and fuzzy-based-k-nearest neighbor classifiers. This shows that the proposed classifier is able to identify the finger vein image effectively.

  4. An integrated multi-label classifier with chemical-chemical interactions for prediction of chemical toxicity effects.

    Science.gov (United States)

    Liu, Tao; Chen, Lei; Pan, Xiaoyong

    2018-05-31

    Chemical toxicity effect is one of the major reasons for declining candidate drugs. Detecting the toxicity effects of all chemicals can accelerate the procedures of drug discovery. However, it is time-consuming and expensive to identify the toxicity effects of a given chemical through traditional experiments. Designing quick, reliable and non-animal-involved computational methods is an alternative way. In this study, a novel integrated multi-label classifier was proposed. First, based on five types of chemical-chemical interactions retrieved from STITCH, each of which is derived from one aspect of chemicals, five individual classifiers were built. Then, several integrated classifiers were built by integrating some or all individual classifiers. By testing the integrated classifiers on a dataset with chemicals and their toxicity effects in Accelrys Toxicity database and non-toxic chemicals with their performance evaluated by jackknife test, an optimal integrated classifier was selected as the proposed classifier, which provided quite high prediction accuracies and wide applications. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  5. Classification of Multiple Chinese Liquors by Means of a QCM-based E-Nose and MDS-SVM Classifier.

    Science.gov (United States)

    Li, Qiang; Gu, Yu; Jia, Jing

    2017-01-30

    Chinese liquors are internationally well-known fermentative alcoholic beverages. They have unique flavors attributable to the use of various bacteria and fungi, raw materials, and production processes. Developing a novel, rapid, and reliable method to identify multiple Chinese liquors is of positive significance. This paper presents a pattern recognition system for classifying ten brands of Chinese liquors based on multidimensional scaling (MDS) and support vector machine (SVM) algorithms in a quartz crystal microbalance (QCM)-based electronic nose (e-nose) we designed. We evaluated the comprehensive performance of the MDS-SVM classifier that predicted all ten brands of Chinese liquors individually. The prediction accuracy (98.3%) showed superior performance of the MDS-SVM classifier over the back-propagation artificial neural network (BP-ANN) classifier (93.3%) and moving average-linear discriminant analysis (MA-LDA) classifier (87.6%). The MDS-SVM classifier has reasonable reliability, good fitting and prediction (generalization) performance in classification of the Chinese liquors. Taking both application of the e-nose and validation of the MDS-SVM classifier into account, we have thus created a useful method for the classification of multiple Chinese liquors.

  6. Classification of Multiple Chinese Liquors by Means of a QCM-based E-Nose and MDS-SVM Classifier

    Directory of Open Access Journals (Sweden)

    Qiang Li

    2017-01-01

    Full Text Available Chinese liquors are internationally well-known fermentative alcoholic beverages. They have unique flavors attributable to the use of various bacteria and fungi, raw materials, and production processes. Developing a novel, rapid, and reliable method to identify multiple Chinese liquors is of positive significance. This paper presents a pattern recognition system for classifying ten brands of Chinese liquors based on multidimensional scaling (MDS and support vector machine (SVM algorithms in a quartz crystal microbalance (QCM-based electronic nose (e-nose we designed. We evaluated the comprehensive performance of the MDS-SVM classifier that predicted all ten brands of Chinese liquors individually. The prediction accuracy (98.3% showed superior performance of the MDS-SVM classifier over the back-propagation artificial neural network (BP-ANN classifier (93.3% and moving average-linear discriminant analysis (MA-LDA classifier (87.6%. The MDS-SVM classifier has reasonable reliability, good fitting and prediction (generalization performance in classification of the Chinese liquors. Taking both application of the e-nose and validation of the MDS-SVM classifier into account, we have thus created a useful method for the classification of multiple Chinese liquors.

  7. Genome Wide Association Study to predict severe asthma exacerbations in children using random forests classifiers

    Directory of Open Access Journals (Sweden)

    Litonjua Augusto A

    2011-06-01

    Full Text Available Abstract Background Personalized health-care promises tailored health-care solutions to individual patients based on their genetic background and/or environmental exposure history. To date, disease prediction has been based on a few environmental factors and/or single nucleotide polymorphisms (SNPs, while complex diseases are usually affected by many genetic and environmental factors with each factor contributing a small portion to the outcome. We hypothesized that the use of random forests classifiers to select SNPs would result in an improved predictive model of asthma exacerbations. We tested this hypothesis in a population of childhood asthmatics. Methods In this study, using emergency room visits or hospitalizations as the definition of a severe asthma exacerbation, we first identified a list of top Genome Wide Association Study (GWAS SNPs ranked by Random Forests (RF importance score for the CAMP (Childhood Asthma Management Program population of 127 exacerbation cases and 290 non-exacerbation controls. We predict severe asthma exacerbations using the top 10 to 320 SNPs together with age, sex, pre-bronchodilator FEV1 percentage predicted, and treatment group. Results Testing in an independent set of the CAMP population shows that severe asthma exacerbations can be predicted with an Area Under the Curve (AUC = 0.66 with 160-320 SNPs in comparison to an AUC score of 0.57 with 10 SNPs. Using the clinical traits alone yielded AUC score of 0.54, suggesting the phenotype is affected by genetic as well as environmental factors. Conclusions Our study shows that a random forests algorithm can effectively extract and use the information contained in a small number of samples. Random forests, and other machine learning tools, can be used with GWAS studies to integrate large numbers of predictors simultaneously.

  8. Deep-Learning Convolutional Neural Networks Accurately Classify Genetic Mutations in Gliomas.

    Science.gov (United States)

    Chang, P; Grinband, J; Weinberg, B D; Bardis, M; Khy, M; Cadena, G; Su, M-Y; Cha, S; Filippi, C G; Bota, D; Baldi, P; Poisson, L M; Jain, R; Chow, D

    2018-05-10

    The World Health Organization has recently placed new emphasis on the integration of genetic information for gliomas. While tissue sampling remains the criterion standard, noninvasive imaging techniques may provide complimentary insight into clinically relevant genetic mutations. Our aim was to train a convolutional neural network to independently predict underlying molecular genetic mutation status in gliomas with high accuracy and identify the most predictive imaging features for each mutation. MR imaging data and molecular information were retrospectively obtained from The Cancer Imaging Archives for 259 patients with either low- or high-grade gliomas. A convolutional neural network was trained to classify isocitrate dehydrogenase 1 ( IDH1 ) mutation status, 1p/19q codeletion, and O6-methylguanine-DNA methyltransferase ( MGMT ) promotor methylation status. Principal component analysis of the final convolutional neural network layer was used to extract the key imaging features critical for successful classification. Classification had high accuracy: IDH1 mutation status, 94%; 1p/19q codeletion, 92%; and MGMT promotor methylation status, 83%. Each genetic category was also associated with distinctive imaging features such as definition of tumor margins, T1 and FLAIR suppression, extent of edema, extent of necrosis, and textural features. Our results indicate that for The Cancer Imaging Archives dataset, machine-learning approaches allow classification of individual genetic mutations of both low- and high-grade gliomas. We show that relevant MR imaging features acquired from an added dimensionality-reduction technique demonstrate that neural networks are capable of learning key imaging components without prior feature selection or human-directed training. © 2018 by American Journal of Neuroradiology.

  9. Classifying vulnerability to sleep deprivation using baseline measures of psychomotor vigilance.

    Science.gov (United States)

    Patanaik, Amiya; Kwoh, Chee Keong; Chua, Eric C P; Gooley, Joshua J; Chee, Michael W L

    2015-05-01

    To identify measures derived from baseline psychomotor vigilance task (PVT) performance that can reliably predict vulnerability to sleep deprivation. Subjects underwent total sleep deprivation and completed a 10-min PVT every 1-2 h in a controlled laboratory setting. Participants were categorized as vulnerable or resistant to sleep deprivation, based on a median split of lapses that occurred following sleep deprivation. Standard reaction time, drift diffusion model (DDM), and wavelet metrics were derived from PVT response times collected at baseline. A support vector machine model that incorporated maximum relevance and minimum redundancy feature selection and wrapper-based heuristics was used to classify subjects as vulnerable or resistant using rested data. Two academic sleep laboratories. Independent samples of 135 (69 women, age 18 to 25 y), and 45 (3 women, age 22 to 32 y) healthy adults. In both datasets, DDM measures, number of consecutive reaction times that differ by more than 250 ms, and two wavelet features were selected by the model as features predictive of vulnerability to sleep deprivation. Using the best set of features selected in each dataset, classification accuracy was 77% and 82% using fivefold stratified cross-validation, respectively. In both datasets, DDM measures, number of consecutive reaction times that differ by more than 250 ms, and two wavelet features were selected by the model as features predictive of vulnerability to sleep deprivation. Using the best set of features selected in each dataset, classification accuracy was 77% and 82% using fivefold stratified cross-validation, respectively. Despite differences in experimental conditions across studies, drift diffusion model parameters associated reliably with individual differences in performance during total sleep deprivation. These results demonstrate the utility of drift diffusion modeling of baseline performance in estimating vulnerability to psychomotor vigilance decline

  10. 18 CFR 367.18 - Criteria for classifying leases.

    Science.gov (United States)

    2010-04-01

    ... the lessee) must not give rise to a new classification of a lease for accounting purposes. ... classifying the lease. (4) The present value at the beginning of the lease term of the minimum lease payments... taxes to be paid by the lessor, including any related profit, equals or exceeds 90 percent of the excess...

  11. Discrimination-Aware Classifiers for Student Performance Prediction

    Science.gov (United States)

    Luo, Ling; Koprinska, Irena; Liu, Wei

    2015-01-01

    In this paper we consider discrimination-aware classification of educational data. Mining and using rules that distinguish groups of students based on sensitive attributes such as gender and nationality may lead to discrimination. It is desirable to keep the sensitive attributes during the training of a classifier to avoid information loss but…

  12. 29 CFR 1910.307 - Hazardous (classified) locations.

    Science.gov (United States)

    2010-07-01

    ... equipment at the location. (c) Electrical installations. Equipment, wiring methods, and installations of... covers the requirements for electric equipment and wiring in locations that are classified depending on... provisions of this section. (4) Division and zone classification. In Class I locations, an installation must...

  13. 29 CFR 1926.407 - Hazardous (classified) locations.

    Science.gov (United States)

    2010-07-01

    ...) locations, unless modified by provisions of this section. (b) Electrical installations. Equipment, wiring..., DEPARTMENT OF LABOR (CONTINUED) SAFETY AND HEALTH REGULATIONS FOR CONSTRUCTION Electrical Installation Safety... electric equipment and wiring in locations which are classified depending on the properties of the...

  14. 18 CFR 3a.71 - Accountability for classified material.

    Science.gov (United States)

    2010-04-01

    ... numbers assigned to top secret material will be separate from the sequence for other classified material... central control registry in calendar year 1969. TS 1006—Sixth Top Secret document controlled by the... control registry when the document is transferred. (e) For Top Secret documents only, an access register...

  15. Classifier fusion for VoIP attacks classification

    Science.gov (United States)

    Safarik, Jakub; Rezac, Filip

    2017-05-01

    SIP is one of the most successful protocols in the field of IP telephony communication. It establishes and manages VoIP calls. As the number of SIP implementation rises, we can expect a higher number of attacks on the communication system in the near future. This work aims at malicious SIP traffic classification. A number of various machine learning algorithms have been developed for attack classification. The paper presents a comparison of current research and the use of classifier fusion method leading to a potential decrease in classification error rate. Use of classifier combination makes a more robust solution without difficulties that may affect single algorithms. Different voting schemes, combination rules, and classifiers are discussed to improve the overall performance. All classifiers have been trained on real malicious traffic. The concept of traffic monitoring depends on the network of honeypot nodes. These honeypots run in several networks spread in different locations. Separation of honeypots allows us to gain an independent and trustworthy attack information.

  16. Bayesian Classifier for Medical Data from Doppler Unit

    Directory of Open Access Journals (Sweden)

    J. Málek

    2006-01-01

    Full Text Available Nowadays, hand-held ultrasonic Doppler units (probes are often used for noninvasive screening of atherosclerosis in the arteries of the lower limbs. The mean velocity of blood flow in time and blood pressures are measured on several positions on each lower limb. By listening to the acoustic signal generated by the device or by reading the signal displayed on screen, a specialist can detect peripheral arterial disease (PAD.This project aims to design software that will be able to analyze data from such a device and classify it into several diagnostic classes. At the Department of Functional Diagnostics at the Regional Hospital in Liberec a database of several hundreds signals was collected. In cooperation with the specialist, the signals were manually classified into four classes. For each class, selected signal features were extracted and then used for training a Bayesian classifier. Another set of signals was used for evaluating and optimizing the parameters of the classifier. Slightly above 84 % of successfully recognized diagnostic states, was recently achieved on the test data. 

  17. Diagnosis of Broiler Livers by Classifying Image Patches

    DEFF Research Database (Denmark)

    Jørgensen, Anders; Fagertun, Jens; Moeslund, Thomas B.

    2017-01-01

    The manual health inspection are becoming the bottleneck at poultry processing plants. We present a computer vision method for automatic diagnosis of broiler livers. The non-rigid livers, of varying shape and sizes, are classified in patches by a convolutional neural network, outputting maps...

  18. Support vector machines classifiers of physical activities in preschoolers

    Science.gov (United States)

    The goal of this study is to develop, test, and compare multinomial logistic regression (MLR) and support vector machines (SVM) in classifying preschool-aged children physical activity data acquired from an accelerometer. In this study, 69 children aged 3-5 years old were asked to participate in a s...

  19. A Linguistic Image of Nature: The Burmese Numerative Classifier System

    Science.gov (United States)

    Becker, Alton L.

    1975-01-01

    The Burmese classifier system is coherent because it is based upon a single elementary semantic dimension: deixis. On that dimension, four distances are distinguished, distances which metaphorically substitute for other conceptual relations between people and other living beings, people and things, and people and concepts. (Author/RM)

  20. Data Stream Classification Based on the Gamma Classifier

    Directory of Open Access Journals (Sweden)

    Abril Valeria Uriarte-Arcia

    2015-01-01

    Full Text Available The ever increasing data generation confronts us with the problem of handling online massive amounts of information. One of the biggest challenges is how to extract valuable information from these massive continuous data streams during single scanning. In a data stream context, data arrive continuously at high speed; therefore the algorithms developed to address this context must be efficient regarding memory and time management and capable of detecting changes over time in the underlying distribution that generated the data. This work describes a novel method for the task of pattern classification over a continuous data stream based on an associative model. The proposed method is based on the Gamma classifier, which is inspired by the Alpha-Beta associative memories, which are both supervised pattern recognition models. The proposed method is capable of handling the space and time constrain inherent to data stream scenarios. The Data Streaming Gamma classifier (DS-Gamma classifier implements a sliding window approach to provide concept drift detection and a forgetting mechanism. In order to test the classifier, several experiments were performed using different data stream scenarios with real and synthetic data streams. The experimental results show that the method exhibits competitive performance when compared to other state-of-the-art algorithms.

  1. Building an automated SOAP classifier for emergency department reports.

    Science.gov (United States)

    Mowery, Danielle; Wiebe, Janyce; Visweswaran, Shyam; Harkema, Henk; Chapman, Wendy W

    2012-02-01

    Information extraction applications that extract structured event and entity information from unstructured text can leverage knowledge of clinical report structure to improve performance. The Subjective, Objective, Assessment, Plan (SOAP) framework, used to structure progress notes to facilitate problem-specific, clinical decision making by physicians, is one example of a well-known, canonical structure in the medical domain. Although its applicability to structuring data is understood, its contribution to information extraction tasks has not yet been determined. The first step to evaluating the SOAP framework's usefulness for clinical information extraction is to apply the model to clinical narratives and develop an automated SOAP classifier that classifies sentences from clinical reports. In this quantitative study, we applied the SOAP framework to sentences from emergency department reports, and trained and evaluated SOAP classifiers built with various linguistic features. We found the SOAP framework can be applied manually to emergency department reports with high agreement (Cohen's kappa coefficients over 0.70). Using a variety of features, we found classifiers for each SOAP class can be created with moderate to outstanding performance with F(1) scores of 93.9 (subjective), 94.5 (objective), 75.7 (assessment), and 77.0 (plan). We look forward to expanding the framework and applying the SOAP classification to clinical information extraction tasks. Copyright © 2011. Published by Elsevier Inc.

  2. Learning to classify wakes from local sensory information

    Science.gov (United States)

    Alsalman, Mohamad; Colvert, Brendan; Kanso, Eva; Kanso Team

    2017-11-01

    Aquatic organisms exhibit remarkable abilities to sense local flow signals contained in their fluid environment and to surmise the origins of these flows. For example, fish can discern the information contained in various flow structures and utilize this information for obstacle avoidance and prey tracking. Flow structures created by flapping and swimming bodies are well characterized in the fluid dynamics literature; however, such characterization relies on classical methods that use an external observer to reconstruct global flow fields. The reconstructed flows, or wakes, are then classified according to the unsteady vortex patterns. Here, we propose a new approach for wake identification: we classify the wakes resulting from a flapping airfoil by applying machine learning algorithms to local flow information. In particular, we simulate the wakes of an oscillating airfoil in an incoming flow, extract the downstream vorticity information, and train a classifier to learn the different flow structures and classify new ones. This data-driven approach provides a promising framework for underwater navigation and detection in application to autonomous bio-inspired vehicles.

  3. The Closing of the Classified Catalog at Boston University

    Science.gov (United States)

    Hazen, Margaret Hindle

    1974-01-01

    Although the classified catalog at Boston University libraries has been a useful research tool, it has proven too expensive to keep current. The library has converted to a traditional alphabetic subject catalog and will recieve catalog cards from the Ohio College Library Center through the New England Library Network. (Author/LS)

  4. Recognition of Arabic Sign Language Alphabet Using Polynomial Classifiers

    Directory of Open Access Journals (Sweden)

    M. Al-Rousan

    2005-08-01

    Full Text Available Building an accurate automatic sign language recognition system is of great importance in facilitating efficient communication with deaf people. In this paper, we propose the use of polynomial classifiers as a classification engine for the recognition of Arabic sign language (ArSL alphabet. Polynomial classifiers have several advantages over other classifiers in that they do not require iterative training, and that they are highly computationally scalable with the number of classes. Based on polynomial classifiers, we have built an ArSL system and measured its performance using real ArSL data collected from deaf people. We show that the proposed system provides superior recognition results when compared with previously published results using ANFIS-based classification on the same dataset and feature extraction methodology. The comparison is shown in terms of the number of misclassified test patterns. The reduction in the rate of misclassified patterns was very significant. In particular, we have achieved a 36% reduction of misclassifications on the training data and 57% on the test data.

  5. Reconfigurable support vector machine classifier with approximate computing

    NARCIS (Netherlands)

    van Leussen, M.J.; Huisken, J.; Wang, L.; Jiao, H.; De Gyvez, J.P.

    2017-01-01

    Support Vector Machine (SVM) is one of the most popular machine learning algorithms. An energy-efficient SVM classifier is proposed in this paper, where approximate computing is utilized to reduce energy consumption and silicon area. A hardware architecture with reconfigurable kernels and

  6. Classifying regularized sensor covariance matrices: An alternative to CSP

    NARCIS (Netherlands)

    Roijendijk, L.M.M.; Gielen, C.C.A.M.; Farquhar, J.D.R.

    2016-01-01

    Common spatial patterns ( CSP) is a commonly used technique for classifying imagined movement type brain-computer interface ( BCI) datasets. It has been very successful with many extensions and improvements on the basic technique. However, a drawback of CSP is that the signal processing pipeline

  7. Classifying regularised sensor covariance matrices: An alternative to CSP

    NARCIS (Netherlands)

    Roijendijk, L.M.M.; Gielen, C.C.A.M.; Farquhar, J.D.R.

    2016-01-01

    Common spatial patterns (CSP) is a commonly used technique for classifying imagined movement type brain computer interface (BCI) datasets. It has been very successful with many extensions and improvements on the basic technique. However, a drawback of CSP is that the signal processing pipeline

  8. Two-categorical bundles and their classifying spaces

    DEFF Research Database (Denmark)

    Baas, Nils A.; Bökstedt, M.; Kro, T.A.

    2012-01-01

    -category is a classifying space for the associated principal 2-bundles. In the process of proving this we develop a lot of powerful machinery which may be useful in further studies of 2-categorical topology. As a corollary we get a new proof of the classification of principal bundles. A calculation based...

  9. 3 CFR - Classified Information and Controlled Unclassified Information

    Science.gov (United States)

    2010-01-01

    ... on Transparency and Open Government and on the Freedom of Information Act, my Administration is... memoranda of January 21, 2009, on Transparency and Open Government and on the Freedom of Information Act; (B... 3 The President 1 2010-01-01 2010-01-01 false Classified Information and Controlled Unclassified...

  10. Comparison of Classifier Architectures for Online Neural Spike Sorting.

    Science.gov (United States)

    Saeed, Maryam; Khan, Amir Ali; Kamboh, Awais Mehmood

    2017-04-01

    High-density, intracranial recordings from micro-electrode arrays need to undergo Spike Sorting in order to associate the recorded neuronal spikes to particular neurons. This involves spike detection, feature extraction, and classification. To reduce the data transmission and power requirements, on-chip real-time processing is becoming very popular. However, high computational resources are required for classifiers in on-chip spike-sorters, making scalability a great challenge. In this review paper, we analyze several popular classifiers to propose five new hardware architectures using the off-chip training with on-chip classification approach. These include support vector classification, fuzzy C-means classification, self-organizing maps classification, moving-centroid K-means classification, and Cosine distance classification. The performance of these architectures is analyzed in terms of accuracy and resource requirement. We establish that the neural networks based Self-Organizing Maps classifier offers the most viable solution. A spike sorter based on the Self-Organizing Maps classifier, requires only 7.83% of computational resources of the best-reported spike sorter, hierarchical adaptive means, while offering a 3% better accuracy at 7 dB SNR.

  11. Human Activity Recognition by Combining a Small Number of Classifiers.

    Science.gov (United States)

    Nazabal, Alfredo; Garcia-Moreno, Pablo; Artes-Rodriguez, Antonio; Ghahramani, Zoubin

    2016-09-01

    We consider the problem of daily human activity recognition (HAR) using multiple wireless inertial sensors, and specifically, HAR systems with a very low number of sensors, each one providing an estimation of the performed activities. We propose new Bayesian models to combine the output of the sensors. The models are based on a soft outputs combination of individual classifiers to deal with the small number of sensors. We also incorporate the dynamic nature of human activities as a first-order homogeneous Markov chain. We develop both inductive and transductive inference methods for each model to be employed in supervised and semisupervised situations, respectively. Using different real HAR databases, we compare our classifiers combination models against a single classifier that employs all the signals from the sensors. Our models exhibit consistently a reduction of the error rate and an increase of robustness against sensor failures. Our models also outperform other classifiers combination models that do not consider soft outputs and an Markovian structure of the human activities.

  12. Evaluation of three classifiers in mapping forest stand types using ...

    African Journals Online (AJOL)

    EJIRO

    applied for classification of the image. Supervised classification technique using maximum likelihood algorithm is the most commonly and widely used method for land cover classification (Jia and Richards, 2006). In Australia, the maximum likelihood classifier was effectively used to map different forest stand types with high.

  13. Localizing genes to cerebellar layers by classifying ISH images.

    Directory of Open Access Journals (Sweden)

    Lior Kirsch

    Full Text Available Gene expression controls how the brain develops and functions. Understanding control processes in the brain is particularly hard since they involve numerous types of neurons and glia, and very little is known about which genes are expressed in which cells and brain layers. Here we describe an approach to detect genes whose expression is primarily localized to a specific brain layer and apply it to the mouse cerebellum. We learn typical spatial patterns of expression from a few markers that are known to be localized to specific layers, and use these patterns to predict localization for new genes. We analyze images of in-situ hybridization (ISH experiments, which we represent using histograms of local binary patterns (LBP and train image classifiers and gene classifiers for four layers of the cerebellum: the Purkinje, granular, molecular and white matter layer. On held-out data, the layer classifiers achieve accuracy above 94% (AUC by representing each image at multiple scales and by combining multiple image scores into a single gene-level decision. When applied to the full mouse genome, the classifiers predict specific layer localization for hundreds of new genes in the Purkinje and granular layers. Many genes localized to the Purkinje layer are likely to be expressed in astrocytes, and many others are involved in lipid metabolism, possibly due to the unusual size of Purkinje cells.

  14. An ensemble self-training protein interaction article classifier.

    Science.gov (United States)

    Chen, Yifei; Hou, Ping; Manderick, Bernard

    2014-01-01

    Protein-protein interaction (PPI) is essential to understand the fundamental processes governing cell biology. The mining and curation of PPI knowledge are critical for analyzing proteomics data. Hence it is desired to classify articles PPI-related or not automatically. In order to build interaction article classification systems, an annotated corpus is needed. However, it is usually the case that only a small number of labeled articles can be obtained manually. Meanwhile, a large number of unlabeled articles are available. By combining ensemble learning and semi-supervised self-training, an ensemble self-training interaction classifier called EST_IACer is designed to classify PPI-related articles based on a small number of labeled articles and a large number of unlabeled articles. A biological background based feature weighting strategy is extended using the category information from both labeled and unlabeled data. Moreover, a heuristic constraint is put forward to select optimal instances from unlabeled data to improve the performance further. Experiment results show that the EST_IACer can classify the PPI related articles effectively and efficiently.

  15. Classifying Your Food as Acid, Low-Acid, or Acidified

    OpenAIRE

    Bacon, Karleigh

    2012-01-01

    As a food entrepreneur, you should be aware of how ingredients in your product make the food look, feel, and taste; as well as how the ingredients create environments for microorganisms like bacteria, yeast, and molds to survive and grow. This guide will help you classifying your food as acid, low-acid, or acidified.

  16. Gene-expression Classifier in Papillary Thyroid Carcinoma

    DEFF Research Database (Denmark)

    Londero, Stefano Christian; Jespersen, Marie Louise; Krogdahl, Annelise

    2016-01-01

    BACKGROUND: No reliable biomarker for metastatic potential in the risk stratification of papillary thyroid carcinoma exists. We aimed to develop a gene-expression classifier for metastatic potential. MATERIALS AND METHODS: Genome-wide expression analyses were used. Development cohort: freshly...

  17. Abbreviations: Their Effects on Comprehension of Classified Advertisements.

    Science.gov (United States)

    Sokol, Kirstin R.

    Two experimental designs were used to test the hypothesis that abbreviations in classified advertisements decrease the reader's comprehension of such ads. In the first experimental design, 73 high school students read four ads (for employment, used cars, apartments for rent, and articles for sale) either with abbreviations or with all…

  18. Planning schistosomiasis control: investigation of alternative sampling strategies for Schistosoma mansoni to target mass drug administration of praziquantel in East Africa.

    Science.gov (United States)

    Sturrock, Hugh J W; Gething, Pete W; Ashton, Ruth A; Kolaczinski, Jan H; Kabatereine, Narcis B; Brooker, Simon

    2011-09-01

    In schistosomiasis control, there is a need to geographically target treatment to populations at high risk of morbidity. This paper evaluates alternative sampling strategies for surveys of Schistosoma mansoni to target mass drug administration in Kenya and Ethiopia. Two main designs are considered: lot quality assurance sampling (LQAS) of children from all schools; and a geostatistical design that samples a subset of schools and uses semi-variogram analysis and spatial interpolation to predict prevalence in the remaining unsurveyed schools. Computerized simulations are used to investigate the performance of sampling strategies in correctly classifying schools according to treatment needs and their cost-effectiveness in identifying high prevalence schools. LQAS performs better than geostatistical sampling in correctly classifying schools, but at a cost with a higher cost per high prevalence school correctly classified. It is suggested that the optimal surveying strategy for S. mansoni needs to take into account the goals of the control programme and the financial and drug resources available.

  19. Multivariate models to classify Tuscan virgin olive oils by zone.

    Directory of Open Access Journals (Sweden)

    Alessandri, Stefano

    1999-10-01

    Full Text Available In order to study and classify Tuscan virgin olive oils, 179 samples were collected. They were obtained from drupes harvested during the first half of November, from three different zones of the Region. The sampling was repeated for 5 years. Fatty acids, phytol, aliphatic and triterpenic alcohols, triterpenic dialcohols, sterols, squalene and tocopherols were analyzed. A subset of variables was considered. They were selected in a preceding work as the most effective and reliable, from the univariate point of view. The analytical data were transformed (except for the cycloartenol to compensate annual variations, the mean related to the East zone was subtracted from each value, within each year. Univariate three-class models were calculated and further variables discarded. Then multivariate three-zone models were evaluated, including phytol (that was always selected and all the combinations of palmitic, palmitoleic and oleic acid, tetracosanol, cycloartenol and squalene. Models including from two to seven variables were studied. The best model shows by-zone classification errors less than 40%, by-zone within-year classification errors that are less than 45% and a global classification error equal to 30%. This model includes phytol, palmitic acid, tetracosanol and cycloartenol.

    Para estudiar y clasificar aceites de oliva vírgenes Toscanos, se utilizaron 179 muestras, que fueron obtenidas de frutos recolectados durante la primera mitad de Noviembre, de tres zonas diferentes de la Región. El muestreo fue repetido durante 5 años. Se analizaron ácidos grasos, fitol, alcoholes alifáticos y triterpénicos, dialcoholes triterpénicos, esteroles, escualeno y tocoferoles. Se consideró un subconjunto de variables que fueron seleccionadas en un trabajo anterior como el más efectivo y fiable, desde el punto de vista univariado. Los datos analíticos se transformaron (excepto para el cicloartenol para compensar las variaciones anuales, rest

  20. Classification of THz pulse signals using two-dimensional cross-correlation feature extraction and non-linear classifiers.

    Science.gov (United States)

    Siuly; Yin, Xiaoxia; Hadjiloucas, Sillas; Zhang, Yanchun

    2016-04-01

    This work provides a performance comparison of four different machine learning classifiers: multinomial logistic regression with ridge estimators (MLR) classifier, k-nearest neighbours (KNN), support vector machine (SVM) and naïve Bayes (NB) as applied to terahertz (THz) transient time domain sequences associated with pixelated images of different powder samples. The six substances considered, although have similar optical properties, their complex insertion loss at the THz part of the spectrum is significantly different because of differences in both their frequency dependent THz extinction coefficient as well as differences in their refractive index and scattering properties. As scattering can be unquantifiable in many spectroscopic experiments, classification solely on differences in complex insertion loss can be inconclusive. The problem is addressed using two-dimensional (2-D) cross-correlations between background and sample interferograms, these ensure good noise suppression of the datasets and provide a range of statistical features that are subsequently used as inputs to the above classifiers. A cross-validation procedure is adopted to assess the performance of the classifiers. Firstly the measurements related to samples that had thicknesses of 2mm were classified, then samples at thicknesses of 4mm, and after that 3mm were classified and the success rate and consistency of each classifier was recorded. In addition, mixtures having thicknesses of 2 and 4mm as well as mixtures of 2, 3 and 4mm were presented simultaneously to all classifiers. This approach provided further cross-validation of the classification consistency of each algorithm. The results confirm the superiority in classification accuracy and robustness of the MLR (least accuracy 88.24%) and KNN (least accuracy 90.19%) algorithms which consistently outperformed the SVM (least accuracy 74.51%) and NB (least accuracy 56.86%) classifiers for the same number of feature vectors across all studies

  1. Zooniverse: Combining Human and Machine Classifiers for the Big Survey Era

    Science.gov (United States)

    Fortson, Lucy; Wright, Darryl; Beck, Melanie; Lintott, Chris; Scarlata, Claudia; Dickinson, Hugh; Trouille, Laura; Willi, Marco; Laraia, Michael; Boyer, Amy; Veldhuis, Marten; Zooniverse

    2018-01-01

    Many analyses of astronomical data sets, ranging from morphological classification of galaxies to identification of supernova candidates, have relied on humans to classify data into distinct categories. Crowdsourced galaxy classifications via the Galaxy Zoo project provided a solution that scaled visual classification for extant surveys by harnessing the combined power of thousands of volunteers. However, the much larger data sets anticipated from upcoming surveys will require a different approach. Automated classifiers using supervised machine learning have improved considerably over the past decade but their increasing sophistication comes at the expense of needing ever more training data. Crowdsourced classification by human volunteers is a critical technique for obtaining these training data. But several improvements can be made on this zeroth order solution. Efficiency gains can be achieved by implementing a “cascade filtering” approach whereby the task structure is reduced to a set of binary questions that are more suited to simpler machines while demanding lower cognitive loads for humans.Intelligent subject retirement based on quantitative metrics of volunteer skill and subject label reliability also leads to dramatic improvements in efficiency. We note that human and machine classifiers may retire subjects differently leading to trade-offs in performance space. Drawing on work with several Zooniverse projects including Galaxy Zoo and Supernova Hunter, we will present recent findings from experiments that combine cohorts of human and machine classifiers. We show that the most efficient system results when appropriate subsets of the data are intelligently assigned to each group according to their particular capabilities.With sufficient online training, simple machines can quickly classify “easy” subjects, leaving more difficult (and discovery-oriented) tasks for volunteers. We also find humans achieve higher classification purity while samples

  2. Identifying 'unhealthy' food advertising on television: a case study applying the UK Nutrient Profile model.

    Science.gov (United States)

    Jenkin, Gabrielle; Wilson, Nick; Hermanson, Nicole

    2009-05-01

    To evaluate the feasibility of the UK Nutrient Profile (NP) model for identifying 'unhealthy' food advertisements using a case study of New Zealand television advertisements. Four weeks of weekday television from 15.30 hours to 18.30 hours was videotaped from a state-owned (free-to-air) television channel popular with children. Food advertisements were identified and their nutritional information collected in accordance with the requirements of the NP model. Nutrient information was obtained from a variety of sources including food labels, company websites and a national nutritional database. From the 60 h sample of weekday afternoon television, there were 1893 advertisements, of which 483 were for food products or retailers. After applying the NP model, 66 % of these were classified as advertising high-fat, high-salt and high-sugar (HFSS) foods; 28 % were classified as advertising non-HFSS foods; and the remaining 2 % were unclassifiable. More than half (53 %) of the HFSS food advertisements were for 'mixed meal' items promoted by major fast-food franchises. The advertising of non-HFSS food was sparse, covering a narrow range of food groups, with no advertisements for fresh fruit or vegetables. Despite the NP model having some design limitations in classifying real-world televised food advertisements, it was easily applied to this sample and could clearly identify HFSS products. Policy makers who do not wish to completely restrict food advertising to children outright should consider using this NP model for regulating food advertising.

  3. A neural-fuzzy approach to classify the ecological status in surface waters

    International Nuclear Information System (INIS)

    Ocampo-Duque, William; Schuhmacher, Marta; Domingo, Jose L.

    2007-01-01

    A methodology based on a hybrid approach that combines fuzzy inference systems and artificial neural networks has been used to classify ecological status in surface waters. This methodology has been proposed to deal efficiently with the non-linearity and highly subjective nature of variables involved in this serious problem. Ecological status has been assessed with biological, hydro-morphological, and physicochemical indicators. A data set collected from 378 sampling sites in the Ebro river basin has been used to train and validate the hybrid model. Up to 97.6% of sampling sites have been correctly classified with neural-fuzzy models. Such performance resulted very competitive when compared with other classification algorithms. With non-parametric classification-regression trees and probabilistic neural networks, the predictive capacities were 90.7% and 97.0%, respectively. The proposed methodology can support decision-makers in evaluation and classification of ecological status, as required by the EU Water Framework Directive. - Fuzzy inference systems can be used as environmental classifiers

  4. SVM Classifier – a comprehensive java interface for support vector machine classification of microarray data

    Science.gov (United States)

    Pirooznia, Mehdi; Deng, Youping

    2006-01-01

    Motivation Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. Results The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1–BRCA2 samples with RBF kernel of SVM. Conclusion We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at . PMID:17217518

  5. SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data.

    Science.gov (United States)

    Pirooznia, Mehdi; Deng, Youping

    2006-12-12

    Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1-BRCA2 samples with RBF kernel of SVM. We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at http://mfgn.usm.edu/ebl/svm/.

  6. Fusion of classifiers for REIS-based detection of suspicious breast lesions

    Science.gov (United States)

    Lederman, Dror; Wang, Xingwei; Zheng, Bin; Sumkin, Jules H.; Tublin, Mitchell; Gur, David

    2011-03-01

    After developing a multi-probe resonance-frequency electrical impedance spectroscopy (REIS) system aimed at detecting women with breast abnormalities that may indicate a developing breast cancer, we have been conducting a prospective clinical study to explore the feasibility of applying this REIS system to classify younger women (breast cancer. The system comprises one central probe placed in contact with the nipple, and six additional probes uniformly distributed along an outside circle to be placed in contact with six points on the outer breast skin surface. In this preliminary study, we selected an initial set of 174 examinations on participants that have completed REIS examinations and have clinical status verification. Among these, 66 examinations were recommended for biopsy due to findings of a highly suspicious breast lesion ("positives"), and 108 were determined as negative during imaging based procedures ("negatives"). A set of REIS-based features, extracted using a mirror-matched approach, was computed and fed into five machine learning classifiers. A genetic algorithm was used to select an optimal subset of features for each of the five classifiers. Three fusion rules, namely sum rule, weighted sum rule and weighted median rule, were used to combine the results of the classifiers. Performance evaluation was performed using a leave-one-case-out cross-validation method. The results indicated that REIS may provide a new technology to identify younger women with higher than average risk of having or developing breast cancer. Furthermore, it was shown that fusion rule, such as a weighted median fusion rule and a weighted sum fusion rule may improve performance as compared with the highest performing single classifier.

  7. Deep sequencing identifies ethnicity-specific bacterial signatures in the oral microbiome.

    Directory of Open Access Journals (Sweden)

    Matthew R Mason

    Full Text Available Oral infections have a strong ethnic predilection; suggesting that ethnicity is a critical determinant of oral microbial colonization. Dental plaque and saliva samples from 192 subjects belonging to four major ethnicities in the United States were analyzed using terminal restriction fragment length polymorphism (t-RFLP and 16S pyrosequencing. Ethnicity-specific clustering of microbial communities was apparent in saliva and subgingival biofilms, and a machine-learning classifier was capable of identifying an individual's ethnicity from subgingival microbial signatures. The classifier identified African Americans with a 100% sensitivity and 74% specificity and Caucasians with a 50% sensitivity and 91% specificity. The data demonstrates a significant association between ethnic affiliation and the composition of the oral microbiome; to the extent that these microbial signatures appear to be capable of discriminating between ethnicities.

  8. Super resolution reconstruction of infrared images based on classified dictionary learning

    Science.gov (United States)

    Liu, Fei; Han, Pingli; Wang, Yi; Li, Xuan; Bai, Lu; Shao, Xiaopeng

    2018-05-01

    Infrared images always suffer from low-resolution problems resulting from limitations of imaging devices. An economical approach to combat this problem involves reconstructing high-resolution images by reasonable methods without updating devices. Inspired by compressed sensing theory, this study presents and demonstrates a Classified Dictionary Learning method to reconstruct high-resolution infrared images. It classifies features of the samples into several reasonable clusters and trained a dictionary pair for each cluster. The optimal pair of dictionaries is chosen for each image reconstruction and therefore, more satisfactory results is achieved without the increase in computational complexity and time cost. Experiments and results demonstrated that it is a viable method for infrared images reconstruction since it improves image resolution and recovers detailed information of targets.

  9. Textual and shape-based feature extraction and neuro-fuzzy classifier for nuclear track recognition

    Science.gov (United States)

    Khayat, Omid; Afarideh, Hossein

    2013-04-01

    Track counting algorithms as one of the fundamental principles of nuclear science have been emphasized in the recent years. Accurate measurement of nuclear tracks on solid-state nuclear track detectors is the aim of track counting systems. Commonly track counting systems comprise a hardware system for the task of imaging and software for analysing the track images. In this paper, a track recognition algorithm based on 12 defined textual and shape-based features and a neuro-fuzzy classifier is proposed. Features are defined so as to discern the tracks from the background and small objects. Then, according to the defined features, tracks are detected using a trained neuro-fuzzy system. Features and the classifier are finally validated via 100 Alpha track images and 40 training samples. It is shown that principle textual and shape-based features concomitantly yield a high rate of track detection compared with the single-feature based methods.

  10. Improved Classification by Non Iterative and Ensemble Classifiers in Motor Fault Diagnosis

    Directory of Open Access Journals (Sweden)

    PANIGRAHY, P. S.

    2018-02-01

    Full Text Available Data driven approach for multi-class fault diagnosis of induction motor using MCSA at steady state condition is a complex pattern classification problem. This investigation has exploited the built-in ensemble process of non-iterative classifiers to resolve the most challenging issues in this area, including bearing and stator fault detection. Non-iterative techniques exhibit with an average 15% of increased fault classification accuracy against their iterative counterparts. Particularly RF has shown outstanding performance even at less number of training samples and noisy feature space because of its distributive feature model. The robustness of the results, backed by the experimental verification shows that the non-iterative individual classifiers like RF is the optimum choice in the area of automatic fault diagnosis of induction motor.

  11. A Machine Learning Ensemble Classifier for Early Prediction of Diabetic Retinopathy.

    Science.gov (United States)

    S K, Somasundaram; P, Alli

    2017-11-09

    The main complication of diabetes is Diabetic retinopathy (DR), retinal vascular disease and it leads to the blindness. Regular screening for early DR disease detection is considered as an intensive labor and resource oriented task. Therefore, automatic detection of DR diseases is performed only by using the computational technique is the great solution. An automatic method is more reliable to determine the presence of an abnormality in Fundus images (FI) but, the classification process is poorly performed. Recently, few research works have been designed for analyzing texture discrimination capacity in FI to distinguish the healthy images. However, the feature extraction (FE) process was not performed well, due to the high dimensionality. Therefore, to identify retinal features for DR disease diagnosis and early detection using Machine Learning and Ensemble Classification method, called, Machine Learning Bagging Ensemble Classifier (ML-BEC) is designed. The ML-BEC method comprises of two stages. The first stage in ML-BEC method comprises extraction of the candidate objects from Retinal Images (RI). The candidate objects or the features for DR disease diagnosis include blood vessels, optic nerve, neural tissue, neuroretinal rim, optic disc size, thickness and variance. These features are initially extracted by applying Machine Learning technique called, t-distributed Stochastic Neighbor Embedding (t-SNE). Besides, t-SNE generates a probability distribution across high-dimensional images where the images are separated into similar and dissimilar pairs. Then, t-SNE describes a similar probability distribution across the points in the low-dimensional map. This lessens the Kullback-Leibler divergence among two distributions regarding the locations of the points on the map. The second stage comprises of application of ensemble classifiers to the extracted features for providing accurate analysis of digital FI using machine learning. In this stage, an automatic detection

  12. Predicting Classifier Performance with Limited Training Data: Applications to Computer-Aided Diagnosis in Breast and Prostate Cancer

    Science.gov (United States)

    Basavanhally, Ajay; Viswanath, Satish; Madabhushi, Anant

    2015-01-01

    Clinical trials increasingly employ medical imaging data in conjunction with supervised classifiers, where the latter require large amounts of training data to accurately model the system. Yet, a classifier selected at the start of the trial based on smaller and more accessible datasets may yield inaccurate and unstable classification performance. In this paper, we aim to address two common concerns in classifier selection for clinical trials: (1) predicting expected classifier performance for large datasets based on error rates calculated from smaller datasets and (2) the selection of appropriate classifiers based on expected performance for larger datasets. We present a framework for comparative evaluation of classifiers using only limited amounts of training data by using random repeated sampling (RRS) in conjunction with a cross-validation sampling strategy. Extrapolated error rates are subsequently validated via comparison with leave-one-out cross-validation performed on a larger dataset. The ability to predict error rates as dataset size increases is demonstrated on both synthetic data as well as three different computational imaging tasks: detecting cancerous image regions in prostate histopathology, differentiating high and low grade cancer in breast histopathology, and detecting cancerous metavoxels in prostate magnetic resonance spectroscopy. For each task, the relationships between 3 distinct classifiers (k-nearest neighbor, naive Bayes, Support Vector Machine) are explored. Further quantitative evaluation in terms of interquartile range (IQR) suggests that our approach consistently yields error rates with lower variability (mean IQRs of 0.0070, 0.0127, and 0.0140) than a traditional RRS approach (mean IQRs of 0.0297, 0.0779, and 0.305) that does not employ cross-validation sampling for all three datasets. PMID:25993029

  13. Training set optimization and classifier performance in a top-down diabetic retinopathy screening system

    Science.gov (United States)

    Wigdahl, J.; Agurto, C.; Murray, V.; Barriga, S.; Soliz, P.

    2013-03-01

    Diabetic retinopathy (DR) affects more than 4.4 million Americans age 40 and over. Automatic screening for DR has shown to be an efficient and cost-effective way to lower the burden on the healthcare system, by triaging diabetic patients and ensuring timely care for those presenting with DR. Several supervised algorithms have been developed to detect pathologies related to DR, but little work has been done in determining the size of the training set that optimizes an algorithm's performance. In this paper we analyze the effect of the training sample size on the performance of a top-down DR screening algorithm for different types of statistical classifiers. Results are based on partial least squares (PLS), support vector machines (SVM), k-nearest neighbor (kNN), and Naïve Bayes classifiers. Our dataset consisted of digital retinal images collected from a total of 745 cases (595 controls, 150 with DR). We varied the number of normal controls in the training set, while keeping the number of DR samples constant, and repeated the procedure 10 times using randomized training sets to avoid bias. Results show increasing performance in terms of area under the ROC curve (AUC) when the number of DR subjects in the training set increased, with similar trends for each of the classifiers. Of these, PLS and k-NN had the highest average AUC. Lower standard deviation and a flattening of the AUC curve gives evidence that there is a limit to the learning ability of the classifiers and an optimal number of cases to train on.

  14. Classifying threats with a 14-MeV neutron interrogation system.

    Science.gov (United States)

    Strellis, Dan; Gozani, Tsahi

    2005-01-01

    SeaPODDS (Sea Portable Drug Detection System) is a non-intrusive tool for detecting concealed threats in hidden compartments of maritime vessels. This system consists of an electronic neutron generator, a gamma-ray detector, a data acquisition computer, and a laptop computer user-interface. Although initially developed to detect narcotics, recent algorithm developments have shown that the system is capable of correctly classifying a threat into one of four distinct categories: narcotic, explosive, chemical weapon, or radiological dispersion device (RDD). Detection of narcotics, explosives, and chemical weapons is based on gamma-ray signatures unique to the chemical elements. Elements are identified by their characteristic prompt gamma-rays induced by fast and thermal neutrons. Detection of RDD is accomplished by detecting gamma-rays emitted by common radioisotopes and nuclear reactor fission products. The algorithm phenomenology for classifying threats into the proper categories is presented here.

  15. WEB-BASED ADAPTIVE TESTING SYSTEM (WATS FOR CLASSIFYING STUDENTS ACADEMIC ABILITY

    Directory of Open Access Journals (Sweden)

    Jaemu LEE,

    2012-08-01

    Full Text Available Computer Adaptive Testing (CAT has been highlighted as a promising assessment method to fulfill two testing purposes: estimating student academic ability and classifying student academic level. In this paper, we introduced the Web-based Adaptive Testing System (WATS developed to support a cost effective assessment for classifying students’ ability into different academic levels. Instead of using a traditional paper and pencil test, the WATS is expected to serve as an alternate method to promptly diagnosis and identify underachieving students through Web-based testing. The WATS can also help provide students with appropriate learning contents and necessary academic support in time. In this paper, theoretical background and structure of WATS, item construction process based upon item response theory, and user interfaces of WATS were discussed.

  16. EVALUATING A COMPUTER BASED SKILLS ACQUISITION TRAINER TO CLASSIFY BADMINTON PLAYERS

    Directory of Open Access Journals (Sweden)

    Minh Vu Huynh

    2011-09-01

    Full Text Available The aim of the present study was to compare the statistical ability of both neural networks and discriminant function analysis on the newly developed SATB program. Using these statistical tools, we identified the accuracy of the SATB in classifying badminton players into different skill level groups. Forty-one participants, classified as advanced, intermediate, or beginner skilled level, participated in this study. Results indicated neural networks are more effective in predicting group membership, and displayed higher predictive validity when compared to discriminant analysis. Using these outcomes, in conjunction with the physiological and biomechanical variables of the participants, we assessed the authenticity and accuracy of the SATB and commented on the overall effectiveness of the visual based training approach to training badminton athletes

  17. Maternal hemodynamics: a method to classify hypertensive disorders of pregnancy.

    Science.gov (United States)

    Ferrazzi, Enrico; Stampalija, Tamara; Monasta, Lorenzo; Di Martino, Daniela; Vonck, Sharona; Gyselaers, Wilfried

    2018-01-01

    The classification of hypertensive disorders of pregnancy is based on the time at the onset of hypertension, proteinuria, and other associated complications. Maternal hemodynamic interrogation in hypertensive disorders of pregnancy considers not only the peripheral blood pressure but also the entire cardiovascular system, and it might help to classify the different clinical phenotypes of this syndrome. This study aimed to examine cardiovascular parameters in a cohort of patients affected by hypertensive disorders of pregnancy according to the clinical phenotypes that prioritize fetoplacental characteristics and not the time at onset of hypertensive disorders of pregnancy. At the fetal-maternal medicine unit of Ziekenhuis Oost-Limburg (Genk, Belgium), maternal cardiovascular parameters were obtained through impedance cardiography using a noninvasive continuous cardiac output monitor with the patients placed in a standing position. The patients were classified as pregnant women with hypertensive disorders of pregnancy who delivered appropriate- and small-for-gestational-age fetuses. Normotensive pregnant women with an appropriate-for-gestational-age fetus at delivery were enrolled as the control group. The possible impact of obesity (body mass index ≥30 kg/m 2 ) on maternal hemodynamics was reassessed in the same groups. Maternal age, parity, body mass index, and blood pressure were not significantly different between the hypertensive disorders of pregnancy/appropriate-for-gestational-age and hypertensive disorders of pregnancy/small-for-gestational-age groups. The mean uterine artery pulsatility index was significantly higher in the hypertensive disorders of pregnancy/small-for-gestational-age group. The cardiac output and cardiac index were significantly lower in the hypertensive disorders of pregnancy/small-for-gestational-age group (cardiac output 6.5 L/min, cardiac index 3.6) than in the hypertensive disorders of pregnancy/appropriate-for-gestational-age group

  18. Fuzziness-based active learning framework to enhance hyperspectral image classification performance for discriminative and generative classifiers.

    Directory of Open Access Journals (Sweden)

    Muhammad Ahmad

    Full Text Available Hyperspectral image classification with a limited number of training samples without loss of accuracy is desirable, as collecting such data is often expensive and time-consuming. However, classifiers trained with limited samples usually end up with a large generalization error. To overcome the said problem, we propose a fuzziness-based active learning framework (FALF, in which we implement the idea of selecting optimal training samples to enhance generalization performance for two different kinds of classifiers, discriminative and generative (e.g. SVM and KNN. The optimal samples are selected by first estimating the boundary of each class and then calculating the fuzziness-based distance between each sample and the estimated class boundaries. Those samples that are at smaller distances from the boundaries and have higher fuzziness are chosen as target candidates for the training set. Through detailed experimentation on three publically available datasets, we showed that when trained with the proposed sample selection framework, both classifiers achieved higher classification accuracy and lower processing time with the small amount of training data as opposed to the case where the training samples were selected randomly. Our experiments demonstrate the effectiveness of our proposed method, which equates favorably with the state-of-the-art methods.

  19. Distance and Density Similarity Based Enhanced k-NN Classifier for Improving Fault Diagnosis Performance of Bearings

    Directory of Open Access Journals (Sweden)

    Sharif Uddin

    2016-01-01

    Full Text Available An enhanced k-nearest neighbor (k-NN classification algorithm is presented, which uses a density based similarity measure in addition to a distance based similarity measure to improve the diagnostic performance in bearing fault diagnosis. Due to its use of distance based similarity measure alone, the classification accuracy of traditional k-NN deteriorates in case of overlapping samples and outliers and is highly susceptible to the neighborhood size, k. This study addresses these limitations by proposing the use of both distance and density based measures of similarity between training and test samples. The proposed k-NN classifier is used to enhance the diagnostic performance of a bearing fault diagnosis scheme, which classifies different fault conditions based upon hybrid feature vectors extracted from acoustic emission (AE signals. Experimental results demonstrate that the proposed scheme, which uses the enhanced k-NN classifier, yields better diagnostic performance and is more robust to variations in the neighborhood size, k.

  20. Deep Feature Learning and Cascaded Classifier for Large Scale Data

    DEFF Research Database (Denmark)

    Prasoon, Adhish

    from data rather than having a predefined feature set. We explore deep learning approach of convolutional neural network (CNN) for segmenting three dimensional medical images. We propose a novel system integrating three 2D CNNs, which have a one-to-one association with the xy, yz and zx planes of 3D......This thesis focuses on voxel/pixel classification based approaches for image segmentation. The main application is segmentation of articular cartilage in knee MRIs. The first major contribution of the thesis deals with large scale machine learning problems. Many medical imaging problems need huge...... amount of training data to cover sufficient biological variability. Learning methods scaling badly with number of training data points cannot be used in such scenarios. This may restrict the usage of many powerful classifiers having excellent generalization ability. We propose a cascaded classifier which...

  1. Scoring and Classifying Examinees Using Measurement Decision Theory

    Directory of Open Access Journals (Sweden)

    Lawrence M. Rudner

    2009-04-01

    Full Text Available This paper describes and evaluates the use of measurement decision theory (MDT to classify examinees based on their item response patterns. The model has a simple framework that starts with the conditional probabilities of examinees in each category or mastery state responding correctly to each item. The presented evaluation investigates: (1 the classification accuracy of tests scored using decision theory; (2 the effectiveness of different sequential testing procedures; and (3 the number of items needed to make a classification. A large percentage of examinees can be classified accurately with very few items using decision theory. A Java Applet for self instruction and software for generating, calibrating and scoring MDT data are provided.

  2. Evaluation of LDA Ensembles Classifiers for Brain Computer Interface

    International Nuclear Information System (INIS)

    Arjona, Cristian; Pentácolo, José; Gareis, Iván; Atum, Yanina; Gentiletti, Gerardo; Acevedo, Rubén; Rufiner, Leonardo

    2011-01-01

    The Brain Computer Interface (BCI) translates brain activity into computer commands. To increase the performance of the BCI, to decode the user intentions it is necessary to get better the feature extraction and classification techniques. In this article the performance of a three linear discriminant analysis (LDA) classifiers ensemble is studied. The system based on ensemble can theoretically achieved better classification results than the individual counterpart, regarding individual classifier generation algorithm and the procedures for combine their outputs. Classic algorithms based on ensembles such as bagging and boosting are discussed here. For the application on BCI, it was concluded that the generated results using ER and AUC as performance index do not give enough information to establish which configuration is better.

  3. Security Enrichment in Intrusion Detection System Using Classifier Ensemble

    Directory of Open Access Journals (Sweden)

    Uma R. Salunkhe

    2017-01-01

    Full Text Available In the era of Internet and with increasing number of people as its end users, a large number of attack categories are introduced daily. Hence, effective detection of various attacks with the help of Intrusion Detection Systems is an emerging trend in research these days. Existing studies show effectiveness of machine learning approaches in handling Intrusion Detection Systems. In this work, we aim to enhance detection rate of Intrusion Detection System by using machine learning technique. We propose a novel classifier ensemble based IDS that is constructed using hybrid approach which combines data level and feature level approach. Classifier ensembles combine the opinions of different experts and improve the intrusion detection rate. Experimental results show the improved detection rates of our system compared to reference technique.

  4. The three-dimensional origin of the classifying algebra

    International Nuclear Information System (INIS)

    Fuchs, Juergen; Schweigert, Christoph; Stigner, Carl

    2010-01-01

    It is known that reflection coefficients for bulk fields of a rational conformal field theory in the presence of an elementary boundary condition can be obtained as representation matrices of irreducible representations of the classifying algebra, a semisimple commutative associative complex algebra. We show how this algebra arises naturally from the three-dimensional geometry of factorization of correlators of bulk fields on the disk. This allows us to derive explicit expressions for the structure constants of the classifying algebra as invariants of ribbon graphs in the three-manifold S 2 xS 1 . Our result unravels a precise relation between intertwiners of the action of the mapping class group on spaces of conformal blocks and boundary conditions in rational conformal field theories.

  5. Machine learning classifiers and fMRI: a tutorial overview.

    Science.gov (United States)

    Pereira, Francisco; Mitchell, Tom; Botvinick, Matthew

    2009-03-01

    Interpreting brain image experiments requires analysis of complex, multivariate data. In recent years, one analysis approach that has grown in popularity is the use of machine learning algorithms to train classifiers to decode stimuli, mental states, behaviours and other variables of interest from fMRI data and thereby show the data contain information about them. In this tutorial overview we review some of the key choices faced in using this approach as well as how to derive statistically significant results, illustrating each point from a case study. Furthermore, we show how, in addition to answering the question of 'is there information about a variable of interest' (pattern discrimination), classifiers can be used to tackle other classes of question, namely 'where is the information' (pattern localization) and 'how is that information encoded' (pattern characterization).

  6. Lung Nodule Detection in CT Images using Neuro Fuzzy Classifier

    Directory of Open Access Journals (Sweden)

    M. Usman Akram

    2013-07-01

    Full Text Available Automated lung cancer detection using computer aided diagnosis (CAD is an important area in clinical applications. As the manual nodule detection is very time consuming and costly so computerized systems can be helpful for this purpose. In this paper, we propose a computerized system for lung nodule detection in CT scan images. The automated system consists of two stages i.e. lung segmentation and enhancement, feature extraction and classification. The segmentation process will result in separating lung tissue from rest of the image, and only the lung tissues under examination are considered as candidate regions for detecting malignant nodules in lung portion. A feature vector for possible abnormal regions is calculated and regions are classified using neuro fuzzy classifier. It is a fully automatic system that does not require any manual intervention and experimental results show the validity of our system.

  7. A Bayesian Classifier for X-Ray Pulsars Recognition

    Directory of Open Access Journals (Sweden)

    Hao Liang

    2016-01-01

    Full Text Available Recognition for X-ray pulsars is important for the problem of spacecraft’s attitude determination by X-ray Pulsar Navigation (XPNAV. By using the nonhomogeneous Poisson model of the received photons and the minimum recognition error criterion, a classifier based on the Bayesian theorem is proposed. For X-ray pulsars recognition with unknown Doppler frequency and initial phase, the features of every X-ray pulsar are extracted and the unknown parameters are estimated using the Maximum Likelihood (ML method. Besides that, a method to recognize unknown X-ray pulsars or X-ray disturbances is proposed. Simulation results certificate the validity of the proposed Bayesian classifier.

  8. Wavelet classifier used for diagnosing shock absorbers in cars

    Directory of Open Access Journals (Sweden)

    Janusz GARDULSKI

    2007-01-01

    Full Text Available The paper discusses some commonly used methods of hydraulic absorbertesting. Disadvantages of the methods are described. A vibro-acoustic method is presented and recommended for practical use on existing test rigs. The method is based on continuous wavelet analysis combined with neural classifier and 25-neuron, one-way, three-layer back propagation network. The analysis satisfies the intended aim.

  9. Classified installations for environmental protection subject to declaration. Tome 2

    International Nuclear Information System (INIS)

    Anon.

    1992-01-01

    Legislation concerning classified installations govern most of industries or dangerous or pollutant activities. This legislation aims to prevent risks and harmful effects coming from an installation, air pollution, water pollution, noise, wastes produced by installations, even aesthetic bad effects. Pollutant or dangerous activities are defined in a list called nomenclature which obliged installations to a rule of declaration or authorization. Technical regulations ordered by the secretary of state for the environment are listed in tome 2

  10. Evaluating Classifiers in Detecting 419 Scams in Bilingual Cybercriminal Communities

    OpenAIRE

    Mbaziira, Alex V.; Abozinadah, Ehab; Jones Jr, James H.

    2015-01-01

    Incidents of organized cybercrime are rising because of criminals are reaping high financial rewards while incurring low costs to commit crime. As the digital landscape broadens to accommodate more internet-enabled devices and technologies like social media, more cybercriminals who are not native English speakers are invading cyberspace to cash in on quick exploits. In this paper we evaluate the performance of three machine learning classifiers in detecting 419 scams in a bilingual Nigerian c...

  11. Efficient Multi-Concept Visual Classifier Adaptation in Changing Environments

    Science.gov (United States)

    2016-09-01

    sets of images, hand annotated by humans with region boundary outlines followed by label assignment. This annotation is time consuming , and...performed as a necessary but time- consuming step to train su- pervised classifiers. U nsupervised o r s elf-supervised a pproaches h ave b een used to...time- consuming labeling pro- cess. However, the lack of human supervision has limited most of this work to binary classification (e.g., traversability

  12. Classifying apples by the means of fluorescence imaging

    OpenAIRE

    Codrea, Marius C.; Nevalainen, Olli S.; Tyystjärvi, Esa; VAN DE VEN, Martin; VALCKE, Roland

    2004-01-01

    Classification of harvested apples when predicting their storage potential is an important task. This paper describes how chlorophyll a fluorescence images taken in blue light through a red filter, can be used to classify apples. In such an image, fluorescence appears as a relatively homogenous area broken by a number of small nonfluorescing spots, corresponding to normal corky tissue patches, lenticells, and to damaged areas that lower the quality of the apple. The damaged regions appear mor...

  13. Building Road-Sign Classifiers Using a Trainable Similarity Measure

    Czech Academy of Sciences Publication Activity Database

    Paclík, P.; Novovičová, Jana; Duin, R.P.W.

    2006-01-01

    Roč. 7, č. 3 (2006), s. 309-321 ISSN 1524-9050 R&D Projects: GA AV ČR IAA2075302 EU Projects: European Commission(XE) 507752 - MUSCLE Institutional research plan: CEZ:AV0Z10750506 Keywords : classifier system design * road-sign classification * similarity data representation Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 1.434, year: 2006 http://www.ewh.ieee.org/tc/its/trans.html

  14. Classifying Floating Potential Measurement Unit Data Products as Science Data

    Science.gov (United States)

    Coffey, Victoria; Minow, Joseph

    2015-01-01

    We are Co-Investigators for the Floating Potential Measurement Unit (FPMU) on the International Space Station (ISS) and members of the FPMU operations and data analysis team. We are providing this memo for the purpose of classifying raw and processed FPMU data products and ancillary data as NASA science data with unrestricted, public availability in order to best support science uses of the data.

  15. Snoring classified: The Munich-Passau Snore Sound Corpus.

    Science.gov (United States)

    Janott, Christoph; Schmitt, Maximilian; Zhang, Yue; Qian, Kun; Pandit, Vedhas; Zhang, Zixing; Heiser, Clemens; Hohenhorst, Winfried; Herzog, Michael; Hemmert, Werner; Schuller, Björn

    2018-03-01

    Snoring can be excited in different locations within the upper airways during sleep. It was hypothesised that the excitation locations are correlated with distinct acoustic characteristics of the snoring noise. To verify this hypothesis, a database of snore sounds is developed, labelled with the location of sound excitation. Video and audio recordings taken during drug induced sleep endoscopy (DISE) examinations from three medical centres have been semi-automatically screened for snore events, which subsequently have been classified by ENT experts into four classes based on the VOTE classification. The resulting dataset containing 828 snore events from 219 subjects has been split into Train, Development, and Test sets. An SVM classifier has been trained using low level descriptors (LLDs) related to energy, spectral features, mel frequency cepstral coefficients (MFCC), formants, voicing, harmonic-to-noise ratio (HNR), spectral harmonicity, pitch, and microprosodic features. An unweighted average recall (UAR) of 55.8% could be achieved using the full set of LLDs including formants. Best performing subset is the MFCC-related set of LLDs. A strong difference in performance could be observed between the permutations of train, development, and test partition, which may be caused by the relatively low number of subjects included in the smaller classes of the strongly unbalanced data set. A database of snoring sounds is presented which are classified according to their sound excitation location based on objective criteria and verifiable video material. With the database, it could be demonstrated that machine classifiers can distinguish different excitation location of snoring sounds in the upper airway based on acoustic parameters. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. Young module multiplicities and classifying the indecomposable Young permutation modules

    OpenAIRE

    Gill, Christopher C.

    2012-01-01

    We study the multiplicities of Young modules as direct summands of permutation modules on cosets of Young subgroups. Such multiplicities have become known as the p-Kostka numbers. We classify the indecomposable Young permutation modules, and, applying the Brauer construction for p-permutation modules, we give some new reductions for p-Kostka numbers. In particular we prove that p-Kostka numbers are preserved under multiplying partitions by p, and strengthen a known reduction given by Henke, c...

  17. BIOPHARMACEUTICS CLASSIFICATION SYSTEM: A STRATEGIC TOOL FOR CLASSIFYING DRUG SUBSTANCES

    OpenAIRE

    Rohilla Seema; Rohilla Ankur; Marwaha RK; Nanda Arun

    2011-01-01

    The biopharmaceutical classification system (BCS) is a scientific approach for classifying drug substances based on their dose/solubility ratio and intestinal permeability. The BCS has been developed to allow prediction of in vivo pharmacokinetic performance of drug products from measurements of permeability and solubility. Moreover, the drugs can be categorized into four classes of BCS on the basis of permeability and solubility namely; high permeability high solubility, high permeability lo...

  18. Self-organizing map classifier for stressed speech recognition

    Science.gov (United States)

    Partila, Pavol; Tovarek, Jaromir; Voznak, Miroslav

    2016-05-01

    This paper presents a method for detecting speech under stress using Self-Organizing Maps. Most people who are exposed to stressful situations can not adequately respond to stimuli. Army, police, and fire department occupy the largest part of the environment that are typical of an increased number of stressful situations. The role of men in action is controlled by the control center. Control commands should be adapted to the psychological state of a man in action. It is known that the psychological changes of the human body are also reflected physiologically, which consequently means the stress effected speech. Therefore, it is clear that the speech stress recognizing system is required in the security forces. One of the possible classifiers, which are popular for its flexibility, is a self-organizing map. It is one type of the artificial neural networks. Flexibility means independence classifier on the character of the input data. This feature is suitable for speech processing. Human Stress can be seen as a kind of emotional state. Mel-frequency cepstral coefficients, LPC coefficients, and prosody features were selected for input data. These coefficients were selected for their sensitivity to emotional changes. The calculation of the parameters was performed on speech recordings, which can be divided into two classes, namely the stress state recordings and normal state recordings. The benefit of the experiment is a method using SOM classifier for stress speech detection. Results showed the advantage of this method, which is input data flexibility.

  19. Deconstructing Cross-Entropy for Probabilistic Binary Classifiers

    Directory of Open Access Journals (Sweden)

    Daniel Ramos

    2018-03-01

    Full Text Available In this work, we analyze the cross-entropy function, widely used in classifiers both as a performance measure and as an optimization objective. We contextualize cross-entropy in the light of Bayesian decision theory, the formal probabilistic framework for making decisions, and we thoroughly analyze its motivation, meaning and interpretation from an information-theoretical point of view. In this sense, this article presents several contributions: First, we explicitly analyze the contribution to cross-entropy of (i prior knowledge; and (ii the value of the features in the form of a likelihood ratio. Second, we introduce a decomposition of cross-entropy into two components: discrimination and calibration. This decomposition enables the measurement of different performance aspects of a classifier in a more precise way; and justifies previously reported strategies to obtain reliable probabilities by means of the calibration of the output of a discriminating classifier. Third, we give different information-theoretical interpretations of cross-entropy, which can be useful in different application scenarios, and which are related to the concept of reference probabilities. Fourth, we present an analysis tool, the Empirical Cross-Entropy (ECE plot, a compact representation of cross-entropy and its aforementioned decomposition. We show the power of ECE plots, as compared to other classical performance representations, in two diverse experimental examples: a speaker verification system, and a forensic case where some glass findings are present.

  20. Evaluation of Polarimetric SAR Decomposition for Classifying Wetland Vegetation Types

    Directory of Open Access Journals (Sweden)

    Sang-Hoon Hong

    2015-07-01

    Full Text Available The Florida Everglades is the largest subtropical wetland system in the United States and, as with subtropical and tropical wetlands elsewhere, has been threatened by severe environmental stresses. It is very important to monitor such wetlands to inform management on the status of these fragile ecosystems. This study aims to examine the applicability of TerraSAR-X quadruple polarimetric (quad-pol synthetic aperture radar (PolSAR data for classifying wetland vegetation in the Everglades. We processed quad-pol data using the Hong & Wdowinski four-component decomposition, which accounts for double bounce scattering in the cross-polarization signal. The calculated decomposition images consist of four scattering mechanisms (single, co- and cross-pol double, and volume scattering. We applied an object-oriented image analysis approach to classify vegetation types with the decomposition results. We also used a high-resolution multispectral optical RapidEye image to compare statistics and classification results with Synthetic Aperture Radar (SAR observations. The calculated classification accuracy was higher than 85%, suggesting that the TerraSAR-X quad-pol SAR signal had a high potential for distinguishing different vegetation types. Scattering components from SAR acquisition were particularly advantageous for classifying mangroves along tidal channels. We conclude that the typical scattering behaviors from model-based decomposition are useful for discriminating among different wetland vegetation types.

  1. A Novel Cascade Classifier for Automatic Microcalcification Detection.

    Directory of Open Access Journals (Sweden)

    Seung Yeon Shin

    Full Text Available In this paper, we present a novel cascaded classification framework for automatic detection of individual and clusters of microcalcifications (μC. Our framework comprises three classification stages: i a random forest (RF classifier for simple features capturing the second order local structure of individual μCs, where non-μC pixels in the target mammogram are efficiently eliminated; ii a more complex discriminative restricted Boltzmann machine (DRBM classifier for μC candidates determined in the RF stage, which automatically learns the detailed morphology of μC appearances for improved discriminative power; and iii a detector to detect clusters of μCs from the individual μC detection results, using two different criteria. From the two-stage RF-DRBM classifier, we are able to distinguish μCs using explicitly computed features, as well as learn implicit features that are able to further discriminate between confusing cases. Experimental evaluation is conducted on the original Mammographic Image Analysis Society (MIAS and mini-MIAS databases, as well as our own Seoul National University Bundang Hospital digital mammographic database. It is shown that the proposed method outperforms comparable methods in terms of receiver operating characteristic (ROC and precision-recall curves for detection of individual μCs and free-response receiver operating characteristic (FROC curve for detection of clustered μCs.

  2. Comparison of artificial intelligence classifiers for SIP attack data

    Science.gov (United States)

    Safarik, Jakub; Slachta, Jiri

    2016-05-01

    Honeypot application is a source of valuable data about attacks on the network. We run several SIP honeypots in various computer networks, which are separated geographically and logically. Each honeypot runs on public IP address and uses standard SIP PBX ports. All information gathered via honeypot is periodically sent to the centralized server. This server classifies all attack data by neural network algorithm. The paper describes optimizations of a neural network classifier, which lower the classification error. The article contains the comparison of two neural network algorithm used for the classification of validation data. The first is the original implementation of the neural network described in recent work; the second neural network uses further optimizations like input normalization or cross-entropy cost function. We also use other implementations of neural networks and machine learning classification algorithms. The comparison test their capabilities on validation data to find the optimal classifier. The article result shows promise for further development of an accurate SIP attack classification engine.

  3. Application of the Naive Bayesian Classifier to optimize treatment decisions

    International Nuclear Information System (INIS)

    Kazmierska, Joanna; Malicki, Julian

    2008-01-01

    Background and purpose: To study the accuracy, specificity and sensitivity of the Naive Bayesian Classifier (NBC) in the assessment of individual risk of cancer relapse or progression after radiotherapy (RT). Materials and methods: Data of 142 brain tumour patients irradiated from 2000 to 2005 were analyzed. Ninety-six attributes related to disease, patient and treatment were chosen. Attributes in binary form consisted of the training set for NBC learning. NBC calculated an individual conditional probability of being assigned to: relapse or progression (1), or no relapse or progression (0) group. Accuracy, attribute selection and quality of classifier were determined by comparison with actual treatment results, leave-one-out and cross validation methods, respectively. Clinical setting test utilized data of 35 patients. Treatment results at classification were unknown and were compared with classification results after 3 months. Results: High classification accuracy (84%), specificity (0.87) and sensitivity (0.80) were achieved, both for classifier training and in progressive clinical evaluation. Conclusions: NBC is a useful tool to support the assessment of individual risk of relapse or progression in patients diagnosed with brain tumour undergoing RT postoperatively

  4. Entropy based classifier for cross-domain opinion mining

    Directory of Open Access Journals (Sweden)

    Jyoti S. Deshmukh

    2018-01-01

    Full Text Available In recent years, the growth of social network has increased the interest of people in analyzing reviews and opinions for products before they buy them. Consequently, this has given rise to the domain adaptation as a prominent area of research in sentiment analysis. A classifier trained from one domain often gives poor results on data from another domain. Expression of sentiment is different in every domain. The labeling cost of each domain separately is very high as well as time consuming. Therefore, this study has proposed an approach that extracts and classifies opinion words from one domain called source domain and predicts opinion words of another domain called target domain using a semi-supervised approach, which combines modified maximum entropy and bipartite graph clustering. A comparison of opinion classification on reviews on four different product domains is presented. The results demonstrate that the proposed method performs relatively well in comparison to the other methods. Comparison of SentiWordNet of domain-specific and domain-independent words reveals that on an average 72.6% and 88.4% words, respectively, are correctly classified.

  5. Boat sampling

    International Nuclear Information System (INIS)

    Citanovic, M.; Bezlaj, H.

    1994-01-01

    This presentation describes essential boat sampling activities: on site boat sampling process optimization and qualification; boat sampling of base material (beltline region); boat sampling of weld material (weld No. 4); problems accompanied with weld crown varieties, RPV shell inner radius tolerance, local corrosion pitting and water clarity. The equipment used for boat sampling is described too. 7 pictures

  6. Graph sampling

    OpenAIRE

    Zhang, L.-C.; Patone, M.

    2017-01-01

    We synthesise the existing theory of graph sampling. We propose a formal definition of sampling in finite graphs, and provide a classification of potential graph parameters. We develop a general approach of Horvitz–Thompson estimation to T-stage snowball sampling, and present various reformulations of some common network sampling methods in the literature in terms of the outlined graph sampling theory.

  7. Word2Vec inversion and traditional text classifiers for phenotyping lupus.

    Science.gov (United States)

    Turner, Clayton A; Jacobs, Alexander D; Marques, Cassios K; Oates, James C; Kamen, Diane L; Anderson, Paul E; Obeid, Jihad S

    2017-08-22

    Identifying patients with certain clinical criteria based on manual chart review of doctors' notes is a daunting task given the massive amounts of text notes in the electronic health records (EHR). This task can be automated using text classifiers based on Natural Language Processing (NLP) techniques along with pattern recognition machine learning (ML) algorithms. The aim of this research is to evaluate the performance of traditional classifiers for identifying patients with Systemic Lupus Erythematosus (SLE) in comparison with a newer Bayesian word vector method. We obtained clinical notes for patients with SLE diagnosis along with controls from the Rheumatology Clinic (662 total patients). Sparse bag-of-words (BOWs) and Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs) matrices were produced using NLP pipelines. These matrices were subjected to several different NLP classifiers: neural networks, random forests, naïve Bayes, support vector machines, and Word2Vec inversion, a Bayesian inversion method. Performance was measured by calculating accuracy and area under the Receiver Operating Characteristic (ROC) curve (AUC) of a cross-validated (CV) set and a separate testing set. We calculated the accuracy of the ICD-9 billing codes as a baseline to be 90.00% with an AUC of 0.900, the shallow neural network with CUIs to be 92.10% with an AUC of 0.970, the random forest with BOWs to be 95.25% with an AUC of 0.994, the random forest with CUIs to be 95.00% with an AUC of 0.979, and the Word2Vec inversion to be 90.03% with an AUC of 0.905. Our results suggest that a shallow neural network with CUIs and random forests with both CUIs and BOWs are the best classifiers for this lupus phenotyping task. The Word2Vec inversion method failed to significantly beat the ICD-9 code classification, but yielded promising results. This method does not require explicit features and is more adaptable to non-binary classification tasks. The Word2Vec inversion is

  8. Screening for HIV-related PTSD: sensitivity and specificity of the 17-item Posttraumatic Stress Diagnostic Scale (PDS) in identifying HIV-related PTSD among a South African sample.

    Science.gov (United States)

    Martin, L; Fincham, D; Kagee, A

    2009-11-01

    The identification of HIV-positive patients who exhibit criteria for Posttraumatic Stress Disorder (PTSD) and related trauma symptomatology is of clinical importance in the maintenance of their overall wellbeing. This study assessed the sensitivity and specificity of the 17-item Posttraumatic Stress Diagnostic Scale (PDS), a self-report instrument, in the detection of HIV-related PTSD. An adapted version of the PTSD module of the Composite International Diagnostic Interview (CIDI) served as the gold standard. 85 HIV-positive patients diagnosed with HIV within the year preceding data collection were recruited by means of convenience sampling from three HIV clinics within primary health care facilities in the Boland region of South Africa. A significant association was found between the 17-item PDS and the adapted PTSD module of the CIDI. A ROC curve analysis indicated that the 17-item PDS correctly discriminated between PTSD caseness and non-caseness 74.9% of the time. Moreover, a PDS cut-off point of > or = 15 yielded adequate sensitivity (68%) and 1-specificity (65%). The 17-item PDS demonstrated a PPV of 76.0% and a NPV of 56.7%. The 17-item PDS can be used as a brief screening measure for the detection of HIV-related PTSD among HIV-positive patients in South Africa.

  9. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    Science.gov (United States)

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  10. Parents' Experiences and Perceptions when Classifying their Children with Cerebral Palsy: Recommendations for Service Providers.

    Science.gov (United States)

    Scime, Natalie V; Bartlett, Doreen J; Brunton, Laura K; Palisano, Robert J

    2017-08-01

    This study investigated the experiences and perceptions of parents of children with cerebral palsy (CP) when classifying their children using the Gross Motor Function Classification System (GMFCS), the Manual Ability Classification System (MACS), and the Communication Function Classification System (CFCS). The second aim was to collate parents' recommendations for service providers on how to interact and communicate with families. A purposive sample of seven parents participating in the On Track study was recruited. Semi-structured interviews were conducted orally and were audiotaped, transcribed, and coded openly. A descriptive interpretive approach within a pragmatic perspective was used during analysis. Seven themes encompassing parents' experiences and perspectives reflect a process of increased understanding when classifying their children, with perceptions of utility evident throughout this process. Six recommendations for service providers emerged, including making the child a priority and being a dependable resource. Knowledge of parents' experiences when using the GMFCS, MACS, and CFCS can provide useful insight for service providers collaborating with parents to classify function in children with CP. Using the recommendations from these parents can facilitate family-provider collaboration for goal setting and intervention planning.

  11. Effective sample labeling

    International Nuclear Information System (INIS)

    Rieger, J.T.; Bryce, R.W.

    1990-01-01

    Ground-water samples collected for hazardous-waste and radiological monitoring have come under strict regulatory and quality assurance requirements as a result of laws such as the Resource Conservation and Recovery Act. To comply with these laws, the labeling system used to identify environmental samples had to be upgraded to ensure proper handling and to protect collection personnel from exposure to sample contaminants and sample preservatives. The sample label now used as the Pacific Northwest Laboratory is a complete sample document. In the event other paperwork on a labeled sample were lost, the necessary information could be found on the label

  12. Efeitos quantitativos da estocagem de sangue periférico nas determinações do hemograma automatizado Storage effects on peripheral blood samples as identified from automated hemograms

    Directory of Open Access Journals (Sweden)

    Michele Dalanhol

    2010-02-01

    Full Text Available O presente trabalho avaliou as possíveis alterações em vários parâmetros do hemograma (contagem de eritrócitos totais, hematócrito, concentração de hemoglobina, volume corpuscular médio (VCM, hemoglobina corpuscular média (HCM, concentração da hemoglobina corpuscular média (CHCM, contagem total de leucócitos e contagem de plaquetas, frente a diferentes tempos de armazenamento da amostra em ambiente refrigerado a 4ºC e temperatura ambiente. As determinações foram realizadas através do contador automatizado Sysmex® XT2000i. As amostras sanguíneas foram obtidas de indivíduos sem alterações hematológicas diagnosticadas, sem clínica de doença, sendo considerados indivíduos com valores de referências hematológicos normais. Os resultados foram avaliados através de análise estatística descritiva e comparação de médias através da análise de variância (ANOVA. Os resultados dos parâmetros CHCM e contagem de plaquetas mostraram diferenças estatisticamente significativas para ambas as temperaturas de estocagem. Na temperatura ambiente, os parâmetros que também apresentaram diferença estatística significante foram para o hematócrito, VCM e índice de variação do tamanho dos eritrócitos ( RDW. Conclui-se, portanto, que os resultados dos hemogramas liberados pelo aparelho analisado podem ser satisfatórios quando realizados entre 12 a 24 horas após a coleta da amostra para a maioria dos parâmetros avaliados.This study evaluated possible alterations in different hematologic parameters [total erythrocyte count, hematocrit, hemoglobin concentration, mean corpuscular volume (MCV, mean corpuscular hemoglobin (MCH, mean corpuscular hemoglobin concentration (MHCM, total leukocyte count and platelet count], of blood samples submitted to varied storage times at both 4ºC and at room temperature. The analyses were performed using a Sysmex XT2000i automated hematology analyzer. Written consent was obtained from donors

  13. Backscatter Mosaic used to identify, delineate and classify moderate-depth benthic habitats around St. John, USVI

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This image represents a 2x2 meter resolution backscatter mosaic of the moderate-depth portion of the NPS's Virgin Islands Coral Reef National Monument, south of St....

  14. Bathymetry Surface Layer used to identify, delineate and classify moderate-depth benthic habitats around St. John, USVI

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This image represents a 2x2 meter resolution bathymetry surface of the moderate-depth portion of the NPS's Virgin Islands Coral Reef National Monument, south of St....

  15. Identifying and Prioritizing Effective Factors on Classifying A Private Bank Customers by Delphi Technique and Analytical Hierarchy Process (AHP

    Directory of Open Access Journals (Sweden)

    S. Khayatmoghadam

    2013-05-01

    Full Text Available Banking industry development and presence of different financial institutions cause to increase competition in customer and their capitals attraction so that there are about 28 banks and many credit and financial institutions from which 6 banks are public and 22 banks are private. Among them, public banks have a more appropriate situation than private banks with regard to governmental relations and support and due to geographical expansion and longer history. But due to lack of above conditions; private banks try to attract customers with regarding science areas to remedy this situation. Therefore, in this study we are decided to review banking customers from a different viewpoint. For this reason, we initially obtained ideal indications from banking viewpoint in two-story of uses and resources customers using experts and Delphi technique application which based on this, indicators such as account workflow, account average, lack of returned cheque, etc and in uses section, the amount of facility received, the amount of received warranties, etc, were determined. Then, using a Hierarchical Analysis (AHP method and experts opinions through software Expert Choice11, priority of these criteria were determined and weight of each index was determined. It should be noted that statistical population of bank experts associated with this study were queue and staff. Also obtained results can be used as input for customer grouping in line with CRM techniques implementation.

  16. Depth (Standard Deviation) Layer used to identify, delineate and classify moderate-depth benthic habitats around St. John, USVI

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — Standard deviation of depth was calculated from the bathymetry surface for each cell using the ArcGIS Spatial Analyst Focal Statistics "STD" parameter. Standard...

  17. Comparison of remote sensing data sources and techniques for identifying and classifying alien invasive vegetation in riparian zones

    CSIR Research Space (South Africa)

    Rowlinson, LC

    1999-10-01

    Full Text Available yield the most accurate and cost-effective results. The least cost-effective data sources were found to be 1:10 000 colour aerial photographs and digital aerial photographs and the least accurate data sources were aerial videography and Landsat thematic...

  18. A cascade of classifiers for extracting medication information from discharge summaries

    Directory of Open Access Journals (Sweden)

    Halgrim Scott

    2011-07-01

    Full Text Available Abstract Background Extracting medication information from clinical records has many potential applications, and recently published research, systems, and competitions reflect an interest therein. Much of the early extraction work involved rules and lexicons, but more recently machine learning has been applied to the task. Methods We present a hybrid system consisting of two parts. The first part, field detection, uses a cascade of statistical classifiers to identify medication-related named entities. The second part uses simple heuristics to link those entities into medication events. Results The system achieved performance that is comparable to other approaches to the same task. This performance is further improved by adding features that reference external medication name lists. Conclusions This study demonstrates that our hybrid approach outperforms purely statistical or rule-based systems. The study also shows that a cascade of classifiers works better than a single classifier in extracting medication information. The system is available as is upon request from the first author.

  19. NMD Classifier: A reliable and systematic classification tool for nonsense-mediated decay events.

    Directory of Open Access Journals (Sweden)

    Min-Kung Hsu

    Full Text Available Nonsense-mediated decay (NMD degrades mRNAs that include premature termination codons to avoid the translation and accumulation of truncated proteins. This mechanism has been found to participate in gene regulation and a wide spectrum of biological processes. However, the evolutionary and regulatory origins of NMD-targeted transcripts (NMDTs have been less studied, partly because of the complexity in analyzing NMD events. Here we report NMD Classifier, a tool for systematic classification of NMD events for either annotated or de novo assembled transcripts. This tool is based on the assumption of minimal evolution/regulation-an event that leads to the least change is the most likely to occur. Our simulation results indicate that NMD Classifier can correctly identify an average of 99.3% of the NMD-causing transcript structural changes, particularly exon inclusions/exclusions and exon boundary alterations. Researchers can apply NMD Classifier to evolutionary and regulatory studies by comparing NMD events of different biological conditions or in different organisms.

  20. Machinery Bearing Fault Diagnosis Using Variational Mode Decomposition and Support Vector Machine as a Classifier

    Science.gov (United States)

    Rama Krishna, K.; Ramachandran, K. I.

    2018-02-01

    Crack propagation is a major cause of failure in rotating machines. It adversely affects the productivity, safety, and the machining quality. Hence, detecting the crack’s severity accurately is imperative for the predictive maintenance of such machines. Fault diagnosis is an established concept in identifying the faults, for observing the non-linear behaviour of the vibration signals at various operating conditions. In this work, we find the classification efficiencies for both original and the reconstructed vibrational signals. The reconstructed signals are obtained using Variational Mode Decomposition (VMD), by splitting the original signal into three intrinsic mode functional components and framing them accordingly. Feature extraction, feature selection and feature classification are the three phases in obtaining the classification efficiencies. All the statistical features from the original signals and reconstructed signals are found out in feature extraction process individually. A few statistical parameters are selected in feature selection process and are classified using the SVM classifier. The obtained results show the best parameters and appropriate kernel in SVM classifier for detecting the faults in bearings. Hence, we conclude that better results were obtained by VMD and SVM process over normal process using SVM. This is owing to denoising and filtering the raw vibrational signals.

  1. Segment convolutional neural networks (Seg-CNNs) for classifying relations in clinical notes.

    Science.gov (United States)

    Luo, Yuan; Cheng, Yu; Uzuner, Özlem; Szolovits, Peter; Starren, Justin

    2018-01-01

    We propose Segment Convolutional Neural Networks (Seg-CNNs) for classifying relations from clinical notes. Seg-CNNs use only word-embedding features without manual feature engineering. Unlike typical CNN models, relations between 2 concepts are identified by simultaneously learning separate representations for text segments in a sentence: preceding, concept1, middle, concept2, and succeeding. We evaluate Seg-CNN on the i2b2/VA relation classification challenge dataset. We show that Seg-CNN achieves a state-of-the-art micro-average F-measure of 0.742 for overall evaluation, 0.686 for classifying medical problem-treatment relations, 0.820 for medical problem-test relations, and 0.702 for medical problem-medical problem relations. We demonstrate the benefits of learning segment-level representations. We show that medical domain word embeddings help improve relation classification. Seg-CNNs can be trained quickly for the i2b2/VA dataset on a graphics processing unit (GPU) platform. These results support the use of CNNs computed over segments of text for classifying medical relations, as they show state-of-the-art performance while requiring no manual feature engineering. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  2. Aplikasi E-Tour Guide dengan Fitur Pengenalan Image Menggunakan Metode Haar Classifier

    Directory of Open Access Journals (Sweden)

    Derwin Suhartono

    2013-12-01

    Full Text Available Smartphone has became an important instrument in modern society as it is used for entertainment and information searching except for communication. Concerning to this condition, it is needed to develop an application in order to improve smart phone functionality. The objective of this research is to create an application named E-Tour Guide as a tool for helping to plan and manage tourism activity equipped with image recognition feature. Image recognition method used is the Haar Classifier method. The feature is used to recognize historical objects. From the testing result done to 20 images sample, 85% accuracy is achieved for the image recognition feature.

  3. Methods of generalizing and classifying layer structures of a special form

    Energy Technology Data Exchange (ETDEWEB)

    Viktorova, N P

    1981-09-01

    An examination is made of the problem of classifying structures represented by weighted multilayer graphs of special form with connections between the vertices of each layer. The classification of structures of such a form is based on the construction of resolving sets of graphs as a result of generalization of the elements of the training sample of each class and the testing of whether an input object is isomorphic (with allowance for the weights) to the structures of the resolving set or not. 4 references.

  4. Spatio-Spectral Method for Estimating Classified Regions with High Confidence using MODIS Data

    International Nuclear Information System (INIS)

    Katiyal, Anuj; Rajan, Dr K S

    2014-01-01

    In studies like change analysis, the availability of very high resolution (VHR)/high resolution (HR) imagery for a particular period and region is a challenge due to the sensor revisit times and high cost of acquisition. Therefore, most studies prefer lower resolution (LR) sensor imagery with frequent revisit times, in addition to their cost and computational advantages. Further, the classification techniques provide us a global estimate of the class accuracy, which limits its utility if the accuracy is low. In this work, we focus on the sub-classification problem of LR images and estimate regions of higher confidence than the global classification accuracy within its classified region. The spectrally classified data was mined into spatially clustered regions and further refined and processed using statistical measures to arrive at local high confidence regions (LHCRs), for every class. Rabi season MODIS data of January 2006 and 2007 was used for this study and the evaluation of LHCR was done using the APLULC 2005 classified data. For Jan-2007, the global class accuracies for water bodies (WB), forested regions (FR) and Kharif crops and barren lands (KB) were 89%, 71.7% and 71.23% respectively, while the respective LHCRs had accuracies of 96.67%, 89.4% and 80.9% covering an area of 46%, 29% and 14.5% of the initially classified areas. Though areas are reduced, LHCRs with higher accuracies help in extracting more representative class regions. Identification of such regions can facilitate in improving the classification time and processing for HR images when combined with the more frequently acquired LR imagery, isolate pure vs. mixed/impure pixels and as training samples locations for HR imagery

  5. A Machine Learned Classifier That Uses Gene Expression Data to Accurately Predict Estrogen Receptor Status

    Science.gov (United States)

    Bastani, Meysam; Vos, Larissa; Asgarian, Nasimeh; Deschenes, Jean; Graham, Kathryn; Mackey, John; Greiner, Russell

    2013-01-01

    Background Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER) status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. Methods To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. Results This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. Conclusions Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions. PMID:24312637

  6. A machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.

    Directory of Open Access Journals (Sweden)

    Meysam Bastani

    Full Text Available BACKGROUND: Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. METHODS: To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. RESULTS: This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. CONCLUSIONS: Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions.

  7. Use of Lot Quality Assurance Sampling to Ascertain Levels of Drug Resistant Tuberculosis in Western Kenya.

    Directory of Open Access Journals (Sweden)

    Julia Jezmir

    Full Text Available To classify the prevalence of multi-drug resistant tuberculosis (MDR-TB in two different geographic settings in western Kenya using the Lot Quality Assurance Sampling (LQAS methodology.The prevalence of drug resistance was classified among treatment-naïve smear positive TB patients in two settings, one rural and one urban. These regions were classified as having high or low prevalence of MDR-TB according to a static, two-way LQAS sampling plan selected to classify high resistance regions at greater than 5% resistance and low resistance regions at less than 1% resistance.This study classified both the urban and rural settings as having low levels of TB drug resistance. Out of the 105 patients screened in each setting, two patients were diagnosed with MDR-TB in the urban setting and one patient was diagnosed with MDR-TB in the rural setting. An additional 27 patients were diagnosed with a variety of mono- and poly- resistant strains.Further drug resistance surveillance using LQAS may help identify the levels and geographical distribution of drug resistance in Kenya and may have applications in other countries in the African Region facing similar resource constraints.

  8. Use of Lot Quality Assurance Sampling to Ascertain Levels of Drug Resistant Tuberculosis in Western Kenya.

    Science.gov (United States)

    Jezmir, Julia; Cohen, Ted; Zignol, Matteo; Nyakan, Edwin; Hedt-Gauthier, Bethany L; Gardner, Adrian; Kamle, Lydia; Injera, Wilfred; Carter, E Jane

    2016-01-01

    To classify the prevalence of multi-drug resistant tuberculosis (MDR-TB) in two different geographic settings in western Kenya using the Lot Quality Assurance Sampling (LQAS) methodology. The prevalence of drug resistance was classified among treatment-naïve smear positive TB patients in two settings, one rural and one urban. These regions were classified as having high or low prevalence of MDR-TB according to a static, two-way LQAS sampling plan selected to classify high resistance regions at greater than 5% resistance and low resistance regions at less than 1% resistance. This study classified both the urban and rural settings as having low levels of TB drug resistance. Out of the 105 patients screened in each setting, two patients were diagnosed with MDR-TB in the urban setting and one patient was diagnosed with MDR-TB in the rural setting. An additional 27 patients were diagnosed with a variety of mono- and poly- resistant strains. Further drug resistance surveillance using LQAS may help identify the levels and geographical distribution of drug resistance in Kenya and may have applications in other countries in the African Region facing similar resource constraints.

  9. Classification of Focal and Non Focal Epileptic Seizures Using Multi-Features and SVM Classifier.

    Science.gov (United States)

    Sriraam, N; Raghu, S

    2017-09-02

    Identifying epileptogenic zones prior to surgery is an essential and crucial step in treating patients having pharmacoresistant focal epilepsy. Electroencephalogram (EEG) is a significant measurement benchmark to assess patients suffering from epilepsy. This paper investigates the application of multi-features derived from different domains to recognize the focal and non focal epileptic seizures obtained from pharmacoresistant focal epilepsy patients from Bern Barcelona database. From the dataset, five different classification tasks were formed. Total 26 features were extracted from focal and non focal EEG. Significant features were selected using Wilcoxon rank sum test by setting p-value (p z > 1.96) at 95% significance interval. Hypothesis was made that the effect of removing outliers improves the classification accuracy. Turkey's range test was adopted for pruning outliers from feature set. Finally, 21 features were classified using optimized support vector machine (SVM) classifier with 10-fold cross validation. Bayesian optimization technique was adopted to minimize the cross-validation loss. From the simulation results, it was inferred that the highest sensitivity, specificity, and classification accuracy of 94.56%, 89.74%, and 92.15% achieved respectively and found to be better than the state-of-the-art approaches. Further, it was observed that the classification accuracy improved from 80.2% with outliers to 92.15% without outliers. The classifier performance metrics ensures the suitability of the proposed multi-features with optimized SVM classifier. It can be concluded that the proposed approach can be applied for recognition of focal EEG signals to localize epileptogenic zones.

  10. How well Can We Classify SWOT-derived Water Surface Profiles?

    Science.gov (United States)

    Frasson, R. P. M.; Wei, R.; Picamilh, C.; Durand, M. T.

    2015-12-01

    The upcoming Surface Water Ocean Topography (SWOT) mission will detect water bodies and measure water surface elevation throughout the globe. Within its continental high resolution mask, SWOT is expected to deliver measurements of river width, water elevation and slope of rivers wider than ~50 m. The definition of river reaches is an integral step of the computation of discharge based on SWOT's observables. As poorly defined reaches can negatively affect the accuracy of discharge estimations, we seek strategies to break up rivers into physically meaningful sections. In the present work, we investigate how accurately we can classify water surface profiles based on simulated SWOT observations. We assume that most river sections can be classified as either M1 (mild slope, with depth larger than the normal depth), or A1 (adverse slope with depth larger than the critical depth). This assumption allows the classification to be based solely on the second derivative of water surface profiles, with convex profiles being classified as A1 and concave profiles as M1. We consider a HEC-RAS model of the Sacramento River as a representation of the true state of the river. We employ the SWOT instrument simulator to generate a synthetic pass of the river, which includes our best estimates of height measurement noise and geolocation errors. We process the resulting point cloud of water surface heights with the RiverObs package, which delineates the river center line and draws the water surface profile. Next, we identify inflection points in the water surface profile and classify the sections between the inflection points. Finally, we compare our limited classification of simulated SWOT-derived water surface profile to the "exact" classification of the modeled Sacramento River. With this exercise, we expect to determine if SWOT observations can be used to find inflection points in water surface profiles, which would bring knowledge of flow regimes into the definition of river reaches.

  11. Selectivity in Genetic Association with Sub-classified Migraine in Women

    Science.gov (United States)

    Chasman, Daniel I.; Anttila, Verneri; Buring, Julie E.; Ridker, Paul M.; Schürks, Markus; Kurth, Tobias

    2014-01-01

    Migraine can be sub-classified not only according to presence of migraine aura (MA) or absence of migraine aura (MO), but also by additional features accompanying migraine attacks, e.g. photophobia, phonophobia, nausea, etc. all of which are formally recognized by the International Classification of Headache Disorders. It remains unclear how aura status and the other migraine features may be related to underlying migraine pathophysiology. Recent genome-wide association studies (GWAS) have identified 12 independent loci at which single nucleotide polymorphisms (SNPs) are associated with migraine. Using a likelihood framework, we explored the selective association of these SNPs with migraine, sub-classified according to aura status and the other features in a large population-based cohort of women including 3,003 active migraineurs and 18,108 free of migraine. Five loci met stringent significance for association with migraine, among which four were selective for sub-classified migraine, including rs11172113 (LRP1) for MO. The number of loci associated with migraine increased to 11 at suggestive significance thresholds, including five additional selective associations for MO but none for MA. No two SNPs showed similar patterns of selective association with migraine characteristics. At one extreme, SNPs rs6790925 (near TGFBR2) and rs2274316 (MEF2D) were not associated with migraine overall, MA, or MO but were selective for migraine sub-classified by the presence of one or more of the additional migraine features. In contrast, SNP rs7577262 (TRPM8) was associated with migraine overall and showed little or no selectivity for any of the migraine characteristics. The results emphasize the multivalent nature of migraine pathophysiology and suggest that a complete understanding of the genetic influence on migraine may benefit from analyses that stratify migraine according to both aura status and the additional diagnostic features used for clinical characterization of

  12. On the statistical assessment of classifiers using DNA microarray data

    Directory of Open Access Journals (Sweden)

    Carella M

    2006-08-01

    Full Text Available Abstract Background In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22 and tumor (25 specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. Results We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045 as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS and Support Vector Machines (SVM classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035 and e = 18% (p = 0.037 respectively. Moreover, the error rate

  13. Can scientific journals be classified based on their citation profiles?

    Directory of Open Access Journals (Sweden)

    Sayed-Amir Marashi

    2015-03-01

    Full Text Available Classification of scientific publications is of great importance in biomedical research evaluation. However, accurate classification of research publications is challenging and normally is performed in a rather subjective way. In the present paper, we propose to classify biomedical publications into superfamilies, by analysing their citation profiles, i.e. the location of citations in the structure of citing articles. Such a classification may help authors to find the appropriate biomedical journal for publication, may make journal comparisons more rational, and may even help planners to better track the consequences of their policies on biomedical research.

  14. Classifying the future of universes with dark energy

    International Nuclear Information System (INIS)

    Chiba, Takeshi; Takahashi, Ryuichi; Sugiyama, Naoshi

    2005-01-01

    We classify the future of the universe for general cosmological models including matter and dark energy. If the equation of state of dark energy is less then -1, the age of the universe becomes finite. We compute the rest of the age of the universe for such universe models. The behaviour of the future growth of matter density perturbation is also studied. We find that the collapse of the spherical overdensity region is greatly changed if the equation of state of dark energy is less than -1

  15. DFRFT: A Classified Review of Recent Methods with Its Application

    Directory of Open Access Journals (Sweden)

    Ashutosh Kumar Singh

    2013-01-01

    Full Text Available In the literature, there are various algorithms available for computing the discrete fractional Fourier transform (DFRFT. In this paper, all the existing methods are reviewed, classified into four categories, and subsequently compared to find out the best alternative from the view point of minimal computational error, computational complexity, transform features, and additional features like security. Subsequently, the correlation theorem of FRFT has been utilized to remove significantly the Doppler shift caused due to motion of receiver in the DSB-SC AM signal. Finally, the role of DFRFT has been investigated in the area of steganography.

  16. Application of a naive Bayesians classifiers in assessing the supplier

    Directory of Open Access Journals (Sweden)

    Mijailović Snežana

    2017-01-01

    Full Text Available The paper considers the class of interactive knowledge based systems whose main purpose of making proposals and assisting customers in making decisions. The mathematical model provides a set of examples of learning about the delivered series of outflows from three suppliers, as well as an analysis of an illustrative example for assessing the supplier using a naive Bayesian classifier. The model was developed on the basis of the analysis of subjective probabilities, which are later revised with the help of new empirical information and Bayesian theorem on a posterior probability, i.e. combining of subjective and objective conditional probabilities in the choice of a reliable supplier.

  17. Nonlinear Knowledge in Kernel-Based Multiple Criteria Programming Classifier

    Science.gov (United States)

    Zhang, Dongling; Tian, Yingjie; Shi, Yong

    Kernel-based Multiple Criteria Linear Programming (KMCLP) model is used as classification methods, which can learn from training examples. Whereas, in traditional machine learning area, data sets are classified only by prior knowledge. Some works combine the above two classification principle to overcome the defaults of each approach. In this paper, we propose a model to incorporate the nonlinear knowledge into KMCLP in order to solve the problem when input consists of not only training example, but also nonlinear prior knowledge. In dealing with real world case breast cancer diagnosis, the model shows its better performance than the model solely based on training data.

  18. On-line computing in a classified environment

    International Nuclear Information System (INIS)

    O'Callaghan, P.B.

    1982-01-01

    Westinghouse Hanford Company (WHC) recently developed a Department of Energy (DOE) approved real-time, on-line computer system to control nuclear material. The system simultaneously processes both classified and unclassified information. Implementation of this system required application of many security techniques. The system has a secure, but user friendly interface. Many software applications protect the integrity of the data base from malevolent or accidental errors. Programming practices ensure the integrity of the computer system software. The audit trail and the reports generation capability record user actions and status of the nuclear material inventory

  19. A Handbook for Derivative Classifiers at Los Alamos National Laboratory

    Energy Technology Data Exchange (ETDEWEB)

    Sinkula, Barbara Jean [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2018-02-23

    The Los Alamos Classification Office (within the SAFE-IP group) prepared this handbook as a resource for the Laboratory’s derivative classifiers (DCs). It contains information about United States Government (USG) classification policy, principles, and authorities as they relate to the LANL Classification Program in general, and to the LANL DC program specifically. At a working level, DCs review Laboratory documents and material that are subject to classification review requirements, while the Classification Office provides the training and resources for DCs to perform that vital function.

  20. Classifying BCI signals from novice users with extreme learning machine

    Directory of Open Access Journals (Sweden)

    Rodríguez-Bermúdez Germán

    2017-07-01

    Full Text Available Brain computer interface (BCI allows to control external devices only with the electrical activity of the brain. In order to improve the system, several approaches have been proposed. However it is usual to test algorithms with standard BCI signals from experts users or from repositories available on Internet. In this work, extreme learning machine (ELM has been tested with signals from 5 novel users to compare with standard classification algorithms. Experimental results show that ELM is a suitable method to classify electroencephalogram signals from novice users.