WorldWideScience

Sample records for progression-related gene classifier

  1. A Gene Expression Classifier of Node-Positive Colorectal Cancer

    Directory of Open Access Journals (Sweden)

    Paul F. Meeh

    2009-10-01

    Full Text Available We used digital long serial analysis of gene expression to discover gene expression differences between node-negative and node-positive colorectal tumors and developed a multigene classifier able to discriminate between these two tumor types. We prepared and sequenced long serial analysis of gene expression libraries from one node-negative and one node-positive colorectal tumor, sequenced to a depth of 26,060 unique tags, and identified 262 tags significantly differentially expressed between these two tumors (P < 2 x 10-6. We confirmed the tag-to-gene assignments and differential expression of 31 genes by quantitative real-time polymerase chain reaction, 12 of which were elevated in the node-positive tumor. We analyzed the expression levels of these 12 upregulated genes in a validation panel of 23 additional tumors and developed an optimized seven-gene logistic regression classifier. The classifier discriminated between node-negative and node-positive tumors with 86% sensitivity and 80% specificity. Receiver operating characteristic analysis of the classifier revealed an area under the curve of 0.86. Experimental manipulation of the function of one classification gene, Fibronectin, caused profound effects on invasion and migration of colorectal cancer cells in vitro. These results suggest that the development of node-positive colorectal cancer occurs in part through elevated epithelial FN1 expression and suggest novel strategies for the diagnosis and treatment of advanced disease.

  2. Localizing genes to cerebellar layers by classifying ISH images.

    Directory of Open Access Journals (Sweden)

    Lior Kirsch

    Full Text Available Gene expression controls how the brain develops and functions. Understanding control processes in the brain is particularly hard since they involve numerous types of neurons and glia, and very little is known about which genes are expressed in which cells and brain layers. Here we describe an approach to detect genes whose expression is primarily localized to a specific brain layer and apply it to the mouse cerebellum. We learn typical spatial patterns of expression from a few markers that are known to be localized to specific layers, and use these patterns to predict localization for new genes. We analyze images of in-situ hybridization (ISH experiments, which we represent using histograms of local binary patterns (LBP and train image classifiers and gene classifiers for four layers of the cerebellum: the Purkinje, granular, molecular and white matter layer. On held-out data, the layer classifiers achieve accuracy above 94% (AUC by representing each image at multiple scales and by combining multiple image scores into a single gene-level decision. When applied to the full mouse genome, the classifiers predict specific layer localization for hundreds of new genes in the Purkinje and granular layers. Many genes localized to the Purkinje layer are likely to be expressed in astrocytes, and many others are involved in lipid metabolism, possibly due to the unusual size of Purkinje cells.

  3. Gene-expression Classifier in Papillary Thyroid Carcinoma

    DEFF Research Database (Denmark)

    Londero, Stefano Christian; Jespersen, Marie Louise; Krogdahl, Annelise

    2016-01-01

    BACKGROUND: No reliable biomarker for metastatic potential in the risk stratification of papillary thyroid carcinoma exists. We aimed to develop a gene-expression classifier for metastatic potential. MATERIALS AND METHODS: Genome-wide expression analyses were used. Development cohort: freshly...

  4. Classifying genes to the correct Gene Ontology Slim term in Saccharomyces cerevisiae using neighbouring genes with classification learning

    Directory of Open Access Journals (Sweden)

    Tsatsoulis Costas

    2010-05-01

    Full Text Available Abstract Background There is increasing evidence that gene location and surrounding genes influence the functionality of genes in the eukaryotic genome. Knowing the Gene Ontology Slim terms associated with a gene gives us insight into a gene's functionality by informing us how its gene product behaves in a cellular context using three different ontologies: molecular function, biological process, and cellular component. In this study, we analyzed if we could classify a gene in Saccharomyces cerevisiae to its correct Gene Ontology Slim term using information about its location in the genome and information from its nearest-neighbouring genes using classification learning. Results We performed experiments to establish that the MultiBoostAB algorithm using the J48 classifier could correctly classify Gene Ontology Slim terms of a gene given information regarding the gene's location and information from its nearest-neighbouring genes for training. Different neighbourhood sizes were examined to determine how many nearest neighbours should be included around each gene to provide better classification rules. Our results show that by just incorporating neighbour information from each gene's two-nearest neighbours, the percentage of correctly classified genes to their correct Gene Ontology Slim term for each ontology reaches over 80% with high accuracy (reflected in F-measures over 0.80 of the classification rules produced. Conclusions We confirmed that in classifying genes to their correct Gene Ontology Slim term, the inclusion of neighbour information from those genes is beneficial. Knowing the location of a gene and the Gene Ontology Slim information from neighbouring genes gives us insight into that gene's functionality. This benefit is seen by just including information from a gene's two-nearest neighbouring genes.

  5. Tumor Microenvironment Gene Signature as a Prognostic Classifier and Therapeutic Target

    Science.gov (United States)

    2016-06-01

    AWARD NUMBER: W81XWH-14-1-0107 TITLE: Tumor Microenvironment Gene Signature as a Prognostic Classifier and Therapeutic Target PRINCIPAL...AND SUBTITLE Tumor Microenvironment Gene Signature as a 5a. CONTRACT NUMBER W81XWH-14-1-0107 Prognostic Classifier and Therapeutic Target 5b...gene signature that correlates with poor survival in ovarian cancer patients. We are refining this gene signature to develop biomarkers for the

  6. Glycosyltransferase Gene Expression Profiles Classify Cancer Types and Propose Prognostic Subtypes

    Science.gov (United States)

    Ashkani, Jahanshah; Naidoo, Kevin J.

    2016-05-01

    Aberrant glycosylation in tumours stem from altered glycosyltransferase (GT) gene expression but can the expression profiles of these signature genes be used to classify cancer types and lead to cancer subtype discovery? The differential structural changes to cellular glycan structures are predominantly regulated by the expression patterns of GT genes and are a hallmark of neoplastic cell metamorphoses. We found that the expression of 210 GT genes taken from 1893 cancer patient samples in The Cancer Genome Atlas (TCGA) microarray data are able to classify six cancers; breast, ovarian, glioblastoma, kidney, colon and lung. The GT gene expression profiles are used to develop cancer classifiers and propose subtypes. The subclassification of breast cancer solid tumour samples illustrates the discovery of subgroups from GT genes that match well against basal-like and HER2-enriched subtypes and correlates to clinical, mutation and survival data. This cancer type glycosyltransferase gene signature finding provides foundational evidence for the centrality of glycosylation in cancer.

  7. Identification and optimization of classifier genes from multi-class earthworm microarray dataset.

    Directory of Open Access Journals (Sweden)

    Ying Li

    Full Text Available Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. A variety of toxicological effects have been associated with explosive compounds TNT and RDX. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. We have developed an earthworm microarray containing 15,208 unique oligo probes and have used it to profile gene expression in 248 earthworms exposed to TNT, RDX or neither. We assembled a new machine learning pipeline consisting of several well-established feature filtering/selection and classification techniques to analyze the 248-array dataset in order to construct classifier models that can separate earthworm samples into three groups: control, TNT-treated, and RDX-treated. First, a total of 869 genes differentially expressed in response to TNT or RDX exposure were identified using a univariate statistical algorithm of class comparison. Then, decision tree-based algorithms were applied to select a subset of 354 classifier genes, which were ranked by their overall weight of significance. A multiclass support vector machine (MC-SVM method and an unsupervised K-mean clustering method were applied to independently refine the classifier, producing a smaller subset of 39 and 30 classifier genes, separately, with 11 common genes being potential biomarkers. The combined 58 genes were considered the refined subset and used to build MC-SVM and clustering models with classification accuracy of 83.5% and 56.9%, respectively. This study demonstrates that the machine learning approach can be used to identify and optimize a small subset of classifier/biomarker genes from high dimensional datasets and generate classification models of acceptable precision for multiple classes.

  8. Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.

    Science.gov (United States)

    Tuo, Youlin; An, Ning; Zhang, Ming

    2018-03-01

    The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of PSVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several

  9. Assessment of a 44 gene classifier for the evaluation of chronic fatigue syndrome from peripheral blood mononuclear cell gene expression.

    Directory of Open Access Journals (Sweden)

    Daniel Frampton

    Full Text Available Chronic fatigue syndrome (CFS is a clinically defined illness estimated to affect millions of people worldwide causing significant morbidity and an annual cost of billions of dollars. Currently there are no laboratory-based diagnostic methods for CFS. However, differences in gene expression profiles between CFS patients and healthy persons have been reported in the literature. Using mRNA relative quantities for 44 previously identified reporter genes taken from a large dataset comprising both CFS patients and healthy volunteers, we derived a gene profile scoring metric to accurately classify CFS and healthy samples. This metric out-performed any of the reporter genes used individually as a classifier of CFS.To determine whether the reporter genes were robust across populations, we applied this metric to classify a separate blind dataset of mRNA relative quantities from a new population of CFS patients and healthy persons with limited success. Although the metric was able to successfully classify roughly two-thirds of both CFS and healthy samples correctly, the level of misclassification was high. We conclude many of the previously identified reporter genes are study-specific and thus cannot be used as a broad CFS diagnostic.

  10. A novel algorithm for simplification of complex gene classifiers in cancer

    Science.gov (United States)

    Wilson, Raphael A.; Teng, Ling; Bachmeyer, Karen M.; Bissonnette, Mei Lin Z.; Husain, Aliya N.; Parham, David M.; Triche, Timothy J.; Wing, Michele R.; Gastier-Foster, Julie M.; Barr, Frederic G.; Hawkins, Douglas S.; Anderson, James R.; Skapek, Stephen X.; Volchenboum, Samuel L.

    2013-01-01

    The clinical application of complex molecular classifiers as diagnostic or prognostic tools has been limited by the time and cost needed to apply them to patients. Using an existing fifty-gene expression signature known to separate two molecular subtypes of the pediatric cancer rhabdomyosarcoma, we show that an exhaustive iterative search algorithm can distill this complex classifier down to two or three features with equal discrimination. We validated the two-gene signatures using three separate and distinct data sets, including one that uses degraded RNA extracted from formalin-fixed, paraffin-embedded material. Finally, to demonstrate the generalizability of our algorithm, we applied it to a lung cancer data set to find minimal gene signatures that can distinguish survival. Our approach can easily be generalized and coupled to existing technical platforms to facilitate the discovery of simplified signatures that are ready for routine clinical use. PMID:23913937

  11. Building gene expression profile classifiers with a simple and efficient rejection option in R.

    Science.gov (United States)

    Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco; Savino, Alessandro; Hafeezurrehman, Hafeez

    2011-01-01

    The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional

  12. A new approach to enhance the performance of decision tree for classifying gene expression data.

    Science.gov (United States)

    Hassan, Md; Kotagiri, Ramamohanarao

    2013-12-20

    Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset. By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature. We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.

  13. GeneBins: a database for classifying gene expression data, with application to plant genome arrays

    Directory of Open Access Journals (Sweden)

    Weiller Georg

    2007-03-01

    Full Text Available Abstract Background To interpret microarray experiments, several ontological analysis tools have been developed. However, current tools are limited to specific organisms. Results We developed a bioinformatics system to assign the probe set sequences of any organism to a hierarchical functional classification modelled on KEGG ontology. The GeneBins database currently supports the functional classification of expression data from four Affymetrix arrays; Arabidopsis thaliana, Oryza sativa, Glycine max and Medicago truncatula. An online analysis tool to identify relevant functions is also provided. Conclusion GeneBins provides resources to interpret gene expression results from microarray experiments. It is available at http://bioinfoserver.rsbs.anu.edu.au/utils/GeneBins/

  14. Discovering time-lagged rules from microarray data using gene profile classifiers

    Directory of Open Access Journals (Sweden)

    Ponzoni Ignacio

    2011-04-01

    Full Text Available Abstract Background Gene regulatory networks have an essential role in every process of life. In this regard, the amount of genome-wide time series data is becoming increasingly available, providing the opportunity to discover the time-delayed gene regulatory networks that govern the majority of these molecular processes. Results This paper aims at reconstructing gene regulatory networks from multiple genome-wide microarray time series datasets. In this sense, a new model-free algorithm called GRNCOP2 (Gene Regulatory Network inference by Combinatorial OPtimization 2, which is a significant evolution of the GRNCOP algorithm, was developed using combinatorial optimization of gene profile classifiers. The method is capable of inferring potential time-delay relationships with any span of time between genes from various time series datasets given as input. The proposed algorithm was applied to time series data composed of twenty yeast genes that are highly relevant for the cell-cycle study, and the results were compared against several related approaches. The outcomes have shown that GRNCOP2 outperforms the contrasted methods in terms of the proposed metrics, and that the results are consistent with previous biological knowledge. Additionally, a genome-wide study on multiple publicly available time series data was performed. In this case, the experimentation has exhibited the soundness and scalability of the new method which inferred highly-related statistically-significant gene associations. Conclusions A novel method for inferring time-delayed gene regulatory networks from genome-wide time series datasets is proposed in this paper. The method was carefully validated with several publicly available data sets. The results have demonstrated that the algorithm constitutes a usable model-free approach capable of predicting meaningful relationships between genes, revealing the time-trends of gene regulation.

  15. A Machine Learned Classifier That Uses Gene Expression Data to Accurately Predict Estrogen Receptor Status

    Science.gov (United States)

    Bastani, Meysam; Vos, Larissa; Asgarian, Nasimeh; Deschenes, Jean; Graham, Kathryn; Mackey, John; Greiner, Russell

    2013-01-01

    Background Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER) status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. Methods To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. Results This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. Conclusions Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions. PMID:24312637

  16. A machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.

    Directory of Open Access Journals (Sweden)

    Meysam Bastani

    Full Text Available BACKGROUND: Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. METHODS: To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. RESULTS: This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. CONCLUSIONS: Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions.

  17. Clustering based gene expression feature selection method: A computational approach to enrich the classifier efficiency of differentially expressed genes

    KAUST Repository

    Abusamra, Heba

    2016-07-20

    The native nature of high dimension low sample size of gene expression data make the classification task more challenging. Therefore, feature (gene) selection become an apparent need. Selecting a meaningful and relevant genes for classifier not only decrease the computational time and cost, but also improve the classification performance. Among different approaches of feature selection methods, however most of them suffer from several problems such as lack of robustness, validation issues etc. Here, we present a new feature selection technique that takes advantage of clustering both samples and genes. Materials and methods We used leukemia gene expression dataset [1]. The effectiveness of the selected features were evaluated by four different classification methods; support vector machines, k-nearest neighbor, random forest, and linear discriminate analysis. The method evaluate the importance and relevance of each gene cluster by summing the expression level for each gene belongs to this cluster. The gene cluster consider important, if it satisfies conditions depend on thresholds and percentage otherwise eliminated. Results Initial analysis identified 7120 differentially expressed genes of leukemia (Fig. 15a), after applying our feature selection methodology we end up with specific 1117 genes discriminating two classes of leukemia (Fig. 15b). Further applying the same method with more stringent higher positive and lower negative threshold condition, number reduced to 58 genes have be tested to evaluate the effectiveness of the method (Fig. 15c). The results of the four classification methods are summarized in Table 11. Conclusions The feature selection method gave good results with minimum classification error. Our heat-map result shows distinct pattern of refines genes discriminating between two classes of leukemia.

  18. Gene expression-based classifiers identify Staphylococcus aureus infection in mice and humans.

    Directory of Open Access Journals (Sweden)

    Sun Hee Ahn

    Full Text Available Staphylococcus aureus causes a spectrum of human infection. Diagnostic delays and uncertainty lead to treatment delays and inappropriate antibiotic use. A growing literature suggests the host's inflammatory response to the pathogen represents a potential tool to improve upon current diagnostics. The hypothesis of this study is that the host responds differently to S. aureus than to E. coli infection in a quantifiable way, providing a new diagnostic avenue. This study uses Bayesian sparse factor modeling and penalized binary regression to define peripheral blood gene-expression classifiers of murine and human S. aureus infection. The murine-derived classifier distinguished S. aureus infection from healthy controls and Escherichia coli-infected mice across a range of conditions (mouse and bacterial strain, time post infection and was validated in outbred mice (AUC>0.97. A S. aureus classifier derived from a cohort of 94 human subjects distinguished S. aureus blood stream infection (BSI from healthy subjects (AUC 0.99 and E. coli BSI (AUC 0.84. Murine and human responses to S. aureus infection share common biological pathways, allowing the murine model to classify S. aureus BSI in humans (AUC 0.84. Both murine and human S. aureus classifiers were validated in an independent human cohort (AUC 0.95 and 0.92, respectively. The approach described here lends insight into the conserved and disparate pathways utilized by mice and humans in response to these infections. Furthermore, this study advances our understanding of S. aureus infection; the host response to it; and identifies new diagnostic and therapeutic avenues.

  19. Regularization strategies for hyperplane classifiers: application to cancer classification with gene expression data.

    Science.gov (United States)

    Andries, Erik; Hagstrom, Thomas; Atlas, Susan R; Willman, Cheryl

    2007-02-01

    Linear discrimination, from the point of view of numerical linear algebra, can be treated as solving an ill-posed system of linear equations. In order to generate a solution that is robust in the presence of noise, these problems require regularization. Here, we examine the ill-posedness involved in the linear discrimination of cancer gene expression data with respect to outcome and tumor subclasses. We show that a filter factor representation, based upon Singular Value Decomposition, yields insight into the numerical ill-posedness of the hyperplane-based separation when applied to gene expression data. We also show that this representation yields useful diagnostic tools for guiding the selection of classifier parameters, thus leading to improved performance.

  20. A lung cancer risk classifier comprising genome maintenance genes measured in normal bronchial epithelial cells.

    Science.gov (United States)

    Yeo, Jiyoun; Crawford, Erin L; Zhang, Xiaolu; Khuder, Sadik; Chen, Tian; Levin, Albert; Blomquist, Thomas M; Willey, James C

    2017-05-02

    Annual low dose CT (LDCT) screening of individuals at high demographic risk reduces lung cancer mortality by more than 20%. However, subjects selected for screening based on demographic criteria typically have less than a 10% lifetime risk for lung cancer. Thus, there is need for a biomarker that better stratifies subjects for LDCT screening. Toward this goal, we previously reported a lung cancer risk test (LCRT) biomarker comprising 14 genome-maintenance (GM) pathway genes measured in normal bronchial epithelial cells (NBEC) that accurately classified cancer (CA) from non-cancer (NC) subjects. The primary goal of the studies reported here was to optimize the LCRT biomarker for high specificity and ease of clinical implementation. Targeted competitive multiplex PCR amplicon libraries were prepared for next generation sequencing (NGS) analysis of transcript abundance at 68 sites among 33 GM target genes in NBEC specimens collected from a retrospective cohort of 120 subjects, including 61 CA cases and 59 NC controls. Genes were selected for analysis based on contribution to the previously reported LCRT biomarker and/or prior evidence for association with lung cancer risk. Linear discriminant analysis was used to identify the most accurate classifier suitable to stratify subjects for screening. After cross-validation, a model comprising expression values from 12 genes (CDKN1A, E2F1, ERCC1, ERCC4, ERCC5, GPX1, GSTP1, KEAP1, RB1, TP53, TP63, and XRCC1) and demographic factors age, gender, and pack-years smoking, had Receiver Operator Characteristic area under the curve (ROC AUC) of 0.975 (95% CI: 0.96-0.99). The overall classification accuracy was 93% (95% CI 88%-98%) with sensitivity 93.1%, specificity 92.9%, positive predictive value 93.1% and negative predictive value 93%. The ROC AUC for this classifier was significantly better (p < 0.0001) than the best model comprising demographic features alone. The LCRT biomarker reported here displayed high accuracy and ease

  1. Validation of the 18-gene classifier as a prognostic biomarker of distant metastasis in breast cancer.

    Directory of Open Access Journals (Sweden)

    Skye Hung-Chun Cheng

    Full Text Available We validated an 18-gene classifier (GC initially developed to predict local/regional recurrence after mastectomy in estimating distant metastasis risk. The 18-gene scoring algorithm defines scores as: <21, low risk; ≥21, high risk. Six hundred eighty-three patients with primary operable breast cancer and fresh frozen tumor tissues available were included. The primary outcome was the 5-year probability of freedom from distant metastasis (DMFP. Two external datasets were used to test the predictive accuracy of 18-GC. The 5-year rates of DMFP for patients classified as low-risk (n = 146, 21.7% and high-risk (n = 537, 78.6% were 96.2% (95% CI, 91.1%-98.8% and 80.9% (74.6%-81.9%, respectively (median follow-up interval, 71.8 months. The 5-year rates of DMFP of the low-risk group in stage I (n = 62, 35.6%, stage II (n = 66, 20.1%, and stage III (n = 18, 10.3% were 100%, 94.2% (78.5%-98.5%, and 90.9% (50.8%-98.7%, respectively. Multivariate analysis revealed that 18-GC is an independent prognostic factor of distant metastasis (adjusted hazard ratio, 5.1; 95% CI, 1.8-14.1; p = 0.0017 for scores of ≥21. External validation showed that the 5-year rate of DMFP in the low- and high-risk patients was 94.1% (82.9%-100% and 80.3% (70.7%-89.9%, p = 0.06 in a Singapore dataset, and 89.5% (81.9%-94.1% and 73.6% (67.2%-79.0%, p = 0.0039 in the GEO-GSE20685 dataset, respectively. In conclusion, 18-GC is a viable prognostic biomarker for breast cancer to estimate distant metastasis risk.

  2. Gene-expression patterns in peripheral blood classify familial breast cancer susceptibility.

    Science.gov (United States)

    Piccolo, Stephen R; Andrulis, Irene L; Cohen, Adam L; Conner, Thomas; Moos, Philip J; Spira, Avrum E; Buys, Saundra S; Johnson, W Evan; Bild, Andrea H

    2015-11-04

    Women with a family history of breast cancer face considerable uncertainty about whether to pursue standard screening, intensive screening, or prophylactic surgery. Accurate and individualized risk-estimation approaches may help these women make more informed decisions. Although highly penetrant genetic variants have been associated with familial breast cancer (FBC) risk, many individuals do not carry these variants, and many carriers never develop breast cancer. Common risk variants have a relatively modest effect on risk and show limited potential for predicting FBC development. As an alternative, we hypothesized that additional genomic data types, such as gene-expression levels, which can reflect genetic and epigenetic variation, could contribute to classifying a person's risk status. Specifically, we aimed to identify common patterns in gene-expression levels across individuals who develop FBC. We profiled peripheral blood mononuclear cells from women with a family history of breast cancer (with or without a germline BRCA1/2 variant) and from controls. We used the support vector machines algorithm to differentiate between patients who developed FBC and those who did not. Our study used two independent datasets, a training set of 124 women from Utah (USA) and an external validation (test) set from Ontario (Canada) of 73 women (197 total). We controlled for expression variation associated with clinical, demographic, and treatment variables as well as lymphocyte markers. Our multigene biomarker provided accurate, individual-level estimates of FBC occurrence for the Utah cohort (AUC = 0.76 [0.67-84]) . Even at their lower confidence bounds, these accuracy estimates meet or exceed estimates from alternative approaches. Our Ontario cohort resulted in similarly high levels of accuracy (AUC = 0.73 [0.59-0.86]), thus providing external validation of our findings. Individuals deemed to have "high" risk by our model would have an estimated 2.4 times greater odds of

  3. SVM classifier to predict genes important for self-renewal and pluripotency of mouse embryonic stem cells

    Directory of Open Access Journals (Sweden)

    Xu Huilei

    2010-12-01

    Full Text Available Abstract Background Mouse embryonic stem cells (mESCs are derived from the inner cell mass of a developing blastocyst and can be cultured indefinitely in-vitro. Their distinct features are their ability to self-renew and to differentiate to all adult cell types. Genes that maintain mESCs self-renewal and pluripotency identity are of interest to stem cell biologists. Although significant steps have been made toward the identification and characterization of such genes, the list is still incomplete and controversial. For example, the overlap among candidate self-renewal and pluripotency genes across different RNAi screens is surprisingly small. Meanwhile, machine learning approaches have been used to analyze multi-dimensional experimental data and integrate results from many studies, yet they have not been applied to specifically tackle the task of predicting and classifying self-renewal and pluripotency gene membership. Results For this study we developed a classifier, a supervised machine learning framework for predicting self-renewal and pluripotency mESCs stemness membership genes (MSMG using support vector machines (SVM. The data used to train the classifier was derived from mESCs-related studies using mRNA microarrays, measuring gene expression in various stages of early differentiation, as well as ChIP-seq studies applied to mESCs profiling genome-wide binding of key transcription factors, such as Nanog, Oct4, and Sox2, to the regulatory regions of other genes. Comparison to other classification methods using the leave-one-out cross-validation method was employed to evaluate the accuracy and generality of the classification. Finally, two sets of candidate genes from genome-wide RNA interference screens are used to test the generality and potential application of the classifier. Conclusions Our results reveal that an SVM approach can be useful for prioritizing genes for functional validation experiments and complement the analyses of high

  4. Classifying chemical mode of action using gene networks and machine learning: a case study with the herbicide linuron.

    Science.gov (United States)

    Ornostay, Anna; Cowie, Andrew M; Hindle, Matthew; Baker, Christopher J O; Martyniuk, Christopher J

    2013-12-01

    The herbicide linuron (LIN) is an endocrine disruptor with an anti-androgenic mode of action. The objectives of this study were to (1) improve knowledge of androgen and anti-androgen signaling in the teleostean ovary and to (2) assess the ability of gene networks and machine learning to classify LIN as an anti-androgen using transcriptomic data. Ovarian explants from vitellogenic fathead minnows (FHMs) were exposed to three concentrations of either 5α-dihydrotestosterone (DHT), flutamide (FLUT), or LIN for 12h. Ovaries exposed to DHT showed a significant increase in 17β-estradiol (E2) production while FLUT and LIN had no effect on E2. To improve understanding of androgen receptor signaling in the ovary, a reciprocal gene expression network was constructed for DHT and FLUT using pathway analysis and these data suggested that steroid metabolism, translation, and DNA replication are processes regulated through AR signaling in the ovary. Sub-network enrichment analysis revealed that FLUT and LIN shared more regulated gene networks in common compared to DHT. Using transcriptomic datasets from different fish species, machine learning algorithms classified LIN successfully with other anti-androgens. This study advances knowledge regarding molecular signaling cascades in the ovary that are responsive to androgens and anti-androgens and provides proof of concept that gene network analysis and machine learning can classify priority chemicals using experimental transcriptomic data collected from different fish species. © 2013.

  5. Heterogeneity wavelet kinetics from DCE-MRI for classifying gene expression based breast cancer recurrence risk.

    Science.gov (United States)

    Mahrooghy, Majid; Ashraf, Ahmed B; Daye, Dania; Mies, Carolyn; Feldman, Michael; Rosen, Mark; Kontos, Despina

    2013-01-01

    Breast tumors are heterogeneous lesions. Intra-tumor heterogeneity presents a major challenge for cancer diagnosis and treatment. Few studies have worked on capturing tumor heterogeneity from imaging. Most studies to date consider aggregate measures for tumor characterization. In this work we capture tumor heterogeneity by partitioning tumor pixels into subregions and extracting heterogeneity wavelet kinetic (HetWave) features from breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to obtain the spatiotemporal patterns of the wavelet coefficients and contrast agent uptake from each partition. Using a genetic algorithm for feature selection, and a logistic regression classifier with leave one-out cross validation, we tested our proposed HetWave features for the task of classifying breast cancer recurrence risk. The classifier based on our features gave an ROC AUC of 0.78, outperforming previously proposed kinetic, texture, and spatial enhancement variance features which give AUCs of 0.69, 0.64, and 0.65, respectively.

  6. ConSpeciFix: Classifying prokaryotic species based on gene flow.

    Science.gov (United States)

    Bobay, Louis-Marie; Ellis, Brian Shin-Hua; Ochman, Howard

    2018-05-16

    Classification of prokaryotic species is usually based on sequence similarity thresholds, which are easy to apply but lack a biologically-relevant foundation. Here, we present ConSpeciFix, a program that classifies prokaryotes into species using criteria set forth by the Biological Species Concept, thereby unifying species definition in all domains of life. ConSpeciFix's webserver is freely available at www.conspecifix.com. The local version of the program can be freely downloaded from https://github.com/Bobay-Ochman/ConSpeciFix. ConSpeciFix is written in Python 2.7 and requires the following dependencies: Usearch, MCL, MAFFT and RAxML. ljbobay@uncg.edu.

  7. Construction of a novel multi-gene assay (42-gene classifier) for prediction of late recurrence in ER-positive breast cancer patients.

    Science.gov (United States)

    Tsunashima, Ryo; Naoi, Yasuto; Shimazu, Kenzo; Kagara, Naofumi; Shimoda, Masashi; Tanei, Tomonori; Miyake, Tomohiro; Kim, Seung Jin; Noguchi, Shinzaburo

    2018-05-04

    Prediction models for late (> 5 years) recurrence in ER-positive breast cancer need to be developed for the accurate selection of patients for extended hormonal therapy. We attempted to develop such a prediction model focusing on the differences in gene expression between breast cancers with early and late recurrence. For the training set, 779 ER-positive breast cancers treated with tamoxifen alone for 5 years were selected from the databases (GSE6532, GSE12093, GSE17705, and GSE26971). For the validation set, 221 ER-positive breast cancers treated with adjuvant hormonal therapy for 5 years with or without chemotherapy at our hospital were included. Gene expression was assayed by DNA microarray analysis (Affymetrix U133 plus 2.0). With the 42 genes differentially expressed in early and late recurrence breast cancers in the training set, a prediction model (42GC) for late recurrence was constructed. The patients classified by 42GC into the late recurrence-like group showed a significantly (P = 0.006) higher late recurrence rate as expected but a significantly (P = 1.62 × E-13) lower rate for early recurrence than non-late recurrence-like group. These observations were confirmed for the validation set, i.e., P = 0.020 for late recurrence and P = 5.70 × E-5 for early recurrence. We developed a unique prediction model (42GC) for late recurrence by focusing on the biological differences between breast cancers with early and late recurrence. Interestingly, patients in the late recurrence-like group by 42GC were at low risk for early recurrence.

  8. Classifying Microorganisms

    DEFF Research Database (Denmark)

    Sommerlund, Julie

    2006-01-01

    This paper describes the coexistence of two systems for classifying organisms and species: a dominant genetic system and an older naturalist system. The former classifies species and traces their evolution on the basis of genetic characteristics, while the latter employs physiological characteris......This paper describes the coexistence of two systems for classifying organisms and species: a dominant genetic system and an older naturalist system. The former classifies species and traces their evolution on the basis of genetic characteristics, while the latter employs physiological...... characteristics. The coexistence of the classification systems does not lead to a conflict between them. Rather, the systems seem to co-exist in different configurations, through which they are complementary, contradictory and inclusive in different situations-sometimes simultaneously. The systems come...

  9. Integrative Analysis of DCE-MRI and Gene Expression Profiles in Construction of a Gene Classifier for Assessment of Hypoxia-Related Risk of Chemoradiotherapy Failure in Cervical Cancer

    DEFF Research Database (Denmark)

    Fjeldbo, Christina S; Julin, Cathinka H; Lando, Malin

    2016-01-01

    platforms. The prognostic value was independent of existing clinical markers, regardless of clinical endpoints. CONCLUSIONS: A robust DCE-MRI-associated gene classifier has been constructed that may be used to achieve an early indication of patients' risk of hypoxia-related chemoradiotherapy failure.......PURPOSE: A 31-gene expression signature reflected in dynamic contrast enhanced (DCE)-MR images and correlated with hypoxia-related aggressiveness in cervical cancer was identified in previous work. We here aimed to construct a dichotomous classifier with key signature genes and a predefined...... as an indicator of hypoxia. RESULTS: Classifier candidates were constructed by integrative analysis of ABrix and gene expression profiles in the training cohort and evaluated by a leave-one-out cross-validation approach. On the basis of their ability to separate patients correctly according to hypoxia status, a 6...

  10. Thyroid nodules with indeterminate cytology: molecular imaging with 99mTc-methoxyisobutylisonitrile (MIBI) is more cost-effective than the Afirma registered gene expression classifier

    International Nuclear Information System (INIS)

    Heinzel, Alexander; Mueller, Dirk; Behrendt, Florian F.; Giovanella, Luca; Mottaghy, Felix M.; Verburg, Frederik A.

    2014-01-01

    To compare the cost-effectiveness of 99m Tc-methoxyisobutylisonitrile (MIBI) thyroid scintigraphy and the Afirma registered gene expression classifier for the assessment of cytologically indeterminate thyroid nodules. A decision tree model was used. Costs were calculated from the perspective of the German health insurance system. The robustness of the results was assessed with probabilistic sensitivity analyses using a Monte Carlo simulation. Life expectancy was 34.3 years (estimated costs per patient EUR1,459 - EUR2,224) for the MIBI scan and 34.1 years (estimated costs EUR3,560 - EUR4,071) for the molecular test. These results were confirmed by the Monte Carlo simulation. MIBI thyroid scintigraphy is more cost-effective than the gene expression classifier. (orig.)

  11. Thyroid nodules with indeterminate cytology: molecular imaging with {sup 99m}Tc-methoxyisobutylisonitrile (MIBI) is more cost-effective than the Afirma registered gene expression classifier

    Energy Technology Data Exchange (ETDEWEB)

    Heinzel, Alexander [RWTH Aachen University Hospital, Department of Nuclear Medicine, Aachen, Pauwelsstrasse 30 (Germany); Institute for Neuroscience and Medicine (INM-4), Research Centre, Juelich (Germany); Mueller, Dirk [University of Cologne, Institute for Health Economics and Clinical Epidemiology, Cologne (Germany); Behrendt, Florian F. [RWTH Aachen University Hospital, Department of Nuclear Medicine, Aachen, Pauwelsstrasse 30 (Germany); Giovanella, Luca [Institute of Southern Switzerland, Department of Nuclear Medicine Oncology, Belinzona (Switzerland); Mottaghy, Felix M.; Verburg, Frederik A. [RWTH Aachen University Hospital, Department of Nuclear Medicine, Aachen, Pauwelsstrasse 30 (Germany); Maastricht University Medical Center, Department of Nuclear Medicine, Maastricht (Netherlands)

    2014-08-15

    To compare the cost-effectiveness of {sup 99m}Tc-methoxyisobutylisonitrile (MIBI) thyroid scintigraphy and the Afirma registered gene expression classifier for the assessment of cytologically indeterminate thyroid nodules. A decision tree model was used. Costs were calculated from the perspective of the German health insurance system. The robustness of the results was assessed with probabilistic sensitivity analyses using a Monte Carlo simulation. Life expectancy was 34.3 years (estimated costs per patient EUR1,459 - EUR2,224) for the MIBI scan and 34.1 years (estimated costs EUR3,560 - EUR4,071) for the molecular test. These results were confirmed by the Monte Carlo simulation. MIBI thyroid scintigraphy is more cost-effective than the gene expression classifier. (orig.)

  12. The usability of a 15-gene hypoxia classifier as a universal hypoxia profile in various cancer cell types

    DEFF Research Database (Denmark)

    Sørensen, Brita Singers; Knudsen, Anders Bisgård; Wittrup, Catja Foged

    2015-01-01

    genes, with BNIP3 not being upregulated at hypoxic conditions in 3 out of 6 colon cancer cell lines, and ALDOA in OE21 and FAM162A and SLC2A1 in SW116 only showing limited hypoxia induction. Furthermore, in the esophagus cell lines, the normoxic and hypoxic expression levels of LOX and BNIP3 were below...... the tissue type dependency of hypoxia induced genes included in a 15-gene hypoxic profile in carcinoma cell lines from prostate, colon, and esophagus cancer, and demonstrated that in vitro, with minor fluctuations, the genes in the hypoxic profile are hypoxia inducible, and the hypoxia profile may......BACKGROUND AND PURPOSE: A 15-gene hypoxia profile has previously demonstrated to have both prognostic and predictive impact for hypoxic modification in squamous cell carcinoma of the head and neck. This gene expression profile may also have a prognostic value in other histological cancer types...

  13. Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier

    International Nuclear Information System (INIS)

    Lauss, Martin; Frigyesi, Attila; Ryden, Tobias; Höglund, Mattias

    2010-01-01

    Genome wide gene expression data is a rich source for the identification of gene signatures suitable for clinical purposes and a number of statistical algorithms have been described for both identification and evaluation of such signatures. Some employed algorithms are fairly complex and hence sensitive to over-fitting whereas others are more simple and straight forward. Here we present a new type of simple algorithm based on ROC analysis and the use of metagenes that we believe will be a good complement to existing algorithms. The basis for the proposed approach is the use of metagenes, instead of collections of individual genes, and a feature selection using AUC values obtained by ROC analysis. Each gene in a data set is assigned an AUC value relative to the tumor class under investigation and the genes are ranked according to these values. Metagenes are then formed by calculating the mean expression level for an increasing number of ranked genes, and the metagene expression value that optimally discriminates tumor classes in the training set is used for classification of new samples. The performance of the metagene is then evaluated using LOOCV and balanced accuracies. We show that the simple uni-variate gene expression average algorithm performs as well as several alternative algorithms such as discriminant analysis and the more complex approaches such as SVM and neural networks. The R package rocc is freely available at http://cran.r-project.org/web/packages/rocc/index.html

  14. Carbon classified?

    DEFF Research Database (Denmark)

    Lippert, Ingmar

    2012-01-01

    . Using an actor- network theory (ANT) framework, the aim is to investigate the actors who bring together the elements needed to classify their carbon emission sources and unpack the heterogeneous relations drawn on. Based on an ethnographic study of corporate agents of ecological modernisation over...... a period of 13 months, this paper provides an exploration of three cases of enacting classification. Drawing on ANT, we problematise the silencing of a range of possible modalities of consumption facts and point to the ontological ethics involved in such performances. In a context of global warming...

  15. A data mining approach for classifying DNA repair genes into ageing-related or non-ageing-related

    Directory of Open Access Journals (Sweden)

    Vasieva Olga

    2011-01-01

    Full Text Available Abstract Background The ageing of the worldwide population means there is a growing need for research on the biology of ageing. DNA damage is likely a key contributor to the ageing process and elucidating the role of different DNA repair systems in ageing is of great interest. In this paper we propose a data mining approach, based on classification methods (decision trees and Naive Bayes, for analysing data about human DNA repair genes. The goal is to build classification models that allow us to discriminate between ageing-related and non-ageing-related DNA repair genes, in order to better understand their different properties. Results The main patterns discovered by the classification methods are as follows: (a the number of protein-protein interactions was a predictor of DNA repair proteins being ageing-related; (b the use of predictor attributes based on protein-protein interactions considerably increased predictive accuracy of attributes based on Gene Ontology (GO annotations; (c GO terms related to "response to stimulus" seem reasonably good predictors of ageing-relatedness for DNA repair genes; (d interaction with the XRCC5 (Ku80 protein is a strong predictor of ageing-relatedness for DNA repair genes; and (e DNA repair genes with a high expression in T lymphocytes are more likely to be ageing-related. Conclusions The above patterns are broadly integrated in an analysis discussing relations between Ku, the non-homologous end joining DNA repair pathway, ageing and lymphocyte development. These patterns and their analysis support non-homologous end joining double strand break repair as central to the ageing-relatedness of DNA repair genes. Our work also showcases the use of protein interaction partners to improve accuracy in data mining methods and our approach could be applied to other ageing-related pathways.

  16. Development and confirmation of potential gene classifiers of human clear cell renal cell carcinoma using next-generation RNA sequencing.

    Science.gov (United States)

    Eikrem, Oystein S; Strauss, Philipp; Beisland, Christian; Scherer, Andreas; Landolt, Lea; Flatberg, Arnar; Leh, Sabine; Beisvag, Vidar; Skogstrand, Trude; Hjelle, Karin; Shresta, Anjana; Marti, Hans-Peter

    2016-12-01

    A previous study by this group demonstrated the feasibility of RNA sequencing (RNAseq) technology for capturing disease biology of clear cell renal cell carcinoma (ccRCC), and presented initial results for carbonic anhydrase-9 (CA9) and tumor necrosis factor-α-induced protein-6 (TNFAIP6) as possible biomarkers of ccRCC (discovery set) [Eikrem et al. PLoS One 2016;11:e0149743]. To confirm these results, the previous study is expanded, and RNAseq data from additional matched ccRCC and normal renal biopsies are analyzed (confirmation set). Two core biopsies from patients (n = 12) undergoing partial or full nephrectomy were obtained with a 16 g needle. RNA sequencing libraries were generated with the Illumina TruSeq ® Access library preparation protocol. Comparative analysis was done using linear modeling (voom/Limma; R Bioconductor). The formalin-fixed and paraffin-embedded discovery and confirmation data yielded 8957 and 11,047 detected transcripts, respectively. The two data sets shared 1193 of differentially expressed genes with each other. The average expression and the log 2 -fold changes of differentially expressed transcripts in both data sets correlated, with R²   =   .95 and R²   =   .94, respectively. Among transcripts with the highest fold changes were CA9, neuronal pentraxin-2 and uromodulin. Epithelial-mesenchymal transition was highlighted by differential expression of, for example, transforming growth factor-β 1 and delta-like ligand-4. The diagnostic accuracy of CA9 was 100% and 93.9% when using the discovery set as the training set and the confirmation data as the test set, and vice versa, respectively. These data further support TNFAIP6 as a novel biomarker of ccRCC. TNFAIP6 had combined accuracy of 98.5% in the two data sets. This study provides confirmatory data on the potential use of CA9 and TNFAIP6 as biomarkers of ccRCC. Thus, next-generation sequencing expands the clinical application of tissue analyses.

  17. Dynamic Response Genes in CD4+ T Cells Reveal a Network of Interactive Proteins that Classifies Disease Activity in Multiple Sclerosis

    Directory of Open Access Journals (Sweden)

    Sandra Hellberg

    2016-09-01

    Full Text Available Multiple sclerosis (MS is a chronic inflammatory disease of the CNS and has a varying disease course as well as variable response to treatment. Biomarkers may therefore aid personalized treatment. We tested whether in vitro activation of MS patient-derived CD4+ T cells could reveal potential biomarkers. The dynamic gene expression response to activation was dysregulated in patient-derived CD4+ T cells. By integrating our findings with genome-wide association studies, we constructed a highly connected MS gene module, disclosing cell activation and chemotaxis as central components. Changes in several module genes were associated with differences in protein levels, which were measurable in cerebrospinal fluid and were used to classify patients from control individuals. In addition, these measurements could predict disease activity after 2 years and distinguish low and high responders to treatment in two additional, independent cohorts. While further validation is needed in larger cohorts prior to clinical implementation, we have uncovered a set of potentially promising biomarkers.

  18. Sparse representation of multi parametric DCE-MRI features using K-SVD for classifying gene expression based breast cancer recurrence risk

    Science.gov (United States)

    Mahrooghy, Majid; Ashraf, Ahmed B.; Daye, Dania; Mies, Carolyn; Rosen, Mark; Feldman, Michael; Kontos, Despina

    2014-03-01

    We evaluate the prognostic value of sparse representation-based features by applying the K-SVD algorithm on multiparametric kinetic, textural, and morphologic features in breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). K-SVD is an iterative dimensionality reduction method that optimally reduces the initial feature space by updating the dictionary columns jointly with the sparse representation coefficients. Therefore, by using K-SVD, we not only provide sparse representation of the features and condense the information in a few coefficients but also we reduce the dimensionality. The extracted K-SVD features are evaluated by a machine learning algorithm including a logistic regression classifier for the task of classifying high versus low breast cancer recurrence risk as determined by a validated gene expression assay. The features are evaluated using ROC curve analysis and leave one-out cross validation for different sparse representation and dimensionality reduction numbers. Optimal sparse representation is obtained when the number of dictionary elements is 4 (K=4) and maximum non-zero coefficients is 2 (L=2). We compare K-SVD with ANOVA based feature selection for the same prognostic features. The ROC results show that the AUC of the K-SVD based (K=4, L=2), the ANOVA based, and the original features (i.e., no dimensionality reduction) are 0.78, 0.71. and 0.68, respectively. From the results, it can be inferred that by using sparse representation of the originally extracted multi-parametric, high-dimensional data, we can condense the information on a few coefficients with the highest predictive value. In addition, the dimensionality reduction introduced by K-SVD can prevent models from over-fitting.

  19. Association between Interleukin-1 Gene Single Nucleotide Polymorphisms and Ischemic Stroke Classified by TOAST Criteria in the Han Population of Northern China

    Directory of Open Access Journals (Sweden)

    Zheng Zhang

    2013-01-01

    Full Text Available Increasing evidence suggests that IL-1β (C-511T and IL-1α (C-889T genes polymorphisms are associated with the susceptibility to cardiocerebral vascular disease. In this paper, we investigated the relationships between these polymorphisms and the risk of ischemic stroke (IS classified by TOAST criteria in the north Chinese Han population. 440 cases of IS and 486 age- and gender-matched controls of Chinese Han population were enrolled. Association study showed that the TT genotype and T allele of IL-1α-889 C/T were significantly associated with IS of a large artery atherosclerosis (LAA (TT: OR = 2.01, 95% CI = 1.34–3.0, and P<0.001; T: OR = 1.44, 95% CI = 1.18–1.78, and P=0.001. However, there was no significant difference in the distribution of IL-1α-889 C/T genotypes and allele frequencies between the two subgroups (small-artery occlusion (SVD and cardioembolism (CE of IS and control groups. No significant association was also found between the IL-1β-511 TT genotype and T allele (TT: OR = 0.79, 95% CI = 0.56–1.11, and P=0.175; T: OR = 0.83, 95% CI = 0.68–1.01, and P=0.066 and IS as well as subgroups of CE and SVD. Our results implicated that IL-1α-889 C/T gene polymorphism might be associated with the susceptibility to IS, especially to IS with LAA, in a north Chinese Han population.

  20. Assessment of the predictive accuracy of five in silico prediction tools, alone or in combination, and two metaservers to classify long QT syndrome gene mutations.

    Science.gov (United States)

    Leong, Ivone U S; Stuckey, Alexander; Lai, Daniel; Skinner, Jonathan R; Love, Donald R

    2015-05-13

    Long QT syndrome (LQTS) is an autosomal dominant condition predisposing to sudden death from malignant arrhythmia. Genetic testing identifies many missense single nucleotide variants of uncertain pathogenicity. Establishing genetic pathogenicity is an essential prerequisite to family cascade screening. Many laboratories use in silico prediction tools, either alone or in combination, or metaservers, in order to predict pathogenicity; however, their accuracy in the context of LQTS is unknown. We evaluated the accuracy of five in silico programs and two metaservers in the analysis of LQTS 1-3 gene variants. The in silico tools SIFT, PolyPhen-2, PROVEAN, SNPs&GO and SNAP, either alone or in all possible combinations, and the metaservers Meta-SNP and PredictSNP, were tested on 312 KCNQ1, KCNH2 and SCN5A gene variants that have previously been characterised by either in vitro or co-segregation studies as either "pathogenic" (283) or "benign" (29). The accuracy, sensitivity, specificity and Matthews Correlation Coefficient (MCC) were calculated to determine the best combination of in silico tools for each LQTS gene, and when all genes are combined. The best combination of in silico tools for KCNQ1 is PROVEAN, SNPs&GO and SIFT (accuracy 92.7%, sensitivity 93.1%, specificity 100% and MCC 0.70). The best combination of in silico tools for KCNH2 is SIFT and PROVEAN or PROVEAN, SNPs&GO and SIFT. Both combinations have the same scores for accuracy (91.1%), sensitivity (91.5%), specificity (87.5%) and MCC (0.62). In the case of SCN5A, SNAP and PROVEAN provided the best combination (accuracy 81.4%, sensitivity 86.9%, specificity 50.0%, and MCC 0.32). When all three LQT genes are combined, SIFT, PROVEAN and SNAP is the combination with the best performance (accuracy 82.7%, sensitivity 83.0%, specificity 80.0%, and MCC 0.44). Both metaservers performed better than the single in silico tools; however, they did not perform better than the best performing combination of in silico

  1. Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble.

    Science.gov (United States)

    Wang, Xiao; Zhang, Jun; Li, Guo-Zheng

    2015-01-01

    It has become a very important and full of challenge task to predict bacterial protein subcellular locations using computational methods. Although there exist a lot of prediction methods for bacterial proteins, the majority of these methods can only deal with single-location proteins. But unfortunately many multi-location proteins are located in the bacterial cells. Moreover, multi-location proteins have special biological functions capable of helping the development of new drugs. So it is necessary to develop new computational methods for accurately predicting subcellular locations of multi-location bacterial proteins. In this article, two efficient multi-label predictors, Gpos-ECC-mPLoc and Gneg-ECC-mPLoc, are developed to predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. The two multi-label predictors construct the GO vectors by using the GO terms of homologous proteins of query proteins and then adopt a powerful multi-label ensemble classifier to make the final multi-label prediction. The two multi-label predictors have the following advantages: (1) they improve the prediction performance of multi-label proteins by taking the correlations among different labels into account; (2) they ensemble multiple CC classifiers and further generate better prediction results by ensemble learning; and (3) they construct the GO vectors by using the frequency of occurrences of GO terms in the typical homologous set instead of using 0/1 values. Experimental results show that Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently improve prediction accuracy of subcellular localization of multi-location gram-positive and gram-negative bacterial proteins respectively. The online web servers for Gpos-ECC-mPLoc and Gneg-ECC-mPLoc predictors are freely accessible

  2. Classifying Returns as Extreme

    DEFF Research Database (Denmark)

    Christiansen, Charlotte

    2014-01-01

    I consider extreme returns for the stock and bond markets of 14 EU countries using two classification schemes: One, the univariate classification scheme from the previous literature that classifies extreme returns for each market separately, and two, a novel multivariate classification scheme tha...

  3. LCC: Light Curves Classifier

    Science.gov (United States)

    Vo, Martin

    2017-08-01

    Light Curves Classifier uses data mining and machine learning to obtain and classify desired objects. This task can be accomplished by attributes of light curves or any time series, including shapes, histograms, or variograms, or by other available information about the inspected objects, such as color indices, temperatures, and abundances. After specifying features which describe the objects to be searched, the software trains on a given training sample, and can then be used for unsupervised clustering for visualizing the natural separation of the sample. The package can be also used for automatic tuning parameters of used methods (for example, number of hidden neurons or binning ratio). Trained classifiers can be used for filtering outputs from astronomical databases or data stored locally. The Light Curve Classifier can also be used for simple downloading of light curves and all available information of queried stars. It natively can connect to OgleII, OgleIII, ASAS, CoRoT, Kepler, Catalina and MACHO, and new connectors or descriptors can be implemented. In addition to direct usage of the package and command line UI, the program can be used through a web interface. Users can create jobs for ”training” methods on given objects, querying databases and filtering outputs by trained filters. Preimplemented descriptors, classifier and connectors can be picked by simple clicks and their parameters can be tuned by giving ranges of these values. All combinations are then calculated and the best one is used for creating the filter. Natural separation of the data can be visualized by unsupervised clustering.

  4. Intelligent Garbage Classifier

    Directory of Open Access Journals (Sweden)

    Ignacio Rodríguez Novelle

    2008-12-01

    Full Text Available IGC (Intelligent Garbage Classifier is a system for visual classification and separation of solid waste products. Currently, an important part of the separation effort is based on manual work, from household separation to industrial waste management. Taking advantage of the technologies currently available, a system has been built that can analyze images from a camera and control a robot arm and conveyor belt to automatically separate different kinds of waste.

  5. Classifying Linear Canonical Relations

    OpenAIRE

    Lorand, Jonathan

    2015-01-01

    In this Master's thesis, we consider the problem of classifying, up to conjugation by linear symplectomorphisms, linear canonical relations (lagrangian correspondences) from a finite-dimensional symplectic vector space to itself. We give an elementary introduction to the theory of linear canonical relations and present partial results toward the classification problem. This exposition should be accessible to undergraduate students with a basic familiarity with linear algebra.

  6. Stack filter classifiers

    Energy Technology Data Exchange (ETDEWEB)

    Porter, Reid B [Los Alamos National Laboratory; Hush, Don [Los Alamos National Laboratory

    2009-01-01

    Just as linear models generalize the sample mean and weighted average, weighted order statistic models generalize the sample median and weighted median. This analogy can be continued informally to generalized additive modeels in the case of the mean, and Stack Filters in the case of the median. Both of these model classes have been extensively studied for signal and image processing but it is surprising to find that for pattern classification, their treatment has been significantly one sided. Generalized additive models are now a major tool in pattern classification and many different learning algorithms have been developed to fit model parameters to finite data. However Stack Filters remain largely confined to signal and image processing and learning algorithms for classification are yet to be seen. This paper is a step towards Stack Filter Classifiers and it shows that the approach is interesting from both a theoretical and a practical perspective.

  7. Fingerprint prediction using classifier ensembles

    CSIR Research Space (South Africa)

    Molale, P

    2011-11-01

    Full Text Available ); logistic discrimination (LgD), k-nearest neighbour (k-NN), artificial neural network (ANN), association rules (AR) decision tree (DT), naive Bayes classifier (NBC) and the support vector machine (SVM). The performance of several multiple classifier systems...

  8. Classified

    CERN Multimedia

    Computer Security Team

    2011-01-01

    In the last issue of the Bulletin, we have discussed recent implications for privacy on the Internet. But privacy of personal data is just one facet of data protection. Confidentiality is another one. However, confidentiality and data protection are often perceived as not relevant in the academic environment of CERN.   But think twice! At CERN, your personal data, e-mails, medical records, financial and contractual documents, MARS forms, group meeting minutes (and of course your password!) are all considered to be sensitive, restricted or even confidential. And this is not all. Physics results, in particular when being preliminary and pending scrutiny, are sensitive, too. Just recently, an ATLAS collaborator copy/pasted the abstract of an ATLAS note onto an external public blog, despite the fact that this document was clearly marked as an "Internal Note". Such an act was not only embarrassing to the ATLAS collaboration, and had negative impact on CERN’s reputation --- i...

  9. Classifying Sluice Occurrences in Dialogue

    DEFF Research Database (Denmark)

    Baird, Austin; Hamza, Anissa; Hardt, Daniel

    2018-01-01

    perform manual annotation with acceptable inter-coder agreement. We build classifier models with Decision Trees and Naive Bayes, with accuracy of 67%. We deploy a classifier to automatically classify sluice occurrences in OpenSubtitles, resulting in a corpus with 1.7 million occurrences. This will support....... Despite this, the corpus can be of great use in research on sluicing and development of systems, and we are making the corpus freely available on request. Furthermore, we are in the process of improving the accuracy of sluice identification and annotation for the purpose of created a subsequent version...

  10. A systems biology-based classifier for hepatocellular carcinoma diagnosis.

    Directory of Open Access Journals (Sweden)

    Yanqiong Zhang

    Full Text Available AIM: The diagnosis of hepatocellular carcinoma (HCC in the early stage is crucial to the application of curative treatments which are the only hope for increasing the life expectancy of patients. Recently, several large-scale studies have shed light on this problem through analysis of gene expression profiles to identify markers correlated with HCC progression. However, those marker sets shared few genes in common and were poorly validated using independent data. Therefore, we developed a systems biology based classifier by combining the differential gene expression with topological features of human protein interaction networks to enhance the ability of HCC diagnosis. METHODS AND RESULTS: In the Oncomine platform, genes differentially expressed in HCC tissues relative to their corresponding normal tissues were filtered by a corrected Q value cut-off and Concept filters. The identified genes that are common to different microarray datasets were chosen as the candidate markers. Then, their networks were analyzed by GeneGO Meta-Core software and the hub genes were chosen. After that, an HCC diagnostic classifier was constructed by Partial Least Squares modeling based on the microarray gene expression data of the hub genes. Validations of diagnostic performance showed that this classifier had high predictive accuracy (85.88∼92.71% and area under ROC curve (approximating 1.0, and that the network topological features integrated into this classifier contribute greatly to improving the predictive performance. Furthermore, it has been demonstrated that this modeling strategy is not only applicable to HCC, but also to other cancers. CONCLUSION: Our analysis suggests that the systems biology-based classifier that combines the differential gene expression and topological features of human protein interaction network may enhance the diagnostic performance of HCC classifier.

  11. Quantum ensembles of quantum classifiers.

    Science.gov (United States)

    Schuld, Maria; Petruccione, Francesco

    2018-02-09

    Quantum machine learning witnesses an increasing amount of quantum algorithms for data-driven decision making, a problem with potential applications ranging from automated image recognition to medical diagnosis. Many of those algorithms are implementations of quantum classifiers, or models for the classification of data inputs with a quantum computer. Following the success of collective decision making with ensembles in classical machine learning, this paper introduces the concept of quantum ensembles of quantum classifiers. Creating the ensemble corresponds to a state preparation routine, after which the quantum classifiers are evaluated in parallel and their combined decision is accessed by a single-qubit measurement. This framework naturally allows for exponentially large ensembles in which - similar to Bayesian learning - the individual classifiers do not have to be trained. As an example, we analyse an exponentially large quantum ensemble in which each classifier is weighed according to its performance in classifying the training data, leading to new results for quantum as well as classical machine learning.

  12. IAEA safeguards and classified materials

    International Nuclear Information System (INIS)

    Pilat, J.F.; Eccleston, G.W.; Fearey, B.L.; Nicholas, N.J.; Tape, J.W.; Kratzer, M.

    1997-01-01

    The international community in the post-Cold War period has suggested that the International Atomic Energy Agency (IAEA) utilize its expertise in support of the arms control and disarmament process in unprecedented ways. The pledges of the US and Russian presidents to place excess defense materials, some of which are classified, under some type of international inspections raises the prospect of using IAEA safeguards approaches for monitoring classified materials. A traditional safeguards approach, based on nuclear material accountancy, would seem unavoidably to reveal classified information. However, further analysis of the IAEA's safeguards approaches is warranted in order to understand fully the scope and nature of any problems. The issues are complex and difficult, and it is expected that common technical understandings will be essential for their resolution. Accordingly, this paper examines and compares traditional safeguards item accounting of fuel at a nuclear power station (especially spent fuel) with the challenges presented by inspections of classified materials. This analysis is intended to delineate more clearly the problems as well as reveal possible approaches, techniques, and technologies that could allow the adaptation of safeguards to the unprecedented task of inspecting classified materials. It is also hoped that a discussion of these issues can advance ongoing political-technical debates on international inspections of excess classified materials

  13. Hybrid classifiers methods of data, knowledge, and classifier combination

    CERN Document Server

    Wozniak, Michal

    2014-01-01

    This book delivers a definite and compact knowledge on how hybridization can help improving the quality of computer classification systems. In order to make readers clearly realize the knowledge of hybridization, this book primarily focuses on introducing the different levels of hybridization and illuminating what problems we will face with as dealing with such projects. In the first instance the data and knowledge incorporated in hybridization were the action points, and then a still growing up area of classifier systems known as combined classifiers was considered. This book comprises the aforementioned state-of-the-art topics and the latest research results of the author and his team from Department of Systems and Computer Networks, Wroclaw University of Technology, including as classifier based on feature space splitting, one-class classification, imbalance data, and data stream classification.

  14. 3D Bayesian contextual classifiers

    DEFF Research Database (Denmark)

    Larsen, Rasmus

    2000-01-01

    We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours.......We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours....

  15. Knowledge Uncertainty and Composed Classifier

    Czech Academy of Sciences Publication Activity Database

    Klimešová, Dana; Ocelíková, E.

    2007-01-01

    Roč. 1, č. 2 (2007), s. 101-105 ISSN 1998-0140 Institutional research plan: CEZ:AV0Z10750506 Keywords : Boosting architecture * contextual modelling * composed classifier * knowledge management, * knowledge * uncertainty Subject RIV: IN - Informatics, Computer Science

  16. Correlation Dimension-Based Classifier

    Czech Academy of Sciences Publication Activity Database

    Jiřina, Marcel; Jiřina jr., M.

    2014-01-01

    Roč. 44, č. 12 (2014), s. 2253-2263 ISSN 2168-2267 R&D Projects: GA MŠk(CZ) LG12020 Institutional support: RVO:67985807 Keywords : classifier * multidimensional data * correlation dimension * scaling exponent * polynomial expansion Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 3.469, year: 2014

  17. Classified facilities for environmental protection

    International Nuclear Information System (INIS)

    Anon.

    1993-02-01

    The legislation of the classified facilities governs most of the dangerous or polluting industries or fixed activities. It rests on the law of 9 July 1976 concerning facilities classified for environmental protection and its application decree of 21 September 1977. This legislation, the general texts of which appear in this volume 1, aims to prevent all the risks and the harmful effects coming from an installation (air, water or soil pollutions, wastes, even aesthetic breaches). The polluting or dangerous activities are defined in a list called nomenclature which subjects the facilities to a declaration or an authorization procedure. The authorization is delivered by the prefect at the end of an open and contradictory procedure after a public survey. In addition, the facilities can be subjected to technical regulations fixed by the Environment Minister (volume 2) or by the prefect for facilities subjected to declaration (volume 3). (A.B.)

  18. Energy-Efficient Neuromorphic Classifiers.

    Science.gov (United States)

    Martí, Daniel; Rigotti, Mattia; Seok, Mingoo; Fusi, Stefano

    2016-10-01

    Neuromorphic engineering combines the architectural and computational principles of systems neuroscience with semiconductor electronics, with the aim of building efficient and compact devices that mimic the synaptic and neural machinery of the brain. The energy consumptions promised by neuromorphic engineering are extremely low, comparable to those of the nervous system. Until now, however, the neuromorphic approach has been restricted to relatively simple circuits and specialized functions, thereby obfuscating a direct comparison of their energy consumption to that used by conventional von Neumann digital machines solving real-world tasks. Here we show that a recent technology developed by IBM can be leveraged to realize neuromorphic circuits that operate as classifiers of complex real-world stimuli. Specifically, we provide a set of general prescriptions to enable the practical implementation of neural architectures that compete with state-of-the-art classifiers. We also show that the energy consumption of these architectures, realized on the IBM chip, is typically two or more orders of magnitude lower than that of conventional digital machines implementing classifiers with comparable performance. Moreover, the spike-based dynamics display a trade-off between integration time and accuracy, which naturally translates into algorithms that can be flexibly deployed for either fast and approximate classifications, or more accurate classifications at the mere expense of longer running times and higher energy costs. This work finally proves that the neuromorphic approach can be efficiently used in real-world applications and has significant advantages over conventional digital devices when energy consumption is considered.

  19. 76 FR 34761 - Classified National Security Information

    Science.gov (United States)

    2011-06-14

    ... MARINE MAMMAL COMMISSION Classified National Security Information [Directive 11-01] AGENCY: Marine... Commission's (MMC) policy on classified information, as directed by Information Security Oversight Office... of Executive Order 13526, ``Classified National Security Information,'' and 32 CFR part 2001...

  20. Waste classifying and separation device

    International Nuclear Information System (INIS)

    Kakiuchi, Hiroki.

    1997-01-01

    A flexible plastic bags containing solid wastes of indefinite shape is broken and the wastes are classified. The bag cutting-portion of the device has an ultrasonic-type or a heater-type cutting means, and the cutting means moves in parallel with the transferring direction of the plastic bags. A classification portion separates and discriminates the plastic bag from the contents and conducts classification while rotating a classification table. Accordingly, the plastic bag containing solids of indefinite shape can be broken and classification can be conducted efficiently and reliably. The device of the present invention has a simple structure which requires small installation space and enables easy maintenance. (T.M.)

  1. Defining and Classifying Interest Groups

    DEFF Research Database (Denmark)

    Baroni, Laura; Carroll, Brendan; Chalmers, Adam

    2014-01-01

    The interest group concept is defined in many different ways in the existing literature and a range of different classification schemes are employed. This complicates comparisons between different studies and their findings. One of the important tasks faced by interest group scholars engaged...... in large-N studies is therefore to define the concept of an interest group and to determine which classification scheme to use for different group types. After reviewing the existing literature, this article sets out to compare different approaches to defining and classifying interest groups with a sample...... in the organizational attributes of specific interest group types. As expected, our comparison of coding schemes reveals a closer link between group attributes and group type in narrower classification schemes based on group organizational characteristics than those based on a behavioral definition of lobbying....

  2. Composite Classifiers for Automatic Target Recognition

    National Research Council Canada - National Science Library

    Wang, Lin-Cheng

    1998-01-01

    ...) using forward-looking infrared (FLIR) imagery. Two existing classifiers, one based on learning vector quantization and the other on modular neural networks, are used as the building blocks for our composite classifiers...

  3. Aggregation Operator Based Fuzzy Pattern Classifier Design

    DEFF Research Database (Denmark)

    Mönks, Uwe; Larsen, Henrik Legind; Lohweg, Volker

    2009-01-01

    This paper presents a novel modular fuzzy pattern classifier design framework for intelligent automation systems, developed on the base of the established Modified Fuzzy Pattern Classifier (MFPC) and allows designing novel classifier models which are hardware-efficiently implementable....... The performances of novel classifiers using substitutes of MFPC's geometric mean aggregator are benchmarked in the scope of an image processing application against the MFPC to reveal classification improvement potentials for obtaining higher classification rates....

  4. Gene

    Data.gov (United States)

    U.S. Department of Health & Human Services — Gene integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes,...

  5. 15 CFR 4.8 - Classified Information.

    Science.gov (United States)

    2010-01-01

    ... 15 Commerce and Foreign Trade 1 2010-01-01 2010-01-01 false Classified Information. 4.8 Section 4... INFORMATION Freedom of Information Act § 4.8 Classified Information. In processing a request for information..., the information shall be reviewed to determine whether it should remain classified. Ordinarily the...

  6. On the statistical assessment of classifiers using DNA microarray data

    Directory of Open Access Journals (Sweden)

    Carella M

    2006-08-01

    Full Text Available Abstract Background In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22 and tumor (25 specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. Results We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045 as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS and Support Vector Machines (SVM classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035 and e = 18% (p = 0.037 respectively. Moreover, the error rate

  7. Error minimizing algorithms for nearest eighbor classifiers

    Energy Technology Data Exchange (ETDEWEB)

    Porter, Reid B [Los Alamos National Laboratory; Hush, Don [Los Alamos National Laboratory; Zimmer, G. Beate [TEXAS A& M

    2011-01-03

    Stack Filters define a large class of discrete nonlinear filter first introd uced in image and signal processing for noise removal. In recent years we have suggested their application to classification problems, and investigated their relationship to other types of discrete classifiers such as Decision Trees. In this paper we focus on a continuous domain version of Stack Filter Classifiers which we call Ordered Hypothesis Machines (OHM), and investigate their relationship to Nearest Neighbor classifiers. We show that OHM classifiers provide a novel framework in which to train Nearest Neighbor type classifiers by minimizing empirical error based loss functions. We use the framework to investigate a new cost sensitive loss function that allows us to train a Nearest Neighbor type classifier for low false alarm rate applications. We report results on both synthetic data and real-world image data.

  8. Hierarchical mixtures of naive Bayes classifiers

    NARCIS (Netherlands)

    Wiering, M.A.

    2002-01-01

    Naive Bayes classifiers tend to perform very well on a large number of problem domains, although their representation power is quite limited compared to more sophisticated machine learning algorithms. In this pa- per we study combining multiple naive Bayes classifiers by using the hierar- chical

  9. Comparing classifiers for pronunciation error detection

    NARCIS (Netherlands)

    Strik, H.; Truong, K.; Wet, F. de; Cucchiarini, C.

    2007-01-01

    Providing feedback on pronunciation errors in computer assisted language learning systems requires that pronunciation errors be detected automatically. In the present study we compare four types of classifiers that can be used for this purpose: two acoustic-phonetic classifiers (one of which employs

  10. Feature extraction for dynamic integration of classifiers

    NARCIS (Netherlands)

    Pechenizkiy, M.; Tsymbal, A.; Puuronen, S.; Patterson, D.W.

    2007-01-01

    Recent research has shown the integration of multiple classifiers to be one of the most important directions in machine learning and data mining. In this paper, we present an algorithm for the dynamic integration of classifiers in the space of extracted features (FEDIC). It is based on the technique

  11. Deconvolution When Classifying Noisy Data Involving Transformations

    KAUST Repository

    Carroll, Raymond

    2012-09-01

    In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online.

  12. Deconvolution When Classifying Noisy Data Involving Transformations.

    Science.gov (United States)

    Carroll, Raymond; Delaigle, Aurore; Hall, Peter

    2012-09-01

    In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online.

  13. Deconvolution When Classifying Noisy Data Involving Transformations

    KAUST Repository

    Carroll, Raymond; Delaigle, Aurore; Hall, Peter

    2012-01-01

    In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online.

  14. Logarithmic learning for generalized classifier neural network.

    Science.gov (United States)

    Ozyildirim, Buse Melis; Avci, Mutlu

    2014-12-01

    Generalized classifier neural network is introduced as an efficient classifier among the others. Unless the initial smoothing parameter value is close to the optimal one, generalized classifier neural network suffers from convergence problem and requires quite a long time to converge. In this work, to overcome this problem, a logarithmic learning approach is proposed. The proposed method uses logarithmic cost function instead of squared error. Minimization of this cost function reduces the number of iterations used for reaching the minima. The proposed method is tested on 15 different data sets and performance of logarithmic learning generalized classifier neural network is compared with that of standard one. Thanks to operation range of radial basis function included by generalized classifier neural network, proposed logarithmic approach and its derivative has continuous values. This makes it possible to adopt the advantage of logarithmic fast convergence by the proposed learning method. Due to fast convergence ability of logarithmic cost function, training time is maximally decreased to 99.2%. In addition to decrease in training time, classification performance may also be improved till 60%. According to the test results, while the proposed method provides a solution for time requirement problem of generalized classifier neural network, it may also improve the classification accuracy. The proposed method can be considered as an efficient way for reducing the time requirement problem of generalized classifier neural network. Copyright © 2014 Elsevier Ltd. All rights reserved.

  15. A CLASSIFIER SYSTEM USING SMOOTH GRAPH COLORING

    Directory of Open Access Journals (Sweden)

    JORGE FLORES CRUZ

    2017-01-01

    Full Text Available Unsupervised classifiers allow clustering methods with less or no human intervention. Therefore it is desirable to group the set of items with less data processing. This paper proposes an unsupervised classifier system using the model of soft graph coloring. This method was tested with some classic instances in the literature and the results obtained were compared with classifications made with human intervention, yielding as good or better results than supervised classifiers, sometimes providing alternative classifications that considers additional information that humans did not considered.

  16. High dimensional classifiers in the imbalanced case

    DEFF Research Database (Denmark)

    Bak, Britta Anker; Jensen, Jens Ledet

    We consider the binary classification problem in the imbalanced case where the number of samples from the two groups differ. The classification problem is considered in the high dimensional case where the number of variables is much larger than the number of samples, and where the imbalance leads...... to a bias in the classification. A theoretical analysis of the independence classifier reveals the origin of the bias and based on this we suggest two new classifiers that can handle any imbalance ratio. The analytical results are supplemented by a simulation study, where the suggested classifiers in some...

  17. Arabic Handwriting Recognition Using Neural Network Classifier

    African Journals Online (AJOL)

    pc

    2018-03-05

    Mar 5, 2018 ... an OCR using Neural Network classifier preceded by a set of preprocessing .... Artificial Neural Networks (ANNs), which we adopt in this research, consist of ... advantage and disadvantages of each technique. In [9],. Khemiri ...

  18. Classifiers based on optimal decision rules

    KAUST Repository

    Amin, Talha

    2013-11-25

    Based on dynamic programming approach we design algorithms for sequential optimization of exact and approximate decision rules relative to the length and coverage [3, 4]. In this paper, we use optimal rules to construct classifiers, and study two questions: (i) which rules are better from the point of view of classification-exact or approximate; and (ii) which order of optimization gives better results of classifier work: length, length+coverage, coverage, or coverage+length. Experimental results show that, on average, classifiers based on exact rules are better than classifiers based on approximate rules, and sequential optimization (length+coverage or coverage+length) is better than the ordinary optimization (length or coverage).

  19. Classifiers based on optimal decision rules

    KAUST Repository

    Amin, Talha M.; Chikalov, Igor; Moshkov, Mikhail; Zielosko, Beata

    2013-01-01

    Based on dynamic programming approach we design algorithms for sequential optimization of exact and approximate decision rules relative to the length and coverage [3, 4]. In this paper, we use optimal rules to construct classifiers, and study two questions: (i) which rules are better from the point of view of classification-exact or approximate; and (ii) which order of optimization gives better results of classifier work: length, length+coverage, coverage, or coverage+length. Experimental results show that, on average, classifiers based on exact rules are better than classifiers based on approximate rules, and sequential optimization (length+coverage or coverage+length) is better than the ordinary optimization (length or coverage).

  20. Combining multiple classifiers for age classification

    CSIR Research Space (South Africa)

    Van Heerden, C

    2009-11-01

    Full Text Available The authors compare several different classifier combination methods on a single task, namely speaker age classification. This task is well suited to combination strategies, since significantly different feature classes are employed. Support vector...

  1. Neural Network Classifiers for Local Wind Prediction.

    Science.gov (United States)

    Kretzschmar, Ralf; Eckert, Pierre; Cattani, Daniel; Eggimann, Fritz

    2004-05-01

    This paper evaluates the quality of neural network classifiers for wind speed and wind gust prediction with prediction lead times between +1 and +24 h. The predictions were realized based on local time series and model data. The selection of appropriate input features was initiated by time series analysis and completed by empirical comparison of neural network classifiers trained on several choices of input features. The selected input features involved day time, yearday, features from a single wind observation device at the site of interest, and features derived from model data. The quality of the resulting classifiers was benchmarked against persistence for two different sites in Switzerland. The neural network classifiers exhibited superior quality when compared with persistence judged on a specific performance measure, hit and false-alarm rates.

  2. Consistency Analysis of Nearest Subspace Classifier

    OpenAIRE

    Wang, Yi

    2015-01-01

    The Nearest subspace classifier (NSS) finds an estimation of the underlying subspace within each class and assigns data points to the class that corresponds to its nearest subspace. This paper mainly studies how well NSS can be generalized to new samples. It is proved that NSS is strongly consistent under certain assumptions. For completeness, NSS is evaluated through experiments on various simulated and real data sets, in comparison with some other linear model based classifiers. It is also ...

  3. Reinforcement Learning Based Artificial Immune Classifier

    Directory of Open Access Journals (Sweden)

    Mehmet Karakose

    2013-01-01

    Full Text Available One of the widely used methods for classification that is a decision-making process is artificial immune systems. Artificial immune systems based on natural immunity system can be successfully applied for classification, optimization, recognition, and learning in real-world problems. In this study, a reinforcement learning based artificial immune classifier is proposed as a new approach. This approach uses reinforcement learning to find better antibody with immune operators. The proposed new approach has many contributions according to other methods in the literature such as effectiveness, less memory cell, high accuracy, speed, and data adaptability. The performance of the proposed approach is demonstrated by simulation and experimental results using real data in Matlab and FPGA. Some benchmark data and remote image data are used for experimental results. The comparative results with supervised/unsupervised based artificial immune system, negative selection classifier, and resource limited artificial immune classifier are given to demonstrate the effectiveness of the proposed new method.

  4. RRHGE: A Novel Approach to Classify the Estrogen Receptor Based Breast Cancer Subtypes

    Directory of Open Access Journals (Sweden)

    Ashish Saini

    2014-01-01

    Full Text Available Background. Breast cancer is the most common type of cancer among females with a high mortality rate. It is essential to classify the estrogen receptor based breast cancer subtypes into correct subclasses, so that the right treatments can be applied to lower the mortality rate. Using gene signatures derived from gene interaction networks to classify breast cancers has proven to be more reproducible and can achieve higher classification performance. However, the interactions in the gene interaction network usually contain many false-positive interactions that do not have any biological meanings. Therefore, it is a challenge to incorporate the reliability assessment of interactions when deriving gene signatures from gene interaction networks. How to effectively extract gene signatures from available resources is critical to the success of cancer classification. Methods. We propose a novel method to measure and extract the reliable (biologically true or valid interactions from gene interaction networks and incorporate the extracted reliable gene interactions into our proposed RRHGE algorithm to identify significant gene signatures from microarray gene expression data for classifying ER+ and ER− breast cancer samples. Results. The evaluation on real breast cancer samples showed that our RRHGE algorithm achieved higher classification accuracy than the existing approaches.

  5. Classifier Fusion With Contextual Reliability Evaluation.

    Science.gov (United States)

    Liu, Zhunga; Pan, Quan; Dezert, Jean; Han, Jun-Wei; He, You

    2018-05-01

    Classifier fusion is an efficient strategy to improve the classification performance for the complex pattern recognition problem. In practice, the multiple classifiers to combine can have different reliabilities and the proper reliability evaluation plays an important role in the fusion process for getting the best classification performance. We propose a new method for classifier fusion with contextual reliability evaluation (CF-CRE) based on inner reliability and relative reliability concepts. The inner reliability, represented by a matrix, characterizes the probability of the object belonging to one class when it is classified to another class. The elements of this matrix are estimated from the -nearest neighbors of the object. A cautious discounting rule is developed under belief functions framework to revise the classification result according to the inner reliability. The relative reliability is evaluated based on a new incompatibility measure which allows to reduce the level of conflict between the classifiers by applying the classical evidence discounting rule to each classifier before their combination. The inner reliability and relative reliability capture different aspects of the classification reliability. The discounted classification results are combined with Dempster-Shafer's rule for the final class decision making support. The performance of CF-CRE have been evaluated and compared with those of main classical fusion methods using real data sets. The experimental results show that CF-CRE can produce substantially higher accuracy than other fusion methods in general. Moreover, CF-CRE is robust to the changes of the number of nearest neighbors chosen for estimating the reliability matrix, which is appealing for the applications.

  6. Classifying sows' activity types from acceleration patterns

    DEFF Research Database (Denmark)

    Cornou, Cecile; Lundbye-Christensen, Søren

    2008-01-01

    An automated method of classifying sow activity using acceleration measurements would allow the individual sow's behavior to be monitored throughout the reproductive cycle; applications for detecting behaviors characteristic of estrus and farrowing or to monitor illness and welfare can be foreseen....... This article suggests a method of classifying five types of activity exhibited by group-housed sows. The method involves the measurement of acceleration in three dimensions. The five activities are: feeding, walking, rooting, lying laterally and lying sternally. Four time series of acceleration (the three...

  7. Data characteristics that determine classifier performance

    CSIR Research Space (South Africa)

    Van der Walt, Christiaan M

    2006-11-01

    Full Text Available available at [11]. The kNN uses a LinearNN nearest neighbour search algorithm with an Euclidean distance metric [8]. The optimal k value is determined by performing 10-fold cross-validation. An optimal k value between 1 and 10 is used for Experiments 1... classifiers. 10-fold cross-validation is used to evaluate and compare the performance of the classifiers on the different data sets. 3.1. Artificial data generation Multivariate Gaussian distributions are used to generate artificial data sets. We use d...

  8. A Customizable Text Classifier for Text Mining

    Directory of Open Access Journals (Sweden)

    Yun-liang Zhang

    2007-12-01

    Full Text Available Text mining deals with complex and unstructured texts. Usually a particular collection of texts that is specified to one or more domains is necessary. We have developed a customizable text classifier for users to mine the collection automatically. It derives from the sentence category of the HNC theory and corresponding techniques. It can start with a few texts, and it can adjust automatically or be adjusted by user. The user can also control the number of domains chosen and decide the standard with which to choose the texts based on demand and abundance of materials. The performance of the classifier varies with the user's choice.

  9. A survey of decision tree classifier methodology

    Science.gov (United States)

    Safavian, S. R.; Landgrebe, David

    1991-01-01

    Decision tree classifiers (DTCs) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps the most important feature of DTCs is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  10. 75 FR 37253 - Classified National Security Information

    Science.gov (United States)

    2010-06-28

    ... ``Secret.'' (3) Each interior page of a classified document shall be marked at the top and bottom either... ``(TS)'' for Top Secret, ``(S)'' for Secret, and ``(C)'' for Confidential will be used. (2) Portions... from the informational text. (1) Conspicuously place the overall classification at the top and bottom...

  11. 75 FR 707 - Classified National Security Information

    Science.gov (United States)

    2010-01-05

    ... classified at one of the following three levels: (1) ``Top Secret'' shall be applied to information, the... exercise this authority. (2) ``Top Secret'' original classification authority may be delegated only by the... official has been delegated ``Top Secret'' original classification authority by the agency head. (4) Each...

  12. Neural Network Classifier Based on Growing Hyperspheres

    Czech Academy of Sciences Publication Activity Database

    Jiřina Jr., Marcel; Jiřina, Marcel

    2000-01-01

    Roč. 10, č. 3 (2000), s. 417-428 ISSN 1210-0552. [Neural Network World 2000. Prague, 09.07.2000-12.07.2000] Grant - others:MŠMT ČR(CZ) VS96047; MPO(CZ) RP-4210 Institutional research plan: AV0Z1030915 Keywords : neural network * classifier * hyperspheres * big -dimensional data Subject RIV: BA - General Mathematics

  13. Histogram deconvolution - An aid to automated classifiers

    Science.gov (United States)

    Lorre, J. J.

    1983-01-01

    It is shown that N-dimensional histograms are convolved by the addition of noise in the picture domain. Three methods are described which provide the ability to deconvolve such noise-affected histograms. The purpose of the deconvolution is to provide automated classifiers with a higher quality N-dimensional histogram from which to obtain classification statistics.

  14. Classifying web pages with visual features

    NARCIS (Netherlands)

    de Boer, V.; van Someren, M.; Lupascu, T.; Filipe, J.; Cordeiro, J.

    2010-01-01

    To automatically classify and process web pages, current systems use the textual content of those pages, including both the displayed content and the underlying (HTML) code. However, a very important feature of a web page is its visual appearance. In this paper, we show that using generic visual

  15. Classifying features in CT imagery: accuracy for some single- and multiple-species classifiers

    Science.gov (United States)

    Daniel L. Schmoldt; Jing He; A. Lynn Abbott

    1998-01-01

    Our current approach to automatically label features in CT images of hardwood logs classifies each pixel of an image individually. These feature classifiers use a back-propagation artificial neural network (ANN) and feature vectors that include a small, local neighborhood of pixels and the distance of the target pixel to the center of the log. Initially, this type of...

  16. Disassembly and Sanitization of Classified Matter

    International Nuclear Information System (INIS)

    Stockham, Dwight J.; Saad, Max P.

    2008-01-01

    The Disassembly Sanitization Operation (DSO) process was implemented to support weapon disassembly and disposition by using recycling and waste minimization measures. This process was initiated by treaty agreements and reconfigurations within both the DOD and DOE Complexes. The DOE is faced with disassembling and disposing of a huge inventory of retired weapons, components, training equipment, spare parts, weapon maintenance equipment, and associated material. In addition, regulations have caused a dramatic increase in the need for information required to support the handling and disposition of these parts and materials. In the past, huge inventories of classified weapon components were required to have long-term storage at Sandia and at many other locations throughout the DoE Complex. These materials are placed in onsite storage unit due to classification issues and they may also contain radiological and/or hazardous components. Since no disposal options exist for this material, the only choice was long-term storage. Long-term storage is costly and somewhat problematic, requiring a secured storage area, monitoring, auditing, and presenting the potential for loss or theft of the material. Overall recycling rates for materials sent through the DSO process have enabled 70 to 80% of these components to be recycled. These components are made of high quality materials and once this material has been sanitized, the demand for the component metals for recycling efforts is very high. The DSO process for NGPF, classified components established the credibility of this technique for addressing the long-term storage requirements of the classified weapons component inventory. The success of this application has generated interest from other Sandia organizations and other locations throughout the complex. Other organizations are requesting the help of the DSO team and the DSO is responding to these requests by expanding its scope to include Work-for- Other projects. For example

  17. Comparing cosmic web classifiers using information theory

    International Nuclear Information System (INIS)

    Leclercq, Florent; Lavaux, Guilhem; Wandelt, Benjamin; Jasche, Jens

    2016-01-01

    We introduce a decision scheme for optimally choosing a classifier, which segments the cosmic web into different structure types (voids, sheets, filaments, and clusters). Our framework, based on information theory, accounts for the design aims of different classes of possible applications: (i) parameter inference, (ii) model selection, and (iii) prediction of new observations. As an illustration, we use cosmographic maps of web-types in the Sloan Digital Sky Survey to assess the relative performance of the classifiers T-WEB, DIVA and ORIGAMI for: (i) analyzing the morphology of the cosmic web, (ii) discriminating dark energy models, and (iii) predicting galaxy colors. Our study substantiates a data-supported connection between cosmic web analysis and information theory, and paves the path towards principled design of analysis procedures for the next generation of galaxy surveys. We have made the cosmic web maps, galaxy catalog, and analysis scripts used in this work publicly available.

  18. Design of Robust Neural Network Classifiers

    DEFF Research Database (Denmark)

    Larsen, Jan; Andersen, Lars Nonboe; Hintz-Madsen, Mads

    1998-01-01

    This paper addresses a new framework for designing robust neural network classifiers. The network is optimized using the maximum a posteriori technique, i.e., the cost function is the sum of the log-likelihood and a regularization term (prior). In order to perform robust classification, we present...... a modified likelihood function which incorporates the potential risk of outliers in the data. This leads to the introduction of a new parameter, the outlier probability. Designing the neural classifier involves optimization of network weights as well as outlier probability and regularization parameters. We...... suggest to adapt the outlier probability and regularisation parameters by minimizing the error on a validation set, and a simple gradient descent scheme is derived. In addition, the framework allows for constructing a simple outlier detector. Experiments with artificial data demonstrate the potential...

  19. Comparing cosmic web classifiers using information theory

    Energy Technology Data Exchange (ETDEWEB)

    Leclercq, Florent [Institute of Cosmology and Gravitation (ICG), University of Portsmouth, Dennis Sciama Building, Burnaby Road, Portsmouth PO1 3FX (United Kingdom); Lavaux, Guilhem; Wandelt, Benjamin [Institut d' Astrophysique de Paris (IAP), UMR 7095, CNRS – UPMC Université Paris 6, Sorbonne Universités, 98bis boulevard Arago, F-75014 Paris (France); Jasche, Jens, E-mail: florent.leclercq@polytechnique.org, E-mail: lavaux@iap.fr, E-mail: j.jasche@tum.de, E-mail: wandelt@iap.fr [Excellence Cluster Universe, Technische Universität München, Boltzmannstrasse 2, D-85748 Garching (Germany)

    2016-08-01

    We introduce a decision scheme for optimally choosing a classifier, which segments the cosmic web into different structure types (voids, sheets, filaments, and clusters). Our framework, based on information theory, accounts for the design aims of different classes of possible applications: (i) parameter inference, (ii) model selection, and (iii) prediction of new observations. As an illustration, we use cosmographic maps of web-types in the Sloan Digital Sky Survey to assess the relative performance of the classifiers T-WEB, DIVA and ORIGAMI for: (i) analyzing the morphology of the cosmic web, (ii) discriminating dark energy models, and (iii) predicting galaxy colors. Our study substantiates a data-supported connection between cosmic web analysis and information theory, and paves the path towards principled design of analysis procedures for the next generation of galaxy surveys. We have made the cosmic web maps, galaxy catalog, and analysis scripts used in this work publicly available.

  20. Detection of Fundus Lesions Using Classifier Selection

    Science.gov (United States)

    Nagayoshi, Hiroto; Hiramatsu, Yoshitaka; Sako, Hiroshi; Himaga, Mitsutoshi; Kato, Satoshi

    A system for detecting fundus lesions caused by diabetic retinopathy from fundus images is being developed. The system can screen the images in advance in order to reduce the inspection workload on doctors. One of the difficulties that must be addressed in completing this system is how to remove false positives (which tend to arise near blood vessels) without decreasing the detection rate of lesions in other areas. To overcome this difficulty, we developed classifier selection according to the position of a candidate lesion, and we introduced new features that can distinguish true lesions from false positives. A system incorporating classifier selection and these new features was tested in experiments using 55 fundus images with some lesions and 223 images without lesions. The results of the experiments confirm the effectiveness of the proposed system, namely, degrees of sensitivity and specificity of 98% and 81%, respectively.

  1. Classifying objects in LWIR imagery via CNNs

    Science.gov (United States)

    Rodger, Iain; Connor, Barry; Robertson, Neil M.

    2016-10-01

    The aim of the presented work is to demonstrate enhanced target recognition and improved false alarm rates for a mid to long range detection system, utilising a Long Wave Infrared (LWIR) sensor. By exploiting high quality thermal image data and recent techniques in machine learning, the system can provide automatic target recognition capabilities. A Convolutional Neural Network (CNN) is trained and the classifier achieves an overall accuracy of > 95% for 6 object classes related to land defence. While the highly accurate CNN struggles to recognise long range target classes, due to low signal quality, robust target discrimination is achieved for challenging candidates. The overall performance of the methodology presented is assessed using human ground truth information, generating classifier evaluation metrics for thermal image sequences.

  2. How large a training set is needed to develop a classifier for microarray data?

    Science.gov (United States)

    Dobbin, Kevin K; Zhao, Yingdong; Simon, Richard M

    2008-01-01

    A common goal of gene expression microarray studies is the development of a classifier that can be used to divide patients into groups with different prognoses, or with different expected responses to a therapy. These types of classifiers are developed on a training set, which is the set of samples used to train a classifier. The question of how many samples are needed in the training set to produce a good classifier from high-dimensional microarray data is challenging. We present a model-based approach to determining the sample size required to adequately train a classifier. It is shown that sample size can be determined from three quantities: standardized fold change, class prevalence, and number of genes or features on the arrays. Numerous examples and important experimental design issues are discussed. The method is adapted to address ex post facto determination of whether the size of a training set used to develop a classifier was adequate. An interactive web site for performing the sample size calculations is provided. We showed that sample size calculations for classifier development from high-dimensional microarray data are feasible, discussed numerous important considerations, and presented examples.

  3. Learning for VMM + WTA Embedded Classifiers

    Science.gov (United States)

    2016-03-31

    Learning for VMM + WTA Embedded Classifiers Jennifer Hasler and Sahil Shah Electrical and Computer Engineering Georgia Institute of Technology...enabling correct classification of each novel acoustic signal (generator, idle car, and idle truck ). The classification structure requires, after...measured on our SoC FPAA IC. The test input is composed of signals from urban environment for 3 objects (generator, idle car, and idle truck

  4. Bayes classifiers for imbalanced traffic accidents datasets.

    Science.gov (United States)

    Mujalli, Randa Oqab; López, Griselda; Garach, Laura

    2016-03-01

    Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under-sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. A Bayesian classifier for symbol recognition

    OpenAIRE

    Barrat , Sabine; Tabbone , Salvatore; Nourrissier , Patrick

    2007-01-01

    URL : http://www.buyans.com/POL/UploadedFile/134_9977.pdf; International audience; We present in this paper an original adaptation of Bayesian networks to symbol recognition problem. More precisely, a descriptor combination method, which enables to improve significantly the recognition rate compared to the recognition rates obtained by each descriptor, is presented. In this perspective, we use a simple Bayesian classifier, called naive Bayes. In fact, probabilistic graphical models, more spec...

  6. Optimization of short amino acid sequences classifier

    Science.gov (United States)

    Barcz, Aleksy; Szymański, Zbigniew

    This article describes processing methods used for short amino acid sequences classification. The data processed are 9-symbols string representations of amino acid sequences, divided into 49 data sets - each one containing samples labeled as reacting or not with given enzyme. The goal of the classification is to determine for a single enzyme, whether an amino acid sequence would react with it or not. Each data set is processed separately. Feature selection is performed to reduce the number of dimensions for each data set. The method used for feature selection consists of two phases. During the first phase, significant positions are selected using Classification and Regression Trees. Afterwards, symbols appearing at the selected positions are substituted with numeric values of amino acid properties taken from the AAindex database. In the second phase the new set of features is reduced using a correlation-based ranking formula and Gram-Schmidt orthogonalization. Finally, the preprocessed data is used for training LS-SVM classifiers. SPDE, an evolutionary algorithm, is used to obtain optimal hyperparameters for the LS-SVM classifier, such as error penalty parameter C and kernel-specific hyperparameters. A simple score penalty is used to adapt the SPDE algorithm to the task of selecting classifiers with best performance measures values.

  7. SVM classifier on chip for melanoma detection.

    Science.gov (United States)

    Afifi, Shereen; GholamHosseini, Hamid; Sinha, Roopak

    2017-07-01

    Support Vector Machine (SVM) is a common classifier used for efficient classification with high accuracy. SVM shows high accuracy for classifying melanoma (skin cancer) clinical images within computer-aided diagnosis systems used by skin cancer specialists to detect melanoma early and save lives. We aim to develop a medical low-cost handheld device that runs a real-time embedded SVM-based diagnosis system for use in primary care for early detection of melanoma. In this paper, an optimized SVM classifier is implemented onto a recent FPGA platform using the latest design methodology to be embedded into the proposed device for realizing online efficient melanoma detection on a single system on chip/device. The hardware implementation results demonstrate a high classification accuracy of 97.9% and a significant acceleration factor of 26 from equivalent software implementation on an embedded processor, with 34% of resources utilization and 2 watts for power consumption. Consequently, the implemented system meets crucial embedded systems constraints of high performance and low cost, resources utilization and power consumption, while achieving high classification accuracy.

  8. Robust Framework to Combine Diverse Classifiers Assigning Distributed Confidence to Individual Classifiers at Class Level

    Directory of Open Access Journals (Sweden)

    Shehzad Khalid

    2014-01-01

    Full Text Available We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes.

  9. The Protection of Classified Information: The Legal Framework

    National Research Council Canada - National Science Library

    Elsea, Jennifer K

    2006-01-01

    Recent incidents involving leaks of classified information have heightened interest in the legal framework that governs security classification, access to classified information, and penalties for improper disclosure...

  10. Peretinoin, an acyclic retinoid, improves the hepatic gene signature of chronic hepatitis C following curative therapy of hepatocellular carcinoma

    International Nuclear Information System (INIS)

    Honda, Masao; Yamashita, Taro; Yamashita, Tatsuya; Arai, Kuniaki; Sakai, Yoshio; Sakai, Akito; Nakamura, Mikiko; Mizukoshi, Eishiro; Kaneko, Shuichi

    2013-01-01

    The acyclic retinoid, peretinoin, has been shown to be effective for suppressing hepatocellular carcinoma (HCC) recurrence after definitive treatment in a small-scale randomized clinical trial. However, little has been documented about the mechanism by which peretinoin exerts its inhibitory effects against recurrent HCC in humans in vivo. Twelve hepatitis C virus-positive patients whose HCC had been eradicated through curative resection or ablation underwent liver biopsy at baseline and week 8 of treatment with either a daily dose of 300 or 600 mg peretinoin. RNA isolated from biopsy samples was subjected to gene expression profile analysis. Peretinoin treatment elevated the expression levels of IGFBP6, RBP1, PRB4, CEBPA, G0S2, TGM2, GPRC5A, CYP26B1, and many other retinoid target genes. Elevated expression was also observed for interferon-, Wnt-, and tumor suppressor-related genes. By contrast, decreased expression levels were found for mTOR- and tumor progression-related genes. Interestingly, gene expression profiles for week 8 of peretinoin treatment could be classified into two groups of recurrence and non-recurrence with a prediction accuracy rate of 79.6% (P<0.05). In the liver of patients with non-recurrence, expression of PDGFC and other angiogenesis genes, cancer stem cell marker genes, and genes related to tumor progression was down-regulated, while expression of genes related to hepatocyte differentiation, tumor suppression genes, and other genes related to apoptosis induction was up-regulated. Gene expression profiling at week 8 of peretinoin treatment could successfully predict HCC recurrence within 2 years. This study is the first to show the effect of peretinoin in suppressing HCC recurrence in vivo based on gene expression profiles and provides a molecular basis for understanding the efficacy of peretinoin

  11. Classifying smoking urges via machine learning.

    Science.gov (United States)

    Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

    2016-12-01

    Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights

  12. Classifying spaces of degenerating polarized Hodge structures

    CERN Document Server

    Kato, Kazuya

    2009-01-01

    In 1970, Phillip Griffiths envisioned that points at infinity could be added to the classifying space D of polarized Hodge structures. In this book, Kazuya Kato and Sampei Usui realize this dream by creating a logarithmic Hodge theory. They use the logarithmic structures begun by Fontaine-Illusie to revive nilpotent orbits as a logarithmic Hodge structure. The book focuses on two principal topics. First, Kato and Usui construct the fine moduli space of polarized logarithmic Hodge structures with additional structures. Even for a Hermitian symmetric domain D, the present theory is a refinem

  13. Gearbox Condition Monitoring Using Advanced Classifiers

    Directory of Open Access Journals (Sweden)

    P. Večeř

    2010-01-01

    Full Text Available New efficient and reliable methods for gearbox diagnostics are needed in automotive industry because of growing demand for production quality. This paper presents the application of two different classifiers for gearbox diagnostics – Kohonen Neural Networks and the Adaptive-Network-based Fuzzy Interface System (ANFIS. Two different practical applications are presented. In the first application, the tested gearboxes are separated into two classes according to their condition indicators. In the second example, ANFIS is applied to label the tested gearboxes with a Quality Index according to the condition indicators. In both applications, the condition indicators were computed from the vibration of the gearbox housing. 

  14. Cubical sets as a classifying topos

    DEFF Research Database (Denmark)

    Spitters, Bas

    Coquand’s cubical set model for homotopy type theory provides the basis for a computational interpretation of the univalence axiom and some higher inductive types, as implemented in the cubical proof assistant. We show that the underlying cube category is the opposite of the Lawvere theory of De...... Morgan algebras. The topos of cubical sets itself classifies the theory of ‘free De Morgan algebras’. This provides us with a topos with an internal ‘interval’. Using this interval we construct a model of type theory following van den Berg and Garner. We are currently investigating the precise relation...

  15. Double Ramp Loss Based Reject Option Classifier

    Science.gov (United States)

    2015-05-22

    of convex (DC) functions. To minimize it, we use DC programming approach [1]. The proposed method has following advantages: (1) the proposed loss LDR ...space constraints. We see that LDR does not put any restriction on ρ for it to be an upper bound of L0−d−1. 2.2 Risk Formulation Using LDR Let S = {(xn...classifier learnt using LDR based approach (C = 100, μ = 1, d = .2). Filled circles and triangles represent the support vectors. 4 Experimental Results We show

  16. Classifying Coding DNA with Nucleotide Statistics

    Directory of Open Access Journals (Sweden)

    Nicolas Carels

    2009-10-01

    Full Text Available In this report, we compared the success rate of classification of coding sequences (CDS vs. introns by Codon Structure Factor (CSF and by a method that we called Universal Feature Method (UFM. UFM is based on the scoring of purine bias (Rrr and stop codon frequency. We show that the success rate of CDS/intron classification by UFM is higher than by CSF. UFM classifies ORFs as coding or non-coding through a score based on (i the stop codon distribution, (ii the product of purine probabilities in the three positions of nucleotide triplets, (iii the product of Cytosine (C, Guanine (G, and Adenine (A probabilities in the 1st, 2nd, and 3rd positions of triplets, respectively, (iv the probabilities of G in 1st and 2nd position of triplets and (v the distance of their GC3 vs. GC2 levels to the regression line of the universal correlation. More than 80% of CDSs (true positives of Homo sapiens (>250 bp, Drosophila melanogaster (>250 bp and Arabidopsis thaliana (>200 bp are successfully classified with a false positive rate lower or equal to 5%. The method releases coding sequences in their coding strand and coding frame, which allows their automatic translation into protein sequences with 95% confidence. The method is a natural consequence of the compositional bias of nucleotides in coding sequences.

  17. A systematic comparison of supervised classifiers.

    Directory of Open Access Journals (Sweden)

    Diego Raphael Amancio

    Full Text Available Pattern recognition has been employed in a myriad of industrial, commercial and academic applications. Many techniques have been devised to tackle such a diversity of applications. Despite the long tradition of pattern recognition research, there is no technique that yields the best classification in all scenarios. Therefore, as many techniques as possible should be considered in high accuracy applications. Typical related works either focus on the performance of a given algorithm or compare various classification methods. In many occasions, however, researchers who are not experts in the field of machine learning have to deal with practical classification tasks without an in-depth knowledge about the underlying parameters. Actually, the adequate choice of classifiers and parameters in such practical circumstances constitutes a long-standing problem and is one of the subjects of the current paper. We carried out a performance study of nine well-known classifiers implemented in the Weka framework and compared the influence of the parameter configurations on the accuracy. The default configuration of parameters in Weka was found to provide near optimal performance for most cases, not including methods such as the support vector machine (SVM. In addition, the k-nearest neighbor method frequently allowed the best accuracy. In certain conditions, it was possible to improve the quality of SVM by more than 20% with respect to their default parameter configuration.

  18. STATISTICAL TOOLS FOR CLASSIFYING GALAXY GROUP DYNAMICS

    International Nuclear Information System (INIS)

    Hou, Annie; Parker, Laura C.; Harris, William E.; Wilman, David J.

    2009-01-01

    The dynamical state of galaxy groups at intermediate redshifts can provide information about the growth of structure in the universe. We examine three goodness-of-fit tests, the Anderson-Darling (A-D), Kolmogorov, and χ 2 tests, in order to determine which statistical tool is best able to distinguish between groups that are relaxed and those that are dynamically complex. We perform Monte Carlo simulations of these three tests and show that the χ 2 test is profoundly unreliable for groups with fewer than 30 members. Power studies of the Kolmogorov and A-D tests are conducted to test their robustness for various sample sizes. We then apply these tests to a sample of the second Canadian Network for Observational Cosmology Redshift Survey (CNOC2) galaxy groups and find that the A-D test is far more reliable and powerful at detecting real departures from an underlying Gaussian distribution than the more commonly used χ 2 and Kolmogorov tests. We use this statistic to classify a sample of the CNOC2 groups and find that 34 of 106 groups are inconsistent with an underlying Gaussian velocity distribution, and thus do not appear relaxed. In addition, we compute velocity dispersion profiles (VDPs) for all groups with more than 20 members and compare the overall features of the Gaussian and non-Gaussian groups, finding that the VDPs of the non-Gaussian groups are distinct from those classified as Gaussian.

  19. Mercury⊕: An evidential reasoning image classifier

    Science.gov (United States)

    Peddle, Derek R.

    1995-12-01

    MERCURY⊕ is a multisource evidential reasoning classification software system based on the Dempster-Shafer theory of evidence. The design and implementation of this software package is described for improving the classification and analysis of multisource digital image data necessary for addressing advanced environmental and geoscience applications. In the remote-sensing context, the approach provides a more appropriate framework for classifying modern, multisource, and ancillary data sets which may contain a large number of disparate variables with different statistical properties, scales of measurement, and levels of error which cannot be handled using conventional Bayesian approaches. The software uses a nonparametric, supervised approach to classification, and provides a more objective and flexible interface to the evidential reasoning framework using a frequency-based method for computing support values from training data. The MERCURY⊕ software package has been implemented efficiently in the C programming language, with extensive use made of dynamic memory allocation procedures and compound linked list and hash-table data structures to optimize the storage and retrieval of evidence in a Knowledge Look-up Table. The software is complete with a full user interface and runs under Unix, Ultrix, VAX/VMS, MS-DOS, and Apple Macintosh operating system. An example of classifying alpine land cover and permafrost active layer depth in northern Canada is presented to illustrate the use and application of these ideas.

  20. 36 CFR 1256.46 - National security-classified information.

    Science.gov (United States)

    2010-07-01

    ... 36 Parks, Forests, and Public Property 3 2010-07-01 2010-07-01 false National security-classified... Restrictions § 1256.46 National security-classified information. In accordance with 5 U.S.C. 552(b)(1), NARA... properly classified under the provisions of the pertinent Executive Order on Classified National Security...

  1. Two channel EEG thought pattern classifier.

    Science.gov (United States)

    Craig, D A; Nguyen, H T; Burchey, H A

    2006-01-01

    This paper presents a real-time electro-encephalogram (EEG) identification system with the goal of achieving hands free control. With two EEG electrodes placed on the scalp of the user, EEG signals are amplified and digitised directly using a ProComp+ encoder and transferred to the host computer through the RS232 interface. Using a real-time multilayer neural network, the actual classification for the control of a powered wheelchair has a very fast response. It can detect changes in the user's thought pattern in 1 second. Using only two EEG electrodes at positions O(1) and C(4) the system can classify three mental commands (forward, left and right) with an accuracy of more than 79 %

  2. Classifying Drivers' Cognitive Load Using EEG Signals.

    Science.gov (United States)

    Barua, Shaibal; Ahmed, Mobyen Uddin; Begum, Shahina

    2017-01-01

    A growing traffic safety issue is the effect of cognitive loading activities on traffic safety and driving performance. To monitor drivers' mental state, understanding cognitive load is important since while driving, performing cognitively loading secondary tasks, for example talking on the phone, can affect the performance in the primary task, i.e. driving. Electroencephalography (EEG) is one of the reliable measures of cognitive load that can detect the changes in instantaneous load and effect of cognitively loading secondary task. In this driving simulator study, 1-back task is carried out while the driver performs three different simulated driving scenarios. This paper presents an EEG based approach to classify a drivers' level of cognitive load using Case-Based Reasoning (CBR). The results show that for each individual scenario as well as using data combined from the different scenarios, CBR based system achieved approximately over 70% of classification accuracy.

  3. Classifying prion and prion-like phenomena.

    Science.gov (United States)

    Harbi, Djamel; Harrison, Paul M

    2014-01-01

    The universe of prion and prion-like phenomena has expanded significantly in the past several years. Here, we overview the challenges in classifying this data informatically, given that terms such as "prion-like", "prion-related" or "prion-forming" do not have a stable meaning in the scientific literature. We examine the spectrum of proteins that have been described in the literature as forming prions, and discuss how "prion" can have a range of meaning, with a strict definition being for demonstration of infection with in vitro-derived recombinant prions. We suggest that although prion/prion-like phenomena can largely be apportioned into a small number of broad groups dependent on the type of transmissibility evidence for them, as new phenomena are discovered in the coming years, a detailed ontological approach might be necessary that allows for subtle definition of different "flavors" of prion / prion-like phenomena.

  4. Hybrid Neuro-Fuzzy Classifier Based On Nefclass Model

    Directory of Open Access Journals (Sweden)

    Bogdan Gliwa

    2011-01-01

    Full Text Available The paper presents hybrid neuro-fuzzy classifier, based on NEFCLASS model, which wasmodified. The presented classifier was compared to popular classifiers – neural networks andk-nearest neighbours. Efficiency of modifications in classifier was compared with methodsused in original model NEFCLASS (learning methods. Accuracy of classifier was testedusing 3 datasets from UCI Machine Learning Repository: iris, wine and breast cancer wisconsin.Moreover, influence of ensemble classification methods on classification accuracy waspresented.

  5. Classifying Transition Behaviour in Postural Activity Monitoring

    Directory of Open Access Journals (Sweden)

    James BRUSEY

    2009-10-01

    Full Text Available A few accelerometers positioned on different parts of the body can be used to accurately classify steady state behaviour, such as walking, running, or sitting. Such systems are usually built using supervised learning approaches. Transitions between postures are, however, difficult to deal with using posture classification systems proposed to date, since there is no label set for intermediary postures and also the exact point at which the transition occurs can sometimes be hard to pinpoint. The usual bypass when using supervised learning to train such systems is to discard a section of the dataset around each transition. This leads to poorer classification performance when the systems are deployed out of the laboratory and used on-line, particularly if the regimes monitored involve fast paced activity changes. Time-based filtering that takes advantage of sequential patterns is a potential mechanism to improve posture classification accuracy in such real-life applications. Also, such filtering should reduce the number of event messages needed to be sent across a wireless network to track posture remotely, hence extending the system’s life. To support time-based filtering, understanding transitions, which are the major event generators in a classification system, is a key. This work examines three approaches to post-process the output of a posture classifier using time-based filtering: a naïve voting scheme, an exponentially weighted voting scheme, and a Bayes filter. Best performance is obtained from the exponentially weighted voting scheme although it is suspected that a more sophisticated treatment of the Bayes filter might yield better results.

  6. Just-in-time adaptive classifiers-part II: designing the classifier.

    Science.gov (United States)

    Alippi, Cesare; Roveri, Manuel

    2008-12-01

    Aging effects, environmental changes, thermal drifts, and soft and hard faults affect physical systems by changing their nature and behavior over time. To cope with a process evolution adaptive solutions must be envisaged to track its dynamics; in this direction, adaptive classifiers are generally designed by assuming the stationary hypothesis for the process generating the data with very few results addressing nonstationary environments. This paper proposes a methodology based on k-nearest neighbor (NN) classifiers for designing adaptive classification systems able to react to changing conditions just-in-time (JIT), i.e., exactly when it is needed. k-NN classifiers have been selected for their computational-free training phase, the possibility to easily estimate the model complexity k and keep under control the computational complexity of the classifier through suitable data reduction mechanisms. A JIT classifier requires a temporal detection of a (possible) process deviation (aspect tackled in a companion paper) followed by an adaptive management of the knowledge base (KB) of the classifier to cope with the process change. The novelty of the proposed approach resides in the general framework supporting the real-time update of the KB of the classification system in response to novel information coming from the process both in stationary conditions (accuracy improvement) and in nonstationary ones (process tracking) and in providing a suitable estimate of k. It is shown that the classification system grants consistency once the change targets the process generating the data in a new stationary state, as it is the case in many real applications.

  7. A Critical Evaluation of Network and Pathway-Based Classifiers for Outcome Prediction in Breast Cancer

    NARCIS (Netherlands)

    C. Staiger (Christine); S. Cadot; R Kooter; M. Dittrich (Marcus); T. Müller (Tobias); G.W. Klau (Gunnar); L.F.A. Wessels (Lodewyk)

    2012-01-01

    htmlabstractRecently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, such as protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches, new composite features are typically

  8. FERAL : Network-based classifier with application to breast cancer outcome prediction

    NARCIS (Netherlands)

    Allahyar, A.; De Ridder, J.

    2015-01-01

    Motivation: Breast cancer outcome prediction based on gene expression profiles is an important strategy for personalize patient care. To improve performance and consistency of discovered markers of the initial molecular classifiers, network-based outcome prediction methods (NOPs) have been proposed.

  9. The Mycoplasma hominis vaa gene displays a mosaic gene structure

    DEFF Research Database (Denmark)

    Boesen, Thomas; Emmersen, Jeppe M. G.; Jensen, Lise T.

    1998-01-01

    Mycoplasma hominis contains a variable adherence-associated (vaa) gene. To classify variants of the vaa genes, we examined 42 M. hominis isolated by PCR, DNA sequencing and immunoblotting. This uncovered the existence of five gene categories. Comparison of the gene types revealed a modular...

  10. Classifying Adverse Events in the Dental Office.

    Science.gov (United States)

    Kalenderian, Elsbeth; Obadan-Udoh, Enihomo; Maramaldi, Peter; Etolue, Jini; Yansane, Alfa; Stewart, Denice; White, Joel; Vaderhobli, Ram; Kent, Karla; Hebballi, Nutan B; Delattre, Veronique; Kahn, Maria; Tokede, Oluwabunmi; Ramoni, Rachel B; Walji, Muhammad F

    2017-06-30

    Dentists strive to provide safe and effective oral healthcare. However, some patients may encounter an adverse event (AE) defined as "unnecessary harm due to dental treatment." In this research, we propose and evaluate two systems for categorizing the type and severity of AEs encountered at the dental office. Several existing medical AE type and severity classification systems were reviewed and adapted for dentistry. Using data collected in previous work, two initial dental AE type and severity classification systems were developed. Eight independent reviewers performed focused chart reviews, and AEs identified were used to evaluate and modify these newly developed classifications. A total of 958 charts were independently reviewed. Among the reviewed charts, 118 prospective AEs were found and 101 (85.6%) were verified as AEs through a consensus process. At the end of the study, a final AE type classification comprising 12 categories, and an AE severity classification comprising 7 categories emerged. Pain and infection were the most common AE types representing 73% of the cases reviewed (56% and 17%, respectively) and 88% were found to cause temporary, moderate to severe harm to the patient. Adverse events found during the chart review process were successfully classified using the novel dental AE type and severity classifications. Understanding the type of AEs and their severity are important steps if we are to learn from and prevent patient harm in the dental office.

  11. Is it important to classify ischaemic stroke?

    LENUS (Irish Health Repository)

    Iqbal, M

    2012-02-01

    Thirty-five percent of all ischemic events remain classified as cryptogenic. This study was conducted to ascertain the accuracy of diagnosis of ischaemic stroke based on information given in the medical notes. It was tested by applying the clinical information to the (TOAST) criteria. Hundred and five patients presented with acute stroke between Jan-Jun 2007. Data was collected on 90 patients. Male to female ratio was 39:51 with age range of 47-93 years. Sixty (67%) patients had total\\/partial anterior circulation stroke; 5 (5.6%) had a lacunar stroke and in 25 (28%) the mechanism of stroke could not be identified. Four (4.4%) patients with small vessel disease were anticoagulated; 5 (5.6%) with atrial fibrillation received antiplatelet therapy and 2 (2.2%) patients with atrial fibrillation underwent CEA. This study revealed deficiencies in the clinical assessment of patients and treatment was not tailored to the mechanism of stroke in some patients.

  12. Stress fracture development classified by bone scintigraphy

    International Nuclear Information System (INIS)

    Zwas, S.T.; Elkanovich, R.; Frank, G.; Aharonson, Z.

    1985-01-01

    There is no consensus on classifying stress fractures (SF) appearing on bone scans. The authors present a system of classification based on grading the severity and development of bone lesions by visual inspection, according to three main scintigraphic criteria: focality and size, intensity of uptake compare to adjacent bone, and local medular extension. Four grades of development (I-IV) were ranked, ranging from ill defined slightly increased cortical uptake to well defined regions with markedly increased uptake extending transversely bicortically. 310 male subjects aged 19-2, suffering several weeks from leg pains occurring during intensive physical training underwent bone scans of the pelvis and lower extremities using Tc-99-m-MDP. 76% of the scans were positive with 354 lesions, of which 88% were in th4e mild (I-II) grades and 12% in the moderate (III) and severe (IV) grades. Post-treatment scans were obtained in 65 cases having 78 lesions during 1- to 6-month intervals. Complete resolution was found after 1-2 months in 36% of the mild lesions but in only 12% of the moderate and severe ones, and after 3-6 months in 55% of the mild lesions and 15% of the severe ones. 75% of the moderate and severe lesions showed residual uptake in various stages throughout the follow-up period. Early recognition and treatment of mild SF lesions in this study prevented protracted disability and progression of the lesions and facilitated complete healing

  13. 41 CFR 105-62.102 - Authority to originally classify.

    Science.gov (United States)

    2010-07-01

    ... originally classify. (a) Top secret, secret, and confidential. The authority to originally classify information as Top Secret, Secret, or Confidential may be exercised only by the Administrator and is delegable...

  14. Naive Bayesian classifiers for multinomial features: a theoretical analysis

    CSIR Research Space (South Africa)

    Van Dyk, E

    2007-11-01

    Full Text Available The authors investigate the use of naive Bayesian classifiers for multinomial feature spaces and derive error estimates for these classifiers. The error analysis is done by developing a mathematical model to estimate the probability density...

  15. Ensemble of classifiers based network intrusion detection system performance bound

    CSIR Research Space (South Africa)

    Mkuzangwe, Nenekazi NP

    2017-11-01

    Full Text Available This paper provides a performance bound of a network intrusion detection system (NIDS) that uses an ensemble of classifiers. Currently researchers rely on implementing the ensemble of classifiers based NIDS before they can determine the performance...

  16. Fast Most Similar Neighbor (MSN) classifiers for Mixed Data

    OpenAIRE

    Hernández Rodríguez, Selene

    2010-01-01

    The k nearest neighbor (k-NN) classifier has been extensively used in Pattern Recognition because of its simplicity and its good performance. However, in large datasets applications, the exhaustive k-NN classifier becomes impractical. Therefore, many fast k-NN classifiers have been developed; most of them rely on metric properties (usually the triangle inequality) to reduce the number of prototype comparisons. Hence, the existing fast k-NN classifiers are applicable only when the comparison f...

  17. Three data partitioning strategies for building local classifiers (Chapter 14)

    NARCIS (Netherlands)

    Zliobaite, I.; Okun, O.; Valentini, G.; Re, M.

    2011-01-01

    Divide-and-conquer approach has been recognized in multiple classifier systems aiming to utilize local expertise of individual classifiers. In this study we experimentally investigate three strategies for building local classifiers that are based on different routines of sampling data for training.

  18. Recognition of pornographic web pages by classifying texts and images.

    Science.gov (United States)

    Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve

    2007-06-01

    With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.

  19. 32 CFR 2400.28 - Dissemination of classified information.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Dissemination of classified information. 2400.28... SECURITY PROGRAM Safeguarding § 2400.28 Dissemination of classified information. Heads of OSTP offices... originating official may prescribe specific restrictions on dissemination of classified information when...

  20. Peat classified as slowly renewable biomass fuel

    International Nuclear Information System (INIS)

    2001-01-01

    thousands of years. The report states also that peat should be classified as biomass fuel instead of biofuels, such as wood, or fossil fuels such as coal. According to the report peat is a renewable biomass fuel like biofuels, but due to slow accumulation it should be considered as slowly renewable fuel. The report estimates that bonding of carbon in both virgin and forest drained peatlands are so high that it can compensate the emissions formed in combustion of energy peat

  1. A Supervised Multiclass Classifier for an Autocoding System

    Directory of Open Access Journals (Sweden)

    Yukako Toko

    2017-11-01

    Full Text Available Classification is often required in various contexts, including in the field of official statistics. In the previous study, we have developed a multiclass classifier that can classify short text descriptions with high accuracy. The algorithm borrows the concept of the naïve Bayes classifier and is so simple that its structure is easily understandable. The proposed classifier has the following two advantages. First, the processing times for both learning and classifying are extremely practical. Second, the proposed classifier yields high-accuracy results for a large portion of a dataset. We have previously developed an autocoding system for the Family Income and Expenditure Survey in Japan that has a better performing classifier. While the original system was developed in Perl in order to improve the efficiency of the coding process of short Japanese texts, the proposed system is implemented in the R programming language in order to explore versatility and is modified to make the system easily applicable to English text descriptions, in consideration of the increasing number of R users in the field of official statistics. We are planning to publish the proposed classifier as an R-package. The proposed classifier would be generally applicable to other classification tasks including coding activities in the field of official statistics, and it would contribute greatly to improving their efficiency.

  2. 18 CFR 3a.12 - Authority to classify official information.

    Science.gov (United States)

    2010-04-01

    ... efficient administration. (b) The authority to classify information or material originally as Top Secret is... classify information or material originally as Secret is exercised only by: (1) Officials who have Top... information or material originally as Confidential is exercised by officials who have Top Secret or Secret...

  3. Using Neural Networks to Classify Digitized Images of Galaxies

    Science.gov (United States)

    Goderya, S. N.; McGuire, P. C.

    2000-12-01

    Automated classification of Galaxies into Hubble types is of paramount importance to study the large scale structure of the Universe, particularly as survey projects like the Sloan Digital Sky Survey complete their data acquisition of one million galaxies. At present it is not possible to find robust and efficient artificial intelligence based galaxy classifiers. In this study we will summarize progress made in the development of automated galaxy classifiers using neural networks as machine learning tools. We explore the Bayesian linear algorithm, the higher order probabilistic network, the multilayer perceptron neural network and Support Vector Machine Classifier. The performance of any machine classifier is dependant on the quality of the parameters that characterize the different groups of galaxies. Our effort is to develop geometric and invariant moment based parameters as input to the machine classifiers instead of the raw pixel data. Such an approach reduces the dimensionality of the classifier considerably, and removes the effects of scaling and rotation, and makes it easier to solve for the unknown parameters in the galaxy classifier. To judge the quality of training and classification we develop the concept of Mathews coefficients for the galaxy classification community. Mathews coefficients are single numbers that quantify classifier performance even with unequal prior probabilities of the classes.

  4. Fisher classifier and its probability of error estimation

    Science.gov (United States)

    Chittineni, C. B.

    1979-01-01

    Computationally efficient expressions are derived for estimating the probability of error using the leave-one-out method. The optimal threshold for the classification of patterns projected onto Fisher's direction is derived. A simple generalization of the Fisher classifier to multiple classes is presented. Computational expressions are developed for estimating the probability of error of the multiclass Fisher classifier.

  5. Performance of classification confidence measures in dynamic classifier systems

    Czech Academy of Sciences Publication Activity Database

    Štefka, D.; Holeňa, Martin

    2013-01-01

    Roč. 23, č. 4 (2013), s. 299-319 ISSN 1210-0552 R&D Projects: GA ČR GA13-17187S Institutional support: RVO:67985807 Keywords : classifier combining * dynamic classifier systems * classification confidence Subject RIV: IN - Informatics, Computer Science Impact factor: 0.412, year: 2013

  6. 32 CFR 2400.30 - Reproduction of classified information.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Reproduction of classified information. 2400.30... SECURITY PROGRAM Safeguarding § 2400.30 Reproduction of classified information. Documents or portions of... the originator or higher authority. Any stated prohibition against reproduction shall be strictly...

  7. Classifying spaces with virtually cyclic stabilizers for linear groups

    DEFF Research Database (Denmark)

    Degrijse, Dieter Dries; Köhl, Ralf; Petrosyan, Nansen

    2015-01-01

    We show that every discrete subgroup of GL(n, ℝ) admits a finite-dimensional classifying space with virtually cyclic stabilizers. Applying our methods to SL(3, ℤ), we obtain a four-dimensional classifying space with virtually cyclic stabilizers and a decomposition of the algebraic K-theory of its...

  8. Dynamic integration of classifiers in the space of principal components

    NARCIS (Netherlands)

    Tsymbal, A.; Pechenizkiy, M.; Puuronen, S.; Patterson, D.W.; Kalinichenko, L.A.; Manthey, R.; Thalheim, B.; Wloka, U.

    2003-01-01

    Recent research has shown the integration of multiple classifiers to be one of the most important directions in machine learning and data mining. It was shown that, for an ensemble to be successful, it should consist of accurate and diverse base classifiers. However, it is also important that the

  9. An ensemble of dissimilarity based classifiers for Mackerel gender determination

    International Nuclear Information System (INIS)

    Blanco, A; Rodriguez, R; Martinez-Maranon, I

    2014-01-01

    Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity

  10. An ensemble of dissimilarity based classifiers for Mackerel gender determination

    Science.gov (United States)

    Blanco, A.; Rodriguez, R.; Martinez-Maranon, I.

    2014-03-01

    Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity.

  11. Just-in-time classifiers for recurrent concepts.

    Science.gov (United States)

    Alippi, Cesare; Boracchi, Giacomo; Roveri, Manuel

    2013-04-01

    Just-in-time (JIT) classifiers operate in evolving environments by classifying instances and reacting to concept drift. In stationary conditions, a JIT classifier improves its accuracy over time by exploiting additional supervised information coming from the field. In nonstationary conditions, however, the classifier reacts as soon as concept drift is detected; the current classification setup is discarded and a suitable one activated to keep the accuracy high. We present a novel generation of JIT classifiers able to deal with recurrent concept drift by means of a practical formalization of the concept representation and the definition of a set of operators working on such representations. The concept-drift detection activity, which is crucial in promptly reacting to changes exactly when needed, is advanced by considering change-detection tests monitoring both inputs and classes distributions.

  12. Class-specific Error Bounds for Ensemble Classifiers

    Energy Technology Data Exchange (ETDEWEB)

    Prenger, R; Lemmond, T; Varshney, K; Chen, B; Hanley, W

    2009-10-06

    The generalization error, or probability of misclassification, of ensemble classifiers has been shown to be bounded above by a function of the mean correlation between the constituent (i.e., base) classifiers and their average strength. This bound suggests that increasing the strength and/or decreasing the correlation of an ensemble's base classifiers may yield improved performance under the assumption of equal error costs. However, this and other existing bounds do not directly address application spaces in which error costs are inherently unequal. For applications involving binary classification, Receiver Operating Characteristic (ROC) curves, performance curves that explicitly trade off false alarms and missed detections, are often utilized to support decision making. To address performance optimization in this context, we have developed a lower bound for the entire ROC curve that can be expressed in terms of the class-specific strength and correlation of the base classifiers. We present empirical analyses demonstrating the efficacy of these bounds in predicting relative classifier performance. In addition, we specify performance regions of the ROC curve that are naturally delineated by the class-specific strengths of the base classifiers and show that each of these regions can be associated with a unique set of guidelines for performance optimization of binary classifiers within unequal error cost regimes.

  13. Frog sound identification using extended k-nearest neighbor classifier

    Science.gov (United States)

    Mukahar, Nordiana; Affendi Rosdi, Bakhtiar; Athiar Ramli, Dzati; Jaafar, Haryati

    2017-09-01

    Frog sound identification based on the vocalization becomes important for biological research and environmental monitoring. As a result, different types of feature extractions and classifiers have been employed to evaluate the accuracy of frog sound identification. This paper presents a frog sound identification with Extended k-Nearest Neighbor (EKNN) classifier. The EKNN classifier integrates the nearest neighbors and mutual sharing of neighborhood concepts, with the aims of improving the classification performance. It makes a prediction based on who are the nearest neighbors of the testing sample and who consider the testing sample as their nearest neighbors. In order to evaluate the classification performance in frog sound identification, the EKNN classifier is compared with competing classifier, k -Nearest Neighbor (KNN), Fuzzy k -Nearest Neighbor (FKNN) k - General Nearest Neighbor (KGNN)and Mutual k -Nearest Neighbor (MKNN) on the recorded sounds of 15 frog species obtained in Malaysia forest. The recorded sounds have been segmented using Short Time Energy and Short Time Average Zero Crossing Rate (STE+STAZCR), sinusoidal modeling (SM), manual and the combination of Energy (E) and Zero Crossing Rate (ZCR) (E+ZCR) while the features are extracted by Mel Frequency Cepstrum Coefficient (MFCC). The experimental results have shown that the EKNCN classifier exhibits the best performance in terms of accuracy compared to the competing classifiers, KNN, FKNN, GKNN and MKNN for all cases.

  14. Pharmacokinetic Tumor Heterogeneity as a Prognostic Biomarker for Classifying Breast Cancer Recurrence Risk.

    Science.gov (United States)

    Mahrooghy, Majid; Ashraf, Ahmed B; Daye, Dania; McDonald, Elizabeth S; Rosen, Mark; Mies, Carolyn; Feldman, Michael; Kontos, Despina

    2015-06-01

    Heterogeneity in cancer can affect response to therapy and patient prognosis. Histologic measures have classically been used to measure heterogeneity, although a reliable noninvasive measurement is needed both to establish baseline risk of recurrence and monitor response to treatment. Here, we propose using spatiotemporal wavelet kinetic features from dynamic contrast-enhanced magnetic resonance imaging to quantify intratumor heterogeneity in breast cancer. Tumor pixels are first partitioned into homogeneous subregions using pharmacokinetic measures. Heterogeneity wavelet kinetic (HetWave) features are then extracted from these partitions to obtain spatiotemporal patterns of the wavelet coefficients and the contrast agent uptake. The HetWave features are evaluated in terms of their prognostic value using a logistic regression classifier with genetic algorithm wrapper-based feature selection to classify breast cancer recurrence risk as determined by a validated gene expression assay. Receiver operating characteristic analysis and area under the curve (AUC) are computed to assess classifier performance using leave-one-out cross validation. The HetWave features outperform other commonly used features (AUC = 0.88 HetWave versus 0.70 standard features). The combination of HetWave and standard features further increases classifier performance (AUCs 0.94). The rate of the spatial frequency pattern over the pharmacokinetic partitions can provide valuable prognostic information. HetWave could be a powerful feature extraction approach for characterizing tumor heterogeneity, providing valuable prognostic information.

  15. Ship localization in Santa Barbara Channel using machine learning classifiers.

    Science.gov (United States)

    Niu, Haiqiang; Ozanich, Emma; Gerstoft, Peter

    2017-11-01

    Machine learning classifiers are shown to outperform conventional matched field processing for a deep water (600 m depth) ocean acoustic-based ship range estimation problem in the Santa Barbara Channel Experiment when limited environmental information is known. Recordings of three different ships of opportunity on a vertical array were used as training and test data for the feed-forward neural network and support vector machine classifiers, demonstrating the feasibility of machine learning methods to locate unseen sources. The classifiers perform well up to 10 km range whereas the conventional matched field processing fails at about 4 km range without accurate environmental information.

  16. Splicing analysis of 14 BRCA1 missense variants classifies nine variants as pathogenic

    DEFF Research Database (Denmark)

    Ahlborn, Lise B; Dandanell, Mette; Steffensen, Ane Y

    2015-01-01

    by functional analysis at the protein level. Results from a validated mini-gene splicing assay indicated that nine BRCA1 variants resulted in splicing aberrations leading to truncated transcripts and thus can be considered pathogenic (c.4987A>T/p.Met1663Leu, c.4988T>A/p.Met1663Lys, c.5072C>T/p.Thr1691Ile, c......Pathogenic germline mutations in the BRCA1 gene predispose carriers to early onset breast and ovarian cancer. Clinical genetic screening of BRCA1 often reveals variants with uncertain clinical significance, complicating patient and family management. Therefore, functional examinations are urgently...... needed to classify whether these uncertain variants are pathogenic or benign. In this study, we investigated 14 BRCA1 variants by in silico splicing analysis and mini-gene splicing assay. All 14 alterations were missense variants located within the BRCT domain of BRCA1 and had previously been examined...

  17. Genes and Gene Therapy

    Science.gov (United States)

    ... correctly, a child can have a genetic disorder. Gene therapy is an experimental technique that uses genes to ... or prevent disease. The most common form of gene therapy involves inserting a normal gene to replace an ...

  18. Classifying hot water chemistry: Application of MULTIVARIATE STATISTICS

    OpenAIRE

    Sumintadireja, Prihadi; Irawan, Dasapta Erwin; Rezky, Yuanno; Gio, Prana Ugiana; Agustin, Anggita

    2016-01-01

    This file is the dataset for the following paper "Classifying hot water chemistry: Application of MULTIVARIATE STATISTICS". Authors: Prihadi Sumintadireja1, Dasapta Erwin Irawan1, Yuano Rezky2, Prana Ugiana Gio3, Anggita Agustin1

  19. Robust Combining of Disparate Classifiers Through Order Statistics

    Science.gov (United States)

    Tumer, Kagan; Ghosh, Joydeep

    2001-01-01

    Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.

  20. Using Statistical Process Control Methods to Classify Pilot Mental Workloads

    National Research Council Canada - National Science Library

    Kudo, Terence

    2001-01-01

    .... These include cardiac, ocular, respiratory, and brain activity measures. The focus of this effort is to apply statistical process control methodology on different psychophysiological features in an attempt to classify pilot mental workload...

  1. An ensemble classifier to predict track geometry degradation

    International Nuclear Information System (INIS)

    Cárdenas-Gallo, Iván; Sarmiento, Carlos A.; Morales, Gilberto A.; Bolivar, Manuel A.; Akhavan-Tabatabaei, Raha

    2017-01-01

    Railway operations are inherently complex and source of several problems. In particular, track geometry defects are one of the leading causes of train accidents in the United States. This paper presents a solution approach which entails the construction of an ensemble classifier to forecast the degradation of track geometry. Our classifier is constructed by solving the problem from three different perspectives: deterioration, regression and classification. We considered a different model from each perspective and our results show that using an ensemble method improves the predictive performance. - Highlights: • We present an ensemble classifier to forecast the degradation of track geometry. • Our classifier considers three perspectives: deterioration, regression and classification. • We construct and test three models and our results show that using an ensemble method improves the predictive performance.

  2. A novel statistical method for classifying habitat generalists and specialists

    DEFF Research Database (Denmark)

    Chazdon, Robin L; Chao, Anne; Colwell, Robert K

    2011-01-01

    in second-growth (SG) and old-growth (OG) rain forests in the Caribbean lowlands of northeastern Costa Rica. We evaluate the multinomial model in detail for the tree data set. Our results for birds were highly concordant with a previous nonstatistical classification, but our method classified a higher......: (1) generalist; (2) habitat A specialist; (3) habitat B specialist; and (4) too rare to classify with confidence. We illustrate our multinomial classification method using two contrasting data sets: (1) bird abundance in woodland and heath habitats in southeastern Australia and (2) tree abundance...... fraction (57.7%) of bird species with statistical confidence. Based on a conservative specialization threshold and adjustment for multiple comparisons, 64.4% of tree species in the full sample were too rare to classify with confidence. Among the species classified, OG specialists constituted the largest...

  3. 6 CFR 7.23 - Emergency release of classified information.

    Science.gov (United States)

    2010-01-01

    ... Classified Information Non-disclosure Form. In emergency situations requiring immediate verbal release of... information through approved communication channels by the most secure and expeditious method possible, or by...

  4. DECISION TREE CLASSIFIERS FOR STAR/GALAXY SEPARATION

    International Nuclear Information System (INIS)

    Vasconcellos, E. C.; Ruiz, R. S. R.; De Carvalho, R. R.; Capelato, H. V.; Gal, R. R.; LaBarbera, F. L.; Frago Campos Velho, H.; Trevisan, M.

    2011-01-01

    We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: 14 ≤ r ≤ 21 (85.2%) and r ≥ 19 (82.1%). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT, and Ball et al. We find that our FT classifier is comparable to or better in completeness over the full magnitude range 15 ≤ r ≤ 21, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination (∼2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 ≤ r ≤ 21.

  5. Drug target ontology to classify and integrate drug discovery data.

    Science.gov (United States)

    Lin, Yu; Mehta, Saurabh; Küçük-McGinty, Hande; Turner, John Paul; Vidovic, Dusica; Forlin, Michele; Koleti, Amar; Nguyen, Dac-Trung; Jensen, Lars Juhl; Guha, Rajarshi; Mathias, Stephen L; Ursu, Oleg; Stathias, Vasileios; Duan, Jianbin; Nabizadeh, Nooshin; Chung, Caty; Mader, Christopher; Visser, Ubbo; Yang, Jeremy J; Bologa, Cristian G; Oprea, Tudor I; Schürer, Stephan C

    2017-11-09

    One of the most successful approaches to develop new small molecule therapeutics has been to start from a validated druggable protein target. However, only a small subset of potentially druggable targets has attracted significant research and development resources. The Illuminating the Druggable Genome (IDG) project develops resources to catalyze the development of likely targetable, yet currently understudied prospective drug targets. A central component of the IDG program is a comprehensive knowledge resource of the druggable genome. As part of that effort, we have developed a framework to integrate, navigate, and analyze drug discovery data based on formalized and standardized classifications and annotations of druggable protein targets, the Drug Target Ontology (DTO). DTO was constructed by extensive curation and consolidation of various resources. DTO classifies the four major drug target protein families, GPCRs, kinases, ion channels and nuclear receptors, based on phylogenecity, function, target development level, disease association, tissue expression, chemical ligand and substrate characteristics, and target-family specific characteristics. The formal ontology was built using a new software tool to auto-generate most axioms from a database while supporting manual knowledge acquisition. A modular, hierarchical implementation facilitate ontology development and maintenance and makes use of various external ontologies, thus integrating the DTO into the ecosystem of biomedical ontologies. As a formal OWL-DL ontology, DTO contains asserted and inferred axioms. Modeling data from the Library of Integrated Network-based Cellular Signatures (LINCS) program illustrates the potential of DTO for contextual data integration and nuanced definition of important drug target characteristics. DTO has been implemented in the IDG user interface Portal, Pharos and the TIN-X explorer of protein target disease relationships. DTO was built based on the need for a formal semantic

  6. Local-global classifier fusion for screening chest radiographs

    Science.gov (United States)

    Ding, Meng; Antani, Sameer; Jaeger, Stefan; Xue, Zhiyun; Candemir, Sema; Kohli, Marc; Thoma, George

    2017-03-01

    Tuberculosis (TB) is a severe comorbidity of HIV and chest x-ray (CXR) analysis is a necessary step in screening for the infective disease. Automatic analysis of digital CXR images for detecting pulmonary abnormalities is critical for population screening, especially in medical resource constrained developing regions. In this article, we describe steps that improve previously reported performance of NLM's CXR screening algorithms and help advance the state of the art in the field. We propose a local-global classifier fusion method where two complementary classification systems are combined. The local classifier focuses on subtle and partial presentation of the disease leveraging information in radiology reports that roughly indicates locations of the abnormalities. In addition, the global classifier models the dominant spatial structure in the gestalt image using GIST descriptor for the semantic differentiation. Finally, the two complementary classifiers are combined using linear fusion, where the weight of each decision is calculated by the confidence probabilities from the two classifiers. We evaluated our method on three datasets in terms of the area under the Receiver Operating Characteristic (ROC) curve, sensitivity, specificity and accuracy. The evaluation demonstrates the superiority of our proposed local-global fusion method over any single classifier.

  7. Verification of classified fissile material using unclassified attributes

    International Nuclear Information System (INIS)

    Nicholas, N.J.; Fearey, B.L.; Puckett, J.M.; Tape, J.W.

    1998-01-01

    This paper reports on the most recent efforts of US technical experts to explore verification by IAEA of unclassified attributes of classified excess fissile material. Two propositions are discussed: (1) that multiple unclassified attributes could be declared by the host nation and then verified (and reverified) by the IAEA in order to provide confidence in that declaration of a classified (or unclassified) inventory while protecting classified or sensitive information; and (2) that attributes could be measured, remeasured, or monitored to provide continuity of knowledge in a nonintrusive and unclassified manner. They believe attributes should relate to characteristics of excess weapons materials and should be verifiable and authenticatable with methods usable by IAEA inspectors. Further, attributes (along with the methods to measure them) must not reveal any classified information. The approach that the authors have taken is as follows: (1) assume certain attributes of classified excess material, (2) identify passive signatures, (3) determine range of applicable measurement physics, (4) develop a set of criteria to assess and select measurement technologies, (5) select existing instrumentation for proof-of-principle measurements and demonstration, and (6) develop and design information barriers to protect classified information. While the attribute verification concepts and measurements discussed in this paper appear promising, neither the attribute verification approach nor the measurement technologies have been fully developed, tested, and evaluated

  8. A cardiorespiratory classifier of voluntary and involuntary electrodermal activity

    Directory of Open Access Journals (Sweden)

    Sejdic Ervin

    2010-02-01

    Full Text Available Abstract Background Electrodermal reactions (EDRs can be attributed to many origins, including spontaneous fluctuations of electrodermal activity (EDA and stimuli such as deep inspirations, voluntary mental activity and startling events. In fields that use EDA as a measure of psychophysiological state, the fact that EDRs may be elicited from many different stimuli is often ignored. This study attempts to classify observed EDRs as voluntary (i.e., generated from intentional respiratory or mental activity or involuntary (i.e., generated from startling events or spontaneous electrodermal fluctuations. Methods Eight able-bodied participants were subjected to conditions that would cause a change in EDA: music imagery, startling noises, and deep inspirations. A user-centered cardiorespiratory classifier consisting of 1 an EDR detector, 2 a respiratory filter and 3 a cardiorespiratory filter was developed to automatically detect a participant's EDRs and to classify the origin of their stimulation as voluntary or involuntary. Results Detected EDRs were classified with a positive predictive value of 78%, a negative predictive value of 81% and an overall accuracy of 78%. Without the classifier, EDRs could only be correctly attributed as voluntary or involuntary with an accuracy of 50%. Conclusions The proposed classifier may enable investigators to form more accurate interpretations of electrodermal activity as a measure of an individual's psychophysiological state.

  9. Balanced sensitivity functions for tuning multi-dimensional Bayesian network classifiers

    NARCIS (Netherlands)

    Bolt, J.H.; van der Gaag, L.C.

    Multi-dimensional Bayesian network classifiers are Bayesian networks of restricted topological structure, which are tailored to classifying data instances into multiple dimensions. Like more traditional classifiers, multi-dimensional classifiers are typically learned from data and may include

  10. Nonparametric, Coupled ,Bayesian ,Dictionary ,and Classifier Learning for Hyperspectral Classification.

    Science.gov (United States)

    Akhtar, Naveed; Mian, Ajmal

    2017-10-03

    We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.

  11. Classifying a smoker scale in adult daily and nondaily smokers.

    Science.gov (United States)

    Pulvers, Kim; Scheuermann, Taneisha S; Romero, Devan R; Basora, Brittany; Luo, Xianghua; Ahluwalia, Jasjit S

    2014-05-01

    Smoker identity, or the strength of beliefs about oneself as a smoker, is a robust marker of smoking behavior. However, many nondaily smokers do not identify as smokers, underestimating their risk for tobacco-related disease and resulting in missed intervention opportunities. Assessing underlying beliefs about characteristics used to classify smokers may help explain the discrepancy between smoking behavior and smoker identity. This study examines the factor structure, reliability, and validity of the Classifying a Smoker scale among a racially diverse sample of adult smokers. A cross-sectional survey was administered through an online panel survey service to 2,376 current smokers who were at least 25 years of age. The sample was stratified to obtain equal numbers of 3 racial/ethnic groups (African American, Latino, and White) across smoking level (nondaily and daily smoking). The Classifying a Smoker scale displayed a single factor structure and excellent internal consistency (α = .91). Classifying a Smoker scores significantly increased at each level of smoking, F(3,2375) = 23.68, p smoker identity, stronger dependence on cigarettes, greater health risk perceptions, more smoking friends, and were more likely to carry cigarettes. Classifying a Smoker scores explained unique variance in smoking variables above and beyond that explained by smoker identity. The present study supports the use of the Classifying a Smoker scale among diverse, experienced smokers. Stronger endorsement of characteristics used to classify a smoker (i.e., stricter criteria) was positively associated with heavier smoking and related characteristics. Prospective studies are needed to inform prevention and treatment efforts.

  12. Representative Vector Machines: A Unified Framework for Classical Classifiers.

    Science.gov (United States)

    Gui, Jie; Liu, Tongliang; Tao, Dacheng; Sun, Zhenan; Tan, Tieniu

    2016-08-01

    Classifier design is a fundamental problem in pattern recognition. A variety of pattern classification methods such as the nearest neighbor (NN) classifier, support vector machine (SVM), and sparse representation-based classification (SRC) have been proposed in the literature. These typical and widely used classifiers were originally developed from different theory or application motivations and they are conventionally treated as independent and specific solutions for pattern classification. This paper proposes a novel pattern classification framework, namely, representative vector machines (or RVMs for short). The basic idea of RVMs is to assign the class label of a test example according to its nearest representative vector. The contributions of RVMs are twofold. On one hand, the proposed RVMs establish a unified framework of classical classifiers because NN, SVM, and SRC can be interpreted as the special cases of RVMs with different definitions of representative vectors. Thus, the underlying relationship among a number of classical classifiers is revealed for better understanding of pattern classification. On the other hand, novel and advanced classifiers are inspired in the framework of RVMs. For example, a robust pattern classification method called discriminant vector machine (DVM) is motivated from RVMs. Given a test example, DVM first finds its k -NNs and then performs classification based on the robust M-estimator and manifold regularization. Extensive experimental evaluations on a variety of visual recognition tasks such as face recognition (Yale and face recognition grand challenge databases), object categorization (Caltech-101 dataset), and action recognition (Action Similarity LAbeliNg) demonstrate the advantages of DVM over other classifiers.

  13. Current Directional Protection of Series Compensated Line Using Intelligent Classifier

    Directory of Open Access Journals (Sweden)

    M. Mollanezhad Heydarabadi

    2016-12-01

    Full Text Available Current inversion condition leads to incorrect operation of current based directional relay in power system with series compensated device. Application of the intelligent system for fault direction classification has been suggested in this paper. A new current directional protection scheme based on intelligent classifier is proposed for the series compensated line. The proposed classifier uses only half cycle of pre-fault and post fault current samples at relay location to feed the classifier. A lot of forward and backward fault simulations under different system conditions upon a transmission line with a fixed series capacitor are carried out using PSCAD/EMTDC software. The applicability of decision tree (DT, probabilistic neural network (PNN and support vector machine (SVM are investigated using simulated data under different system conditions. The performance comparison of the classifiers indicates that the SVM is a best suitable classifier for fault direction discriminating. The backward faults can be accurately distinguished from forward faults even under current inversion without require to detect of the current inversion condition.

  14. Neural network classifier of attacks in IP telephony

    Science.gov (United States)

    Safarik, Jakub; Voznak, Miroslav; Mehic, Miralem; Partila, Pavol; Mikulec, Martin

    2014-05-01

    Various types of monitoring mechanism allow us to detect and monitor behavior of attackers in VoIP networks. Analysis of detected malicious traffic is crucial for further investigation and hardening the network. This analysis is typically based on statistical methods and the article brings a solution based on neural network. The proposed algorithm is used as a classifier of attacks in a distributed monitoring network of independent honeypot probes. Information about attacks on these honeypots is collected on a centralized server and then classified. This classification is based on different mechanisms. One of them is based on the multilayer perceptron neural network. The article describes inner structure of used neural network and also information about implementation of this network. The learning set for this neural network is based on real attack data collected from IP telephony honeypot called Dionaea. We prepare the learning set from real attack data after collecting, cleaning and aggregation of this information. After proper learning is the neural network capable to classify 6 types of most commonly used VoIP attacks. Using neural network classifier brings more accurate attack classification in a distributed system of honeypots. With this approach is possible to detect malicious behavior in a different part of networks, which are logically or geographically divided and use the information from one network to harden security in other networks. Centralized server for distributed set of nodes serves not only as a collector and classifier of attack data, but also as a mechanism for generating a precaution steps against attacks.

  15. Maximum margin classifier working in a set of strings.

    Science.gov (United States)

    Koyano, Hitoshi; Hayashida, Morihiro; Akutsu, Tatsuya

    2016-03-01

    Numbers and numerical vectors account for a large portion of data. However, recently, the amount of string data generated has increased dramatically. Consequently, classifying string data is a common problem in many fields. The most widely used approach to this problem is to convert strings into numerical vectors using string kernels and subsequently apply a support vector machine that works in a numerical vector space. However, this non-one-to-one conversion involves a loss of information and makes it impossible to evaluate, using probability theory, the generalization error of a learning machine, considering that the given data to train and test the machine are strings generated according to probability laws. In this study, we approach this classification problem by constructing a classifier that works in a set of strings. To evaluate the generalization error of such a classifier theoretically, probability theory for strings is required. Therefore, we first extend a limit theorem for a consensus sequence of strings demonstrated by one of the authors and co-workers in a previous study. Using the obtained result, we then demonstrate that our learning machine classifies strings in an asymptotically optimal manner. Furthermore, we demonstrate the usefulness of our machine in practical data analysis by applying it to predicting protein-protein interactions using amino acid sequences and classifying RNAs by the secondary structure using nucleotide sequences.

  16. Use of information barriers to protect classified information

    International Nuclear Information System (INIS)

    MacArthur, D.; Johnson, M.W.; Nicholas, N.J.; Whiteson, R.

    1998-01-01

    This paper discusses the detailed requirements for an information barrier (IB) for use with verification systems that employ intrusive measurement technologies. The IB would protect classified information in a bilateral or multilateral inspection of classified fissile material. Such a barrier must strike a balance between providing the inspecting party the confidence necessary to accept the measurement while protecting the inspected party's classified information. The authors discuss the structure required of an IB as well as the implications of the IB on detector system maintenance. A defense-in-depth approach is proposed which would provide assurance to the inspected party that all sensitive information is protected and to the inspecting party that the measurements are being performed as expected. The barrier could include elements of physical protection (such as locks, surveillance systems, and tamper indicators), hardening of key hardware components, assurance of capabilities and limitations of hardware and software systems, administrative controls, validation and verification of the systems, and error detection and resolution. Finally, an unclassified interface could be used to display and, possibly, record measurement results. The introduction of an IB into an analysis system may result in many otherwise innocuous components (detectors, analyzers, etc.) becoming classified and unavailable for routine maintenance by uncleared personnel. System maintenance and updating will be significantly simplified if the classification status of as many components as possible can be made reversible (i.e. the component can become unclassified following the removal of classified objects)

  17. Detection of microaneurysms in retinal images using an ensemble classifier

    Directory of Open Access Journals (Sweden)

    M.M. Habib

    2017-01-01

    Full Text Available This paper introduces, and reports on the performance of, a novel combination of algorithms for automated microaneurysm (MA detection in retinal images. The presence of MAs in retinal images is a pathognomonic sign of Diabetic Retinopathy (DR which is one of the leading causes of blindness amongst the working age population. An extensive survey of the literature is presented and current techniques in the field are summarised. The proposed technique first detects an initial set of candidates using a Gaussian Matched Filter and then classifies this set to reduce the number of false positives. A Tree Ensemble classifier is used with a set of 70 features (the most commons features in the literature. A new set of 32 MA groundtruth images (with a total of 256 labelled MAs based on images from the MESSIDOR dataset is introduced as a public dataset for benchmarking MA detection algorithms. We evaluate our algorithm on this dataset as well as another public dataset (DIARETDB1 v2.1 and compare it against the best available alternative. Results show that the proposed classifier is superior in terms of eliminating false positive MA detection from the initial set of candidates. The proposed method achieves an ROC score of 0.415 compared to 0.2636 achieved by the best available technique. Furthermore, results show that the classifier model maintains consistent performance across datasets, illustrating the generalisability of the classifier and that overfitting does not occur.

  18. Generalization in the XCSF classifier system: analysis, improvement, and extension.

    Science.gov (United States)

    Lanzi, Pier Luca; Loiacono, Daniele; Wilson, Stewart W; Goldberg, David E

    2007-01-01

    We analyze generalization in XCSF and introduce three improvements. We begin by showing that the types of generalizations evolved by XCSF can be influenced by the input range. To explain these results we present a theoretical analysis of the convergence of classifier weights in XCSF which highlights a broader issue. In XCSF, because of the mathematical properties of the Widrow-Hoff update, the convergence of classifier weights in a given subspace can be slow when the spread of the eigenvalues of the autocorrelation matrix associated with each classifier is large. As a major consequence, the system's accuracy pressure may act before classifier weights are adequately updated, so that XCSF may evolve piecewise constant approximations, instead of the intended, and more efficient, piecewise linear ones. We propose three different ways to update classifier weights in XCSF so as to increase the generalization capabilities of XCSF: one based on a condition-based normalization of the inputs, one based on linear least squares, and one based on the recursive version of linear least squares. Through a series of experiments we show that while all three approaches significantly improve XCSF, least squares approaches appear to be best performing and most robust. Finally we show how XCSF can be extended to include polynomial approximations.

  19. Dynamic cluster generation for a fuzzy classifier with ellipsoidal regions.

    Science.gov (United States)

    Abe, S

    1998-01-01

    In this paper, we discuss a fuzzy classifier with ellipsoidal regions that dynamically generates clusters. First, for the data belonging to a class we define a fuzzy rule with an ellipsoidal region. Namely, using the training data for each class, we calculate the center and the covariance matrix of the ellipsoidal region for the class. Then we tune the fuzzy rules, i.e., the slopes of the membership functions, successively until there is no improvement in the recognition rate of the training data. Then if the number of the data belonging to a class that are misclassified into another class exceeds a prescribed number, we define a new cluster to which those data belong and the associated fuzzy rule. Then we tune the newly defined fuzzy rules in the similar way as stated above, fixing the already obtained fuzzy rules. We iterate generation of clusters and tuning of the newly generated fuzzy rules until the number of the data belonging to a class that are misclassified into another class does not exceed the prescribed number. We evaluate our method using thyroid data, Japanese Hiragana data of vehicle license plates, and blood cell data. By dynamic cluster generation, the generalization ability of the classifier is improved and the recognition rate of the fuzzy classifier for the test data is the best among the neural network classifiers and other fuzzy classifiers if there are no discrete input variables.

  20. SpectraClassifier 1.0: a user friendly, automated MRS-based classifier-development system

    Directory of Open Access Journals (Sweden)

    Julià-Sapé Margarida

    2010-02-01

    Full Text Available Abstract Background SpectraClassifier (SC is a Java solution for designing and implementing Magnetic Resonance Spectroscopy (MRS-based classifiers. The main goal of SC is to allow users with minimum background knowledge of multivariate statistics to perform a fully automated pattern recognition analysis. SC incorporates feature selection (greedy stepwise approach, either forward or backward, and feature extraction (PCA. Fisher Linear Discriminant Analysis is the method of choice for classification. Classifier evaluation is performed through various methods: display of the confusion matrix of the training and testing datasets; K-fold cross-validation, leave-one-out and bootstrapping as well as Receiver Operating Characteristic (ROC curves. Results SC is composed of the following modules: Classifier design, Data exploration, Data visualisation, Classifier evaluation, Reports, and Classifier history. It is able to read low resolution in-vivo MRS (single-voxel and multi-voxel and high resolution tissue MRS (HRMAS, processed with existing tools (jMRUI, INTERPRET, 3DiCSI or TopSpin. In addition, to facilitate exchanging data between applications, a standard format capable of storing all the information needed for a dataset was developed. Each functionality of SC has been specifically validated with real data with the purpose of bug-testing and methods validation. Data from the INTERPRET project was used. Conclusions SC is a user-friendly software designed to fulfil the needs of potential users in the MRS community. It accepts all kinds of pre-processed MRS data types and classifies them semi-automatically, allowing spectroscopists to concentrate on interpretation of results with the use of its visualisation tools.

  1. IN-MACA-MCC: Integrated Multiple Attractor Cellular Automata with Modified Clonal Classifier for Human Protein Coding and Promoter Prediction

    Directory of Open Access Journals (Sweden)

    Kiran Sree Pokkuluri

    2014-01-01

    Full Text Available Protein coding and promoter region predictions are very important challenges of bioinformatics (Attwood and Teresa, 2000. The identification of these regions plays a crucial role in understanding the genes. Many novel computational and mathematical methods are introduced as well as existing methods that are getting refined for predicting both of the regions separately; still there is a scope for improvement. We propose a classifier that is built with MACA (multiple attractor cellular automata and MCC (modified clonal classifier to predict both regions with a single classifier. The proposed classifier is trained and tested with Fickett and Tung (1992 datasets for protein coding region prediction for DNA sequences of lengths 54, 108, and 162. This classifier is trained and tested with MMCRI datasets for protein coding region prediction for DNA sequences of lengths 252 and 354. The proposed classifier is trained and tested with promoter sequences from DBTSS (Yamashita et al., 2006 dataset and nonpromoters from EID (Saxonov et al., 2000 and UTRdb (Pesole et al., 2002 datasets. The proposed model can predict both regions with an average accuracy of 90.5% for promoter and 89.6% for protein coding region predictions. The specificity and sensitivity values of promoter and protein coding region predictions are 0.89 and 0.92, respectively.

  2. A History of Classified Activities at Oak Ridge National Laboratory

    Energy Technology Data Exchange (ETDEWEB)

    Quist, A.S.

    2001-01-30

    The facilities that became Oak Ridge National Laboratory (ORNL) were created in 1943 during the United States' super-secret World War II project to construct an atomic bomb (the Manhattan Project). During World War II and for several years thereafter, essentially all ORNL activities were classified. Now, in 2000, essentially all ORNL activities are unclassified. The major purpose of this report is to provide a brief history of ORNL's major classified activities from 1943 until the present (September 2000). This report is expected to be useful to the ORNL Classification Officer and to ORNL's Authorized Derivative Classifiers and Authorized Derivative Declassifiers in their classification review of ORNL documents, especially those documents that date from the 1940s and 1950s.

  3. COMPARISON OF SVM AND FUZZY CLASSIFIER FOR AN INDIAN SCRIPT

    Directory of Open Access Journals (Sweden)

    M. J. Baheti

    2012-01-01

    Full Text Available With the advent of technological era, conversion of scanned document (handwritten or printed into machine editable format has attracted many researchers. This paper deals with the problem of recognition of Gujarati handwritten numerals. Gujarati numeral recognition requires performing some specific steps as a part of preprocessing. For preprocessing digitization, segmentation, normalization and thinning are done with considering that the image have almost no noise. Further affine invariant moments based model is used for feature extraction and finally Support Vector Machine (SVM and Fuzzy classifiers are used for numeral classification. . The comparison of SVM and Fuzzy classifier is made and it can be seen that SVM procured better results as compared to Fuzzy Classifier.

  4. Optimal threshold estimation for binary classifiers using game theory.

    Science.gov (United States)

    Sanchez, Ignacio Enrique

    2016-01-01

    Many bioinformatics algorithms can be understood as binary classifiers. They are usually compared using the area under the receiver operating characteristic ( ROC ) curve. On the other hand, choosing the best threshold for practical use is a complex task, due to uncertain and context-dependent skews in the abundance of positives in nature and in the yields/costs for correct/incorrect classification. We argue that considering a classifier as a player in a zero-sum game allows us to use the minimax principle from game theory to determine the optimal operating point. The proposed classifier threshold corresponds to the intersection between the ROC curve and the descending diagonal in ROC space and yields a minimax accuracy of 1-FPR. Our proposal can be readily implemented in practice, and reveals that the empirical condition for threshold estimation of "specificity equals sensitivity" maximizes robustness against uncertainties in the abundance of positives in nature and classification costs.

  5. Statistical text classifier to detect specific type of medical incidents.

    Science.gov (United States)

    Wong, Zoie Shui-Yee; Akiyama, Masanori

    2013-01-01

    WHO Patient Safety has put focus to increase the coherence and expressiveness of patient safety classification with the foundation of International Classification for Patient Safety (ICPS). Text classification and statistical approaches has showed to be successful to identifysafety problems in the Aviation industryusing incident text information. It has been challenging to comprehend the taxonomy of medical incidents in a structured manner. Independent reporting mechanisms for patient safety incidents have been established in the UK, Canada, Australia, Japan, Hong Kong etc. This research demonstrates the potential to construct statistical text classifiers to detect specific type of medical incidents using incident text data. An illustrative example for classifying look-alike sound-alike (LASA) medication incidents using structured text from 227 advisories related to medication errors from Global Patient Safety Alerts (GPSA) is shown in this poster presentation. The classifier was built using logistic regression model. ROC curve and the AUC value indicated that this is a satisfactory good model.

  6. A Topic Model Approach to Representing and Classifying Football Plays

    KAUST Repository

    Varadarajan, Jagannadan

    2013-09-09

    We address the problem of modeling and classifying American Football offense teams’ plays in video, a challenging example of group activity analysis. Automatic play classification will allow coaches to infer patterns and tendencies of opponents more ef- ficiently, resulting in better strategy planning in a game. We define a football play as a unique combination of player trajectories. To this end, we develop a framework that uses player trajectories as inputs to MedLDA, a supervised topic model. The joint maximiza- tion of both likelihood and inter-class margins of MedLDA in learning the topics allows us to learn semantically meaningful play type templates, as well as, classify different play types with 70% average accuracy. Furthermore, this method is extended to analyze individual player roles in classifying each play type. We validate our method on a large dataset comprising 271 play clips from real-world football games, which will be made publicly available for future comparisons.

  7. Defending Malicious Script Attacks Using Machine Learning Classifiers

    Directory of Open Access Journals (Sweden)

    Nayeem Khan

    2017-01-01

    Full Text Available The web application has become a primary target for cyber criminals by injecting malware especially JavaScript to perform malicious activities for impersonation. Thus, it becomes an imperative to detect such malicious code in real time before any malicious activity is performed. This study proposes an efficient method of detecting previously unknown malicious java scripts using an interceptor at the client side by classifying the key features of the malicious code. Feature subset was obtained by using wrapper method for dimensionality reduction. Supervised machine learning classifiers were used on the dataset for achieving high accuracy. Experimental results show that our method can efficiently classify malicious code from benign code with promising results.

  8. Implications of physical symmetries in adaptive image classifiers

    DEFF Research Database (Denmark)

    Sams, Thomas; Hansen, Jonas Lundbek

    2000-01-01

    It is demonstrated that rotational invariance and reflection symmetry of image classifiers lead to a reduction in the number of free parameters in the classifier. When used in adaptive detectors, e.g. neural networks, this may be used to decrease the number of training samples necessary to learn...... a given classification task, or to improve generalization of the neural network. Notably, the symmetrization of the detector does not compromise the ability to distinguish objects that break the symmetry. (C) 2000 Elsevier Science Ltd. All rights reserved....

  9. Silicon nanowire arrays as learning chemical vapour classifiers

    International Nuclear Information System (INIS)

    Niskanen, A O; Colli, A; White, R; Li, H W; Spigone, E; Kivioja, J M

    2011-01-01

    Nanowire field-effect transistors are a promising class of devices for various sensing applications. Apart from detecting individual chemical or biological analytes, it is especially interesting to use multiple selective sensors to look at their collective response in order to perform classification into predetermined categories. We show that non-functionalised silicon nanowire arrays can be used to robustly classify different chemical vapours using simple statistical machine learning methods. We were able to distinguish between acetone, ethanol and water with 100% accuracy while methanol, ethanol and 2-propanol were classified with 96% accuracy in ambient conditions.

  10. SVM Classifier – a comprehensive java interface for support vector machine classification of microarray data

    Science.gov (United States)

    Pirooznia, Mehdi; Deng, Youping

    2006-01-01

    Motivation Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. Results The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1–BRCA2 samples with RBF kernel of SVM. Conclusion We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at . PMID:17217518

  11. SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data.

    Science.gov (United States)

    Pirooznia, Mehdi; Deng, Youping

    2006-12-12

    Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1-BRCA2 samples with RBF kernel of SVM. We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at http://mfgn.usm.edu/ebl/svm/.

  12. NMD Classifier: A reliable and systematic classification tool for nonsense-mediated decay events.

    Directory of Open Access Journals (Sweden)

    Min-Kung Hsu

    Full Text Available Nonsense-mediated decay (NMD degrades mRNAs that include premature termination codons to avoid the translation and accumulation of truncated proteins. This mechanism has been found to participate in gene regulation and a wide spectrum of biological processes. However, the evolutionary and regulatory origins of NMD-targeted transcripts (NMDTs have been less studied, partly because of the complexity in analyzing NMD events. Here we report NMD Classifier, a tool for systematic classification of NMD events for either annotated or de novo assembled transcripts. This tool is based on the assumption of minimal evolution/regulation-an event that leads to the least change is the most likely to occur. Our simulation results indicate that NMD Classifier can correctly identify an average of 99.3% of the NMD-causing transcript structural changes, particularly exon inclusions/exclusions and exon boundary alterations. Researchers can apply NMD Classifier to evolutionary and regulatory studies by comparing NMD events of different biological conditions or in different organisms.

  13. 18 CFR 367.18 - Criteria for classifying leases.

    Science.gov (United States)

    2010-04-01

    ... the lessee) must not give rise to a new classification of a lease for accounting purposes. ... classifying the lease. (4) The present value at the beginning of the lease term of the minimum lease payments... taxes to be paid by the lessor, including any related profit, equals or exceeds 90 percent of the excess...

  14. Discrimination-Aware Classifiers for Student Performance Prediction

    Science.gov (United States)

    Luo, Ling; Koprinska, Irena; Liu, Wei

    2015-01-01

    In this paper we consider discrimination-aware classification of educational data. Mining and using rules that distinguish groups of students based on sensitive attributes such as gender and nationality may lead to discrimination. It is desirable to keep the sensitive attributes during the training of a classifier to avoid information loss but…

  15. 29 CFR 1910.307 - Hazardous (classified) locations.

    Science.gov (United States)

    2010-07-01

    ... equipment at the location. (c) Electrical installations. Equipment, wiring methods, and installations of... covers the requirements for electric equipment and wiring in locations that are classified depending on... provisions of this section. (4) Division and zone classification. In Class I locations, an installation must...

  16. 29 CFR 1926.407 - Hazardous (classified) locations.

    Science.gov (United States)

    2010-07-01

    ...) locations, unless modified by provisions of this section. (b) Electrical installations. Equipment, wiring..., DEPARTMENT OF LABOR (CONTINUED) SAFETY AND HEALTH REGULATIONS FOR CONSTRUCTION Electrical Installation Safety... electric equipment and wiring in locations which are classified depending on the properties of the...

  17. 18 CFR 3a.71 - Accountability for classified material.

    Science.gov (United States)

    2010-04-01

    ... numbers assigned to top secret material will be separate from the sequence for other classified material... central control registry in calendar year 1969. TS 1006—Sixth Top Secret document controlled by the... control registry when the document is transferred. (e) For Top Secret documents only, an access register...

  18. Classifier fusion for VoIP attacks classification

    Science.gov (United States)

    Safarik, Jakub; Rezac, Filip

    2017-05-01

    SIP is one of the most successful protocols in the field of IP telephony communication. It establishes and manages VoIP calls. As the number of SIP implementation rises, we can expect a higher number of attacks on the communication system in the near future. This work aims at malicious SIP traffic classification. A number of various machine learning algorithms have been developed for attack classification. The paper presents a comparison of current research and the use of classifier fusion method leading to a potential decrease in classification error rate. Use of classifier combination makes a more robust solution without difficulties that may affect single algorithms. Different voting schemes, combination rules, and classifiers are discussed to improve the overall performance. All classifiers have been trained on real malicious traffic. The concept of traffic monitoring depends on the network of honeypot nodes. These honeypots run in several networks spread in different locations. Separation of honeypots allows us to gain an independent and trustworthy attack information.

  19. Bayesian Classifier for Medical Data from Doppler Unit

    Directory of Open Access Journals (Sweden)

    J. Málek

    2006-01-01

    Full Text Available Nowadays, hand-held ultrasonic Doppler units (probes are often used for noninvasive screening of atherosclerosis in the arteries of the lower limbs. The mean velocity of blood flow in time and blood pressures are measured on several positions on each lower limb. By listening to the acoustic signal generated by the device or by reading the signal displayed on screen, a specialist can detect peripheral arterial disease (PAD.This project aims to design software that will be able to analyze data from such a device and classify it into several diagnostic classes. At the Department of Functional Diagnostics at the Regional Hospital in Liberec a database of several hundreds signals was collected. In cooperation with the specialist, the signals were manually classified into four classes. For each class, selected signal features were extracted and then used for training a Bayesian classifier. Another set of signals was used for evaluating and optimizing the parameters of the classifier. Slightly above 84 % of successfully recognized diagnostic states, was recently achieved on the test data. 

  20. An Investigation to Improve Classifier Accuracy for Myo Collected Data

    Science.gov (United States)

    2017-02-01

    Bad Samples Effect on Classification Accuracy 7 5.1 Naïve Bayes (NB) Classifier Accuracy 7 5.2 Logistic Model Tree (LMT) 10 5.3 K-Nearest Neighbor...gesture, pitch feature, user 06. All samples exhibit reversed movement...20 Fig. A-2 Come gesture, pitch feature, user 14. All samples exhibit reversed movement

  1. Diagnosis of Broiler Livers by Classifying Image Patches

    DEFF Research Database (Denmark)

    Jørgensen, Anders; Fagertun, Jens; Moeslund, Thomas B.

    2017-01-01

    The manual health inspection are becoming the bottleneck at poultry processing plants. We present a computer vision method for automatic diagnosis of broiler livers. The non-rigid livers, of varying shape and sizes, are classified in patches by a convolutional neural network, outputting maps...

  2. Support vector machines classifiers of physical activities in preschoolers

    Science.gov (United States)

    The goal of this study is to develop, test, and compare multinomial logistic regression (MLR) and support vector machines (SVM) in classifying preschool-aged children physical activity data acquired from an accelerometer. In this study, 69 children aged 3-5 years old were asked to participate in a s...

  3. A Linguistic Image of Nature: The Burmese Numerative Classifier System

    Science.gov (United States)

    Becker, Alton L.

    1975-01-01

    The Burmese classifier system is coherent because it is based upon a single elementary semantic dimension: deixis. On that dimension, four distances are distinguished, distances which metaphorically substitute for other conceptual relations between people and other living beings, people and things, and people and concepts. (Author/RM)

  4. Data Stream Classification Based on the Gamma Classifier

    Directory of Open Access Journals (Sweden)

    Abril Valeria Uriarte-Arcia

    2015-01-01

    Full Text Available The ever increasing data generation confronts us with the problem of handling online massive amounts of information. One of the biggest challenges is how to extract valuable information from these massive continuous data streams during single scanning. In a data stream context, data arrive continuously at high speed; therefore the algorithms developed to address this context must be efficient regarding memory and time management and capable of detecting changes over time in the underlying distribution that generated the data. This work describes a novel method for the task of pattern classification over a continuous data stream based on an associative model. The proposed method is based on the Gamma classifier, which is inspired by the Alpha-Beta associative memories, which are both supervised pattern recognition models. The proposed method is capable of handling the space and time constrain inherent to data stream scenarios. The Data Streaming Gamma classifier (DS-Gamma classifier implements a sliding window approach to provide concept drift detection and a forgetting mechanism. In order to test the classifier, several experiments were performed using different data stream scenarios with real and synthetic data streams. The experimental results show that the method exhibits competitive performance when compared to other state-of-the-art algorithms.

  5. Building an automated SOAP classifier for emergency department reports.

    Science.gov (United States)

    Mowery, Danielle; Wiebe, Janyce; Visweswaran, Shyam; Harkema, Henk; Chapman, Wendy W

    2012-02-01

    Information extraction applications that extract structured event and entity information from unstructured text can leverage knowledge of clinical report structure to improve performance. The Subjective, Objective, Assessment, Plan (SOAP) framework, used to structure progress notes to facilitate problem-specific, clinical decision making by physicians, is one example of a well-known, canonical structure in the medical domain. Although its applicability to structuring data is understood, its contribution to information extraction tasks has not yet been determined. The first step to evaluating the SOAP framework's usefulness for clinical information extraction is to apply the model to clinical narratives and develop an automated SOAP classifier that classifies sentences from clinical reports. In this quantitative study, we applied the SOAP framework to sentences from emergency department reports, and trained and evaluated SOAP classifiers built with various linguistic features. We found the SOAP framework can be applied manually to emergency department reports with high agreement (Cohen's kappa coefficients over 0.70). Using a variety of features, we found classifiers for each SOAP class can be created with moderate to outstanding performance with F(1) scores of 93.9 (subjective), 94.5 (objective), 75.7 (assessment), and 77.0 (plan). We look forward to expanding the framework and applying the SOAP classification to clinical information extraction tasks. Copyright © 2011. Published by Elsevier Inc.

  6. Learning to classify wakes from local sensory information

    Science.gov (United States)

    Alsalman, Mohamad; Colvert, Brendan; Kanso, Eva; Kanso Team

    2017-11-01

    Aquatic organisms exhibit remarkable abilities to sense local flow signals contained in their fluid environment and to surmise the origins of these flows. For example, fish can discern the information contained in various flow structures and utilize this information for obstacle avoidance and prey tracking. Flow structures created by flapping and swimming bodies are well characterized in the fluid dynamics literature; however, such characterization relies on classical methods that use an external observer to reconstruct global flow fields. The reconstructed flows, or wakes, are then classified according to the unsteady vortex patterns. Here, we propose a new approach for wake identification: we classify the wakes resulting from a flapping airfoil by applying machine learning algorithms to local flow information. In particular, we simulate the wakes of an oscillating airfoil in an incoming flow, extract the downstream vorticity information, and train a classifier to learn the different flow structures and classify new ones. This data-driven approach provides a promising framework for underwater navigation and detection in application to autonomous bio-inspired vehicles.

  7. The Closing of the Classified Catalog at Boston University

    Science.gov (United States)

    Hazen, Margaret Hindle

    1974-01-01

    Although the classified catalog at Boston University libraries has been a useful research tool, it has proven too expensive to keep current. The library has converted to a traditional alphabetic subject catalog and will recieve catalog cards from the Ohio College Library Center through the New England Library Network. (Author/LS)

  8. Recognition of Arabic Sign Language Alphabet Using Polynomial Classifiers

    Directory of Open Access Journals (Sweden)

    M. Al-Rousan

    2005-08-01

    Full Text Available Building an accurate automatic sign language recognition system is of great importance in facilitating efficient communication with deaf people. In this paper, we propose the use of polynomial classifiers as a classification engine for the recognition of Arabic sign language (ArSL alphabet. Polynomial classifiers have several advantages over other classifiers in that they do not require iterative training, and that they are highly computationally scalable with the number of classes. Based on polynomial classifiers, we have built an ArSL system and measured its performance using real ArSL data collected from deaf people. We show that the proposed system provides superior recognition results when compared with previously published results using ANFIS-based classification on the same dataset and feature extraction methodology. The comparison is shown in terms of the number of misclassified test patterns. The reduction in the rate of misclassified patterns was very significant. In particular, we have achieved a 36% reduction of misclassifications on the training data and 57% on the test data.

  9. Reconfigurable support vector machine classifier with approximate computing

    NARCIS (Netherlands)

    van Leussen, M.J.; Huisken, J.; Wang, L.; Jiao, H.; De Gyvez, J.P.

    2017-01-01

    Support Vector Machine (SVM) is one of the most popular machine learning algorithms. An energy-efficient SVM classifier is proposed in this paper, where approximate computing is utilized to reduce energy consumption and silicon area. A hardware architecture with reconfigurable kernels and

  10. Classifying regularized sensor covariance matrices: An alternative to CSP

    NARCIS (Netherlands)

    Roijendijk, L.M.M.; Gielen, C.C.A.M.; Farquhar, J.D.R.

    2016-01-01

    Common spatial patterns ( CSP) is a commonly used technique for classifying imagined movement type brain-computer interface ( BCI) datasets. It has been very successful with many extensions and improvements on the basic technique. However, a drawback of CSP is that the signal processing pipeline

  11. Classifying regularised sensor covariance matrices: An alternative to CSP

    NARCIS (Netherlands)

    Roijendijk, L.M.M.; Gielen, C.C.A.M.; Farquhar, J.D.R.

    2016-01-01

    Common spatial patterns (CSP) is a commonly used technique for classifying imagined movement type brain computer interface (BCI) datasets. It has been very successful with many extensions and improvements on the basic technique. However, a drawback of CSP is that the signal processing pipeline

  12. Two-categorical bundles and their classifying spaces

    DEFF Research Database (Denmark)

    Baas, Nils A.; Bökstedt, M.; Kro, T.A.

    2012-01-01

    -category is a classifying space for the associated principal 2-bundles. In the process of proving this we develop a lot of powerful machinery which may be useful in further studies of 2-categorical topology. As a corollary we get a new proof of the classification of principal bundles. A calculation based...

  13. 3 CFR - Classified Information and Controlled Unclassified Information

    Science.gov (United States)

    2010-01-01

    ... on Transparency and Open Government and on the Freedom of Information Act, my Administration is... memoranda of January 21, 2009, on Transparency and Open Government and on the Freedom of Information Act; (B... 3 The President 1 2010-01-01 2010-01-01 false Classified Information and Controlled Unclassified...

  14. Comparison of Classifier Architectures for Online Neural Spike Sorting.

    Science.gov (United States)

    Saeed, Maryam; Khan, Amir Ali; Kamboh, Awais Mehmood

    2017-04-01

    High-density, intracranial recordings from micro-electrode arrays need to undergo Spike Sorting in order to associate the recorded neuronal spikes to particular neurons. This involves spike detection, feature extraction, and classification. To reduce the data transmission and power requirements, on-chip real-time processing is becoming very popular. However, high computational resources are required for classifiers in on-chip spike-sorters, making scalability a great challenge. In this review paper, we analyze several popular classifiers to propose five new hardware architectures using the off-chip training with on-chip classification approach. These include support vector classification, fuzzy C-means classification, self-organizing maps classification, moving-centroid K-means classification, and Cosine distance classification. The performance of these architectures is analyzed in terms of accuracy and resource requirement. We establish that the neural networks based Self-Organizing Maps classifier offers the most viable solution. A spike sorter based on the Self-Organizing Maps classifier, requires only 7.83% of computational resources of the best-reported spike sorter, hierarchical adaptive means, while offering a 3% better accuracy at 7 dB SNR.

  15. Cascaded lexicalised classifiers for second-person reference resolution

    NARCIS (Netherlands)

    Purver, M.; Fernández, R.; Frampton, M.; Peters, S.; Healey, P.; Pieraccini, R.; Byron, D.; Young, S.; Purver, M.

    2009-01-01

    This paper examines the resolution of the second person English pronoun you in multi-party dialogue. Following previous work, we attempt to classify instances as generic or referential, and in the latter case identify the singular or plural addressee. We show that accuracy and robustness can be

  16. Human Activity Recognition by Combining a Small Number of Classifiers.

    Science.gov (United States)

    Nazabal, Alfredo; Garcia-Moreno, Pablo; Artes-Rodriguez, Antonio; Ghahramani, Zoubin

    2016-09-01

    We consider the problem of daily human activity recognition (HAR) using multiple wireless inertial sensors, and specifically, HAR systems with a very low number of sensors, each one providing an estimation of the performed activities. We propose new Bayesian models to combine the output of the sensors. The models are based on a soft outputs combination of individual classifiers to deal with the small number of sensors. We also incorporate the dynamic nature of human activities as a first-order homogeneous Markov chain. We develop both inductive and transductive inference methods for each model to be employed in supervised and semisupervised situations, respectively. Using different real HAR databases, we compare our classifiers combination models against a single classifier that employs all the signals from the sensors. Our models exhibit consistently a reduction of the error rate and an increase of robustness against sensor failures. Our models also outperform other classifiers combination models that do not consider soft outputs and an Markovian structure of the human activities.

  17. Evaluation of three classifiers in mapping forest stand types using ...

    African Journals Online (AJOL)

    EJIRO

    applied for classification of the image. Supervised classification technique using maximum likelihood algorithm is the most commonly and widely used method for land cover classification (Jia and Richards, 2006). In Australia, the maximum likelihood classifier was effectively used to map different forest stand types with high.

  18. Classifying patients' complaints for regulatory purposes : A Pilot Study

    NARCIS (Netherlands)

    Bouwman, R.J.R.; Bomhoff, Manja; Robben, Paul; Friele, R.D.

    2018-01-01

    Objectives: It is assumed that classifying and aggregated reporting of patients' complaints by regulators helps to identify problem areas, to respond better to patients and increase public accountability. This pilot study addresses what a classification of complaints in a regulatory setting

  19. An ensemble self-training protein interaction article classifier.

    Science.gov (United States)

    Chen, Yifei; Hou, Ping; Manderick, Bernard

    2014-01-01

    Protein-protein interaction (PPI) is essential to understand the fundamental processes governing cell biology. The mining and curation of PPI knowledge are critical for analyzing proteomics data. Hence it is desired to classify articles PPI-related or not automatically. In order to build interaction article classification systems, an annotated corpus is needed. However, it is usually the case that only a small number of labeled articles can be obtained manually. Meanwhile, a large number of unlabeled articles are available. By combining ensemble learning and semi-supervised self-training, an ensemble self-training interaction classifier called EST_IACer is designed to classify PPI-related articles based on a small number of labeled articles and a large number of unlabeled articles. A biological background based feature weighting strategy is extended using the category information from both labeled and unlabeled data. Moreover, a heuristic constraint is put forward to select optimal instances from unlabeled data to improve the performance further. Experiment results show that the EST_IACer can classify the PPI related articles effectively and efficiently.

  20. Classifying Your Food as Acid, Low-Acid, or Acidified

    OpenAIRE

    Bacon, Karleigh

    2012-01-01

    As a food entrepreneur, you should be aware of how ingredients in your product make the food look, feel, and taste; as well as how the ingredients create environments for microorganisms like bacteria, yeast, and molds to survive and grow. This guide will help you classifying your food as acid, low-acid, or acidified.

  1. Abbreviations: Their Effects on Comprehension of Classified Advertisements.

    Science.gov (United States)

    Sokol, Kirstin R.

    Two experimental designs were used to test the hypothesis that abbreviations in classified advertisements decrease the reader's comprehension of such ads. In the first experimental design, 73 high school students read four ads (for employment, used cars, apartments for rent, and articles for sale) either with abbreviations or with all…

  2. Genome-Wide Comparative Gene Family Classification

    Science.gov (United States)

    Frech, Christian; Chen, Nansheng

    2010-01-01

    Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species. PMID:20976221

  3. Deep Feature Learning and Cascaded Classifier for Large Scale Data

    DEFF Research Database (Denmark)

    Prasoon, Adhish

    from data rather than having a predefined feature set. We explore deep learning approach of convolutional neural network (CNN) for segmenting three dimensional medical images. We propose a novel system integrating three 2D CNNs, which have a one-to-one association with the xy, yz and zx planes of 3D......This thesis focuses on voxel/pixel classification based approaches for image segmentation. The main application is segmentation of articular cartilage in knee MRIs. The first major contribution of the thesis deals with large scale machine learning problems. Many medical imaging problems need huge...... amount of training data to cover sufficient biological variability. Learning methods scaling badly with number of training data points cannot be used in such scenarios. This may restrict the usage of many powerful classifiers having excellent generalization ability. We propose a cascaded classifier which...

  4. Scoring and Classifying Examinees Using Measurement Decision Theory

    Directory of Open Access Journals (Sweden)

    Lawrence M. Rudner

    2009-04-01

    Full Text Available This paper describes and evaluates the use of measurement decision theory (MDT to classify examinees based on their item response patterns. The model has a simple framework that starts with the conditional probabilities of examinees in each category or mastery state responding correctly to each item. The presented evaluation investigates: (1 the classification accuracy of tests scored using decision theory; (2 the effectiveness of different sequential testing procedures; and (3 the number of items needed to make a classification. A large percentage of examinees can be classified accurately with very few items using decision theory. A Java Applet for self instruction and software for generating, calibrating and scoring MDT data are provided.

  5. MAMMOGRAMS ANALYSIS USING SVM CLASSIFIER IN COMBINED TRANSFORMS DOMAIN

    Directory of Open Access Journals (Sweden)

    B.N. Prathibha

    2011-02-01

    Full Text Available Breast cancer is a primary cause of mortality and morbidity in women. Reports reveal that earlier the detection of abnormalities, better the improvement in survival. Digital mammograms are one of the most effective means for detecting possible breast anomalies at early stages. Digital mammograms supported with Computer Aided Diagnostic (CAD systems help the radiologists in taking reliable decisions. The proposed CAD system extracts wavelet features and spectral features for the better classification of mammograms. The Support Vector Machines classifier is used to analyze 206 mammogram images from Mias database pertaining to the severity of abnormality, i.e., benign and malign. The proposed system gives 93.14% accuracy for discrimination between normal-malign and 87.25% accuracy for normal-benign samples and 89.22% accuracy for benign-malign samples. The study reveals that features extracted in hybrid transform domain with SVM classifier proves to be a promising tool for analysis of mammograms.

  6. Evaluation of LDA Ensembles Classifiers for Brain Computer Interface

    International Nuclear Information System (INIS)

    Arjona, Cristian; Pentácolo, José; Gareis, Iván; Atum, Yanina; Gentiletti, Gerardo; Acevedo, Rubén; Rufiner, Leonardo

    2011-01-01

    The Brain Computer Interface (BCI) translates brain activity into computer commands. To increase the performance of the BCI, to decode the user intentions it is necessary to get better the feature extraction and classification techniques. In this article the performance of a three linear discriminant analysis (LDA) classifiers ensemble is studied. The system based on ensemble can theoretically achieved better classification results than the individual counterpart, regarding individual classifier generation algorithm and the procedures for combine their outputs. Classic algorithms based on ensembles such as bagging and boosting are discussed here. For the application on BCI, it was concluded that the generated results using ER and AUC as performance index do not give enough information to establish which configuration is better.

  7. Security Enrichment in Intrusion Detection System Using Classifier Ensemble

    Directory of Open Access Journals (Sweden)

    Uma R. Salunkhe

    2017-01-01

    Full Text Available In the era of Internet and with increasing number of people as its end users, a large number of attack categories are introduced daily. Hence, effective detection of various attacks with the help of Intrusion Detection Systems is an emerging trend in research these days. Existing studies show effectiveness of machine learning approaches in handling Intrusion Detection Systems. In this work, we aim to enhance detection rate of Intrusion Detection System by using machine learning technique. We propose a novel classifier ensemble based IDS that is constructed using hybrid approach which combines data level and feature level approach. Classifier ensembles combine the opinions of different experts and improve the intrusion detection rate. Experimental results show the improved detection rates of our system compared to reference technique.

  8. The three-dimensional origin of the classifying algebra

    International Nuclear Information System (INIS)

    Fuchs, Juergen; Schweigert, Christoph; Stigner, Carl

    2010-01-01

    It is known that reflection coefficients for bulk fields of a rational conformal field theory in the presence of an elementary boundary condition can be obtained as representation matrices of irreducible representations of the classifying algebra, a semisimple commutative associative complex algebra. We show how this algebra arises naturally from the three-dimensional geometry of factorization of correlators of bulk fields on the disk. This allows us to derive explicit expressions for the structure constants of the classifying algebra as invariants of ribbon graphs in the three-manifold S 2 xS 1 . Our result unravels a precise relation between intertwiners of the action of the mapping class group on spaces of conformal blocks and boundary conditions in rational conformal field theories.

  9. Machine learning classifiers and fMRI: a tutorial overview.

    Science.gov (United States)

    Pereira, Francisco; Mitchell, Tom; Botvinick, Matthew

    2009-03-01

    Interpreting brain image experiments requires analysis of complex, multivariate data. In recent years, one analysis approach that has grown in popularity is the use of machine learning algorithms to train classifiers to decode stimuli, mental states, behaviours and other variables of interest from fMRI data and thereby show the data contain information about them. In this tutorial overview we review some of the key choices faced in using this approach as well as how to derive statistically significant results, illustrating each point from a case study. Furthermore, we show how, in addition to answering the question of 'is there information about a variable of interest' (pattern discrimination), classifiers can be used to tackle other classes of question, namely 'where is the information' (pattern localization) and 'how is that information encoded' (pattern characterization).

  10. Lung Nodule Detection in CT Images using Neuro Fuzzy Classifier

    Directory of Open Access Journals (Sweden)

    M. Usman Akram

    2013-07-01

    Full Text Available Automated lung cancer detection using computer aided diagnosis (CAD is an important area in clinical applications. As the manual nodule detection is very time consuming and costly so computerized systems can be helpful for this purpose. In this paper, we propose a computerized system for lung nodule detection in CT scan images. The automated system consists of two stages i.e. lung segmentation and enhancement, feature extraction and classification. The segmentation process will result in separating lung tissue from rest of the image, and only the lung tissues under examination are considered as candidate regions for detecting malignant nodules in lung portion. A feature vector for possible abnormal regions is calculated and regions are classified using neuro fuzzy classifier. It is a fully automatic system that does not require any manual intervention and experimental results show the validity of our system.

  11. A Bayesian Classifier for X-Ray Pulsars Recognition

    Directory of Open Access Journals (Sweden)

    Hao Liang

    2016-01-01

    Full Text Available Recognition for X-ray pulsars is important for the problem of spacecraft’s attitude determination by X-ray Pulsar Navigation (XPNAV. By using the nonhomogeneous Poisson model of the received photons and the minimum recognition error criterion, a classifier based on the Bayesian theorem is proposed. For X-ray pulsars recognition with unknown Doppler frequency and initial phase, the features of every X-ray pulsar are extracted and the unknown parameters are estimated using the Maximum Likelihood (ML method. Besides that, a method to recognize unknown X-ray pulsars or X-ray disturbances is proposed. Simulation results certificate the validity of the proposed Bayesian classifier.

  12. Wavelet classifier used for diagnosing shock absorbers in cars

    Directory of Open Access Journals (Sweden)

    Janusz GARDULSKI

    2007-01-01

    Full Text Available The paper discusses some commonly used methods of hydraulic absorbertesting. Disadvantages of the methods are described. A vibro-acoustic method is presented and recommended for practical use on existing test rigs. The method is based on continuous wavelet analysis combined with neural classifier and 25-neuron, one-way, three-layer back propagation network. The analysis satisfies the intended aim.

  13. Classified installations for environmental protection subject to declaration. Tome 2

    International Nuclear Information System (INIS)

    Anon.

    1992-01-01

    Legislation concerning classified installations govern most of industries or dangerous or pollutant activities. This legislation aims to prevent risks and harmful effects coming from an installation, air pollution, water pollution, noise, wastes produced by installations, even aesthetic bad effects. Pollutant or dangerous activities are defined in a list called nomenclature which obliged installations to a rule of declaration or authorization. Technical regulations ordered by the secretary of state for the environment are listed in tome 2

  14. Classified study and clinical value of the phase imaging features

    International Nuclear Information System (INIS)

    Dang Yaping; Ma Aiqun; Zheng Xiaopu; Yang Aimin; Xiao Jiang; Gao Xinyao

    2000-01-01

    445 patients with various heart diseases were examined by the gated cardiac blood pool imaging, and the phase was classified. The relationship between the seven types with left ventricular function index, clinical heart function, different heart diseases as well as electrocardiograph was studied. The results showed that the phase image classification could match with the clinical heart function. It can visually, directly and accurately indicate clinical heart function and can be used to identify diagnosis of heart disease

  15. Evaluating Classifiers in Detecting 419 Scams in Bilingual Cybercriminal Communities

    OpenAIRE

    Mbaziira, Alex V.; Abozinadah, Ehab; Jones Jr, James H.

    2015-01-01

    Incidents of organized cybercrime are rising because of criminals are reaping high financial rewards while incurring low costs to commit crime. As the digital landscape broadens to accommodate more internet-enabled devices and technologies like social media, more cybercriminals who are not native English speakers are invading cyberspace to cash in on quick exploits. In this paper we evaluate the performance of three machine learning classifiers in detecting 419 scams in a bilingual Nigerian c...

  16. Classifying Radio Galaxies with the Convolutional Neural Network

    International Nuclear Information System (INIS)

    Aniyan, A. K.; Thorat, K.

    2017-01-01

    We present the application of a deep machine learning technique to classify radio images of extended sources on a morphological basis using convolutional neural networks (CNN). In this study, we have taken the case of the Fanaroff–Riley (FR) class of radio galaxies as well as radio galaxies with bent-tailed morphology. We have used archival data from the Very Large Array (VLA)—Faint Images of the Radio Sky at Twenty Centimeters survey and existing visually classified samples available in the literature to train a neural network for morphological classification of these categories of radio sources. Our training sample size for each of these categories is ∼200 sources, which has been augmented by rotated versions of the same. Our study shows that CNNs can classify images of the FRI and FRII and bent-tailed radio galaxies with high accuracy (maximum precision at 95%) using well-defined samples and a “fusion classifier,” which combines the results of binary classifications, while allowing for a mechanism to find sources with unusual morphologies. The individual precision is highest for bent-tailed radio galaxies at 95% and is 91% and 75% for the FRI and FRII classes, respectively, whereas the recall is highest for FRI and FRIIs at 91% each, while the bent-tailed class has a recall of 79%. These results show that our results are comparable to that of manual classification, while being much faster. Finally, we discuss the computational and data-related challenges associated with the morphological classification of radio galaxies with CNNs.

  17. Classifying Radio Galaxies with the Convolutional Neural Network

    Energy Technology Data Exchange (ETDEWEB)

    Aniyan, A. K.; Thorat, K. [Department of Physics and Electronics, Rhodes University, Grahamstown (South Africa)

    2017-06-01

    We present the application of a deep machine learning technique to classify radio images of extended sources on a morphological basis using convolutional neural networks (CNN). In this study, we have taken the case of the Fanaroff–Riley (FR) class of radio galaxies as well as radio galaxies with bent-tailed morphology. We have used archival data from the Very Large Array (VLA)—Faint Images of the Radio Sky at Twenty Centimeters survey and existing visually classified samples available in the literature to train a neural network for morphological classification of these categories of radio sources. Our training sample size for each of these categories is ∼200 sources, which has been augmented by rotated versions of the same. Our study shows that CNNs can classify images of the FRI and FRII and bent-tailed radio galaxies with high accuracy (maximum precision at 95%) using well-defined samples and a “fusion classifier,” which combines the results of binary classifications, while allowing for a mechanism to find sources with unusual morphologies. The individual precision is highest for bent-tailed radio galaxies at 95% and is 91% and 75% for the FRI and FRII classes, respectively, whereas the recall is highest for FRI and FRIIs at 91% each, while the bent-tailed class has a recall of 79%. These results show that our results are comparable to that of manual classification, while being much faster. Finally, we discuss the computational and data-related challenges associated with the morphological classification of radio galaxies with CNNs.

  18. Efficient Multi-Concept Visual Classifier Adaptation in Changing Environments

    Science.gov (United States)

    2016-09-01

    sets of images, hand annotated by humans with region boundary outlines followed by label assignment. This annotation is time consuming , and...performed as a necessary but time- consuming step to train su- pervised classifiers. U nsupervised o r s elf-supervised a pproaches h ave b een used to...time- consuming labeling pro- cess. However, the lack of human supervision has limited most of this work to binary classification (e.g., traversability

  19. Classifying apples by the means of fluorescence imaging

    OpenAIRE

    Codrea, Marius C.; Nevalainen, Olli S.; Tyystjärvi, Esa; VAN DE VEN, Martin; VALCKE, Roland

    2004-01-01

    Classification of harvested apples when predicting their storage potential is an important task. This paper describes how chlorophyll a fluorescence images taken in blue light through a red filter, can be used to classify apples. In such an image, fluorescence appears as a relatively homogenous area broken by a number of small nonfluorescing spots, corresponding to normal corky tissue patches, lenticells, and to damaged areas that lower the quality of the apple. The damaged regions appear mor...

  20. Building Road-Sign Classifiers Using a Trainable Similarity Measure

    Czech Academy of Sciences Publication Activity Database

    Paclík, P.; Novovičová, Jana; Duin, R.P.W.

    2006-01-01

    Roč. 7, č. 3 (2006), s. 309-321 ISSN 1524-9050 R&D Projects: GA AV ČR IAA2075302 EU Projects: European Commission(XE) 507752 - MUSCLE Institutional research plan: CEZ:AV0Z10750506 Keywords : classifier system design * road-sign classification * similarity data representation Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 1.434, year: 2006 http://www.ewh.ieee.org/tc/its/trans.html

  1. Classifying Radio Galaxies with the Convolutional Neural Network

    Science.gov (United States)

    Aniyan, A. K.; Thorat, K.

    2017-06-01

    We present the application of a deep machine learning technique to classify radio images of extended sources on a morphological basis using convolutional neural networks (CNN). In this study, we have taken the case of the Fanaroff-Riley (FR) class of radio galaxies as well as radio galaxies with bent-tailed morphology. We have used archival data from the Very Large Array (VLA)—Faint Images of the Radio Sky at Twenty Centimeters survey and existing visually classified samples available in the literature to train a neural network for morphological classification of these categories of radio sources. Our training sample size for each of these categories is ˜200 sources, which has been augmented by rotated versions of the same. Our study shows that CNNs can classify images of the FRI and FRII and bent-tailed radio galaxies with high accuracy (maximum precision at 95%) using well-defined samples and a “fusion classifier,” which combines the results of binary classifications, while allowing for a mechanism to find sources with unusual morphologies. The individual precision is highest for bent-tailed radio galaxies at 95% and is 91% and 75% for the FRI and FRII classes, respectively, whereas the recall is highest for FRI and FRIIs at 91% each, while the bent-tailed class has a recall of 79%. These results show that our results are comparable to that of manual classification, while being much faster. Finally, we discuss the computational and data-related challenges associated with the morphological classification of radio galaxies with CNNs.

  2. Classifying Floating Potential Measurement Unit Data Products as Science Data

    Science.gov (United States)

    Coffey, Victoria; Minow, Joseph

    2015-01-01

    We are Co-Investigators for the Floating Potential Measurement Unit (FPMU) on the International Space Station (ISS) and members of the FPMU operations and data analysis team. We are providing this memo for the purpose of classifying raw and processed FPMU data products and ancillary data as NASA science data with unrestricted, public availability in order to best support science uses of the data.

  3. Snoring classified: The Munich-Passau Snore Sound Corpus.

    Science.gov (United States)

    Janott, Christoph; Schmitt, Maximilian; Zhang, Yue; Qian, Kun; Pandit, Vedhas; Zhang, Zixing; Heiser, Clemens; Hohenhorst, Winfried; Herzog, Michael; Hemmert, Werner; Schuller, Björn

    2018-03-01

    Snoring can be excited in different locations within the upper airways during sleep. It was hypothesised that the excitation locations are correlated with distinct acoustic characteristics of the snoring noise. To verify this hypothesis, a database of snore sounds is developed, labelled with the location of sound excitation. Video and audio recordings taken during drug induced sleep endoscopy (DISE) examinations from three medical centres have been semi-automatically screened for snore events, which subsequently have been classified by ENT experts into four classes based on the VOTE classification. The resulting dataset containing 828 snore events from 219 subjects has been split into Train, Development, and Test sets. An SVM classifier has been trained using low level descriptors (LLDs) related to energy, spectral features, mel frequency cepstral coefficients (MFCC), formants, voicing, harmonic-to-noise ratio (HNR), spectral harmonicity, pitch, and microprosodic features. An unweighted average recall (UAR) of 55.8% could be achieved using the full set of LLDs including formants. Best performing subset is the MFCC-related set of LLDs. A strong difference in performance could be observed between the permutations of train, development, and test partition, which may be caused by the relatively low number of subjects included in the smaller classes of the strongly unbalanced data set. A database of snoring sounds is presented which are classified according to their sound excitation location based on objective criteria and verifiable video material. With the database, it could be demonstrated that machine classifiers can distinguish different excitation location of snoring sounds in the upper airway based on acoustic parameters. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. Young module multiplicities and classifying the indecomposable Young permutation modules

    OpenAIRE

    Gill, Christopher C.

    2012-01-01

    We study the multiplicities of Young modules as direct summands of permutation modules on cosets of Young subgroups. Such multiplicities have become known as the p-Kostka numbers. We classify the indecomposable Young permutation modules, and, applying the Brauer construction for p-permutation modules, we give some new reductions for p-Kostka numbers. In particular we prove that p-Kostka numbers are preserved under multiplying partitions by p, and strengthen a known reduction given by Henke, c...

  5. BIOPHARMACEUTICS CLASSIFICATION SYSTEM: A STRATEGIC TOOL FOR CLASSIFYING DRUG SUBSTANCES

    OpenAIRE

    Rohilla Seema; Rohilla Ankur; Marwaha RK; Nanda Arun

    2011-01-01

    The biopharmaceutical classification system (BCS) is a scientific approach for classifying drug substances based on their dose/solubility ratio and intestinal permeability. The BCS has been developed to allow prediction of in vivo pharmacokinetic performance of drug products from measurements of permeability and solubility. Moreover, the drugs can be categorized into four classes of BCS on the basis of permeability and solubility namely; high permeability high solubility, high permeability lo...

  6. Self-organizing map classifier for stressed speech recognition

    Science.gov (United States)

    Partila, Pavol; Tovarek, Jaromir; Voznak, Miroslav

    2016-05-01

    This paper presents a method for detecting speech under stress using Self-Organizing Maps. Most people who are exposed to stressful situations can not adequately respond to stimuli. Army, police, and fire department occupy the largest part of the environment that are typical of an increased number of stressful situations. The role of men in action is controlled by the control center. Control commands should be adapted to the psychological state of a man in action. It is known that the psychological changes of the human body are also reflected physiologically, which consequently means the stress effected speech. Therefore, it is clear that the speech stress recognizing system is required in the security forces. One of the possible classifiers, which are popular for its flexibility, is a self-organizing map. It is one type of the artificial neural networks. Flexibility means independence classifier on the character of the input data. This feature is suitable for speech processing. Human Stress can be seen as a kind of emotional state. Mel-frequency cepstral coefficients, LPC coefficients, and prosody features were selected for input data. These coefficients were selected for their sensitivity to emotional changes. The calculation of the parameters was performed on speech recordings, which can be divided into two classes, namely the stress state recordings and normal state recordings. The benefit of the experiment is a method using SOM classifier for stress speech detection. Results showed the advantage of this method, which is input data flexibility.

  7. Deconstructing Cross-Entropy for Probabilistic Binary Classifiers

    Directory of Open Access Journals (Sweden)

    Daniel Ramos

    2018-03-01

    Full Text Available In this work, we analyze the cross-entropy function, widely used in classifiers both as a performance measure and as an optimization objective. We contextualize cross-entropy in the light of Bayesian decision theory, the formal probabilistic framework for making decisions, and we thoroughly analyze its motivation, meaning and interpretation from an information-theoretical point of view. In this sense, this article presents several contributions: First, we explicitly analyze the contribution to cross-entropy of (i prior knowledge; and (ii the value of the features in the form of a likelihood ratio. Second, we introduce a decomposition of cross-entropy into two components: discrimination and calibration. This decomposition enables the measurement of different performance aspects of a classifier in a more precise way; and justifies previously reported strategies to obtain reliable probabilities by means of the calibration of the output of a discriminating classifier. Third, we give different information-theoretical interpretations of cross-entropy, which can be useful in different application scenarios, and which are related to the concept of reference probabilities. Fourth, we present an analysis tool, the Empirical Cross-Entropy (ECE plot, a compact representation of cross-entropy and its aforementioned decomposition. We show the power of ECE plots, as compared to other classical performance representations, in two diverse experimental examples: a speaker verification system, and a forensic case where some glass findings are present.

  8. General and Local: Averaged k-Dependence Bayesian Classifiers

    Directory of Open Access Journals (Sweden)

    Limin Wang

    2015-06-01

    Full Text Available The inference of a general Bayesian network has been shown to be an NP-hard problem, even for approximate solutions. Although k-dependence Bayesian (KDB classifier can construct at arbitrary points (values of k along the attribute dependence spectrum, it cannot identify the changes of interdependencies when attributes take different values. Local KDB, which learns in the framework of KDB, is proposed in this study to describe the local dependencies implicated in each test instance. Based on the analysis of functional dependencies, substitution-elimination resolution, a new type of semi-naive Bayesian operation, is proposed to substitute or eliminate generalization to achieve accurate estimation of conditional probability distribution while reducing computational complexity. The final classifier, averaged k-dependence Bayesian (AKDB classifiers, will average the output of KDB and local KDB. Experimental results on the repository of machine learning databases from the University of California Irvine (UCI showed that AKDB has significant advantages in zero-one loss and bias relative to naive Bayes (NB, tree augmented naive Bayes (TAN, Averaged one-dependence estimators (AODE, and KDB. Moreover, KDB and local KDB show mutually complementary characteristics with respect to variance.

  9. Evaluation of Polarimetric SAR Decomposition for Classifying Wetland Vegetation Types

    Directory of Open Access Journals (Sweden)

    Sang-Hoon Hong

    2015-07-01

    Full Text Available The Florida Everglades is the largest subtropical wetland system in the United States and, as with subtropical and tropical wetlands elsewhere, has been threatened by severe environmental stresses. It is very important to monitor such wetlands to inform management on the status of these fragile ecosystems. This study aims to examine the applicability of TerraSAR-X quadruple polarimetric (quad-pol synthetic aperture radar (PolSAR data for classifying wetland vegetation in the Everglades. We processed quad-pol data using the Hong & Wdowinski four-component decomposition, which accounts for double bounce scattering in the cross-polarization signal. The calculated decomposition images consist of four scattering mechanisms (single, co- and cross-pol double, and volume scattering. We applied an object-oriented image analysis approach to classify vegetation types with the decomposition results. We also used a high-resolution multispectral optical RapidEye image to compare statistics and classification results with Synthetic Aperture Radar (SAR observations. The calculated classification accuracy was higher than 85%, suggesting that the TerraSAR-X quad-pol SAR signal had a high potential for distinguishing different vegetation types. Scattering components from SAR acquisition were particularly advantageous for classifying mangroves along tidal channels. We conclude that the typical scattering behaviors from model-based decomposition are useful for discriminating among different wetland vegetation types.

  10. A Novel Cascade Classifier for Automatic Microcalcification Detection.

    Directory of Open Access Journals (Sweden)

    Seung Yeon Shin

    Full Text Available In this paper, we present a novel cascaded classification framework for automatic detection of individual and clusters of microcalcifications (μC. Our framework comprises three classification stages: i a random forest (RF classifier for simple features capturing the second order local structure of individual μCs, where non-μC pixels in the target mammogram are efficiently eliminated; ii a more complex discriminative restricted Boltzmann machine (DRBM classifier for μC candidates determined in the RF stage, which automatically learns the detailed morphology of μC appearances for improved discriminative power; and iii a detector to detect clusters of μCs from the individual μC detection results, using two different criteria. From the two-stage RF-DRBM classifier, we are able to distinguish μCs using explicitly computed features, as well as learn implicit features that are able to further discriminate between confusing cases. Experimental evaluation is conducted on the original Mammographic Image Analysis Society (MIAS and mini-MIAS databases, as well as our own Seoul National University Bundang Hospital digital mammographic database. It is shown that the proposed method outperforms comparable methods in terms of receiver operating characteristic (ROC and precision-recall curves for detection of individual μCs and free-response receiver operating characteristic (FROC curve for detection of clustered μCs.

  11. Patients on weaning trials classified with support vector machines

    International Nuclear Information System (INIS)

    Garde, Ainara; Caminal, Pere; Giraldo, Beatriz F; Schroeder, Rico; Voss, Andreas; Benito, Salvador

    2010-01-01

    The process of discontinuing mechanical ventilation is called weaning and is one of the most challenging problems in intensive care. An unnecessary delay in the discontinuation process and an early weaning trial are undesirable. This study aims to characterize the respiratory pattern through features that permit the identification of patients' conditions in weaning trials. Three groups of patients have been considered: 94 patients with successful weaning trials, who could maintain spontaneous breathing after 48 h (GSucc); 39 patients who failed the weaning trial (GFail) and 21 patients who had successful weaning trials, but required reintubation in less than 48 h (GRein). Patients are characterized by their cardiorespiratory interactions, which are described by joint symbolic dynamics (JSD) applied to the cardiac interbeat and breath durations. The most discriminating features in the classification of the different groups of patients (GSucc, GFail and GRein) are identified by support vector machines (SVMs). The SVM-based feature selection algorithm has an accuracy of 81% in classifying GSucc versus the rest of the patients, 83% in classifying GRein versus GSucc patients and 81% in classifying GRein versus the rest of the patients. Moreover, a good balance between sensitivity and specificity is achieved in all classifications

  12. Comparison of artificial intelligence classifiers for SIP attack data

    Science.gov (United States)

    Safarik, Jakub; Slachta, Jiri

    2016-05-01

    Honeypot application is a source of valuable data about attacks on the network. We run several SIP honeypots in various computer networks, which are separated geographically and logically. Each honeypot runs on public IP address and uses standard SIP PBX ports. All information gathered via honeypot is periodically sent to the centralized server. This server classifies all attack data by neural network algorithm. The paper describes optimizations of a neural network classifier, which lower the classification error. The article contains the comparison of two neural network algorithm used for the classification of validation data. The first is the original implementation of the neural network described in recent work; the second neural network uses further optimizations like input normalization or cross-entropy cost function. We also use other implementations of neural networks and machine learning classification algorithms. The comparison test their capabilities on validation data to find the optimal classifier. The article result shows promise for further development of an accurate SIP attack classification engine.

  13. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers.

    Science.gov (United States)

    McIntyre, Alexa B R; Ounit, Rachid; Afshinnekoo, Ebrahim; Prill, Robert J; Hénaff, Elizabeth; Alexander, Noah; Minot, Samuel S; Danko, David; Foox, Jonathan; Ahsanuddin, Sofia; Tighe, Scott; Hasan, Nur A; Subramanian, Poorani; Moffat, Kelly; Levy, Shawn; Lonardi, Stefano; Greenfield, Nick; Colwell, Rita R; Rosen, Gail L; Mason, Christopher E

    2017-09-21

    One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.

  14. Classifier-Guided Sampling for Complex Energy System Optimization

    Energy Technology Data Exchange (ETDEWEB)

    Backlund, Peter B. [Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States); Eddy, John P. [Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)

    2015-09-01

    This report documents the results of a Laboratory Directed Research and Development (LDRD) effort enti tled "Classifier - Guided Sampling for Complex Energy System Optimization" that was conducted during FY 2014 and FY 2015. The goal of this proj ect was to develop, implement, and test major improvements to the classifier - guided sampling (CGS) algorithm. CGS is type of evolutionary algorithm for perform ing search and optimization over a set of discrete design variables in the face of one or more objective functions. E xisting evolutionary algorithms, such as genetic algorithms , may require a large number of o bjecti ve function evaluations to identify optimal or near - optimal solutions . Reducing the number of evaluations can result in significant time savings, especially if the objective function is computationally expensive. CGS reduce s the evaluation count by us ing a Bayesian network classifier to filter out non - promising candidate designs , prior to evaluation, based on their posterior probabilit ies . In this project, b oth the single - objective and multi - objective version s of the CGS are developed and tested on a set of benchm ark problems. As a domain - specific case study, CGS is used to design a microgrid for use in islanded mode during an extended bulk power grid outage.

  15. Application of the Naive Bayesian Classifier to optimize treatment decisions

    International Nuclear Information System (INIS)

    Kazmierska, Joanna; Malicki, Julian

    2008-01-01

    Background and purpose: To study the accuracy, specificity and sensitivity of the Naive Bayesian Classifier (NBC) in the assessment of individual risk of cancer relapse or progression after radiotherapy (RT). Materials and methods: Data of 142 brain tumour patients irradiated from 2000 to 2005 were analyzed. Ninety-six attributes related to disease, patient and treatment were chosen. Attributes in binary form consisted of the training set for NBC learning. NBC calculated an individual conditional probability of being assigned to: relapse or progression (1), or no relapse or progression (0) group. Accuracy, attribute selection and quality of classifier were determined by comparison with actual treatment results, leave-one-out and cross validation methods, respectively. Clinical setting test utilized data of 35 patients. Treatment results at classification were unknown and were compared with classification results after 3 months. Results: High classification accuracy (84%), specificity (0.87) and sensitivity (0.80) were achieved, both for classifier training and in progressive clinical evaluation. Conclusions: NBC is a useful tool to support the assessment of individual risk of relapse or progression in patients diagnosed with brain tumour undergoing RT postoperatively

  16. A support vector machine (SVM) based voltage stability classifier

    Energy Technology Data Exchange (ETDEWEB)

    Dosano, R.D.; Song, H. [Kunsan National Univ., Kunsan, Jeonbuk (Korea, Republic of); Lee, B. [Korea Univ., Seoul (Korea, Republic of)

    2007-07-01

    Power system stability has become even more complex and critical with the advent of deregulated energy markets and the growing desire to completely employ existing transmission and infrastructure. The economic pressure on electricity markets forces the operation of power systems and components to their limit of capacity and performance. System conditions can be more exposed to instability due to greater uncertainty in day to day system operations and increase in the number of potential components for system disturbances potentially resulting in voltage stability. This paper proposed a support vector machine (SVM) based power system voltage stability classifier using local measurements of voltage and active power of load. It described the procedure for fast classification of long-term voltage stability using the SVM algorithm. The application of the SVM based voltage stability classifier was presented with reference to the choice of input parameters; input data preconditioning; moving window for feature vector; determination of learning samples; and other considerations in SVM applications. The paper presented a case study with numerical examples of an 11-bus test system. The test results for the feasibility study demonstrated that the classifier could offer an excellent performance in classification with time-series measurements in terms of long-term voltage stability. 9 refs., 14 figs.

  17. Entropy based classifier for cross-domain opinion mining

    Directory of Open Access Journals (Sweden)

    Jyoti S. Deshmukh

    2018-01-01

    Full Text Available In recent years, the growth of social network has increased the interest of people in analyzing reviews and opinions for products before they buy them. Consequently, this has given rise to the domain adaptation as a prominent area of research in sentiment analysis. A classifier trained from one domain often gives poor results on data from another domain. Expression of sentiment is different in every domain. The labeling cost of each domain separately is very high as well as time consuming. Therefore, this study has proposed an approach that extracts and classifies opinion words from one domain called source domain and predicts opinion words of another domain called target domain using a semi-supervised approach, which combines modified maximum entropy and bipartite graph clustering. A comparison of opinion classification on reviews on four different product domains is presented. The results demonstrate that the proposed method performs relatively well in comparison to the other methods. Comparison of SentiWordNet of domain-specific and domain-independent words reveals that on an average 72.6% and 88.4% words, respectively, are correctly classified.

  18. A predictive toxicogenomics signature to classify genotoxic versus non-genotoxic chemicals in human TK6 cells

    Directory of Open Access Journals (Sweden)

    Andrew Williams

    2015-12-01

    Full Text Available Genotoxicity testing is a critical component of chemical assessment. The use of integrated approaches in genetic toxicology, including the incorporation of gene expression data to determine the DNA damage response pathways involved in response, is becoming more common. In companion papers previously published in Environmental and Molecular Mutagenesis, Li et al. (2015 [6] developed a dose optimization protocol that was based on evaluating expression changes in several well-characterized stress-response genes using quantitative real-time PCR in human lymphoblastoid TK6 cells in culture. This optimization approach was applied to the analysis of TK6 cells exposed to one of 14 genotoxic or 14 non-genotoxic agents, with sampling 4 h post-exposure. Microarray-based transcriptomic analyses were then used to develop a classifier for genotoxicity using the nearest shrunken centroids method. A panel of 65 genes was identified that could accurately classify toxicants as genotoxic or non-genotoxic. In Buick et al. (2015 [1], the utility of the biomarker for chemicals that require metabolic activation was evaluated. In this study, TK6 cells were exposed to increasing doses of four chemicals (two genotoxic that require metabolic activation and two non-genotoxic chemicals in the presence of rat liver S9 to demonstrate that S9 does not impair the ability to classify genotoxicity using this genomic biomarker in TK6cells.

  19. Prediction of cardiac arrest recurrence using ensemble classifiers

    Indian Academy of Sciences (India)

    Nachiket Tapas

    ECG dataset from PhysioNet, Pima Indian Diabetes dataset from UCI Machine Learning Repository and gene expression ... electrical activity, medically the condition is known as cardiac arrest ... ing, (5) lack of physical exercise, etc. [9]. Using ...

  20. Clustering based gene expression feature selection method: A computational approach to enrich the classifier efficiency of differentially expressed genes

    KAUST Repository

    Abusamra, Heba; Bajic, Vladimir B.

    2016-01-01

    decrease the computational time and cost, but also improve the classification performance. Among different approaches of feature selection methods, however most of them suffer from several problems such as lack of robustness, validation issues etc. Here, we

  1. Can scientific journals be classified based on their citation profiles?

    Directory of Open Access Journals (Sweden)

    Sayed-Amir Marashi

    2015-03-01

    Full Text Available Classification of scientific publications is of great importance in biomedical research evaluation. However, accurate classification of research publications is challenging and normally is performed in a rather subjective way. In the present paper, we propose to classify biomedical publications into superfamilies, by analysing their citation profiles, i.e. the location of citations in the structure of citing articles. Such a classification may help authors to find the appropriate biomedical journal for publication, may make journal comparisons more rational, and may even help planners to better track the consequences of their policies on biomedical research.

  2. Classifying the future of universes with dark energy

    International Nuclear Information System (INIS)

    Chiba, Takeshi; Takahashi, Ryuichi; Sugiyama, Naoshi

    2005-01-01

    We classify the future of the universe for general cosmological models including matter and dark energy. If the equation of state of dark energy is less then -1, the age of the universe becomes finite. We compute the rest of the age of the universe for such universe models. The behaviour of the future growth of matter density perturbation is also studied. We find that the collapse of the spherical overdensity region is greatly changed if the equation of state of dark energy is less than -1

  3. DFRFT: A Classified Review of Recent Methods with Its Application

    Directory of Open Access Journals (Sweden)

    Ashutosh Kumar Singh

    2013-01-01

    Full Text Available In the literature, there are various algorithms available for computing the discrete fractional Fourier transform (DFRFT. In this paper, all the existing methods are reviewed, classified into four categories, and subsequently compared to find out the best alternative from the view point of minimal computational error, computational complexity, transform features, and additional features like security. Subsequently, the correlation theorem of FRFT has been utilized to remove significantly the Doppler shift caused due to motion of receiver in the DSB-SC AM signal. Finally, the role of DFRFT has been investigated in the area of steganography.

  4. Application of a naive Bayesians classifiers in assessing the supplier

    Directory of Open Access Journals (Sweden)

    Mijailović Snežana

    2017-01-01

    Full Text Available The paper considers the class of interactive knowledge based systems whose main purpose of making proposals and assisting customers in making decisions. The mathematical model provides a set of examples of learning about the delivered series of outflows from three suppliers, as well as an analysis of an illustrative example for assessing the supplier using a naive Bayesian classifier. The model was developed on the basis of the analysis of subjective probabilities, which are later revised with the help of new empirical information and Bayesian theorem on a posterior probability, i.e. combining of subjective and objective conditional probabilities in the choice of a reliable supplier.

  5. Interface Prostheses With Classifier-Feedback-Based User Training.

    Science.gov (United States)

    Fang, Yinfeng; Zhou, Dalin; Li, Kairu; Liu, Honghai

    2017-11-01

    It is evident that user training significantly affects performance of pattern-recognition-based myoelectric prosthetic device control. Despite plausible classification accuracy on offline datasets, online accuracy usually suffers from the changes in physiological conditions and electrode displacement. The user ability in generating consistent electromyographic (EMG) patterns can be enhanced via proper user training strategies in order to improve online performance. This study proposes a clustering-feedback strategy that provides real-time feedback to users by means of a visualized online EMG signal input as well as the centroids of the training samples, whose dimensionality is reduced to minimal number by dimension reduction. Clustering feedback provides a criterion that guides users to adjust motion gestures and muscle contraction forces intentionally. The experiment results have demonstrated that hand motion recognition accuracy increases steadily along the progress of the clustering-feedback-based user training, while conventional classifier-feedback methods, i.e., label feedback, hardly achieve any improvement. The result concludes that the use of proper classifier feedback can accelerate the process of user training, and implies prosperous future for the amputees with limited or no experience in pattern-recognition-based prosthetic device manipulation.It is evident that user training significantly affects performance of pattern-recognition-based myoelectric prosthetic device control. Despite plausible classification accuracy on offline datasets, online accuracy usually suffers from the changes in physiological conditions and electrode displacement. The user ability in generating consistent electromyographic (EMG) patterns can be enhanced via proper user training strategies in order to improve online performance. This study proposes a clustering-feedback strategy that provides real-time feedback to users by means of a visualized online EMG signal input as well

  6. Nonlinear Knowledge in Kernel-Based Multiple Criteria Programming Classifier

    Science.gov (United States)

    Zhang, Dongling; Tian, Yingjie; Shi, Yong

    Kernel-based Multiple Criteria Linear Programming (KMCLP) model is used as classification methods, which can learn from training examples. Whereas, in traditional machine learning area, data sets are classified only by prior knowledge. Some works combine the above two classification principle to overcome the defaults of each approach. In this paper, we propose a model to incorporate the nonlinear knowledge into KMCLP in order to solve the problem when input consists of not only training example, but also nonlinear prior knowledge. In dealing with real world case breast cancer diagnosis, the model shows its better performance than the model solely based on training data.

  7. On-line computing in a classified environment

    International Nuclear Information System (INIS)

    O'Callaghan, P.B.

    1982-01-01

    Westinghouse Hanford Company (WHC) recently developed a Department of Energy (DOE) approved real-time, on-line computer system to control nuclear material. The system simultaneously processes both classified and unclassified information. Implementation of this system required application of many security techniques. The system has a secure, but user friendly interface. Many software applications protect the integrity of the data base from malevolent or accidental errors. Programming practices ensure the integrity of the computer system software. The audit trail and the reports generation capability record user actions and status of the nuclear material inventory

  8. A Handbook for Derivative Classifiers at Los Alamos National Laboratory

    Energy Technology Data Exchange (ETDEWEB)

    Sinkula, Barbara Jean [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2018-02-23

    The Los Alamos Classification Office (within the SAFE-IP group) prepared this handbook as a resource for the Laboratory’s derivative classifiers (DCs). It contains information about United States Government (USG) classification policy, principles, and authorities as they relate to the LANL Classification Program in general, and to the LANL DC program specifically. At a working level, DCs review Laboratory documents and material that are subject to classification review requirements, while the Classification Office provides the training and resources for DCs to perform that vital function.

  9. Classifying BCI signals from novice users with extreme learning machine

    Directory of Open Access Journals (Sweden)

    Rodríguez-Bermúdez Germán

    2017-07-01

    Full Text Available Brain computer interface (BCI allows to control external devices only with the electrical activity of the brain. In order to improve the system, several approaches have been proposed. However it is usual to test algorithms with standard BCI signals from experts users or from repositories available on Internet. In this work, extreme learning machine (ELM has been tested with signals from 5 novel users to compare with standard classification algorithms. Experimental results show that ELM is a suitable method to classify electroencephalogram signals from novice users.

  10. A support vector machine and a random forest classifier indicates a 15-miRNA set related to osteosarcoma recurrence

    Directory of Open Access Journals (Sweden)

    He Y

    2018-01-01

    Full Text Available Yunfei He,1,2,* Jun Ma,1,* An Wang,1,3,* Weiheng Wang,1 Shengchang Luo,1 Yaoming Liu,2 Xiaojian Ye1 1Department of Orthopaedics, Changzheng Hospital Affiliated with Second Military Medical University, Shanghai, 2Department of Orthopaedics, Lanzhou General Hospital of Lanzhou Military Command Region, Lanzhou, 3Department of Orthopaedics, Shanghai Armed Police Force Hospital, Shanghai, People’s Republic of China *These authors contributed equally to this work Background: Osteosarcoma, which originates in the mesenchymal tissue, is the prevalent primary solid malignancy of the bone. It is of great importance to explore the mechanisms of metastasis and recurrence, which are two primary reasons accounting for the high death rate in osteosarcoma. Data and methods: Three miRNA expression profiles related to osteosarcoma were downloaded from GEO DataSets. Differentially expressed miRNAs (DEmiRs were screened using MetaDE.ES of the MetaDE package. A support vector machine (SVM classifier was constructed using optimal miRNAs, and its prediction efficiency for recurrence was detected in independent datasets. Finally, a co-expression network was constructed based on the DEmiRs and their target genes. Results: In total, 78 significantly DEmiRs were screened. The SVM classifier constructed by 15 miRNAs could accurately classify 58 samples in 65 samples (89.2% in the GSE39040 database, which was validated in another two databases, GSE39052 (84.62%, 22/26 and GSE79181 (91.3%, 21/23. Cox regression showed that four miRNAs, including hsa-miR-10b, hsa-miR-1227, hsa-miR-146b-3p, and hsa-miR-873, significantly correlated with tumor recurrence time. There were 137, 147, 145, and 77 target genes of the above four miRNAs, respectively, which were assigned to 17 gene ontology functionally annotated terms and 14 Kyoto Encyclopedia of Genes and Genomes pathways. Among them, the “Osteoclast differentiation” pathway contained a total of seven target genes and was

  11. nRC: non-coding RNA Classifier based on structural features.

    Science.gov (United States)

    Fiannaca, Antonino; La Rosa, Massimo; La Paglia, Laura; Rizzo, Riccardo; Urso, Alfonso

    2017-01-01

    Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs. In this work, we introduce a new ncRNA classification tool, nRC (non-coding RNA Classifier). Our approach is based on features extraction from the ncRNA secondary structure together with a supervised classification algorithm implementing a deep learning architecture based on convolutional neural networks. We tested our approach for the classification of 13 different ncRNA classes. We obtained classification scores, using the most common statistical measures. In particular, we reach an accuracy and sensitivity score of about 74%. The proposed method outperforms other similar classification methods based on secondary structure features and machine learning algorithms, including the RNAcon tool that, to date, is the reference classifier. nRC tool is freely available as a docker image at https://hub.docker.com/r/tblab/nrc/. The source code of nRC tool is also available at https://github.com/IcarPA-TBlab/nrc.

  12. Classifying and mapping wetlands and peat resources using digital cartography

    Science.gov (United States)

    Cameron, Cornelia C.; Emery, David A.

    1992-01-01

    Digital cartography allows the portrayal of spatial associations among diverse data types and is ideally suited for land use and resource analysis. We have developed methodology that uses digital cartography for the classification of wetlands and their associated peat resources and applied it to a 1:24 000 scale map area in New Hampshire. Classifying and mapping wetlands involves integrating the spatial distribution of wetlands types with depth variations in associated peat quality and character. A hierarchically structured classification that integrates the spatial distribution of variations in (1) vegetation, (2) soil type, (3) hydrology, (4) geologic aspects, and (5) peat characteristics has been developed and can be used to build digital cartographic files for resource and land use analysis. The first three parameters are the bases used by the National Wetlands Inventory to classify wetlands and deepwater habitats of the United States. The fourth parameter, geological aspects, includes slope, relief, depth of wetland (from surface to underlying rock or substrate), wetland stratigraphy, and the type and structure of solid and unconsolidated rock surrounding and underlying the wetland. The fifth parameter, peat characteristics, includes the subsurface variation in ash, acidity, moisture, heating value (Btu), sulfur content, and other chemical properties as shown in specimens obtained from core holes. These parameters can be shown as a series of map data overlays with tables that can be integrated for resource or land use analysis.

  13. Efficacy of MRI in classifying proximal focal femoral deficiency

    International Nuclear Information System (INIS)

    Maldjian, C.; Patel, T.Y.; Klein, R.M.; Smith, R.C.

    2007-01-01

    To evaluate the efficacy of MRI in classifying PFFD and to compare MRI to radiographic classification of PFFD. Radiographic and MRI classification of the cases was performed utilizing the Amstutz classification system. Retrospective evaluation of radiographs and MRI exams in nine hips of eight patients with proximal focal femoral deficiency was performed by two radiologists. The cases were classified by radiographs as Amstutz 1: n=3, Amstutz 3: n=3, Amstutz 4: n=1 and Amstutz 5: n=2. The classifications based on MRI were Amstutz 1: n=6, Amstutz 2: n=1, Amstutz 3: n=0, Amstutz 4: n=2 and Amstutz 5: n=0. Three hips demonstrated complete agreement. There were six discordant hips. In two of the discordant cases, follow-up radiographs of 6 months or greater intervals were available and helped to confirm MRI findings. Errors in radiographic evaluation consisted of overestimating the degree of deficiency. MRI is more accurate than radiographic evaluation for the classification of PFFD, particularly early on, prior to the ossification of cartilaginous components in the femurs. Since radiographic evaluation tends to overestimate the degree of deficiency, MRI is a more definitive modality for evaluation of PFFD. (orig.)

  14. REPTREE CLASSIFIER FOR IDENTIFYING LINK SPAM IN WEB SEARCH ENGINES

    Directory of Open Access Journals (Sweden)

    S.K. Jayanthi

    2013-01-01

    Full Text Available Search Engines are used for retrieving the information from the web. Most of the times, the importance is laid on top 10 results sometimes it may shrink as top 5, because of the time constraint and reliability on the search engines. Users believe that top 10 or 5 of total results are more relevant. Here comes the problem of spamdexing. It is a method to deceive the search result quality. Falsified metrics such as inserting enormous amount of keywords or links in website may take that website to the top 10 or 5 positions. This paper proposes a classifier based on the Reptree (Regression tree representative. As an initial step Link-based features such as neighbors, pagerank, truncated pagerank, trustrank and assortativity related attributes are inferred. Based on this features, tree is constructed. The tree uses the feature inference to differentiate spam sites from legitimate sites. WEBSPAM-UK-2007 dataset is taken as a base. It is preprocessed and converted into five datasets FEATA, FEATB, FEATC, FEATD and FEATE. Only link based features are taken for experiments. This paper focus on link spam alone. Finally a representative tree is created which will more precisely classify the web spam entries. Results are given. Regression tree classification seems to perform well as shown through experiments.

  15. Deposition of Nanostructured Thin Film from Size-Classified Nanoparticles

    Science.gov (United States)

    Camata, Renato P.; Cunningham, Nicholas C.; Seol, Kwang Soo; Okada, Yoshiki; Takeuchi, Kazuo

    2003-01-01

    Materials comprising nanometer-sized grains (approximately 1_50 nm) exhibit properties dramatically different from those of their homogeneous and uniform counterparts. These properties vary with size, shape, and composition of nanoscale grains. Thus, nanoparticles may be used as building blocks to engineer tailor-made artificial materials with desired properties, such as non-linear optical absorption, tunable light emission, charge-storage behavior, selective catalytic activity, and countless other characteristics. This bottom-up engineering approach requires exquisite control over nanoparticle size, shape, and composition. We describe the design and characterization of an aerosol system conceived for the deposition of size classified nanoparticles whose performance is consistent with these strict demands. A nanoparticle aerosol is generated by laser ablation and sorted according to size using a differential mobility analyzer. Nanoparticles within a chosen window of sizes (e.g., (8.0 plus or minus 0.6) nm) are deposited electrostatically on a surface forming a film of the desired material. The system allows the assembly and engineering of thin films using size-classified nanoparticles as building blocks.

  16. Speaker gender identification based on majority vote classifiers

    Science.gov (United States)

    Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri

    2017-03-01

    Speaker gender identification is considered among the most important tools in several multimedia applications namely in automatic speech recognition, interactive voice response systems and audio browsing systems. Gender identification systems performance is closely linked to the selected feature set and the employed classification model. Typical techniques are based on selecting the best performing classification method or searching optimum tuning of one classifier parameters through experimentation. In this paper, we consider a relevant and rich set of features involving pitch, MFCCs as well as other temporal and frequency-domain descriptors. Five classification models including decision tree, discriminant analysis, nave Bayes, support vector machine and k-nearest neighbor was experimented. The three best perming classifiers among the five ones will contribute by majority voting between their scores. Experimentations were performed on three different datasets spoken in three languages: English, German and Arabic in order to validate language independency of the proposed scheme. Results confirm that the presented system has reached a satisfying accuracy rate and promising classification performance thanks to the discriminating abilities and diversity of the used features combined with mid-level statistics.

  17. Spread-sheet application to classify radioactive material for shipment

    International Nuclear Information System (INIS)

    Brown, A.N.

    1998-01-01

    A spread-sheet application has been developed at the Idaho National Engineering and Environmental Laboratory to aid the shipper when classifying nuclide mixtures of normal form, radioactive materials. The results generated by this spread-sheet are used to confirm the proper US DOT classification when offering radioactive material packages for transport. The user must input to the spread-sheet the mass of the material being classified, the physical form (liquid or not) and the activity of each regulated nuclide. The spread-sheet uses these inputs to calculate two general values: 1)the specific activity of the material and a summation calculation of the nuclide content. The specific activity is used to determine if the material exceeds the DOT minimal threshold for a radioactive material. If the material is calculated to be radioactive, the specific activity is also used to determine if the material meets the activity requirement for one of the three low specific activity designations (LSA-I, LSA-II, LSA-III, or not LSA). Again, if the material is calculated to be radioactive, the summation calculation is then used to determine which activity category the material will meet (Limited Quantity, Type A, Type B, or Highway Route Controlled Quantity). This spread-sheet has proven to be an invaluable aid for shippers of radioactive materials at the Idaho National Engineering and Environmental Laboratory. (authors)

  18. Identifying aggressive prostate cancer foci using a DNA methylation classifier.

    Science.gov (United States)

    Mundbjerg, Kamilla; Chopra, Sameer; Alemozaffar, Mehrdad; Duymich, Christopher; Lakshminarasimhan, Ranjani; Nichols, Peter W; Aron, Manju; Siegmund, Kimberly D; Ukimura, Osamu; Aron, Monish; Stern, Mariana; Gill, Parkash; Carpten, John D; Ørntoft, Torben F; Sørensen, Karina D; Weisenberger, Daniel J; Jones, Peter A; Duddalwar, Vinay; Gill, Inderbir; Liang, Gangning

    2017-01-12

    Slow-growing prostate cancer (PC) can be aggressive in a subset of cases. Therefore, prognostic tools to guide clinical decision-making and avoid overtreatment of indolent PC and undertreatment of aggressive disease are urgently needed. PC has a propensity to be multifocal with several different cancerous foci per gland. Here, we have taken advantage of the multifocal propensity of PC and categorized aggressiveness of individual PC foci based on DNA methylation patterns in primary PC foci and matched lymph node metastases. In a set of 14 patients, we demonstrate that over half of the cases have multiple epigenetically distinct subclones and determine the primary subclone from which the metastatic lesion(s) originated. Furthermore, we develop an aggressiveness classifier consisting of 25 DNA methylation probes to determine aggressive and non-aggressive subclones. Upon validation of the classifier in an independent cohort, the predicted aggressive tumors are significantly associated with the presence of lymph node metastases and invasive tumor stages. Overall, this study provides molecular-based support for determining PC aggressiveness with the potential to impact clinical decision-making, such as targeted biopsy approaches for early diagnosis and active surveillance, in addition to focal therapy.

  19. Spreadsheet application to classify radioactive material for shipment

    International Nuclear Information System (INIS)

    Brown, A.N.

    1997-12-01

    A spreadsheet application has been developed at the Idaho National Engineering and Environmental Laboratory to aid the shipper when classifying nuclide mixtures of normal form, radioactive materials. The results generated by this spreadsheet are used to confirm the proper US Department of Transportation (DOT) classification when offering radioactive material packages for transport. The user must input to the spreadsheet the mass of the material being classified, the physical form (liquid or not), and the activity of each regulated nuclide. The spreadsheet uses these inputs to calculate two general values: (1) the specific activity of the material, and (2) a summation calculation of the nuclide content. The specific activity is used to determine if the material exceeds the DOT minimal threshold for a radioactive material (Yes or No). If the material is calculated to be radioactive, the specific activity is also used to determine if the material meets the activity requirement for one of the three Low Specific Activity designations (LSA-I, LSA-II, LSA-III, or Not LSA). Again, if the material is calculated to be radioactive, the summation calculation is then used to determine which activity category the material will meet (Limited Quantity, Type A, Type B, or Highway Route Controlled Quantity)

  20. Classifying decommissioning wastes for allocation to appropriate final repositories

    International Nuclear Information System (INIS)

    Alder, J.C.; Tunaboylu, K.

    1982-01-01

    For the safe disposal of radioactive wastes in different repositories, it is of advantage to classify them in well-defined conditioned categories, appropriate for final disposal. These categories, the so-called waste sorts are characterized by similar radionuclide distribution, similar nuclide-specific activity concentrations and similar waste matrix. A methodology is presented for classifying decommissioning wastes and is applied to the decommissioning wastes arising from a Swiss program of 6 GWe. The amounts and nuclide-specific activity inventories of the decommissioning waste sorts have been estimated. A first allocation into two different repository types has been performed. Such a classification enables one to define the source parameters for repository safety analysis and allows one to allocate the different waste categories into appropriate final repositories. This work presents a first iteration to determine which waste sorts belong to which repository type. The characteristics of waste sorts have to be better defined and the protective strength of the repository barriers has to be optimized. 7 references, 2 figures, 4 tables

  1. Classifying magnetic resonance image modalities with convolutional neural networks

    Science.gov (United States)

    Remedios, Samuel; Pham, Dzung L.; Butman, John A.; Roy, Snehashis

    2018-02-01

    Magnetic Resonance (MR) imaging allows the acquisition of images with different contrast properties depending on the acquisition protocol and the magnetic properties of tissues. Many MR brain image processing techniques, such as tissue segmentation, require multiple MR contrasts as inputs, and each contrast is treated differently. Thus it is advantageous to automate the identification of image contrasts for various purposes, such as facilitating image processing pipelines, and managing and maintaining large databases via content-based image retrieval (CBIR). Most automated CBIR techniques focus on a two-step process: extracting features from data and classifying the image based on these features. We present a novel 3D deep convolutional neural network (CNN)- based method for MR image contrast classification. The proposed CNN automatically identifies the MR contrast of an input brain image volume. Specifically, we explored three classification problems: (1) identify T1-weighted (T1-w), T2-weighted (T2-w), and fluid-attenuated inversion recovery (FLAIR) contrasts, (2) identify pre vs postcontrast T1, (3) identify pre vs post-contrast FLAIR. A total of 3418 image volumes acquired from multiple sites and multiple scanners were used. To evaluate each task, the proposed model was trained on 2137 images and tested on the remaining 1281 images. Results showed that image volumes were correctly classified with 97.57% accuracy.

  2. Classifying next-generation sequencing data using a zero-inflated Poisson model.

    Science.gov (United States)

    Zhou, Yan; Wan, Xiang; Zhang, Baoxue; Tong, Tiejun

    2018-04-15

    With the development of high-throughput techniques, RNA-sequencing (RNA-seq) is becoming increasingly popular as an alternative for gene expression analysis, such as RNAs profiling and classification. Identifying which type of diseases a new patient belongs to with RNA-seq data has been recognized as a vital problem in medical research. As RNA-seq data are discrete, statistical methods developed for classifying microarray data cannot be readily applied for RNA-seq data classification. Witten proposed a Poisson linear discriminant analysis (PLDA) to classify the RNA-seq data in 2011. Note, however, that the count datasets are frequently characterized by excess zeros in real RNA-seq or microRNA sequence data (i.e. when the sequence depth is not enough or small RNAs with the length of 18-30 nucleotides). Therefore, it is desired to develop a new model to analyze RNA-seq data with an excess of zeros. In this paper, we propose a Zero-Inflated Poisson Logistic Discriminant Analysis (ZIPLDA) for RNA-seq data with an excess of zeros. The new method assumes that the data are from a mixture of two distributions: one is a point mass at zero, and the other follows a Poisson distribution. We then consider a logistic relation between the probability of observing zeros and the mean of the genes and the sequencing depth in the model. Simulation studies show that the proposed method performs better than, or at least as well as, the existing methods in a wide range of settings. Two real datasets including a breast cancer RNA-seq dataset and a microRNA-seq dataset are also analyzed, and they coincide with the simulation results that our proposed method outperforms the existing competitors. The software is available at http://www.math.hkbu.edu.hk/∼tongt. xwan@comp.hkbu.edu.hk or tongt@hkbu.edu.hk. Supplementary data are available at Bioinformatics online.

  3. A deep learning method for classifying mammographic breast density categories.

    Science.gov (United States)

    Mohamed, Aly A; Berg, Wendie A; Peng, Hong; Luo, Yahong; Jankowitz, Rachel C; Wu, Shandong

    2018-01-01

    Mammographic breast density is an established risk marker for breast cancer and is visually assessed by radiologists in routine mammogram image reading, using four qualitative Breast Imaging and Reporting Data System (BI-RADS) breast density categories. It is particularly difficult for radiologists to consistently distinguish the two most common and most variably assigned BI-RADS categories, i.e., "scattered density" and "heterogeneously dense". The aim of this work was to investigate a deep learning-based breast density classifier to consistently distinguish these two categories, aiming at providing a potential computerized tool to assist radiologists in assigning a BI-RADS category in current clinical workflow. In this study, we constructed a convolutional neural network (CNN)-based model coupled with a large (i.e., 22,000 images) digital mammogram imaging dataset to evaluate the classification performance between the two aforementioned breast density categories. All images were collected from a cohort of 1,427 women who underwent standard digital mammography screening from 2005 to 2016 at our institution. The truths of the density categories were based on standard clinical assessment made by board-certified breast imaging radiologists. Effects of direct training from scratch solely using digital mammogram images and transfer learning of a pretrained model on a large nonmedical imaging dataset were evaluated for the specific task of breast density classification. In order to measure the classification performance, the CNN classifier was also tested on a refined version of the mammogram image dataset by removing some potentially inaccurately labeled images. Receiver operating characteristic (ROC) curves and the area under the curve (AUC) were used to measure the accuracy of the classifier. The AUC was 0.9421 when the CNN-model was trained from scratch on our own mammogram images, and the accuracy increased gradually along with an increased size of training samples

  4. Least Square Support Vector Machine Classifier vs a Logistic Regression Classifier on the Recognition of Numeric Digits

    Directory of Open Access Journals (Sweden)

    Danilo A. López-Sarmiento

    2013-11-01

    Full Text Available In this paper is compared the performance of a multi-class least squares support vector machine (LSSVM mc versus a multi-class logistic regression classifier to problem of recognizing the numeric digits (0-9 handwritten. To develop the comparison was used a data set consisting of 5000 images of handwritten numeric digits (500 images for each number from 0-9, each image of 20 x 20 pixels. The inputs to each of the systems were vectors of 400 dimensions corresponding to each image (not done feature extraction. Both classifiers used OneVsAll strategy to enable multi-classification and a random cross-validation function for the process of minimizing the cost function. The metrics of comparison were precision and training time under the same computational conditions. Both techniques evaluated showed a precision above 95 %, with LS-SVM slightly more accurate. However the computational cost if we found a marked difference: LS-SVM training requires time 16.42 % less than that required by the logistic regression model based on the same low computational conditions.

  5. Higher School Marketing Strategy Formation: Classifying the Factors

    Directory of Open Access Journals (Sweden)

    N. K. Shemetova

    2012-01-01

    Full Text Available The paper deals with the main trends of higher school management strategy formation. The author specifies the educational changes in the modern information society determining the strategy options. For each professional training level the author denotes the set of strategic factors affecting the educational service consumers and, therefore, the effectiveness of the higher school marketing. The given factors are classified from the stand-points of the providers and consumers of educational service (enrollees, students, graduates and postgraduates. The research methods include the statistic analysis and general methods of scientific analysis, synthesis, induction, deduction, comparison, and classification. The author is convinced that the university management should develop the necessary prerequisites for raising the graduates’ competitiveness in the labor market, and stimulate the active marketing policies of the relating subdivisions and departments. In author’s opinion, the above classification of marketing strategy factors can be used as the system of values for educational service providers. 

  6. An automated approach to the design of decision tree classifiers

    Science.gov (United States)

    Argentiero, P.; Chin, R.; Beaudet, P.

    1982-01-01

    An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.

  7. A robust dataset-agnostic heart disease classifier from Phonocardiogram.

    Science.gov (United States)

    Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M

    2017-07-01

    Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.

  8. Business process modeling for processing classified documents using RFID technology

    Directory of Open Access Journals (Sweden)

    Koszela Jarosław

    2016-01-01

    Full Text Available The article outlines the application of the processing approach to the functional description of the designed IT system supporting the operations of the secret office, which processes classified documents. The article describes the application of the method of incremental modeling of business processes according to the BPMN model to the description of the processes currently implemented (“as is” in a manual manner and target processes (“to be”, using the RFID technology for the purpose of their automation. Additionally, the examples of applying the method of structural and dynamic analysis of the processes (process simulation to verify their correctness and efficiency were presented. The extension of the process analysis method is a possibility of applying the warehouse of processes and process mining methods.

  9. The Motivation of Betrayal by Leaking of Classified Information

    Directory of Open Access Journals (Sweden)

    Lăzăroiu Laurențiu-Leonard

    2017-03-01

    Full Text Available Trying to forecast the human behavior involves acts and knowledge of motivational theories, applicable to profile of each organization and in particular to each individual’s style. The anticipation of personal attitudes has not the only aim for a passive monitoring of professional activity, but also wants to increase performance of risk avoidance, in acordance with a specific organizational environment. The emergence and development of motivational forms and values, whose projections determine social crimes, are risk factors, affecting the professional activity of the person, but also affecting the performance and stability of the institution. Moreover, if the motivation determines attitudes aimed at compromising classified information, the resulting actions may be considered as threats to national security. The prevention of such threats can only be achieved by understanding motivational mechanisms and external conditions for the perssonel that make it possible to transform some intentions into real actions.

  10. Using point-set compression to classify folk songs

    DEFF Research Database (Denmark)

    Meredith, David

    2014-01-01

    -neighbour algorithm and leave-one-out cross-validation to classify the 360 melodies into tune families. The classifications produced by the algorithms were compared with a ground-truth classification prepared by expert musicologists. Twelve of the thirteen compressors used in the experiment were based...... compared. The highest classification success rate of 77–84% was achieved by COSIATEC, followed by 60–64% for Forth’s algorithm and then 52–58% for SIATECCompress. When the NCDs were calculated using bzip2, the success rate was only 12.5%. The results demonstrate that the effectiveness of NCD for measuring...... similarity between folk-songs for classification purposes is highly dependent upon the actual compressor chosen. Furthermore, it seems that compressors based on finding maximal repeated patterns in point-set representations of music show more promise for NCD-based music classification than general...

  11. Sex Bias in Classifying Borderline and Narcissistic Personality Disorder.

    Science.gov (United States)

    Braamhorst, Wouter; Lobbestael, Jill; Emons, Wilco H M; Arntz, Arnoud; Witteman, Cilia L M; Bekker, Marrie H J

    2015-10-01

    This study investigated sex bias in the classification of borderline and narcissistic personality disorders. A sample of psychologists in training for a post-master degree (N = 180) read brief case histories (male or female version) and made DSM classification. To differentiate sex bias due to sex stereotyping or to base rate variation, we used different case histories, respectively: (1) non-ambiguous case histories with enough criteria of either borderline or narcissistic personality disorder to meet the threshold for classification, and (2) an ambiguous case with subthreshold features of both borderline and narcissistic personality disorder. Results showed significant differences due to sex of the patient in the ambiguous condition. Thus, when the diagnosis is not straightforward, as in the case of mixed subthreshold features, sex bias is present and is influenced by base-rate variation. These findings emphasize the need for caution in classifying personality disorders, especially borderline or narcissistic traits.

  12. Fisher information metrics for binary classifier evaluation and training

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    Different evaluation metrics for binary classifiers are appropriate to different scientific domains and even to different problems within the same domain. This presentation focuses on the optimisation of event selection to minimise statistical errors in HEP parameter estimation, a problem that is best analysed in terms of the maximisation of Fisher information about the measured parameters. After describing a general formalism to derive evaluation metrics based on Fisher information, three more specific metrics are introduced for the measurements of signal cross sections in counting experiments (FIP1) or distribution fits (FIP2) and for the measurements of other parameters from distribution fits (FIP3). The FIP2 metric is particularly interesting because it can be derived from any ROC curve, provided that prevalence is also known. In addition to its relation to measurement errors when used as an evaluation criterion (which makes it more interesting that the ROC AUC), a further advantage of the FIP2 metric is ...

  13. Multivariate analysis of quantitative traits can effectively classify rapeseed germplasm

    Directory of Open Access Journals (Sweden)

    Jankulovska Mirjana

    2014-01-01

    Full Text Available In this study, the use of different multivariate approaches to classify rapeseed genotypes based on quantitative traits has been presented. Tree regression analysis, PCA analysis and two-way cluster analysis were applied in order todescribe and understand the extent of genetic variability in spring rapeseed genotype by trait data. The traits which highly influenced seed and oil yield in rapeseed were successfully identified by the tree regression analysis. Principal predictor for both response variables was number of pods per plant (NP. NP and 1000 seed weight could help in the selection of high yielding genotypes. High values for both traits and oil content could lead to high oil yielding genotypes. These traits may serve as indirect selection criteria and can lead to improvement of seed and oil yield in rapeseed. Quantitative traits that explained most of the variability in the studied germplasm were classified using principal component analysis. In this data set, five PCs were identified, out of which the first three PCs explained 63% of the total variance. It helped in facilitating the choice of variables based on which the genotypes’ clustering could be performed. The two-way cluster analysissimultaneously clustered genotypes and quantitative traits. The final number of clusters was determined using bootstrapping technique. This approach provided clear overview on the variability of the analyzed genotypes. The genotypes that have similar performance regarding the traits included in this study can be easily detected on the heatmap. Genotypes grouped in the clusters 1 and 8 had high values for seed and oil yield, and relatively short vegetative growth duration period and those in cluster 9, combined moderate to low values for vegetative growth duration and moderate to high seed and oil yield. These genotypes should be further exploited and implemented in the rapeseed breeding program. The combined application of these multivariate methods

  14. Classifying and Visualising Roman Pottery using Computer-scanned Typologies

    Directory of Open Access Journals (Sweden)

    Jacqueline Christmas

    2018-05-01

    Full Text Available For many archaeological assemblages and type-series, accurate drawings of standardised pottery vessels have been recorded in consistent styles. This provides the opportunity to extract individual pot drawings and derive from them data that can be used for analysis and visualisation. Starting from PDF scans of the original pages of pot drawings, we have automated much of the process for locating, defining the boundaries, extracting and orientating each individual pot drawing. From these processed images, basic features such as width and height, the volume of the interior, the edges, and the shape of the cross-section outline are extracted and are then used to construct more complex features such as a measure of a pot's 'circularity'. Capturing these traits opens up new possibilities for (a classifying vessel form in a way that is sensitive to the physical characteristics of pots relative to other vessels in an assemblage, and (b visualising the results of quantifying assemblages using standard typologies. A frequently encountered problem when trying to compare pottery from different archaeological sites is that the pottery is classified into forms and labels using different standards. With a set of data from early Roman urban centres and related sites that has been labelled both with forms (e.g. 'platter' and 'bowl' and shape identifiers (based on the Camulodunum type-series, we use the extracted features from images to look both at how the pottery forms cluster for a given set of features, and at how the features may be used to compare finds from different sites.

  15. Deep Learning to Classify Radiology Free-Text Reports.

    Science.gov (United States)

    Chen, Matthew C; Ball, Robyn L; Yang, Lingyao; Moradzadeh, Nathaniel; Chapman, Brian E; Larson, David B; Langlotz, Curtis P; Amrhein, Timothy J; Lungren, Matthew P

    2018-03-01

    Purpose To evaluate the performance of a deep learning convolutional neural network (CNN) model compared with a traditional natural language processing (NLP) model in extracting pulmonary embolism (PE) findings from thoracic computed tomography (CT) reports from two institutions. Materials and Methods Contrast material-enhanced CT examinations of the chest performed between January 1, 1998, and January 1, 2016, were selected. Annotations by two human radiologists were made for three categories: the presence, chronicity, and location of PE. Classification of performance of a CNN model with an unsupervised learning algorithm for obtaining vector representations of words was compared with the open-source application PeFinder. Sensitivity, specificity, accuracy, and F1 scores for both the CNN model and PeFinder in the internal and external validation sets were determined. Results The CNN model demonstrated an accuracy of 99% and an area under the curve value of 0.97. For internal validation report data, the CNN model had a statistically significant larger F1 score (0.938) than did PeFinder (0.867) when classifying findings as either PE positive or PE negative, but no significant difference in sensitivity, specificity, or accuracy was found. For external validation report data, no statistical difference between the performance of the CNN model and PeFinder was found. Conclusion A deep learning CNN model can classify radiology free-text reports with accuracy equivalent to or beyond that of an existing traditional NLP model. © RSNA, 2017 Online supplemental material is available for this article.

  16. Immunohistochemical analysis of breast tissue microarray images using contextual classifiers

    Directory of Open Access Journals (Sweden)

    Stephen J McKenna

    2013-01-01

    Full Text Available Background: Tissue microarrays (TMAs are an important tool in translational research for examining multiple cancers for molecular and protein markers. Automatic immunohistochemical (IHC scoring of breast TMA images remains a challenging problem. Methods: A two-stage approach that involves localization of regions of invasive and in-situ carcinoma followed by ordinal IHC scoring of nuclei in these regions is proposed. The localization stage classifies locations on a grid as tumor or non-tumor based on local image features. These classifications are then refined using an auto-context algorithm called spin-context. Spin-context uses a series of classifiers to integrate image feature information with spatial context information in the form of estimated class probabilities. This is achieved in a rotationally-invariant manner. The second stage estimates ordinal IHC scores in terms of the strength of staining and the proportion of nuclei stained. These estimates take the form of posterior probabilities, enabling images with uncertain scores to be referred for pathologist review. Results: The method was validated against manual pathologist scoring on two nuclear markers, progesterone receptor (PR and estrogen receptor (ER. Errors for PR data were consistently lower than those achieved with ER data. Scoring was in terms of estimated proportion of cells that were positively stained (scored on an ordinal scale of 0-6 and perceived strength of staining (scored on an ordinal scale of 0-3. Average absolute differences between predicted scores and pathologist-assigned scores were 0.74 for proportion of cells and 0.35 for strength of staining (PR. Conclusions: The use of context information via spin-context improved the precision and recall of tumor localization. The combination of the spin-context localization method with the automated scoring method resulted in reduced IHC scoring errors.

  17. Novel gene sets improve set-level classification of prokaryotic gene expression data.

    Science.gov (United States)

    Holec, Matěj; Kuželka, Ondřej; Železný, Filip

    2015-10-28

    Set-level classification of gene expression data has received significant attention recently. In this setting, high-dimensional vectors of features corresponding to genes are converted into lower-dimensional vectors of features corresponding to biologically interpretable gene sets. The dimensionality reduction brings the promise of a decreased risk of overfitting, potentially resulting in improved accuracy of the learned classifiers. However, recent empirical research has not confirmed this expectation. Here we hypothesize that the reported unfavorable classification results in the set-level framework were due to the adoption of unsuitable gene sets defined typically on the basis of the Gene ontology and the KEGG database of metabolic networks. We explore an alternative approach to defining gene sets, based on regulatory interactions, which we expect to collect genes with more correlated expression. We hypothesize that such more correlated gene sets will enable to learn more accurate classifiers. We define two families of gene sets using information on regulatory interactions, and evaluate them on phenotype-classification tasks using public prokaryotic gene expression data sets. From each of the two gene-set families, we first select the best-performing subtype. The two selected subtypes are then evaluated on independent (testing) data sets against state-of-the-art gene sets and against the conventional gene-level approach. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. The novel gene sets are indeed more correlated than the conventional ones, and lead to significantly more accurate classifiers. Novel gene sets defined on the basis of regulatory interactions improve set-level classification of gene expression data. The experimental scripts and other material needed to reproduce the experiments are available at http://ida.felk.cvut.cz/novelgenesets.tar.gz.

  18. Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition

    International Nuclear Information System (INIS)

    Shen Hongbin; Chou Kuochen

    2005-01-01

    The nucleus is the brain of eukaryotic cells that guides the life processes of the cell by issuing key instructions. For in-depth understanding of the biochemical process of the nucleus, the knowledge of localization of nuclear proteins is very important. With the avalanche of protein sequences generated in the post-genomic era, it is highly desired to develop an automated method for fast annotating the subnuclear locations for numerous newly found nuclear protein sequences so as to be able to timely utilize them for basic research and drug discovery. In view of this, a novel approach is developed for predicting the protein subnuclear location. It is featured by introducing a powerful classifier, the optimized evidence-theoretic K-nearest classifier, and using the pseudo amino acid composition [K.C. Chou, PROTEINS: Structure, Function, and Genetics, 43 (2001) 246], which can incorporate a considerable amount of sequence-order effects, to represent protein samples. As a demonstration, identifications were performed for 370 nuclear proteins among the following 9 subnuclear locations: (1) Cajal body, (2) chromatin, (3) heterochromatin, (4) nuclear diffuse, (5) nuclear pore, (6) nuclear speckle, (7) nucleolus, (8) PcG body, and (9) PML body. The overall success rates thus obtained by both the re-substitution test and jackknife cross-validation test are significantly higher than those by existing classifiers on the same working dataset. It is anticipated that the powerful approach may also become a useful high throughput vehicle to bridge the huge gap occurring in the post-genomic era between the number of gene sequences in databases and the number of gene products that have been functionally characterized. The OET-KNN classifier will be available at www.pami.sjtu.edu.cn/people/hbshen

  19. Classified model and characteristics of strategies at tourist companies

    Directory of Open Access Journals (Sweden)

    I.V. Saukh

    2017-12-01

    Full Text Available The research is devoted to the assessment of the scientific approaches to the identification of classification features of the strategy and its types distinguished in accordance with the mentioned features. The research object is the activities of tourist companies and this determines the choice of strategies typical for the tourism field. It is substantiated that the scientific approaches to the classification of strategies are various in specific literature because of obscurity in the strategy definition, vagueness and plurality of its classified features. Due to the current research the authors have improved the classified model of strategies for tourist companies that will result in making effective management decisions directed to the development of enterprise potential under conditions of unstable and unpredictable external environment. The paper singles out the peculiarities of functioning the tourism branch, which are the following : high sensitivity to the changes in external environment; the high level of competition in the field; dynamics and the lack of necessity for the use of «far-seeing» strategies; insufficiency of information provision for the application of traditional western models and matric methods of strategy development; time gap between obtaining the service and its consumption; a great number of intermediaries; seasonal swings in demands; the sudden shift of external environment caused by cyclicity, globalization, political decisions of separate countries and etc. The article shows essential differences in the development of financial strategies of small-scale enterprises and stock companies of tourist business. It is substantiated that small-scale enterprises develop strategies directed to a higher level of personal services, occupational competence, ability and experience in designing, the best knowledge of regional conditions and flexible decisions caused by the peculiarities of the received orders. Taking into

  20. Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction

    Directory of Open Access Journals (Sweden)

    Boulesteix Anne-Laure

    2009-12-01

    Full Text Available Abstract Background In biometric practice, researchers often apply a large number of different methods in a "trial-and-error" strategy to get as much as possible out of their data and, due to publication pressure or pressure from the consulting customer, present only the most favorable results. This strategy may induce a substantial optimistic bias in prediction error estimation, which is quantitatively assessed in the present manuscript. The focus of our work is on class prediction based on high-dimensional data (e.g. microarray data, since such analyses are particularly exposed to this kind of bias. Methods In our study we consider a total of 124 variants of classifiers (possibly including variable selection or tuning steps within a cross-validation evaluation scheme. The classifiers are applied to original and modified real microarray data sets, some of which are obtained by randomly permuting the class labels to mimic non-informative predictors while preserving their correlation structure. Results We assess the minimal misclassification rate over the different variants of classifiers in order to quantify the bias arising when the optimal classifier is selected a posteriori in a data-driven manner. The bias resulting from the parameter tuning (including gene selection parameters as a special case and the bias resulting from the choice of the classification method are examined both separately and jointly. Conclusions The median minimal error rate over the investigated classifiers was as low as 31% and 41% based on permuted uninformative predictors from studies on colon cancer and prostate cancer, respectively. We conclude that the strategy to present only the optimal result is not acceptable because it yields a substantial bias in error rate estimation, and suggest alternative approaches for properly reporting classification accuracy.

  1. The decision tree classifier - Design and potential. [for Landsat-1 data

    Science.gov (United States)

    Hauska, H.; Swain, P. H.

    1975-01-01

    A new classifier has been developed for the computerized analysis of remote sensor data. The decision tree classifier is essentially a maximum likelihood classifier using multistage decision logic. It is characterized by the fact that an unknown sample can be classified into a class using one or several decision functions in a successive manner. The classifier is applied to the analysis of data sensed by Landsat-1 over Kenosha Pass, Colorado. The classifier is illustrated by a tree diagram which for processing purposes is encoded as a string of symbols such that there is a unique one-to-one relationship between string and decision tree.

  2. Classifying images using restricted Boltzmann machines and convolutional neural networks

    Science.gov (United States)

    Zhao, Zhijun; Xu, Tongde; Dai, Chenyu

    2017-07-01

    To improve the feature recognition ability of deep model transfer learning, we propose a hybrid deep transfer learning method for image classification based on restricted Boltzmann machines (RBM) and convolutional neural networks (CNNs). It integrates learning abilities of two models, which conducts subject classification by exacting structural higher-order statistics features of images. While the method transfers the trained convolutional neural networks to the target datasets, fully-connected layers can be replaced by restricted Boltzmann machine layers; then the restricted Boltzmann machine layers and Softmax classifier are retrained, and BP neural network can be used to fine-tuned the hybrid model. The restricted Boltzmann machine layers has not only fully integrated the whole feature maps, but also learns the statistical features of target datasets in the view of the biggest logarithmic likelihood, thus removing the effects caused by the content differences between datasets. The experimental results show that the proposed method has improved the accuracy of image classification, outperforming other methods on Pascal VOC2007 and Caltech101 datasets.

  3. Classifying and explaining democracy in the Muslim world

    Directory of Open Access Journals (Sweden)

    Rohaizan Baharuddin

    2012-12-01

    Full Text Available The purpose of this study is to classify and explain democracies in the 47 Muslim countries between the years 1998 and 2008 by using liberties and elections as independent variables. Specifically focusing on the context of the Muslim world, this study examines the performance of civil liberties and elections, variation of democracy practised the most, the elections, civil liberties and democratic transitions and patterns that followed. Based on the quantitative data primarily collected from Freedom House, this study demonstrates the following aggregate findings: first, the “not free not fair” elections, the “limited” civil liberties and the “Illiberal Partial Democracy” were the dominant feature of elections, civil liberties and democracy practised in the Muslim world; second, a total of 413 Muslim regimes out of 470 (47 regimes x 10 years remained the same as their democratic origin points, without any transitions to a better or worse level of democracy, throughout these 10 years; and third, a slow, yet steady positive transition of both elections and civil liberties occurred in the Muslim world with changes in the nature of elections becoming much more progressive compared to the civil liberties’ transitions.

  4. An automatic classifier of emotions built from entropy of noise.

    Science.gov (United States)

    Ferreira, Jacqueline; Brás, Susana; Silva, Carlos F; Soares, Sandra C

    2017-04-01

    The electrocardiogram (ECG) signal has been widely used to study the physiological substrates of emotion. However, searching for better filtering techniques in order to obtain a signal with better quality and with the maximum relevant information remains an important issue for researchers in this field. Signal processing is largely performed for ECG analysis and interpretation, but this process can be susceptible to error in the delineation phase. In addition, it can lead to the loss of important information that is usually considered as noise and, consequently, discarded from the analysis. The goal of this study was to evaluate if the ECG noise allows for the classification of emotions, while using its entropy as an input in a decision tree classifier. We collected the ECG signal from 25 healthy participants while they were presented with videos eliciting negative (fear and disgust) and neutral emotions. The results indicated that the neutral condition showed a perfect identification (100%), whereas the classification of negative emotions indicated good identification performances (60% of sensitivity and 80% of specificity). These results suggest that the entropy of noise contains relevant information that can be useful to improve the analysis of the physiological correlates of emotion. © 2016 Society for Psychophysiological Research.

  5. Classifying the Optical Morphology of Shocked POststarburst Galaxies

    Science.gov (United States)

    Stewart, Tess; SPOGs Team

    2018-01-01

    The Shocked POststarburst Galaxy Survey (SPOGS) is a sample of galaxies in transition from blue, star forming spirals to red, inactive ellipticals. These galaxies are earlier in the transition than classical poststarburst samples. We have classified the physical characteristics of the full sample of 1067 SPOGs in 7 categories, covering (1) their shape; (2) the relative prominence of their nuclei; (3) the uniformity of their optical color; (4) whether the outskirts of the galaxy were indicative of on-going star formation; (5) whether they are engaged in interactions with other galaxies, and if so, (6) the kinds of galaxies with which they are interacting; and (7) the presence of asymmetrical features, possibly indicative of recent interactions. We determined that a plurality of SPOGs are in elliptical galaxies, indicating morphological transformations may tend to conclude before other indicators of transitions have faded. Further, early-type SPOGs also tend to have the brightest optical nuclei. Most galaxies do not show signs of current or recent interactions. We used these classifications to search for correlations between qualitative and quantitative characteristics of SPOGs using Sloan Digital Sky Survey and Wide-field Infrared Survey Explorer magnitudes. We find that relative optical nuclear brightness is not a good indicator of the presence of an active galactic nuclei and that galaxies with visible indications of active star formation also cluster in optical color and diagnostic line ratios.

  6. Asymptotic performance of regularized quadratic discriminant analysis based classifiers

    KAUST Repository

    Elkhalil, Khalil

    2017-12-13

    This paper carries out a large dimensional analysis of the standard regularized quadratic discriminant analysis (QDA) classifier designed on the assumption that data arise from a Gaussian mixture model. The analysis relies on fundamental results from random matrix theory (RMT) when both the number of features and the cardinality of the training data within each class grow large at the same pace. Under some mild assumptions, we show that the asymptotic classification error converges to a deterministic quantity that depends only on the covariances and means associated with each class as well as the problem dimensions. Such a result permits a better understanding of the performance of regularized QDA and can be used to determine the optimal regularization parameter that minimizes the misclassification error probability. Despite being valid only for Gaussian data, our theoretical findings are shown to yield a high accuracy in predicting the performances achieved with real data sets drawn from popular real data bases, thereby making an interesting connection between theory and practice.

  7. Molecular Characteristics in MRI-classified Group 1 Glioblastoma Multiforme

    Directory of Open Access Journals (Sweden)

    William E Haskins

    2013-07-01

    Full Text Available Glioblastoma multiforme (GBM is a clinically and pathologically heterogeneous brain tumor. Previous study of MRI-classified GBM has revealed a spatial relationship between Group 1 GBM (GBM1 and the subventricular zone (SVZ. The SVZ is an adult neural stem cell niche and is also suspected to be the origin of a subtype of brain tumor. The intimate contact between GBM1 and the SVZ raises the possibility that tumor cells in GBM1 may be most related to SVZ cells. In support of this notion, we found that neural stem cell and neuroblast markers are highly expressed in GBM1. Additionally, we identified molecular characteristics in this type of GBM that include up-regulation of metabolic enzymes, ribosomal proteins, heat shock proteins, and c-Myc oncoprotein. As GBM1 often recurs at great distances from the initial lesion, the rewiring of metabolism and ribosomal biogenesis may facilitate cancer cells’ growth and survival during tumor migration. Taken together, combined our findings and MRI-based classification of GBM1 would offer better prediction and treatment for this multifocal GBM.

  8. The Complete Gabor-Fisher Classifier for Robust Face Recognition

    Directory of Open Access Journals (Sweden)

    Štruc Vitomir

    2010-01-01

    Full Text Available Abstract This paper develops a novel face recognition technique called Complete Gabor Fisher Classifier (CGFC. Different from existing techniques that use Gabor filters for deriving the Gabor face representation, the proposed approach does not rely solely on Gabor magnitude information but effectively uses features computed based on Gabor phase information as well. It represents one of the few successful attempts found in the literature of combining Gabor magnitude and phase information for robust face recognition. The novelty of the proposed CGFC technique comes from (1 the introduction of a Gabor phase-based face representation and (2 the combination of the recognition technique using the proposed representation with classical Gabor magnitude-based methods into a unified framework. The proposed face recognition framework is assessed in a series of face verification and identification experiments performed on the XM2VTS, Extended YaleB, FERET, and AR databases. The results of the assessment suggest that the proposed technique clearly outperforms state-of-the-art face recognition techniques from the literature and that its performance is almost unaffected by the presence of partial occlusions of the facial area, changes in facial expression, or severe illumination changes.

  9. Stability of halophilic proteins: from dipeptide attributes to discrimination classifier.

    Science.gov (United States)

    Zhang, Guangya; Huihua, Ge; Yi, Lin

    2013-02-01

    To investigate the molecular features responsible for protein halophilicity is of great significance for understanding the structure basis of protein halo-stability and would help to develop a practical strategy for designing halophilic proteins. In this work, we have systematically analyzed the dipeptide composition of the halophilic and non-halophilic protein sequences. We observed the halophilic proteins contained more DA, RA, AD, RR, AP, DD, PD, EA, VG and DV at the expense of LK, IL, II, IA, KK, IS, KA, GK, RK and AI. We identified some macromolecular signatures of halo-adaptation, and thought the dipeptide composition might contain more information than amino acid composition. Based on the dipeptide composition, we have developed a machine learning method for classifying halophilic and non-halophilic proteins for the first time. The accuracy of our method for the training dataset was 100.0%, and for the 10-fold cross-validation was 93.1%. We also discussed the influence of some specific dipeptides on prediction accuracy. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. A Large Dimensional Analysis of Regularized Discriminant Analysis Classifiers

    KAUST Repository

    Elkhalil, Khalil

    2017-11-01

    This article carries out a large dimensional analysis of standard regularized discriminant analysis classifiers designed on the assumption that data arise from a Gaussian mixture model with different means and covariances. The analysis relies on fundamental results from random matrix theory (RMT) when both the number of features and the cardinality of the training data within each class grow large at the same pace. Under mild assumptions, we show that the asymptotic classification error approaches a deterministic quantity that depends only on the means and covariances associated with each class as well as the problem dimensions. Such a result permits a better understanding of the performance of regularized discriminant analsysis, in practical large but finite dimensions, and can be used to determine and pre-estimate the optimal regularization parameter that minimizes the misclassification error probability. Despite being theoretically valid only for Gaussian data, our findings are shown to yield a high accuracy in predicting the performances achieved with real data sets drawn from the popular USPS data base, thereby making an interesting connection between theory and practice.

  11. Classifying Taiwan Lianas with Radiating Plates of Xylem

    Directory of Open Access Journals (Sweden)

    Sheng-Zehn Yang

    2015-12-01

    Full Text Available Radiating plates of xylem are a lianas cambium variation, of which, 22 families have this feature. This study investigates 15 liana species representing nine families with radiating plates of xylem structures. The features of the transverse section and epidermis in fresh liana samples are documented, including shapes and colors of xylem and phloem, ray width and numbers, and skin morphology. Experimental results indicated that the shape of phloem fibers in Ampelopsis brevipedunculata var. hancei is gradually tapered and flame-like, which is in contrast with the other characteristics of this type, including those classified as rays. Both inner and outer cylinders of vascular bundles are found in Piper kwashoense, and the irregularly inner cylinder persists yet gradually diminishes. Red crystals are numerous in the cortex of Celastrus kusanoi. Aristolochia shimadai and A. zollingeriana develop a combination of two cambium variants, radiating plates of xylem and a lobed xylem. The shape of phloem in Stauntonia obovatifoliola is square or truncate, and its rays are numerous. Meanwhile, that of Neoalsomitra integrifolia is blunt and its rays are fewer. As for the features of a stem surface within the same family, Cyclea ochiaiana is brownish in color and has a deep vertical depression with lenticels, Pericampylus glaucus is greenish in color with a vertical shallow depression. Within the same genus, Aristolochia shimadai develops lenticels, which are not in A. zollingeriana; although the periderm developed in Clematis grata is a ring bark and tears easily, that of Clematis tamura is thick and soft.

  12. Using biological indices to classify schizophrenia and other psychotic patients.

    Science.gov (United States)

    Sponheim, S R; Iacono, W G; Thuras, P D; Beiser, M

    2001-07-01

    Although classification of mental disorders using more than clinical description would be desirable, there is scant evidence that available laboratory tests (i.e. biological indices) would provide more valid classifications than current diagnostic systems (e.g. DSM-IV). We used cluster analysis of four biological variables to classify 163 psychotic patients and 83 nonpsychiatric comparison subjects. Analyses revealed a three-cluster solution with the first cluster reflecting electrodermal deviance, the second cluster representing nondeviant biological function, and the third cluster reflecting increased nailfold plexus visibility and ocular motor dysfunction. To assess the construct validity of proband clusters we examined ocular motor performance in 156 first-degree relatives as a function of proband cluster membership. First-degree relatives of third cluster probands exhibited worse ocular motor performance than relatives of other cluster probands. Additionally, better classification sensitivity and specificity were obtained for the relatives when they were grouped by proband cluster than by proband DSM-IV diagnosis. When a single proband characteristic (i.e. eyetracking performance) was used to group relatives, classification sensitivity and specificity failed to significantly increase over grouping by proband DSM-IV diagnosis. Multivariate biologically defined clusters may offer an advantage over DSM-IV classification when examining nosology and etiology of psychotic disorders.

  13. Passive Sonar Target Detection Using Statistical Classifier and Adaptive Threshold

    Directory of Open Access Journals (Sweden)

    Hamed Komari Alaie

    2018-01-01

    Full Text Available This paper presents the results of an experimental investigation about target detecting with passive sonar in Persian Gulf. Detecting propagated sounds in the water is one of the basic challenges of the researchers in sonar field. This challenge will be complex in shallow water (like Persian Gulf and noise less vessels. Generally, in passive sonar, the targets are detected by sonar equation (with constant threshold that increases the detection error in shallow water. The purpose of this study is proposed a new method for detecting targets in passive sonars using adaptive threshold. In this method, target signal (sound is processed in time and frequency domain. For classifying, Bayesian classification is used and posterior distribution is estimated by Maximum Likelihood Estimation algorithm. Finally, target was detected by combining the detection points in both domains using Least Mean Square (LMS adaptive filter. Results of this paper has showed that the proposed method has improved true detection rate by about 24% when compared other the best detection method.

  14. Addressing the Challenge of Defining Valid Proteomic Biomarkers and Classifiers

    LENUS (Irish Health Repository)

    Dakna, Mohammed

    2010-12-10

    Abstract Background The purpose of this manuscript is to provide, based on an extensive analysis of a proteomic data set, suggestions for proper statistical analysis for the discovery of sets of clinically relevant biomarkers. As tractable example we define the measurable proteomic differences between apparently healthy adult males and females. We choose urine as body-fluid of interest and CE-MS, a thoroughly validated platform technology, allowing for routine analysis of a large number of samples. The second urine of the morning was collected from apparently healthy male and female volunteers (aged 21-40) in the course of the routine medical check-up before recruitment at the Hannover Medical School. Results We found that the Wilcoxon-test is best suited for the definition of potential biomarkers. Adjustment for multiple testing is necessary. Sample size estimation can be performed based on a small number of observations via resampling from pilot data. Machine learning algorithms appear ideally suited to generate classifiers. Assessment of any results in an independent test-set is essential. Conclusions Valid proteomic biomarkers for diagnosis and prognosis only can be defined by applying proper statistical data mining procedures. In particular, a justification of the sample size should be part of the study design.

  15. Quantum Hooke's Law to Classify Pulse Laser Induced Ultrafast Melting

    Science.gov (United States)

    Hu, Hao; Ding, Hepeng; Liu, Feng

    2015-02-01

    Ultrafast crystal-to-liquid phase transition induced by femtosecond pulse laser excitation is an interesting material's behavior manifesting the complexity of light-matter interaction. There exist two types of such phase transitions: one occurs at a time scale shorter than a picosecond via a nonthermal process mediated by electron-hole plasma formation; the other at a longer time scale via a thermal melting process mediated by electron-phonon interaction. However, it remains unclear what material would undergo which process and why? Here, by exploiting the property of quantum electronic stress (QES) governed by quantum Hooke's law, we classify the transitions by two distinct classes of materials: the faster nonthermal process can only occur in materials like ice having an anomalous phase diagram characterized with dTm/dP < 0, where Tm is the melting temperature and P is pressure, above a high threshold laser fluence; while the slower thermal process may occur in all materials. Especially, the nonthermal transition is shown to be induced by the QES, acting like a negative internal pressure, which drives the crystal into a ``super pressing'' state to spontaneously transform into a higher-density liquid phase. Our findings significantly advance fundamental understanding of ultrafast crystal-to-liquid phase transitions, enabling quantitative a priori predictions.

  16. Classifying and Analyzing 3d Cell Motion in Jammed Microgels

    Science.gov (United States)

    Bhattacharjee, Tapomoy; Sawyer, W. Gregory; Angelini, Thomas

    Soft granular polyelectrolyte microgels swell in liquid cell growth media to form a continuous elastic solid that can easily transition between solid to fluid state under a low shear stress. Such Liquid-like solids (LLS) have recently been used to create 3D cellular constructs as well as to support, culture and harvest cells in 3D. Current understanding of cell migration mechanics in 3D was established from experiments performed in natural and synthetic polymer networks. Spatial variation in network structure and the transience of degradable gels limit their usefulness in quantitative cell mechanics studies. By contrast, LLS growth media approximates a homogeneous continuum, enabling tractable cell mechanics measurements to be performed in 3D. Here, we introduce a process to understand and classify cytotoxic T cell motion in 3D by studying cellular motility in LLS media. General classification of T cell motion can be achieved with a very traditional statistical approach: the cell's mean squared displacement (MSD) as a function of delay time. We will also use Langevin approaches combined with the constitutive equations of the LLS medium to predict the statistics of T cell motion. National Science Foundation under Grant No. DMR-1352043.

  17. Transforming Musical Signals through a Genre Classifying Convolutional Neural Network

    Science.gov (United States)

    Geng, S.; Ren, G.; Ogihara, M.

    2017-05-01

    Convolutional neural networks (CNNs) have been successfully applied on both discriminative and generative modeling for music-related tasks. For a particular task, the trained CNN contains information representing the decision making or the abstracting process. One can hope to manipulate existing music based on this 'informed' network and create music with new features corresponding to the knowledge obtained by the network. In this paper, we propose a method to utilize the stored information from a CNN trained on musical genre classification task. The network was composed of three convolutional layers, and was trained to classify five-second song clips into five different genres. After training, randomly selected clips were modified by maximizing the sum of outputs from the network layers. In addition to the potential of such CNNs to produce interesting audio transformation, more information about the network and the original music could be obtained from the analysis of the generated features since these features indicate how the network 'understands' the music.

  18. An Informed Framework for Training Classifiers from Social Media

    Directory of Open Access Journals (Sweden)

    Dong Seon Cheng

    2016-04-01

    Full Text Available Extracting information from social media has become a major focus of companies and researchers in recent years. Aside from the study of the social aspects, it has also been found feasible to exploit the collaborative strength of crowds to help solve classical machine learning problems like object recognition. In this work, we focus on the generally underappreciated problem of building effective datasets for training classifiers by automatically assembling data from social media. We detail some of the challenges of this approach and outline a framework that uses expanded search queries to retrieve more qualified data. In particular, we concentrate on collaboratively tagged media on the social platform Flickr, and on the problem of image classification to evaluate our approach. Finally, we describe a novel entropy-based method to incorporate an information-theoretic principle to guide our framework. Experimental validation against well-known public datasets shows the viability of this approach and marks an improvement over the state of the art in terms of simplicity and performance.

  19. Classifying the evolutionary and ecological features of neoplasms

    Science.gov (United States)

    Maley, Carlo C.; Aktipis, Athena; Graham, Trevor A.; Sottoriva, Andrea; Boddy, Amy M.; Janiszewska, Michalina; Silva, Ariosto S.; Gerlinger, Marco; Yuan, Yinyin; Pienta, Kenneth J.; Anderson, Karen S.; Gatenby, Robert; Swanton, Charles; Posada, David; Wu, Chung-I; Schiffman, Joshua D.; Hwang, E. Shelley; Polyak, Kornelia; Anderson, Alexander R. A.; Brown, Joel S.; Greaves, Mel; Shibata, Darryl

    2018-01-01

    Neoplasms change over time through a process of cell-level evolution, driven by genetic and epigenetic alterations. However, the ecology of the microenvironment of a neoplastic cell determines which changes provide adaptive benefits. There is widespread recognition of the importance of these evolutionary and ecological processes in cancer, but to date, no system has been proposed for drawing clinically relevant distinctions between how different tumours are evolving. On the basis of a consensus conference of experts in the fields of cancer evolution and cancer ecology, we propose a framework for classifying tumours that is based on four relevant components. These are the diversity of neoplastic cells (intratumoural heterogeneity) and changes over time in that diversity, which make up an evolutionary index (Evo-index), as well as the hazards to neoplastic cell survival and the resources available to neoplastic cells, which make up an ecological index (Eco-index). We review evidence demonstrating the importance of each of these factors and describe multiple methods that can be used to measure them. Development of this classification system holds promise for enabling clinicians to personalize optimal interventions based on the evolvability of the patient’s tumour. The Evo- and Eco-indices provide a common lexicon for communicating about how neoplasms change in response to interventions, with potential implications for clinical trials, personalized medicine and basic cancer research. PMID:28912577

  20. Learning multiscale and deep representations for classifying remotely sensed imagery

    Science.gov (United States)

    Zhao, Wenzhi; Du, Shihong

    2016-03-01

    It is widely agreed that spatial features can be combined with spectral properties for improving interpretation performances on very-high-resolution (VHR) images in urban areas. However, many existing methods for extracting spatial features can only generate low-level features and consider limited scales, leading to unpleasant classification results. In this study, multiscale convolutional neural network (MCNN) algorithm was presented to learn spatial-related deep features for hyperspectral remote imagery classification. Unlike traditional methods for extracting spatial features, the MCNN first transforms the original data sets into a pyramid structure containing spatial information at multiple scales, and then automatically extracts high-level spatial features using multiscale training data sets. Specifically, the MCNN has two merits: (1) high-level spatial features can be effectively learned by using the hierarchical learning structure and (2) multiscale learning scheme can capture contextual information at different scales. To evaluate the effectiveness of the proposed approach, the MCNN was applied to classify the well-known hyperspectral data sets and compared with traditional methods. The experimental results shown a significant increase in classification accuracies especially for urban areas.

  1. Executed Movement Using EEG Signals through a Naive Bayes Classifier

    Directory of Open Access Journals (Sweden)

    Juliano Machado

    2014-11-01

    Full Text Available Recent years have witnessed a rapid development of brain-computer interface (BCI technology. An independent BCI is a communication system for controlling a device by human intension, e.g., a computer, a wheelchair or a neuroprosthes is, not depending on the brain’s normal output pathways of peripheral nerves and muscles, but on detectable signals that represent responsive or intentional brain activities. This paper presents a comparative study of the usage of the linear discriminant analysis (LDA and the naive Bayes (NB classifiers on describing both right- and left-hand movement through electroencephalographic signal (EEG acquisition. For the analysis, we considered the following input features: the energy of the segments of a band pass-filtered signal with the frequency band in sensorimotor rhythms and the components of the spectral energy obtained through the Welch method. We also used the common spatial pattern (CSP filter, so as to increase the discriminatory activity among movement classes. By using the database generated by this experiment, we obtained hit rates up to 70%. The results are compatible with previous studies.

  2. Online Feature Selection for Classifying Emphysema in HRCT Images

    Directory of Open Access Journals (Sweden)

    M. Prasad

    2008-06-01

    Full Text Available Feature subset selection, applied as a pre- processing step to machine learning, is valuable in dimensionality reduction, eliminating irrelevant data and improving classifier performance. In the classic formulation of the feature selection problem, it is assumed that all the features are available at the beginning. However, in many real world problems, there are scenarios where not all features are present initially and must be integrated as they become available. In such scenarios, online feature selection provides an efficient way to sort through a large space of features. It is in this context that we introduce online feature selection for the classification of emphysema, a smoking related disease that appears as low attenuation regions in High Resolution Computer Tomography (HRCT images. The technique was successfully evaluated on 61 HRCT scans and compared with different online feature selection approaches, including hill climbing, best first search, grafting, and correlation-based feature selection. The results were also compared against ldensity maskr, a standard approach used for emphysema detection in medical image analysis.

  3. Hyperspectral image classifier based on beach spectral feature

    International Nuclear Information System (INIS)

    Liang, Zhang; Lianru, Gao; Bing, Zhang

    2014-01-01

    The seashore, especially coral bank, is sensitive to human activities and environmental changes. A multispectral image, with coarse spectral resolution, is inadaptable for identify subtle spectral distinctions between various beaches. To the contrary, hyperspectral image with narrow and consecutive channels increases our capability to retrieve minor spectral features which is suit for identification and classification of surface materials on the shore. Herein, this paper used airborne hyperspectral data, in addition to ground spectral data to study the beaches in Qingdao. The image data first went through image pretreatment to deal with the disturbance of noise, radiation inconsistence and distortion. In succession, the reflection spectrum, the derivative spectrum and the spectral absorption features of the beach surface were inspected in search of diagnostic features. Hence, spectra indices specific for the unique environment of seashore were developed. According to expert decisions based on image spectrums, the beaches are ultimately classified into sand beach, rock beach, vegetation beach, mud beach, bare land and water. In situ surveying reflection spectrum from GER1500 field spectrometer validated the classification production. In conclusion, the classification approach under expert decision based on feature spectrum is proved to be feasible for beaches

  4. A dimensionless parameter for classifying hemodynamics in intracranial

    Science.gov (United States)

    Asgharzadeh, Hafez; Borazjani, Iman

    2015-11-01

    Rupture of an intracranial aneurysm (IA) is a disease with high rates of mortality. Given the risk associated with the aneurysm surgery, quantifying the likelihood of aneurysm rupture is essential. There are many risk factors that could be implicated in the rupture of an aneurysm. However, the most important factors correlated to the IA rupture are hemodynamic factors such as wall shear stress (WSS) and oscillatory shear index (OSI) which are affected by the IA flows. Here, we carry out three-dimensional high resolution simulations on representative IA models with simple geometries to test a dimensionless number (first proposed by Le et al., ASME J Biomech Eng, 2010), denoted as An number, to classify the flow mode. An number is defined as the ratio of the time takes the parent artery flow transports across the IA neck to the time required for vortex ring formation. Based on the definition, the flow mode is vortex if An>1 and it is cavity if AnOSI on the human subject IA. This work was supported partly by the NIH grant R03EB014860, and the computational resources were partly provided by CCR at UB. We thank Prof. Hui Meng and Dr. Jianping Xiang for providing us the database of aneurysms and helpful discussions.

  5. Instance Selection for Classifier Performance Estimation in Meta Learning

    Directory of Open Access Journals (Sweden)

    Marcin Blachnik

    2017-11-01

    Full Text Available Building an accurate prediction model is challenging and requires appropriate model selection. This process is very time consuming but can be accelerated with meta-learning–automatic model recommendation by estimating the performances of given prediction models without training them. Meta-learning utilizes metadata extracted from the dataset to effectively estimate the accuracy of the model in question. To achieve that goal, metadata descriptors must be gathered efficiently and must be informative to allow the precise estimation of prediction accuracy. In this paper, a new type of metadata descriptors is analyzed. These descriptors are based on the compression level obtained from the instance selection methods at the data-preprocessing stage. To verify their suitability, two types of experiments on real-world datasets have been conducted. In the first one, 11 instance selection methods were examined in order to validate the compression–accuracy relation for three classifiers: k-nearest neighbors (kNN, support vector machine (SVM, and random forest. From this analysis, two methods are recommended (instance-based learning type 2 (IB2, and edited nearest neighbor (ENN which are then compared with the state-of-the-art metaset descriptors. The obtained results confirm that the two suggested compression-based meta-features help to predict accuracy of the base model much more accurately than the state-of-the-art solution.

  6. Implementation of a classifier didactical machine for learning mechatronic processes

    Directory of Open Access Journals (Sweden)

    Alex De La Cruz

    2017-06-01

    Full Text Available The present article shows the design and construction of a classifier didactical machine through artificial vision. The implementation of the machine is to be used as a learning module of mechatronic processes. In the project, it is described the theoretical aspects that relate concepts of mechanical design, electronic design and software management which constitute popular field in science and technology, which is mechatronics. The design of the machine was developed based on the requirements of the user, through the concurrent design methodology to define and materialize the appropriate hardware and software solutions. LabVIEW 2015 was implemented for high-speed image acquisition and analysis, as well as for the establishment of data communication with a programmable logic controller (PLC via Ethernet and an open communications platform known as Open Platform Communications - OPC. In addition, the Arduino MEGA 2560 platform was used to control the movement of the step motor and the servo motors of the module. Also, is used the Arduino MEGA 2560 to control the movement of the stepper motor and servo motors in the module. Finally, we assessed whether the equipment meets the technical specifications raised by running specific test protocols.

  7. Salient Region Detection via Feature Combination and Discriminative Classifier

    Directory of Open Access Journals (Sweden)

    Deming Kong

    2015-01-01

    Full Text Available We introduce a novel approach to detect salient regions of an image via feature combination and discriminative classifier. Our method, which is based on hierarchical image abstraction, uses the logistic regression approach to map the regional feature vector to a saliency score. Four saliency cues are used in our approach, including color contrast in a global context, center-boundary priors, spatially compact color distribution, and objectness, which is as an atomic feature of segmented region in the image. By mapping a four-dimensional regional feature to fifteen-dimensional feature vector, we can linearly separate the salient regions from the clustered background by finding an optimal linear combination of feature coefficients in the fifteen-dimensional feature space and finally fuse the saliency maps across multiple levels. Furthermore, we introduce the weighted salient image center into our saliency analysis task. Extensive experiments on two large benchmark datasets show that the proposed approach achieves the best performance over several state-of-the-art approaches.

  8. Gene Therapy

    Science.gov (United States)

    Gene therapy Overview Gene therapy involves altering the genes inside your body's cells in an effort to treat or stop disease. Genes contain your ... that don't work properly can cause disease. Gene therapy replaces a faulty gene or adds a new ...

  9. A GIS semiautomatic tool for classifying and mapping wetland soils

    Science.gov (United States)

    Moreno-Ramón, Héctor; Marqués-Mateu, Angel; Ibáñez-Asensio, Sara

    2016-04-01

    Wetlands are one of the most productive and biodiverse ecosystems in the world. Water is the main resource and controls the relationships between agents and factors that determine the quality of the wetland. However, vegetation, wildlife and soils are also essential factors to understand these environments. It is possible that soils have been the least studied resource due to their sampling problems. This feature has caused that sometimes wetland soils have been classified broadly. The traditional methodology states that homogeneous soil units should be based on the five soil forming-factors. The problem can appear when the variation of one soil-forming factor is too small to differentiate a change in soil units, or in case that there is another factor, which is not taken into account (e.g. fluctuating water table). This is the case of Albufera of Valencia, a coastal wetland located in the middle east of the Iberian Peninsula (Spain). The saline water table fluctuates throughout the year and it generates differences in soils. To solve this problem, the objectives of this study were to establish a reliable methodology to avoid that problems, and develop a GIS tool that would allow us to define homogeneous soil units in wetlands. This step is essential for the soil scientist, who has to decide the number of soil profiles in a study. The research was conducted with data from 133 soil pits of a previous study in the wetland. In that study, soil parameters of 401 samples (organic carbon, salinity, carbonates, n-value, etc.) were analysed. In a first stage, GIS layers were generated according to depth. The method employed was Bayesian Maxim Entropy. Subsequently, it was designed a program in GIS environment that was based on the decision tree algorithms. The goal of this tool was to create a single layer, for each soil variable, according to the different diagnostic criteria of Soil Taxonomy (properties, horizons and diagnostic epipedons). At the end, the program

  10. Locating and classifying defects using an hybrid data base

    Energy Technology Data Exchange (ETDEWEB)

    Luna-Aviles, A; Diaz Pineda, A [Tecnologico de Estudios Superiores de Coacalco. Av. 16 de Septiembre 54, Col. Cabecera Municipal. C.P. 55700 (Mexico); Hernandez-Gomez, L H; Urriolagoitia-Calderon, G; Urriolagoitia-Sosa, G [Instituto Politecnico Nacional. ESIME-SEPI. Unidad Profesional ' Adolfo Lopez Mateos' Edificio 5, 30 Piso, Colonia Lindavista. Gustavo A. Madero. 07738 Mexico D.F. (Mexico); Durodola, J F [School of Technology, Oxford Brookes University, Headington Campus, Gipsy Lane, Oxford OX3 0BP (United Kingdom); Beltran Fernandez, J A, E-mail: alelunaav@hotmail.com, E-mail: luishector56@hotmail.com, E-mail: jdurodola@brookes.ac.uk

    2011-07-19

    A computational inverse technique was used in the localization and classification of defects. Postulated voids of two different sizes (2 mm and 4 mm diameter) were introduced in PMMA bars with and without a notch. The bar dimensions are 200x20x5 mm. One half of them were plain and the other half has a notch (3 mm x 4 mm) which is close to the defect area (19 mm x 16 mm).This analysis was done with an Artificial Neural Network (ANN) and its optimization was done with an Adaptive Neuro Fuzzy Procedure (ANFIS). A hybrid data base was developed with numerical and experimental results. Synthetic data was generated with the finite element method using SOLID95 element of ANSYS code. A parametric analysis was carried out. Only one defect in such bars was taken into account and the first five natural frequencies were calculated. 460 cases were evaluated. Half of them were plain and the other half has a notch. All the input data was classified in two groups. Each one has 230 cases and corresponds to one of the two sort of voids mentioned above. On the other hand, experimental analysis was carried on with PMMA specimens of the same size. The first two natural frequencies of 40 cases were obtained with one void. The other three frequencies were obtained numerically. 20 of these bars were plain and the others have a notch. These experimental results were introduced in the synthetic data base. 400 cases were taken randomly and, with this information, the ANN was trained with the backpropagation algorithm. The accuracy of the results was tested with the 100 cases that were left. In the next stage of this work, the ANN output was optimized with ANFIS. Previous papers showed that localization and classification of defects was reduced as notches were introduced in such bars. In the case of this paper, improved results were obtained when a hybrid data base was used.

  11. CLASSIFYING X-RAY BINARIES: A PROBABILISTIC APPROACH

    International Nuclear Information System (INIS)

    Gopalan, Giri; Bornn, Luke; Vrtilek, Saeqa Dil

    2015-01-01

    In X-ray binary star systems consisting of a compact object that accretes material from an orbiting secondary star, there is no straightforward means to decide whether the compact object is a black hole or a neutron star. To assist in this process, we develop a Bayesian statistical model that makes use of the fact that X-ray binary systems appear to cluster based on their compact object type when viewed from a three-dimensional coordinate system derived from X-ray spectral data where the first coordinate is the ratio of counts in the mid- to low-energy band (color 1), the second coordinate is the ratio of counts in the high- to low-energy band (color 2), and the third coordinate is the sum of counts in all three bands. We use this model to estimate the probabilities of an X-ray binary system containing a black hole, non-pulsing neutron star, or pulsing neutron star. In particular, we utilize a latent variable model in which the latent variables follow a Gaussian process prior distribution, and hence we are able to induce the spatial correlation which we believe exists between systems of the same type. The utility of this approach is demonstrated by the accurate prediction of system types using Rossi X-ray Timing Explorer All Sky Monitor data, but it is not flawless. In particular, non-pulsing neutron systems containing “bursters” that are close to the boundary demarcating systems containing black holes tend to be classified as black hole systems. As a byproduct of our analyses, we provide the astronomer with the public R code which can be used to predict the compact object type of XRBs given training data

  12. Development of multicriteria models to classify energy efficiency alternatives

    International Nuclear Information System (INIS)

    Neves, Luis Pires; Antunes, Carlos Henggeler; Dias, Luis Candido; Martins, Antonio Gomes

    2005-01-01

    This paper aims at describing a novel constructive approach to develop decision support models to classify energy efficiency initiatives, including traditional Demand-Side Management and Market Transformation initiatives, overcoming the limitations and drawbacks of Cost-Benefit Analysis. A multicriteria approach based on the ELECTRE-TRI method is used, focusing on four perspectives: - an independent Agency with the aim of promoting energy efficiency; - Distribution-only utilities under a regulated framework; - the Regulator; - Supply companies in a competitive liberalized market. These perspectives were chosen after a system analysis of the decision situation regarding the implementation of energy efficiency initiatives, looking for the main roles and power relations, with the purpose of structuring the decision problem by identifying the actors, the decision makers, the decision paradigm, and the relevant criteria. The multicriteria models developed allow considering different kinds of impacts, but avoiding difficult measurements and unit conversions due to the nature of the multicriteria method chosen. The decision is then based on all the significant effects of the initiative, both positive and negative ones, including ancillary effects often forgotten in cost-benefit analysis. The ELECTRE-TRI, as most multicriteria methods, provides to the Decision Maker the ability of controlling the relevance each impact can have on the final decision. The decision support process encompasses a robustness analysis, which, together with a good documentation of the parameters supplied into the model, should support sound decisions. The models were tested with a set of real-world initiatives and compared with possible decisions based on Cost-Benefit analysis

  13. Intermediate depth burial of classified transuranic wastes in arid alluvium

    International Nuclear Information System (INIS)

    Cochran, J.R.; Crowe, B.M.; Di Sanza, F.

    1999-01-01

    Intermediate depth disposal operations were conducted by the US Department of Energy (DOE) at the DOE's Nevada Test Site (NTS) from 1984 through 1989. These operations emplaced high-specific activity low-level wastes (LLW) and limited quantities of classified transuranic (TRU) wastes in 37 m (120-ft) deep, Greater Confinement Disposal (GCD) boreholes. The GCD boreholes are 3 m (10 ft) in diameter and founded in a thick sequence of arid alluvium. The bottom 15 m (50 ft) of each borehole was used for waste emplacement and the upper 21 m (70 ft) was backfilled with native alluvium. The bottom of each GCD borehole is almost 200 m (650 ft) above the water table. The GCD boreholes are located in one of the most arid portions of the US, with an average precipitation of 13 cm (5 inches) per year. The limited precipitation, coupled with generally warm temperatures and low humidities results in a hydrologic system dominated by evapotranspiration. The US Environmental Protection Agency's (EPA's) 40 CFR 191 defines the requirements for protection of human health from disposed TRU wastes. This EPA standard sets a number of requirements, including probabilistic limits on the cumulative releases of radionuclides to the accessible environment for 10,000 years. The DOE Nevada Operations Office (DOE/NV) has contracted with Sandia National Laboratories (Sandia) to conduct a performance assessment (PA) to determine if the TRU wastes emplaced in the GCD boreholes complies with the EPA's 40 CFR 191 requirements. This paper describes DOE's actions undertaken to evaluate whether the TRU wastes in the GCD boreholes will, or will not, endanger human health. Based on preliminary modeling, the TRU wastes in the GCD boreholes meet the EPA's requirements, and are, therefore, protective of human health

  14. Locating and classifying defects using an hybrid data base

    Science.gov (United States)

    Luna-Avilés, A.; Hernández-Gómez, L. H.; Durodola, J. F.; Urriolagoitia-Calderón, G.; Urriolagoitia-Sosa, G.; Beltrán Fernández, J. A.; Díaz Pineda, A.

    2011-07-01

    A computational inverse technique was used in the localization and classification of defects. Postulated voids of two different sizes (2 mm and 4 mm diameter) were introduced in PMMA bars with and without a notch. The bar dimensions are 200×20×5 mm. One half of them were plain and the other half has a notch (3 mm × 4 mm) which is close to the defect area (19 mm × 16 mm).This analysis was done with an Artificial Neural Network (ANN) and its optimization was done with an Adaptive Neuro Fuzzy Procedure (ANFIS). A hybrid data base was developed with numerical and experimental results. Synthetic data was generated with the finite element method using SOLID95 element of ANSYS code. A parametric analysis was carried out. Only one defect in such bars was taken into account and the first five natural frequencies were calculated. 460 cases were evaluated. Half of them were plain and the other half has a notch. All the input data was classified in two groups. Each one has 230 cases and corresponds to one of the two sort of voids mentioned above. On the other hand, experimental analysis was carried on with PMMA specimens of the same size. The first two natural frequencies of 40 cases were obtained with one void. The other three frequencies were obtained numerically. 20 of these bars were plain and the others have a notch. These experimental results were introduced in the synthetic data base. 400 cases were taken randomly and, with this information, the ANN was trained with the backpropagation algorithm. The accuracy of the results was tested with the 100 cases that were left. In the next stage of this work, the ANN output was optimized with ANFIS. Previous papers showed that localization and classification of defects was reduced as notches were introduced in such bars. In the case of this paper, improved results were obtained when a hybrid data base was used.

  15. Multimodal fusion of polynomial classifiers for automatic person recgonition

    Science.gov (United States)

    Broun, Charles C.; Zhang, Xiaozheng

    2001-03-01

    With the prevalence of the information age, privacy and personalization are forefront in today's society. As such, biometrics are viewed as essential components of current evolving technological systems. Consumers demand unobtrusive and non-invasive approaches. In our previous work, we have demonstrated a speaker verification system that meets these criteria. However, there are additional constraints for fielded systems. The required recognition transactions are often performed in adverse environments and across diverse populations, necessitating robust solutions. There are two significant problem areas in current generation speaker verification systems. The first is the difficulty in acquiring clean audio signals in all environments without encumbering the user with a head- mounted close-talking microphone. Second, unimodal biometric systems do not work with a significant percentage of the population. To combat these issues, multimodal techniques are being investigated to improve system robustness to environmental conditions, as well as improve overall accuracy across the population. We propose a multi modal approach that builds on our current state-of-the-art speaker verification technology. In order to maintain the transparent nature of the speech interface, we focus on optical sensing technology to provide the additional modality-giving us an audio-visual person recognition system. For the audio domain, we use our existing speaker verification system. For the visual domain, we focus on lip motion. This is chosen, rather than static face or iris recognition, because it provides dynamic information about the individual. In addition, the lip dynamics can aid speech recognition to provide liveness testing. The visual processing method makes use of both color and edge information, combined within Markov random field MRF framework, to localize the lips. Geometric features are extracted and input to a polynomial classifier for the person recognition process. A late

  16. Machine learning algorithms to classify spinal muscular atrophy subtypes.

    Science.gov (United States)

    Srivastava, Tuhin; Darras, Basil T; Wu, Jim S; Rutkove, Seward B

    2012-07-24

    The development of better biomarkers for disease assessment remains an ongoing effort across the spectrum of neurologic illnesses. One approach for refining biomarkers is based on the concept of machine learning, in which individual, unrelated biomarkers are simultaneously evaluated. In this cross-sectional study, we assess the possibility of using machine learning, incorporating both quantitative muscle ultrasound (QMU) and electrical impedance myography (EIM) data, for classification of muscles affected by spinal muscular atrophy (SMA). Twenty-one normal subjects, 15 subjects with SMA type 2, and 10 subjects with SMA type 3 underwent EIM and QMU measurements of unilateral biceps, wrist extensors, quadriceps, and tibialis anterior. EIM and QMU parameters were then applied in combination using a support vector machine (SVM), a type of machine learning, in an attempt to accurately categorize 165 individual muscles. For all 3 classification problems, normal vs SMA, normal vs SMA 3, and SMA 2 vs SMA 3, use of SVM provided the greatest accuracy in discrimination, surpassing both EIM and QMU individually. For example, the accuracy, as measured by the receiver operating characteristic area under the curve (ROC-AUC) for the SVM discriminating SMA 2 muscles from SMA 3 muscles was 0.928; in comparison, the ROC-AUCs for EIM and QMU parameters alone were only 0.877 (p < 0.05) and 0.627 (p < 0.05), respectively. Combining EIM and QMU data categorizes individual SMA-affected muscles with very high accuracy. Further investigation of this approach for classifying and for following the progression of neuromuscular illness is warranted.

  17. Unsupervised online classifier in sleep scoring for sleep deprivation studies.

    Science.gov (United States)

    Libourel, Paul-Antoine; Corneyllie, Alexandra; Luppi, Pierre-Hervé; Chouvet, Guy; Gervasoni, Damien

    2015-05-01

    This study was designed to evaluate an unsupervised adaptive algorithm for real-time detection of sleep and wake states in rodents. We designed a Bayesian classifier that automatically extracts electroencephalogram (EEG) and electromyogram (EMG) features and categorizes non-overlapping 5-s epochs into one of the three major sleep and wake states without any human supervision. This sleep-scoring algorithm is coupled online with a new device to perform selective paradoxical sleep deprivation (PSD). Controlled laboratory settings for chronic polygraphic sleep recordings and selective PSD. Ten adult Sprague-Dawley rats instrumented for chronic polysomnographic recordings. The performance of the algorithm is evaluated by comparison with the score obtained by a human expert reader. Online detection of PS is then validated with a PSD protocol with duration of 72 hours. Our algorithm gave a high concordance with human scoring with an average κ coefficient > 70%. Notably, the specificity to detect PS reached 92%. Selective PSD using real-time detection of PS strongly reduced PS amounts, leaving only brief PS bouts necessary for the detection of PS in EEG and EMG signals (4.7 ± 0.7% over 72 h, versus 8.9 ± 0.5% in baseline), and was followed by a significant PS rebound (23.3 ± 3.3% over 150 minutes). Our fully unsupervised data-driven algorithm overcomes some limitations of the other automated methods such as the selection of representative descriptors or threshold settings. When used online and coupled with our sleep deprivation device, it represents a better option for selective PSD than other methods like the tedious gentle handling or the platform method. © 2015 Associated Professional Sleep Societies, LLC.

  18. Locating and classifying defects using an hybrid data base

    International Nuclear Information System (INIS)

    Luna-Aviles, A; Diaz Pineda, A; Hernandez-Gomez, L H; Urriolagoitia-Calderon, G; Urriolagoitia-Sosa, G; Durodola, J F; Beltran Fernandez, J A

    2011-01-01

    A computational inverse technique was used in the localization and classification of defects. Postulated voids of two different sizes (2 mm and 4 mm diameter) were introduced in PMMA bars with and without a notch. The bar dimensions are 200x20x5 mm. One half of them were plain and the other half has a notch (3 mm x 4 mm) which is close to the defect area (19 mm x 16 mm).This analysis was done with an Artificial Neural Network (ANN) and its optimization was done with an Adaptive Neuro Fuzzy Procedure (ANFIS). A hybrid data base was developed with numerical and experimental results. Synthetic data was generated with the finite element method using SOLID95 element of ANSYS code. A parametric analysis was carried out. Only one defect in such bars was taken into account and the first five natural frequencies were calculated. 460 cases were evaluated. Half of them were plain and the other half has a notch. All the input data was classified in two groups. Each one has 230 cases and corresponds to one of the two sort of voids mentioned above. On the other hand, experimental analysis was carried on with PMMA specimens of the same size. The first two natural frequencies of 40 cases were obtained with one void. The other three frequencies were obtained numerically. 20 of these bars were plain and the others have a notch. These experimental results were introduced in the synthetic data base. 400 cases were taken randomly and, with this information, the ANN was trained with the backpropagation algorithm. The accuracy of the results was tested with the 100 cases that were left. In the next stage of this work, the ANN output was optimized with ANFIS. Previous papers showed that localization and classification of defects was reduced as notches were introduced in such bars. In the case of this paper, improved results were obtained when a hybrid data base was used.

  19. Classifying supersymmetric solutions in 3D maximal supergravity

    Science.gov (United States)

    de Boer, Jan; Mayerson, Daniel R.; Shigemori, Masaki

    2014-12-01

    String theory contains various extended objects. Among those, objects of codimension two (such as the D7-brane) are particularly interesting. Codimension-two objects carry non-Abelian charges which are elements of a discrete U-duality group and they may not admit a simple spacetime description, in which case they are known as exotic branes. A complete classification of consistent codimension-two objects in string theory is missing, even if we demand that they preserve some supersymmetry. As a step toward such a classification, we study the supersymmetric solutions of 3D maximal supergravity, which can be regarded as an approximate description of the geometry near codimension-two objects. We present a complete classification of the types of supersymmetric solutions that exist in this theory. We found that this problem reduces to that of classifying nilpotent orbits associated with the U-duality group, for which various mathematical results are known. We show that the only allowed supersymmetric configurations are 1/2, 1/4, 1/8, and 1/16 BPS, and determine the nilpotent orbits that they correspond to. One example of 1/16 BPS configurations is a generalization of the MSW system, where momentum runs along the intersection of seven M5-branes. On the other hand, it turns out exceedingly difficult to translate this classification into a simple criterion for supersymmetry in terms of the non-Abelian (monodromy) charges of the objects. For example, it can happen that a supersymmetric solution exists locally but cannot be extended all the way to the location of the object. To illustrate the various issues that arise in constructing supersymmetric solutions, we present a number of explicit examples.

  20. CLASSIFYING BENIGN AND MALIGNANT MASSES USING STATISTICAL MEASURES

    Directory of Open Access Journals (Sweden)

    B. Surendiran

    2011-11-01

    Full Text Available Breast cancer is the primary and most common disease found in women which causes second highest rate of death after lung cancer. The digital mammogram is the X-ray of breast captured for the analysis, interpretation and diagnosis. According to Breast Imaging Reporting and Data System (BIRADS benign and malignant can be differentiated using its shape, size and density, which is how radiologist visualize the mammograms. According to BIRADS mass shape characteristics, benign masses tend to have round, oval, lobular in shape and malignant masses are lobular or irregular in shape. Measuring regular and irregular shapes mathematically is found to be a difficult task, since there is no single measure to differentiate various shapes. In this paper, the malignant and benign masses present in mammogram are classified using Hue, Saturation and Value (HSV weight function based statistical measures. The weight function is robust against noise and captures the degree of gray content of the pixel. The statistical measures use gray weight value instead of gray pixel value to effectively discriminate masses. The 233 mammograms from the Digital Database for Screening Mammography (DDSM benchmark dataset have been used. The PASW data mining modeler has been used for constructing Neural Network for identifying importance of statistical measures. Based on the obtained important statistical measure, the C5.0 tree has been constructed with 60-40 data split. The experimental results are found to be encouraging. Also, the results will agree to the standard specified by the American College of Radiology-BIRADS Systems.

  1. Binary naive Bayesian classifiers for correlated Gaussian features: a theoretical analysis

    CSIR Research Space (South Africa)

    Van Dyk, E

    2008-11-01

    Full Text Available classifier with Gaussian features while using any quadratic decision boundary. Therefore, the analysis is not restricted to Naive Bayesian classifiers alone and can, for instance, be used to calculate the Bayes error performance. We compare the analytical...

  2. 32 CFR 2004.21 - Protection of Classified Information [201(e)].

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Protection of Classified Information [201(e... PROGRAM DIRECTIVE NO. 1 Operations § 2004.21 Protection of Classified Information [201(e)]. Procedures for... coordination process. ...

  3. Evaluating the Performance of Multiple Classifier Systems: A Matrix Algebra Representation of Boolean Fusion Rules

    National Research Council Canada - National Science Library

    Hill, Justin

    2003-01-01

    ...., a logical OR, AND, or a majority vote of the classifiers in the system). An established method for evaluating a classifier is measuring some aspect of its Receiver Operating Characteristic (ROC...

  4. Predicting Alzheimer's disease by classifying 3D-Brain MRI images using SVM and other well-defined classifiers

    International Nuclear Information System (INIS)

    Matoug, S; Abdel-Dayem, A; Passi, K; Gross, W; Alqarni, M

    2012-01-01

    Alzheimer's disease (AD) is the most common form of dementia affecting seniors age 65 and over. When AD is suspected, the diagnosis is usually confirmed with behavioural assessments and cognitive tests, often followed by a brain scan. Advanced medical imaging and pattern recognition techniques are good tools to create a learning database in the first step and to predict the class label of incoming data in order to assess the development of the disease, i.e., the conversion from prodromal stages (mild cognitive impairment) to Alzheimer's disease, which is the most critical brain disease for the senior population. Advanced medical imaging such as the volumetric MRI can detect changes in the size of brain regions due to the loss of the brain tissues. Measuring regions that atrophy during the progress of Alzheimer's disease can help neurologists in detecting and staging the disease. In the present investigation, we present a pseudo-automatic scheme that reads volumetric MRI, extracts the middle slices of the brain region, performs segmentation in order to detect the region of brain's ventricle, generates a feature vector that characterizes this region, creates an SQL database that contains the generated data, and finally classifies the images based on the extracted features. For our results, we have used the MRI data sets from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.

  5. LOCALIZATION AND RECOGNITION OF DYNAMIC HAND GESTURES BASED ON HIERARCHY OF MANIFOLD CLASSIFIERS

    OpenAIRE

    M. Favorskaya; A. Nosov; A. Popov

    2015-01-01

    Generally, the dynamic hand gestures are captured in continuous video sequences, and a gesture recognition system ought to extract the robust features automatically. This task involves the highly challenging spatio-temporal variations of dynamic hand gestures. The proposed method is based on two-level manifold classifiers including the trajectory classifiers in any time instants and the posture classifiers of sub-gestures in selected time instants. The trajectory classifiers contain skin dete...

  6. Learning Bayesian network classifiers for credit scoring using Markov Chain Monte Carlo search

    NARCIS (Netherlands)

    Baesens, B.; Egmont-Petersen, M.; Castelo, R.; Vanthienen, J.

    2001-01-01

    In this paper, we will evaluate the power and usefulness of Bayesian network classifiers for credit scoring. Various types of Bayesian network classifiers will be evaluated and contrasted including unrestricted Bayesian network classifiers learnt using Markov Chain Monte Carlo (MCMC) search.

  7. Intuitive Action Set Formation in Learning Classifier Systems with Memory Registers

    NARCIS (Netherlands)

    Simões, L.F.; Schut, M.C.; Haasdijk, E.W.

    2008-01-01

    An important design goal in Learning Classifier Systems (LCS) is to equally reinforce those classifiers which cause the level of reward supplied by the environment. In this paper, we propose a new method for action set formation in LCS. When applied to a Zeroth Level Classifier System with Memory

  8. 3 CFR - Implementation of the Executive Order, “Classified National Security Information”

    Science.gov (United States)

    2010-01-01

    ... 29, 2009 Implementation of the Executive Order, “Classified National Security Information” Memorandum..., “Classified National Security Information” (the “order”), which substantially advances my goals for reforming... or handles classified information shall provide the Director of the Information Security Oversight...

  9. 36 CFR 1256.70 - What controls access to national security-classified information?

    Science.gov (United States)

    2010-07-01

    ... national security-classified information? 1256.70 Section 1256.70 Parks, Forests, and Public Property... HISTORICAL MATERIALS Access to Materials Containing National Security-Classified Information § 1256.70 What controls access to national security-classified information? (a) The declassification of and public access...

  10. Evaluation of classifiers that score linear type traits and body condition score using common sires

    NARCIS (Netherlands)

    Veerkamp, R.F.; Gerritsen, C.L.M.; Koenen, E.P.C.; Hamoen, A.; Jong, de G.

    2002-01-01

    Subjective visual assessment of animals by classifiers is undertaken for several different traits in farm livestock, e.g., linear type traits, body condition score, or carcass conformation. One of the difficulties in assessment is the effect of an individual classifier. To ensure that classifiers

  11. Drug target ontology to classify and integrate drug discovery data

    DEFF Research Database (Denmark)

    Lin, Yu; Mehta, Saurabh; Küçük-McGinty, Hande

    2017-01-01

    using a new software tool to auto-generate most axioms from a database while supporting manual knowledge acquisition. A modular, hierarchical implementation facilitate ontology development and maintenance and makes use of various external ontologies, thus integrating the DTO into the ecosystem...... of biomedical ontologies. As a formal OWL-DL ontology, DTO contains asserted and inferred axioms. Modeling data from the Library of Integrated Network-based Cellular Signatures (LINCS) program illustrates the potential of DTO for contextual data integration and nuanced definition of important drug target...... characteristics. DTO has been implemented in the IDG user interface Portal, Pharos and the TIN-X explorer of protein target disease relationships. CONCLUSIONS: DTO was built based on the need for a formal semantic model for druggable targets including various related information such as protein, gene, protein...

  12. Statistical Redundancy Testing for Improved Gene Selection in Cancer Classification Using Microarray Data

    Directory of Open Access Journals (Sweden)

    J. Sunil Rao

    2007-01-01

    Full Text Available In gene selection for cancer classifi cation using microarray data, we define an eigenvalue-ratio statistic to measure a gene’s contribution to the joint discriminability when this gene is included into a set of genes. Based on this eigenvalueratio statistic, we define a novel hypothesis testing for gene statistical redundancy and propose two gene selection methods. Simulation studies illustrate the agreement between statistical redundancy testing and gene selection methods. Real data examples show the proposed gene selection methods can select a compact gene subset which can not only be used to build high quality cancer classifiers but also show biological relevance.

  13. A degenerate primer MOB typing (DPMT method to classify gamma-proteobacterial plasmids in clinical and environmental settings.

    Directory of Open Access Journals (Sweden)

    Andrés Alvarado

    Full Text Available Transmissible plasmids are responsible for the spread of genetic determinants, such as antibiotic resistance or virulence traits, causing a large ecological and epidemiological impact. Transmissible plasmids, either conjugative or mobilizable, have in common the presence of a relaxase gene. Relaxases were previously classified in six protein families according to their phylogeny. Degenerate primers hybridizing to coding sequences of conserved amino acid motifs were designed to amplify related relaxase genes from γ-Proteobacterial plasmids. Specificity and sensitivity of a selected set of 19 primer pairs were first tested using a collection of 33 reference relaxases, representing the diversity of γ-Proteobacterial plasmids. The validated set was then applied to the analysis of two plasmid collections obtained from clinical isolates. The relaxase screening method, which we call "Degenerate Primer MOB Typing" or DPMT, detected not only most known Inc/Rep groups, but also a plethora of plasmids not previously assigned to any Inc group or Rep-type.

  14. Maternal hemodynamics: a method to classify hypertensive disorders of pregnancy.

    Science.gov (United States)

    Ferrazzi, Enrico; Stampalija, Tamara; Monasta, Lorenzo; Di Martino, Daniela; Vonck, Sharona; Gyselaers, Wilfried

    2018-01-01

    The classification of hypertensive disorders of pregnancy is based on the time at the onset of hypertension, proteinuria, and other associated complications. Maternal hemodynamic interrogation in hypertensive disorders of pregnancy considers not only the peripheral blood pressure but also the entire cardiovascular system, and it might help to classify the different clinical phenotypes of this syndrome. This study aimed to examine cardiovascular parameters in a cohort of patients affected by hypertensive disorders of pregnancy according to the clinical phenotypes that prioritize fetoplacental characteristics and not the time at onset of hypertensive disorders of pregnancy. At the fetal-maternal medicine unit of Ziekenhuis Oost-Limburg (Genk, Belgium), maternal cardiovascular parameters were obtained through impedance cardiography using a noninvasive continuous cardiac output monitor with the patients placed in a standing position. The patients were classified as pregnant women with hypertensive disorders of pregnancy who delivered appropriate- and small-for-gestational-age fetuses. Normotensive pregnant women with an appropriate-for-gestational-age fetus at delivery were enrolled as the control group. The possible impact of obesity (body mass index ≥30 kg/m 2 ) on maternal hemodynamics was reassessed in the same groups. Maternal age, parity, body mass index, and blood pressure were not significantly different between the hypertensive disorders of pregnancy/appropriate-for-gestational-age and hypertensive disorders of pregnancy/small-for-gestational-age groups. The mean uterine artery pulsatility index was significantly higher in the hypertensive disorders of pregnancy/small-for-gestational-age group. The cardiac output and cardiac index were significantly lower in the hypertensive disorders of pregnancy/small-for-gestational-age group (cardiac output 6.5 L/min, cardiac index 3.6) than in the hypertensive disorders of pregnancy/appropriate-for-gestational-age group

  15. Using hierarchical clustering of secreted protein families to classify and rank candidate effectors of rust fungi.

    Directory of Open Access Journals (Sweden)

    Diane G O Saunders

    Full Text Available Rust fungi are obligate biotrophic pathogens that cause considerable damage on crop plants. Puccinia graminis f. sp. tritici, the causal agent of wheat stem rust, and Melampsora larici-populina, the poplar leaf rust pathogen, have strong deleterious impacts on wheat and poplar wood production, respectively. Filamentous pathogens such as rust fungi secrete molecules called disease effectors that act as modulators of host cell physiology and can suppress or trigger host immunity. Current knowledge on effectors from other filamentous plant pathogens can be exploited for the characterisation of effectors in the genome of recently sequenced rust fungi. We designed a comprehensive in silico analysis pipeline to identify the putative effector repertoire from the genome of two plant pathogenic rust fungi. The pipeline is based on the observation that known effector proteins from filamentous pathogens have at least one of the following properties: (i contain a secretion signal, (ii are encoded by in planta induced genes, (iii have similarity to haustorial proteins, (iv are small and cysteine rich, (v contain a known effector motif or a nuclear localization signal, (vi are encoded by genes with long intergenic regions, (vii contain internal repeats, and (viii do not contain PFAM domains, except those associated with pathogenicity. We used Markov clustering and hierarchical clustering to classify protein families of rust pathogens and rank them according to their likelihood of being effectors. Using this approach, we identified eight families of candidate effectors that we consider of high value for functional characterization. This study revealed a diverse set of candidate effectors, including families of haustorial expressed secreted proteins and small cysteine-rich proteins. This comprehensive classification of candidate effectors from these devastating rust pathogens is an initial step towards probing plant germplasm for novel resistance components.

  16. Multivariate models to classify Tuscan virgin olive oils by zone.

    Directory of Open Access Journals (Sweden)

    Alessandri, Stefano

    1999-10-01

    Full Text Available In order to study and classify Tuscan virgin olive oils, 179 samples were collected. They were obtained from drupes harvested during the first half of November, from three different zones of the Region. The sampling was repeated for 5 years. Fatty acids, phytol, aliphatic and triterpenic alcohols, triterpenic dialcohols, sterols, squalene and tocopherols were analyzed. A subset of variables was considered. They were selected in a preceding work as the most effective and reliable, from the univariate point of view. The analytical data were transformed (except for the cycloartenol to compensate annual variations, the mean related to the East zone was subtracted from each value, within each year. Univariate three-class models were calculated and further variables discarded. Then multivariate three-zone models were evaluated, including phytol (that was always selected and all the combinations of palmitic, palmitoleic and oleic acid, tetracosanol, cycloartenol and squalene. Models including from two to seven variables were studied. The best model shows by-zone classification errors less than 40%, by-zone within-year classification errors that are less than 45% and a global classification error equal to 30%. This model includes phytol, palmitic acid, tetracosanol and cycloartenol.

    Para estudiar y clasificar aceites de oliva vírgenes Toscanos, se utilizaron 179 muestras, que fueron obtenidas de frutos recolectados durante la primera mitad de Noviembre, de tres zonas diferentes de la Región. El muestreo fue repetido durante 5 años. Se analizaron ácidos grasos, fitol, alcoholes alifáticos y triterpénicos, dialcoholes triterpénicos, esteroles, escualeno y tocoferoles. Se consideró un subconjunto de variables que fueron seleccionadas en un trabajo anterior como el más efectivo y fiable, desde el punto de vista univariado. Los datos analíticos se transformaron (excepto para el cicloartenol para compensar las variaciones anuales, rest

  17. Comparative genome analysis of PHB gene family reveals deep evolutionary origins and diverse gene function.

    Science.gov (United States)

    Di, Chao; Xu, Wenying; Su, Zhen; Yuan, Joshua S

    2010-10-07

    PHB (Prohibitin) gene family is involved in a variety of functions important for different biological processes. PHB genes are ubiquitously present in divergent species from prokaryotes to eukaryotes. Human PHB genes have been found to be associated with various diseases. Recent studies by our group and others have shown diverse function of PHB genes in plants for development, senescence, defence, and others. Despite the importance of the PHB gene family, no comprehensive gene family analysis has been carried to evaluate the relatedness of PHB genes across different species. In order to better guide the gene function analysis and understand the evolution of the PHB gene family, we therefore carried out the comparative genome analysis of the PHB genes across different kingdoms. The relatedness, motif distribution, and intron/exon distribution all indicated that PHB genes is a relatively conserved gene family. The PHB genes can be classified into 5 classes and each class have a very deep evolutionary origin. The PHB genes within the class maintained the same motif patterns during the evolution. With Arabidopsis as the model species, we found that PHB gene intron/exon structure and domains are also conserved during the evolution. Despite being a conserved gene family, various gene duplication events led to the expansion of the PHB genes. Both segmental and tandem gene duplication were involved in Arabidopsis PHB gene family expansion. However, segmental duplication is predominant in Arabidopsis. Moreover, most of the duplicated genes experienced neofunctionalization. The results highlighted that PHB genes might be involved in important functions so that the duplicated genes are under the evolutionary pressure to derive new function. PHB gene family is a conserved gene family and accounts for diverse but important biological functions based on the similar molecular mechanisms. The highly diverse biological function indicated that more research needs to be carried out

  18. Pixel Classification of SAR ice images using ANFIS-PSO Classifier

    Directory of Open Access Journals (Sweden)

    G. Vasumathi

    2016-12-01

    Full Text Available Synthetic Aperture Radar (SAR is playing a vital role in taking extremely high resolution radar images. It is greatly used to monitor the ice covered ocean regions. Sea monitoring is important for various purposes which includes global climate systems and ship navigation. Classification on the ice infested area gives important features which will be further useful for various monitoring process around the ice regions. Main objective of this paper is to classify the SAR ice image that helps in identifying the regions around the ice infested areas. In this paper three stages are considered in classification of SAR ice images. It starts with preprocessing in which the speckled SAR ice images are denoised using various speckle removal filters; comparison is made on all these filters to find the best filter in speckle removal. Second stage includes segmentation in which different regions are segmented using K-means and watershed segmentation algorithms; comparison is made between these two algorithms to find the best in segmenting SAR ice images. The last stage includes pixel based classification which identifies and classifies the segmented regions using various supervised learning classifiers. The algorithms includes Back propagation neural networks (BPN, Fuzzy Classifier, Adaptive Neuro Fuzzy Inference Classifier (ANFIS classifier and proposed ANFIS with Particle Swarm Optimization (PSO classifier; comparison is made on all these classifiers to propose which classifier is best suitable for classifying the SAR ice image. Various evaluation metrics are performed separately at all these three stages.

  19. Classifier-ensemble incremental-learning procedure for nuclear transient identification at different operational conditions

    Energy Technology Data Exchange (ETDEWEB)

    Baraldi, Piero, E-mail: piero.baraldi@polimi.i [Dipartimento di Energia - Sezione Ingegneria Nucleare, Politecnico di Milano, via Ponzio 34/3, 20133 Milano (Italy); Razavi-Far, Roozbeh [Dipartimento di Energia - Sezione Ingegneria Nucleare, Politecnico di Milano, via Ponzio 34/3, 20133 Milano (Italy); Zio, Enrico [Dipartimento di Energia - Sezione Ingegneria Nucleare, Politecnico di Milano, via Ponzio 34/3, 20133 Milano (Italy); Ecole Centrale Paris-Supelec, Paris (France)

    2011-04-15

    An important requirement for the practical implementation of empirical diagnostic systems is the capability of classifying transients in all plant operational conditions. The present paper proposes an approach based on an ensemble of classifiers for incrementally learning transients under different operational conditions. New classifiers are added to the ensemble where transients occurring in new operational conditions are not satisfactorily classified. The construction of the ensemble is made by bagging; the base classifier is a supervised Fuzzy C Means (FCM) classifier whose outcomes are combined by majority voting. The incremental learning procedure is applied to the identification of simulated transients in the feedwater system of a Boiling Water Reactor (BWR) under different reactor power levels.

  20. Statistical and Machine-Learning Classifier Framework to Improve Pulse Shape Discrimination System Design

    Energy Technology Data Exchange (ETDEWEB)

    Wurtz, R. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Kaplan, A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2015-10-28

    Pulse shape discrimination (PSD) is a variety of statistical classifier. Fully-­realized statistical classifiers rely on a comprehensive set of tools for designing, building, and implementing. PSD advances rely on improvements to the implemented algorithm. PSD advances can be improved by using conventional statistical classifier or machine learning methods. This paper provides the reader with a glossary of classifier-­building elements and their functions in a fully-­designed and operational classifier framework that can be used to discover opportunities for improving PSD classifier projects. This paper recommends reporting the PSD classifier’s receiver operating characteristic (ROC) curve and its behavior at a gamma rejection rate (GRR) relevant for realistic applications.

  1. An Active Learning Classifier for Further Reducing Diabetic Retinopathy Screening System Cost

    Directory of Open Access Journals (Sweden)

    Yinan Zhang

    2016-01-01

    Full Text Available Diabetic retinopathy (DR screening system raises a financial problem. For further reducing DR screening cost, an active learning classifier is proposed in this paper. Our approach identifies retinal images based on features extracted by anatomical part recognition and lesion detection algorithms. Kernel extreme learning machine (KELM is a rapid classifier for solving classification problems in high dimensional space. Both active learning and ensemble technique elevate performance of KELM when using small training dataset. The committee only proposes necessary manual work to doctor for saving cost. On the publicly available Messidor database, our classifier is trained with 20%–35% of labeled retinal images and comparative classifiers are trained with 80% of labeled retinal images. Results show that our classifier can achieve better classification accuracy than Classification and Regression Tree, radial basis function SVM, Multilayer Perceptron SVM, Linear SVM, and K Nearest Neighbor. Empirical experiments suggest that our active learning classifier is efficient for further reducing DR screening cost.

  2. Classifier-ensemble incremental-learning procedure for nuclear transient identification at different operational conditions

    International Nuclear Information System (INIS)

    Baraldi, Piero; Razavi-Far, Roozbeh; Zio, Enrico

    2011-01-01

    An important requirement for the practical implementation of empirical diagnostic systems is the capability of classifying transients in all plant operational conditions. The present paper proposes an approach based on an ensemble of classifiers for incrementally learning transients under different operational conditions. New classifiers are added to the ensemble where transients occurring in new operational conditions are not satisfactorily classified. The construction of the ensemble is made by bagging; the base classifier is a supervised Fuzzy C Means (FCM) classifier whose outcomes are combined by majority voting. The incremental learning procedure is applied to the identification of simulated transients in the feedwater system of a Boiling Water Reactor (BWR) under different reactor power levels.

  3. A Bayesian method for comparing and combining binary classifiers in the absence of a gold standard

    Directory of Open Access Journals (Sweden)

    Keith Jonathan M

    2012-07-01

    Full Text Available Abstract Background Many problems in bioinformatics involve classification based on features such as sequence, structure or morphology. Given multiple classifiers, two crucial questions arise: how does their performance compare, and how can they best be combined to produce a better classifier? A classifier can be evaluated in terms of sensitivity and specificity using benchmark, or gold standard, data, that is, data for which the true classification is known. However, a gold standard is not always available. Here we demonstrate that a Bayesian model for comparing medical diagnostics without a gold standard can be successfully applied in the bioinformatics domain, to genomic scale data sets. We present a new implementation, which unlike previous implementations is applicable to any number of classifiers. We apply this model, for the first time, to the problem of finding the globally optimal logical combination of classifiers. Results We compared three classifiers of protein subcellular localisation, and evaluated our estimates of sensitivity and specificity against estimates obtained using a gold standard. The method overestimated sensitivity and specificity with only a small discrepancy, and correctly ranked the classifiers. Diagnostic tests for swine flu were then compared on a small data set. Lastly, classifiers for a genome-wide association study of macular degeneration with 541094 SNPs were analysed. In all cases, run times were feasible, and results precise. The optimal logical combination of classifiers was also determined for all three data sets. Code and data are available from http://bioinformatics.monash.edu.au/downloads/. Conclusions The examples demonstrate the methods are suitable for both small and large data sets, applicable to the wide range of bioinformatics classification problems, and robust to dependence between classifiers. In all three test cases, the globally optimal logical combination of the classifiers was found to be

  4. Lung Nodule Image Classification Based on Local Difference Pattern and Combined Classifier.

    Science.gov (United States)

    Mao, Keming; Deng, Zhuofu

    2016-01-01

    This paper proposes a novel lung nodule classification method for low-dose CT images. The method includes two stages. First, Local Difference Pattern (LDP) is proposed to encode the feature representation, which is extracted by comparing intensity difference along circular regions centered at the lung nodule. Then, the single-center classifier is trained based on LDP. Due to the diversity of feature distribution for different class, the training images are further clustered into multiple cores and the multicenter classifier is constructed. The two classifiers are combined to make the final decision. Experimental results on public dataset show the superior performance of LDP and the combined classifier.

  5. Lung Nodule Image Classification Based on Local Difference Pattern and Combined Classifier

    Directory of Open Access Journals (Sweden)

    Keming Mao

    2016-01-01

    Full Text Available This paper proposes a novel lung nodule classification method for low-dose CT images. The method includes two stages. First, Local Difference Pattern (LDP is proposed to encode the feature representation, which is extracted by comparing intensity difference along circular regions centered at the lung nodule. Then, the single-center classifier is trained based on LDP. Due to the diversity of feature distribution for different class, the training images are further clustered into multiple cores and the multicenter classifier is constructed. The two classifiers are combined to make the final decision. Experimental results on public dataset show the superior performance of LDP and the combined classifier.

  6. Reducing variability in the output of pattern classifiers using histogram shaping

    International Nuclear Information System (INIS)

    Gupta, Shalini; Kan, Chih-Wen; Markey, Mia K.

    2010-01-01

    Purpose: The authors present a novel technique based on histogram shaping to reduce the variability in the output and (sensitivity, specificity) pairs of pattern classifiers with identical ROC curves, but differently distributed outputs. Methods: The authors identify different sources of variability in the output of linear pattern classifiers with identical ROC curves, which also result in classifiers with differently distributed outputs. They theoretically develop a novel technique based on the matching of the histograms of these differently distributed pattern classifier outputs to reduce the variability in their (sensitivity, specificity) pairs at fixed decision thresholds, and to reduce the variability in their actual output values. They empirically demonstrate the efficacy of the proposed technique by means of analyses on the simulated data and real world mammography data. Results: For the simulated data, with three different known sources of variability, and for the real world mammography data with unknown sources of variability, the proposed classifier output calibration technique significantly reduced the variability in the classifiers' (sensitivity, specificity) pairs at fixed decision thresholds. Furthermore, for classifiers with monotonically or approximately monotonically related output variables, the histogram shaping technique also significantly reduced the variability in their actual output values. Conclusions: Classifier output calibration based on histogram shaping can be successfully employed to reduce the variability in the output values and (sensitivity, specificity) pairs of pattern classifiers with identical ROC curves, but differently distributed outputs.

  7. Detection and Classification of Transformer Winding Mechanical Faults Using UWB Sensors and Bayesian Classifier

    Science.gov (United States)

    Alehosseini, Ali; A. Hejazi, Maryam; Mokhtari, Ghassem; B. Gharehpetian, Gevork; Mohammadi, Mohammad

    2015-06-01

    In this paper, the Bayesian classifier is used to detect and classify the radial deformation and axial displacement of transformer windings. The proposed method is tested on a model of transformer for different volumes of radial deformation and axial displacement. In this method, ultra-wideband (UWB) signal is sent to the simplified model of the transformer winding. The received signal from the winding model is recorded and used for training and testing of Bayesian classifier in different axial displacement and radial deformation states of the winding. It is shown that the proposed method has a good accuracy to detect and classify the axial displacement and radial deformation of the winding.

  8. Application of SVM classifier in thermographic image classification for early detection of breast cancer

    Science.gov (United States)

    Oleszkiewicz, Witold; Cichosz, Paweł; Jagodziński, Dariusz; Matysiewicz, Mateusz; Neumann, Łukasz; Nowak, Robert M.; Okuniewski, Rafał

    2016-09-01

    This article presents the application of machine learning algorithms for early detection of breast cancer on the basis of thermographic images. Supervised learning model: Support vector machine (SVM) and Sequential Minimal Optimization algorithm (SMO) for the training of SVM classifier were implemented. The SVM classifier was included in a client-server application which enables to create a training set of examinations and to apply classifiers (including SVM) for the diagnosis and early detection of the breast cancer. The sensitivity and specificity of SVM classifier were calculated based on the thermographic images from studies. Furthermore, the heuristic method for SVM's parameters tuning was proposed.

  9. Minimal gene selection for classification and diagnosis prediction based on gene expression profile

    Directory of Open Access Journals (Sweden)

    Alireza Mehridehnavi

    2013-01-01

    Conclusion: We have shown that the use of two most significant genes based on their S/N ratios and selection of suitable training samples can lead to classify DLBCL patients with a rather good result. Actually with the aid of mentioned methods we could compensate lack of enough number of patients, improve accuracy of classifying and reduce complication of computations and so running time.

  10. 48 CFR 8.608 - Protection of classified and sensitive information.

    Science.gov (United States)

    2010-10-01

    ... Prison Industries, Inc. 8.608 Protection of classified and sensitive information. Agencies shall not enter into any contract with FPI that allows an inmate worker access to any— (a) Classified data; (b) Geographic data regarding the location of— (1) Surface and subsurface infrastructure providing communications...

  11. 40 CFR 260.32 - Variances to be classified as a boiler.

    Science.gov (United States)

    2010-07-01

    ... 40 Protection of Environment 25 2010-07-01 2010-07-01 false Variances to be classified as a boiler... be classified as a boiler. In accordance with the standards and criteria in § 260.10 (definition of “boiler”), and the procedures in § 260.33, the Administrator may determine on a case-by-case basis that...

  12. 46 CFR 108.187 - Ventilation for brush type electric motors in classified spaces.

    Science.gov (United States)

    2010-10-01

    ... 46 Shipping 4 2010-10-01 2010-10-01 false Ventilation for brush type electric motors in classified... Ventilation for brush type electric motors in classified spaces. Ventilation for brush type electric motors in... Electrical Equipment in Hazardous Locations”, except audible and visual alarms may be used if shutting down...

  13. Oblique decision trees using embedded support vector machines in classifier ensembles

    NARCIS (Netherlands)

    Menkovski, V.; Christou, I.; Efremidis, S.

    2008-01-01

    Classifier ensembles have emerged in recent years as a promising research area for boosting pattern recognition systems' performance. We present a new base classifier that utilizes oblique decision tree technology based on support vector machines for the construction of oblique (non-axis parallel)

  14. Adaptation in P300 braincomputer interfaces: A two-classifier cotraining approach

    DEFF Research Database (Denmark)

    Panicker, Rajesh C.; Sun, Ying; Puthusserypady, Sadasivan

    2010-01-01

    A cotraining-based approach is introduced for constructing high-performance classifiers for P300-based braincomputer interfaces (BCIs), which were trained from very little data. It uses two classifiers: Fishers linear discriminant analysis and Bayesian linear discriminant analysis progressively...

  15. Analysis and minimization of overtraining effect in rule-based classifiers for computer-aided diagnosis

    International Nuclear Information System (INIS)

    Li Qiang; Doi Kunio

    2006-01-01

    Computer-aided diagnostic (CAD) schemes have been developed to assist radiologists detect various lesions in medical images. In CAD schemes, classifiers play a key role in achieving a high lesion detection rate and a low false-positive rate. Although many popular classifiers such as linear discriminant analysis and artificial neural networks have been employed in CAD schemes for reduction of false positives, a rule-based classifier has probably been the simplest and most frequently used one since the early days of development of various CAD schemes. However, with existing rule-based classifiers, there are major disadvantages that significantly reduce their practicality and credibility. The disadvantages include manual design, poor reproducibility, poor evaluation methods such as resubstitution, and a large overtraining effect. An automated rule-based classifier with a minimized overtraining effect can overcome or significantly reduce the extent of the above-mentioned disadvantages. In this study, we developed an 'optimal' method for the selection of cutoff thresholds and a fully automated rule-based classifier. Experimental results performed with Monte Carlo simulation and a real lung nodule CT data set demonstrated that the automated threshold selection method can completely eliminate overtraining effect in the procedure of cutoff threshold selection, and thus can minimize overall overtraining effect in the constructed rule-based classifier. We believe that this threshold selection method is very useful in the construction of automated rule-based classifiers with minimized overtraining effect

  16. Automating the construction of scene classifiers for content-based video retrieval

    NARCIS (Netherlands)

    Khan, L.; Israël, Menno; Petrushin, V.A.; van den Broek, Egon; van der Putten, Peter

    2004-01-01

    This paper introduces a real time automatic scene classifier within content-based video retrieval. In our envisioned approach end users like documentalists, not image processing experts, build classifiers interactively, by simply indicating positive examples of a scene. Classification consists of a

  17. Identification of flooded area from satellite images using Hybrid Kohonen Fuzzy C-Means sigma classifier

    Directory of Open Access Journals (Sweden)

    Krishna Kant Singh

    2017-06-01

    Full Text Available A novel neuro fuzzy classifier Hybrid Kohonen Fuzzy C-Means-σ (HKFCM-σ is proposed in this paper. The proposed classifier is a hybridization of Kohonen Clustering Network (KCN with FCM-σ clustering algorithm. The network architecture of HKFCM-σ is similar to simple KCN network having only two layers, i.e., input and output layer. However, the selection of winner neuron is done based on FCM-σ algorithm. Thus, embedding the features of both, a neural network and a fuzzy clustering algorithm in the classifier. This hybridization results in a more efficient, less complex and faster classifier for classifying satellite images. HKFCM-σ is used to identify the flooding that occurred in Kashmir area in September 2014. The HKFCM-σ classifier is applied on pre and post flooding Landsat 8 OLI images of Kashmir to detect the areas that were flooded due to the heavy rainfalls of September, 2014. The classifier is trained using the mean values of the various spectral indices like NDVI, NDWI, NDBI and first component of Principal Component Analysis. The error matrix was computed to test the performance of the method. The method yields high producer’s accuracy, consumer’s accuracy and kappa coefficient value indicating that the proposed classifier is highly effective and efficient.

  18. Variants of the Borda count method for combining ranked classifier hypotheses

    NARCIS (Netherlands)

    van Erp, Merijn; Schomaker, Lambert; Schomaker, Lambert; Vuurpijl, Louis

    2000-01-01

    The Borda count is a simple yet effective method of combining rankings. In pattern recognition, classifiers are often able to return a ranked set of results. Several experiments have been conducted to test the ability of the Borda count and two variant methods to combine these ranked classifier

  19. Should OCD be classified as an anxiety disorder in DSM-V?

    NARCIS (Netherlands)

    Stein, Dan J.; Fineberg, Naomi A.; Bienvenu, O. Joseph; Denys, Damiaan; Lochner, Christine; Nestadt, Gerald; Leckman, James F.; Rauch, Scott L.; Phillips, Katharine A.

    2010-01-01

    In DSM-III, DSM-III-R, and DSM-IV, obsessive-compulsive disorder (OCD) was classified as an anxiety disorder. In ICD-10, OCD is classified separately from the anxiety disorders, although within the same larger category as anxiety disorders (as one of the "neurotic, stress-related, and somatoform

  20. An expert computer program for classifying stars on the MK spectral classification system

    International Nuclear Information System (INIS)

    Gray, R. O.; Corbally, C. J.

    2014-01-01

    This paper describes an expert computer program (MKCLASS) designed to classify stellar spectra on the MK Spectral Classification system in a way similar to humans—by direct comparison with the MK classification standards. Like an expert human classifier, the program first comes up with a rough spectral type, and then refines that spectral type by direct comparison with MK standards drawn from a standards library. A number of spectral peculiarities, including barium stars, Ap and Am stars, λ Bootis stars, carbon-rich giants, etc., can be detected and classified by the program. The program also evaluates the quality of the delivered spectral type. The program currently is capable of classifying spectra in the violet-green region in either the rectified or flux-calibrated format, although the accuracy of the flux calibration is not important. We report on tests of MKCLASS on spectra classified by human classifiers; those tests suggest that over the entire HR diagram, MKCLASS will classify in the temperature dimension with a precision of 0.6 spectral subclass, and in the luminosity dimension with a precision of about one half of a luminosity class. These results compare well with human classifiers.

  1. 75 FR 733 - Implementation of the Executive Order, ``Classified National Security Information''

    Science.gov (United States)

    2010-01-05

    ... of the Executive Order, ``Classified National Security Information'' Memorandum for the Heads of... Security Information'' (the ``order''), which substantially advances my goals for reforming the security... classified information shall provide the Director of the Information Security Oversight Office (ISOO) a copy...

  2. A distributed approach for optimizing cascaded classifier topologies in real-time stream mining systems.

    Science.gov (United States)

    Foo, Brian; van der Schaar, Mihaela

    2010-11-01

    In this paper, we discuss distributed optimization techniques for configuring classifiers in a real-time, informationally-distributed stream mining system. Due to the large volume of streaming data, stream mining systems must often cope with overload, which can lead to poor performance and intolerable processing delay for real-time applications. Furthermore, optimizing over an entire system of classifiers is a difficult task since changing the filtering process at one classifier can impact both the feature values of data arriving at classifiers further downstream and thus, the classification performance achieved by an ensemble of classifiers, as well as the end-to-end processing delay. To address this problem, this paper makes three main contributions: 1) Based on classification and queuing theoretic models, we propose a utility metric that captures both the performance and the delay of a binary filtering classifier system. 2) We introduce a low-complexity framework for estimating the system utility by observing, estimating, and/or exchanging parameters between the inter-related classifiers deployed across the system. 3) We provide distributed algorithms to reconfigure the system, and analyze the algorithms based on their convergence properties, optimality, information exchange overhead, and rate of adaptation to non-stationary data sources. We provide results using different video classifier systems.

  3. 14 CFR 1213.106 - Preventing release of classified information to the media.

    Science.gov (United States)

    2010-01-01

    ... ADMINISTRATION RELEASE OF INFORMATION TO NEWS AND INFORMATION MEDIA § 1213.106 Preventing release of classified... interviews, audio/visual) to the news media is prohibited. The disclosure of classified information to unauthorized individuals may be cause for prosecution and/or disciplinary action against the NASA employee...

  4. Feature selection for Bayesian network classifiers using the MDL-FS score

    NARCIS (Netherlands)

    Drugan, Madalina M.; Wiering, Marco A.

    When constructing a Bayesian network classifier from data, the more or less redundant features included in a dataset may bias the classifier and as a consequence may result in a relatively poor classification accuracy. In this paper, we study the problem of selecting appropriate subsets of features

  5. A unified classifier for robust face recognition based on combining multiple subspace algorithms

    Science.gov (United States)

    Ijaz Bajwa, Usama; Ahmad Taj, Imtiaz; Waqas Anwar, Muhammad

    2012-10-01

    Face recognition being the fastest growing biometric technology has expanded manifold in the last few years. Various new algorithms and commercial systems have been proposed and developed. However, none of the proposed or developed algorithm is a complete solution because it may work very well on one set of images with say illumination changes but may not work properly on another set of image variations like expression variations. This study is motivated by the fact that any single classifier cannot claim to show generally better performance against all facial image variations. To overcome this shortcoming and achieve generality, combining several classifiers using various strategies has been studied extensively also incorporating the question of suitability of any classifier for this task. The study is based on the outcome of a comprehensive comparative analysis conducted on a combination of six subspace extraction algorithms and four distance metrics on three facial databases. The analysis leads to the selection of the most suitable classifiers which performs better on one task or the other. These classifiers are then combined together onto an ensemble classifier by two different strategies of weighted sum and re-ranking. The results of the ensemble classifier show that these strategies can be effectively used to construct a single classifier that can successfully handle varying facial image conditions of illumination, aging and facial expressions.

  6. Constrained parameter estimation for semi-supervised learning : The case of the nearest mean classifier

    NARCIS (Netherlands)

    Loog, M.

    2011-01-01

    A rather simple semi-supervised version of the equally simple nearest mean classifier is presented. However simple, the proposed approach is of practical interest as the nearest mean classifier remains a relevant tool in biomedical applications or other areas dealing with relatively high-dimensional

  7. An expert computer program for classifying stars on the MK spectral classification system

    Energy Technology Data Exchange (ETDEWEB)

    Gray, R. O. [Department of Physics and Astronomy, Appalachian State University, Boone, NC 26808 (United States); Corbally, C. J. [Vatican Observatory Research Group, Tucson, AZ 85721-0065 (United States)

    2014-04-01

    This paper describes an expert computer program (MKCLASS) designed to classify stellar spectra on the MK Spectral Classification system in a way similar to humans—by direct comparison with the MK classification standards. Like an expert human classifier, the program first comes up with a rough spectral type, and then refines that spectral type by direct comparison with MK standards drawn from a standards library. A number of spectral peculiarities, including barium stars, Ap and Am stars, λ Bootis stars, carbon-rich giants, etc., can be detected and classified by the program. The program also evaluates the quality of the delivered spectral type. The program currently is capable of classifying spectra in the violet-green region in either the rectified or flux-calibrated format, although the accuracy of the flux calibration is not important. We report on tests of MKCLASS on spectra classified by human classifiers; those tests suggest that over the entire HR diagram, MKCLASS will classify in the temperature dimension with a precision of 0.6 spectral subclass, and in the luminosity dimension with a precision of about one half of a luminosity class. These results compare well with human classifiers.

  8. A multiscale curvature algorithm for classifying discrete return LiDAR in forested environments

    Science.gov (United States)

    Jeffrey S. Evans; Andrew T. Hudak

    2007-01-01

    One prerequisite to the use of light detection and ranging (LiDAR) across disciplines is differentiating ground from nonground returns. The objective was to automatically and objectively classify points within unclassified LiDAR point clouds, with few model parameters and minimal postprocessing. Presented is an automated method for classifying LiDAR returns as ground...

  9. An algorithm to discover gene signatures with predictive potential

    Directory of Open Access Journals (Sweden)

    Hallett Robin M

    2010-09-01

    Full Text Available Abstract Background The advent of global gene expression profiling has generated unprecedented insight into our molecular understanding of cancer, including breast cancer. For example, human breast cancer patients display significant diversity in terms of their survival, recurrence, metastasis as well as response to treatment. These patient outcomes can be predicted by the transcriptional programs of their individual breast tumors. Predictive gene signatures allow us to correctly classify human breast tumors into various risk groups as well as to more accurately target therapy to ensure more durable cancer treatment. Results Here we present a novel algorithm to generate gene signatures with predictive potential. The method first classifies the expression intensity for each gene as determined by global gene expression profiling as low, average or high. The matrix containing the classified data for each gene is then used to score the expression of each gene based its individual ability to predict the patient characteristic of interest. Finally, all examined genes are ranked based on their predictive ability and the most highly ranked genes are included in the master gene signature, which is then ready for use as a predictor. This method was used to accurately predict the survival outcomes in a cohort of human breast cancer patients. Conclusions We confirmed the capacity of our algorithm to generate gene signatures with bona fide predictive ability. The simplicity of our algorithm will enable biological researchers to quickly generate valuable gene signatures without specialized software or extensive bioinformatics training.

  10. Exploring the relationship between fractal features and bacterial essential genes

    International Nuclear Information System (INIS)

    Yu Yong-Ming; Yang Li-Cai; Zhao Lu-Lu; Liu Zhi-Ping; Zhou Qian

    2016-01-01

    Essential genes are indispensable for the survival of an organism in optimal conditions. Rapid and accurate identifications of new essential genes are of great theoretical and practical significance. Exploring features with predictive power is fundamental for this. Here, we calculate six fractal features from primary gene and protein sequences and then explore their relationship with gene essentiality by statistical analysis and machine learning-based methods. The models are applied to all the currently available identified genes in 27 bacteria from the database of essential genes (DEG). It is found that the fractal features of essential genes generally differ from those of non-essential genes. The fractal features are used to ascertain the parameters of two machine learning classifiers: Naïve Bayes and Random Forest. The area under the curve (AUC) of both classifiers show that each fractal feature is satisfactorily discriminative between essential genes and non-essential genes individually. And, although significant correlations exist among fractal features, gene essentiality can also be reliably predicted by various combinations of them. Thus, the fractal features analyzed in our study can be used not only to construct a good essentiality classifier alone, but also to be significant contributors for computational tools identifying essential genes. (paper)

  11. Performance analysis of a Principal Component Analysis ensemble classifier for Emotiv headset P300 spellers.

    Science.gov (United States)

    Elsawy, Amr S; Eldawlatly, Seif; Taher, Mohamed; Aly, Gamal M

    2014-01-01

    The current trend to use Brain-Computer Interfaces (BCIs) with mobile devices mandates the development of efficient EEG data processing methods. In this paper, we demonstrate the performance of a Principal Component Analysis (PCA) ensemble classifier for P300-based spellers. We recorded EEG data from multiple subjects using the Emotiv neuroheadset in the context of a classical oddball P300 speller paradigm. We compare the performance of the proposed ensemble classifier to the performance of traditional feature extraction and classifier methods. Our results demonstrate the capability of the PCA ensemble classifier to classify P300 data recorded using the Emotiv neuroheadset with an average accuracy of 86.29% on cross-validation data. In addition, offline testing of the recorded data reveals an average classification accuracy of 73.3% that is significantly higher than that achieved using traditional methods. Finally, we demonstrate the effect of the parameters of the P300 speller paradigm on the performance of the method.

  12. Accuracy Evaluation of C4.5 and Naive Bayes Classifiers Using Attribute Ranking Method

    Directory of Open Access Journals (Sweden)

    S. Sivakumari

    2009-03-01

    Full Text Available This paper intends to classify the Ljubljana Breast Cancer dataset using C4.5 Decision Tree and Nai?ve Bayes classifiers. In this work, classification is carriedout using two methods. In the first method, dataset is analysed using all the attributes in the dataset. In the second method, attributes are ranked using information gain ranking technique and only the high ranked attributes are used to build the classification model. We are evaluating the results of C4.5 Decision Tree and Nai?ve Bayes classifiers in terms of classifier accuracy for various folds of cross validation. Our results show that both the classifiers achieve good accuracy on the dataset.

  13. Iceberg Semantics For Count Nouns And Mass Nouns: Classifiers, measures and portions

    Directory of Open Access Journals (Sweden)

    Fred Landman

    2016-12-01

    It is the analysis of complex NPs and their mass-count properties that is the focus of the second part of this paper. There I develop an analysis of English and Dutch pseudo- partitives, in particular, measure phrases like three liters of wine and classifier phrases like three glasses of wine. We will study measure interpretations and classifier interpretations of measures and classifiers, and different types of classifier interpretations: container interpretations, contents interpretations, and - indeed - portion interpretations. Rothstein 2011 argues that classifier interpretations (including portion interpretations of pseudo partitives pattern with count nouns, but that measure interpretations pattern with mass nouns. I will show that this distinction follows from the very basic architecture of Iceberg semantics.

  14. Novelty Detection Classifiers in Weed Mapping: Silybum marianum Detection on UAV Multispectral Images.

    Science.gov (United States)

    Alexandridis, Thomas K; Tamouridou, Afroditi Alexandra; Pantazi, Xanthoula Eirini; Lagopodi, Anastasia L; Kashefi, Javid; Ovakoglou, Georgios; Polychronos, Vassilios; Moshou, Dimitrios

    2017-09-01

    In the present study, the detection and mapping of Silybum marianum (L.) Gaertn. weed using novelty detection classifiers is reported. A multispectral camera (green-red-NIR) on board a fixed wing unmanned aerial vehicle (UAV) was employed for obtaining high-resolution images. Four novelty detection classifiers were used to identify S. marianum between other vegetation in a field. The classifiers were One Class Support Vector Machine (OC-SVM), One Class Self-Organizing Maps (OC-SOM), Autoencoders and One Class Principal Component Analysis (OC-PCA). As input features to the novelty detection classifiers, the three spectral bands and texture were used. The S. marianum identification accuracy using OC-SVM reached an overall accuracy of 96%. The results show the feasibility of effective S. marianum mapping by means of novelty detection classifiers acting on multispectral UAV imagery.

  15. Case base classification on digital mammograms: improving the performance of case base classifier

    Science.gov (United States)

    Raman, Valliappan; Then, H. H.; Sumari, Putra; Venkatesa Mohan, N.

    2011-10-01

    Breast cancer continues to be a significant public health problem in the world. Early detection is the key for improving breast cancer prognosis. The aim of the research presented here is in twofold. First stage of research involves machine learning techniques, which segments and extracts features from the mass of digital mammograms. Second level is on problem solving approach which includes classification of mass by performance based case base classifier. In this paper we build a case-based Classifier in order to diagnose mammographic images. We explain different methods and behaviors that have been added to the classifier to improve the performance of the classifier. Currently the initial Performance base Classifier with Bagging is proposed in the paper and it's been implemented and it shows an improvement in specificity and sensitivity.

  16. Enhanced gene ranking approaches using modified trace ratio algorithm for gene expression data

    Directory of Open Access Journals (Sweden)

    Shruti Mishra

    Full Text Available Microarray technology enables the understanding and investigation of gene expression levels by analyzing high dimensional datasets that contain few samples. Over time, microarray expression data have been collected for studying the underlying biological mechanisms of disease. One such application for understanding the mechanism is by constructing a gene regulatory network (GRN. One of the foremost key criteria for GRN discovery is gene selection. Choosing a generous set of genes for the structure of the network is highly desirable. For this role, two suitable methods were proposed for selection of appropriate genes. The first approach comprises a gene selection method called Information gain, where the dataset is reformed and fused with another distinct algorithm called Trace Ratio (TR. Our second method is the implementation of our projected modified TR algorithm, where the scoring base for finding weight matrices has been re-designed. Both the methods' efficiency was shown with different classifiers that include variants of the Artificial Neural Network classifier, such as Resilient Propagation, Quick Propagation, Back Propagation, Manhattan Propagation and Radial Basis Function Neural Network and also the Support Vector Machine (SVM classifier. In the study, it was confirmed that both of the proposed methods worked well and offered high accuracy with a lesser number of iterations as compared to the original Trace Ratio algorithm. Keywords: Gene regulatory network, Gene selection, Information gain, Trace ratio, Canonical correlation analysis, Classification

  17. Short communication: Cytokine profiles from blood mononuclear cells of dairy cows classified with divergent immune response phenotypes.

    Science.gov (United States)

    Martin, C E; Paibomesai, M A; Emam, S M; Gallienne, J; Hine, B C; Thompson-Crispi, K A; Mallard, B A

    2016-03-01

    Genetic selection for enhanced immune response has been shown to decrease disease occurrence in dairy cattle. Cows can be classified as high (H), average, or low responders based on antibody-mediated immune response (AMIR), predominated by type-2 cytokine production, and cell-mediated immune response (CMIR) through estimated breeding values for these traits. The purpose of this study was to identify in vitro tests that correlate with in vivo immune response phenotyping in dairy cattle. Blood mononuclear cells (BMC) isolated from cows classified as H-AMIR and H-CMIR through estimated breeding values for immune response traits were stimulated with concanavalin A (ConA; Sigma Aldrich, St. Louis, MO) and gene expression, cytokine production, and cell proliferation was determined at multiple time points. A repeated measures model, which included the effects of immune response group, parity, and stage of lactation, was used to compare differences between immune response phenotype groups. The H-AMIR cows produced more IL-4 protein than H-CMIR cows at 48 h; however, no difference in gene expression of type-2 transcription factor GATA3 or IL4 was noted. The BMC from H-CMIR cows had increased production of IFN-γ protein at 48, 72, and 96 h compared with H-AMIR animals. Further, H-CMIR cows had increased expression of the IFNG gene at 16, 24, and 48 h post-treatment with ConA, although expression of the type-1 transcription factor gene TBX21 did not differ between immune response groups. Although proliferation of BMC increased from 24 to 72 h after ConA stimulation, no differences were found between the immune response groups. Overall, stimulation of H-AMIR and H-CMIR bovine BMC with ConA resulted in distinct cytokine production profiles according to genetically defined groups. These distinct cytokine profiles could be used to define disease resistance phenotypes in dairy cows according to stimulation in vitro; however, other immune response phenotypes should be assessed

  18. A genetic ensemble approach for gene-gene interaction identification

    Directory of Open Access Journals (Sweden)

    Ho Joshua WK

    2010-10-01

    Full Text Available Abstract Background It has now become clear that gene-gene interactions and gene-environment interactions are ubiquitous and fundamental mechanisms for the development of complex diseases. Though a considerable effort has been put into developing statistical models and algorithmic strategies for identifying such interactions, the accurate identification of those genetic interactions has been proven to be very challenging. Methods In this paper, we propose a new approach for identifying such gene-gene and gene-environment interactions underlying complex diseases. This is a hybrid algorithm and it combines genetic algorithm (GA and an ensemble of classifiers (called genetic ensemble. Using this approach, the original problem of SNP interaction identification is converted into a data mining problem of combinatorial feature selection. By collecting various single nucleotide polymorphisms (SNP subsets as well as environmental factors generated in multiple GA runs, patterns of gene-gene and gene-environment interactions can be extracted using a simple combinatorial ranking method. Also considered in this study is the idea of combining identification results obtained from multiple algorithms. A novel formula based on pairwise double fault is designed to quantify the degree of complementarity. Conclusions Our simulation study demonstrates that the proposed genetic ensemble algorithm has comparable identification power to Multifactor Dimensionality Reduction (MDR and is slightly better than Polymorphism Interaction Analysis (PIA, which are the two most popular methods for gene-gene interaction identification. More importantly, the identification results generated by using our genetic ensemble algorithm are highly complementary to those obtained by PIA and MDR. Experimental results from our simulation studies and real world data application also confirm the effectiveness of the proposed genetic ensemble algorithm, as well as the potential benefits of

  19. Classifying Classifications

    DEFF Research Database (Denmark)

    Debus, Michael S.

    2017-01-01

    This paper critically analyzes seventeen game classifications. The classifications were chosen on the basis of diversity, ranging from pre-digital classification (e.g. Murray 1952), over game studies classifications (e.g. Elverdam & Aarseth 2007) to classifications of drinking games (e.g. LaBrie et...... al. 2013). The analysis aims at three goals: The classifications’ internal consistency, the abstraction of classification criteria and the identification of differences in classification across fields and/or time. Especially the abstraction of classification criteria can be used in future endeavors...... into the topic of game classifications....

  20. Voice of the Classified Employee: A Descriptive Study to Determine Degree of Job Satisfaction of Classified Employees and to Design Systems of Support by School District Leaders

    Science.gov (United States)

    Barakos-Cartwright, Rebekah B.

    2012-01-01

    Classified employees comprise thirty two percent of the educational workforce in school districts in the state of California. Acknowledging these employees as a viable and untapped resource within the educational system will enrich job satisfaction for these employees and benefit the operations in school sites. As acknowledged and valued…

  1. Can single classifiers be as useful as model ensembles to produce benthic seabed substratum maps?

    Science.gov (United States)

    Turner, Joseph A.; Babcock, Russell C.; Hovey, Renae; Kendrick, Gary A.

    2018-05-01

    Numerous machine-learning classifiers are available for benthic habitat map production, which can lead to different results. This study highlights the performance of the Random Forest (RF) classifier, which was significantly better than Classification Trees (CT), Naïve Bayes (NB), and a multi-model ensemble in terms of overall accuracy, Balanced Error Rate (BER), Kappa, and area under the curve (AUC) values. RF accuracy was often higher than 90% for each substratum class, even at the most detailed level of the substratum classification and AUC values also indicated excellent performance (0.8-1). Total agreement between classifiers was high at the broadest level of classification (75-80%) when differentiating between hard and soft substratum. However, this sharply declined as the number of substratum categories increased (19-45%) including a mix of rock, gravel, pebbles, and sand. The model ensemble, produced from the results of all three classifiers by majority voting, did not show any increase in predictive performance when compared to the single RF classifier. This study shows how a single classifier may be sufficient to produce benthic seabed maps and model ensembles of multiple classifiers.

  2. Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection.

    Science.gov (United States)

    Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad

    2014-11-01

    Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices.

  3. Classifier utility modeling and analysis of hypersonic inlet start/unstart considering training data costs

    Science.gov (United States)

    Chang, Juntao; Hu, Qinghua; Yu, Daren; Bao, Wen

    2011-11-01

    Start/unstart detection is one of the most important issues of hypersonic inlets and is also the foundation of protection control of scramjet. The inlet start/unstart detection can be attributed to a standard pattern classification problem, and the training sample costs have to be considered for the classifier modeling as the CFD numerical simulations and wind tunnel experiments of hypersonic inlets both cost time and money. To solve this problem, the CFD simulation of inlet is studied at first step, and the simulation results could provide the training data for pattern classification of hypersonic inlet start/unstart. Then the classifier modeling technology and maximum classifier utility theories are introduced to analyze the effect of training data cost on classifier utility. In conclusion, it is useful to introduce support vector machine algorithms to acquire the classifier model of hypersonic inlet start/unstart, and the minimum total cost of hypersonic inlet start/unstart classifier can be obtained by the maximum classifier utility theories.

  4. Grounding grammatical categories: attention bias in hand space influences grammatical congruency judgment of Chinese nominal classifiers.

    Science.gov (United States)

    Lobben, Marit; D'Ascenzo, Stefania

    2015-01-01

    Embodied cognitive theories predict that linguistic conceptual representations are grounded and continually represented in real world, sensorimotor experiences. However, there is an on-going debate on whether this also holds for abstract concepts. Grammar is the archetype of abstract knowledge, and therefore constitutes a test case against embodied theories of language representation. Former studies have largely focussed on lexical-level embodied representations. In the present study we take the grounding-by-modality idea a step further by using reaction time (RT) data from the linguistic processing of nominal classifiers in Chinese. We take advantage of an independent body of research, which shows that attention in hand space is biased. Specifically, objects near the hand consistently yield shorter RTs as a function of readiness for action on graspable objects within reaching space, and the same biased attention inhibits attentional disengagement. We predicted that this attention bias would equally apply to the graspable object classifier but not to the big object classifier. Chinese speakers (N = 22) judged grammatical congruency of classifier-noun combinations in two conditions: graspable object classifier and big object classifier. We found that RTs for the graspable object classifier were significantly faster in congruent combinations, and significantly slower in incongruent combinations, than the big object classifier. There was no main effect on grammatical violations, but rather an interaction effect of classifier type. Thus, we demonstrate here grammatical category-specific effects pertaining to the semantic content and by extension the visual and tactile modality of acquisition underlying the acquisition of these categories. We conclude that abstract grammatical categories are subjected to the same mechanisms as general cognitive and neurophysiological processes and may therefore be grounded.

  5. Using multivariate machine learning methods and structural MRI to classify childhood onset schizophrenia and healthy controls

    Directory of Open Access Journals (Sweden)

    Deanna eGreenstein

    2012-06-01

    Full Text Available Introduction: Multivariate machine learning methods can be used to classify groups of schizophrenia patients and controls using structural magnetic resonance imaging (MRI. However, machine learning methods to date have not been extended beyond classification and contemporaneously applied in a meaningful way to clinical measures. We hypothesized that brain measures would classify groups, and that increased likelihood of being classified as a patient using regional brain measures would be positively related to illness severity, developmental delays and genetic risk. Methods: Using 74 anatomic brain MRI sub regions and Random Forest, we classified 98 COS patients and 99 age, sex, and ethnicity-matched healthy controls. We also used Random Forest to determine the likelihood of being classified as a schizophrenia patient based on MRI measures. We then explored relationships between brain-based probability of illness and symptoms, premorbid development, and presence of copy number variation associated with schizophrenia. Results: Brain regions jointly classified COS and control groups with 73.7% accuracy. Greater brain-based probability of illness was associated with worse functioning (p= 0.0004 and fewer developmental delays (p=0.02. Presence of copy number variation (CNV was associated with lower probability of being classified as schizophrenia (p=0.001. The regions that were most important in classifying groups included left temporal lobes, bilateral dorsolateral prefrontal regions, and left medial parietal lobes. Conclusions: Schizophrenia and control groups can be well classified using Random Forest and anatomic brain measures, and brain-based probability of illness has a positive relationship with illness severity and a negative relationship with developmental delays/problems and CNV-based risk.

  6. A support vector machine classifier reduces interscanner variation in the HRCT classification of regional disease pattern in diffuse lung disease: Comparison to a Bayesian classifier

    Energy Technology Data Exchange (ETDEWEB)

    Chang, Yongjun; Lim, Jonghyuck; Kim, Namkug; Seo, Joon Beom [Department of Radiology, University of Ulsan College of Medicine, 388-1 Pungnap2-dong, Songpa-gu, Seoul 138-736 (Korea, Republic of); Lynch, David A. [Department of Radiology, National Jewish Medical and Research Center, Denver, Colorado 80206 (United States)

    2013-05-15

    Purpose: To investigate the effect of using different computed tomography (CT) scanners on the accuracy of high-resolution CT (HRCT) images in classifying regional disease patterns in patients with diffuse lung disease, support vector machine (SVM) and Bayesian classifiers were applied to multicenter data. Methods: Two experienced radiologists marked sets of 600 rectangular 20 Multiplication-Sign 20 pixel regions of interest (ROIs) on HRCT images obtained from two scanners (GE and Siemens), including 100 ROIs for each of local patterns of lungs-normal lung and five of regional pulmonary disease patterns (ground-glass opacity, reticular opacity, honeycombing, emphysema, and consolidation). Each ROI was assessed using 22 quantitative features belonging to one of the following descriptors: histogram, gradient, run-length, gray level co-occurrence matrix, low-attenuation area cluster, and top-hat transform. For automatic classification, a Bayesian classifier and a SVM classifier were compared under three different conditions. First, classification accuracies were estimated using data from each scanner. Next, data from the GE and Siemens scanners were used for training and testing, respectively, and vice versa. Finally, all ROI data were integrated regardless of the scanner type and were then trained and tested together. All experiments were performed based on forward feature selection and fivefold cross-validation with 20 repetitions. Results: For each scanner, better classification accuracies were achieved with the SVM classifier than the Bayesian classifier (92% and 82%, respectively, for the GE scanner; and 92% and 86%, respectively, for the Siemens scanner). The classification accuracies were 82%/72% for training with GE data and testing with Siemens data, and 79%/72% for the reverse. The use of training and test data obtained from the HRCT images of different scanners lowered the classification accuracy compared to the use of HRCT images from the same scanner. For

  7. A support vector machine classifier reduces interscanner variation in the HRCT classification of regional disease pattern in diffuse lung disease: Comparison to a Bayesian classifier

    International Nuclear Information System (INIS)

    Chang, Yongjun; Lim, Jonghyuck; Kim, Namkug; Seo, Joon Beom; Lynch, David A.

    2013-01-01

    Purpose: To investigate the effect of using different computed tomography (CT) scanners on the accuracy of high-resolution CT (HRCT) images in classifying regional disease patterns in patients with diffuse lung disease, support vector machine (SVM) and Bayesian classifiers were applied to multicenter data. Methods: Two experienced radiologists marked sets of 600 rectangular 20 × 20 pixel regions of interest (ROIs) on HRCT images obtained from two scanners (GE and Siemens), including 100 ROIs for each of local patterns of lungs—normal lung and five of regional pulmonary disease patterns (ground-glass opacity, reticular opacity, honeycombing, emphysema, and consolidation). Each ROI was assessed using 22 quantitative features belonging to one of the following descriptors: histogram, gradient, run-length, gray level co-occurrence matrix, low-attenuation area cluster, and top-hat transform. For automatic classification, a Bayesian classifier and a SVM classifier were compared under three different conditions. First, classification accuracies were estimated using data from each scanner. Next, data from the GE and Siemens scanners were used for training and testing, respectively, and vice versa. Finally, all ROI data were integrated regardless of the scanner type and were then trained and tested together. All experiments were performed based on forward feature selection and fivefold cross-validation with 20 repetitions. Results: For each scanner, better classification accuracies were achieved with the SVM classifier than the Bayesian classifier (92% and 82%, respectively, for the GE scanner; and 92% and 86%, respectively, for the Siemens scanner). The classification accuracies were 82%/72% for training with GE data and testing with Siemens data, and 79%/72% for the reverse. The use of training and test data obtained from the HRCT images of different scanners lowered the classification accuracy compared to the use of HRCT images from the same scanner. For integrated ROI

  8. Gene expression

    International Nuclear Information System (INIS)

    Hildebrand, C.E.; Crawford, B.D.; Walters, R.A.; Enger, M.D.

    1983-01-01

    We prepared probes for isolating functional pieces of the metallothionein locus. The probes enabled a variety of experiments, eventually revealing two mechanisms for metallothionein gene expression, the order of the DNA coding units at the locus, and the location of the gene site in its chromosome. Once the switch regulating metallothionein synthesis was located, it could be joined by recombinant DNA methods to other, unrelated genes, then reintroduced into cells by gene-transfer techniques. The expression of these recombinant genes could then be induced by exposing the cells to Zn 2+ or Cd 2+ . We would thus take advantage of the clearly defined switching properties of the metallothionein gene to manipulate the expression of other, perhaps normally constitutive, genes. Already, despite an incomplete understanding of how the regulatory switch of the metallothionein locus operates, such experiments have been performed successfully

  9. A New Adaptive Structural Signature for Symbol Recognition by Using a Galois Lattice as a Classifier.

    Science.gov (United States)

    Coustaty, M; Bertet, K; Visani, M; Ogier, J

    2011-08-01

    In this paper, we propose a new approach for symbol recognition using structural signatures and a Galois lattice as a classifier. The structural signatures are based on topological graphs computed from segments which are extracted from the symbol images by using an adapted Hough transform. These structural signatures-that can be seen as dynamic paths which carry high-level information-are robust toward various transformations. They are classified by using a Galois lattice as a classifier. The performance of the proposed approach is evaluated based on the GREC'03 symbol database, and the experimental results we obtain are encouraging.

  10. Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA

    Science.gov (United States)

    Ma, Xiaoqi

    2015-01-01

    A novel method is proposed to establish the pancreatic cancer classifier. Firstly, the concept of quantum and fruit fly optimal algorithm (FOA) are introduced, respectively. Then FOA is improved by quantum coding and quantum operation, and a new smell concentration determination function is defined. Finally, the improved FOA is used to optimize the parameters of support vector machine (SVM) and the classifier is established by optimized SVM. In order to verify the effectiveness of the proposed method, SVM and other classification methods have been chosen as the comparing methods. The experimental results show that the proposed method can improve the classifier performance and cost less time. PMID:26543867

  11. Multiple classifier systems in texton-based approach for the classification of CT images of Lung

    DEFF Research Database (Denmark)

    Gangeh, Mehrdad J.; Sørensen, Lauge; Shaker, Saher B.

    2010-01-01

    In this paper, we propose using texton signatures based on raw pixel representation along with a parallel multiple classifier system for the classification of emphysema in computed tomography images of the lung. The multiple classifier system is composed of support vector machines on the texton.......e., texton size and k value in k-means. Our results show that while aggregation of single decisions by SVMs over various k values using multiple classifier systems helps to improve the results compared to single SVMs, combining over different texton sizes is not beneficial. The performance of the proposed...

  12. Automatic construction of a recurrent neural network based classifier for vehicle passage detection

    Science.gov (United States)

    Burnaev, Evgeny; Koptelov, Ivan; Novikov, German; Khanipov, Timur

    2017-03-01

    Recurrent Neural Networks (RNNs) are extensively used for time-series modeling and prediction. We propose an approach for automatic construction of a binary classifier based on Long Short-Term Memory RNNs (LSTM-RNNs) for detection of a vehicle passage through a checkpoint. As an input to the classifier we use multidimensional signals of various sensors that are installed on the checkpoint. Obtained results demonstrate that the previous approach to handcrafting a classifier, consisting of a set of deterministic rules, can be successfully replaced by an automatic RNN training on an appropriately labelled data.

  13. DL-ADR: a novel deep learning model for classifying genomic variants into adverse drug reactions.

    Science.gov (United States)

    Liang, Zhaohui; Huang, Jimmy Xiangji; Zeng, Xing; Zhang, Gang

    2016-08-10

    Genomic variations are associated with the metabolism and the occurrence of adverse reactions of many therapeutic agents. The polymorphisms on over 2000 locations of cytochrome P450 enzymes (CYP) due to many factors such as ethnicity, mutations, and inheritance attribute to the diversity of response and side effects of various drugs. The associations of the single nucleotide polymorphisms (SNPs), the internal pharmacokinetic patterns and the vulnerability of specific adverse reactions become one of the research interests of pharmacogenomics. The conventional genomewide association studies (GWAS) mainly focuses on the relation of single or multiple SNPs to a specific risk factors which are a one-to-many relation. However, there are no robust methods to establish a many-to-many network which can combine the direct and indirect associations between multiple SNPs and a serial of events (e.g. adverse reactions, metabolic patterns, prognostic factors etc.). In this paper, we present a novel deep learning model based on generative stochastic networks and hidden Markov chain to classify the observed samples with SNPs on five loci of two genes (CYP2D6 and CYP1A2) respectively to the vulnerable population of 14 types of adverse reactions. A supervised deep learning model is proposed in this study. The revised generative stochastic networks (GSN) model with transited by the hidden Markov chain is used. The data of the training set are collected from clinical observation. The training set is composed of 83 observations of blood samples with the genotypes respectively on CYP2D6*2, *10, *14 and CYP1A2*1C, *1 F. The samples are genotyped by the polymerase chain reaction (PCR) method. A hidden Markov chain is used as the transition operator to simulate the probabilistic distribution. The model can perform learning at lower cost compared to the conventional maximal likelihood method because the transition distribution is conditional on the previous state of the hidden Markov

  14. Computational identification of putative cytochrome P450 genes in ...

    African Journals Online (AJOL)

    In this work, a computational study of expressed sequence tags (ESTs) of soybean was performed by data mining methods and bio-informatics tools and as a result 78 putative P450 genes were identified, including 57 new ones. These genes were classified into five clans and 20 families by sequence similarities and among ...

  15. 32 CFR 154.6 - Standards for access to classified information or assignment to sensitive duties.

    Science.gov (United States)

    2010-07-01

    ... OF THE SECRETARY OF DEFENSE SECURITY DEPARTMENT OF DEFENSE PERSONNEL SECURITY PROGRAM REGULATION... person's loyalty, reliability, and trustworthiness are such that entrusting the person with classified... reasonable basis for doubting the person's loyalty to the Government of the United States. ...

  16. A system for classifying wood-using industries and recording statistics for automatic data processing.

    Science.gov (United States)

    E.W. Fobes; R.W. Rowe

    1968-01-01

    A system for classifying wood-using industries and recording pertinent statistics for automatic data processing is described. Forms and coding instructions for recording data of primary processing plants are included.

  17. A Constrained Multi-Objective Learning Algorithm for Feed-Forward Neural Network Classifiers

    Directory of Open Access Journals (Sweden)

    M. Njah

    2017-06-01

    Full Text Available This paper proposes a new approach to address the optimal design of a Feed-forward Neural Network (FNN based classifier. The originality of the proposed methodology, called CMOA, lie in the use of a new constraint handling technique based on a self-adaptive penalty procedure in order to direct the entire search effort towards finding only Pareto optimal solutions that are acceptable. Neurons and connections of the FNN Classifier are dynamically built during the learning process. The approach includes differential evolution to create new individuals and then keeps only the non-dominated ones as the basis for the next generation. The designed FNN Classifier is applied to six binary classification benchmark problems, obtained from the UCI repository, and results indicated the advantages of the proposed approach over other existing multi-objective evolutionary neural networks classifiers reported recently in the literature.

  18. Cellular hemangioma and angioblastoma of the spine, originally classified as hemangioendothelioma. A confusing diagnosis

    NARCIS (Netherlands)

    Been, H. D.; Fidler, M. W.; Bras, J.

    1994-01-01

    The authors report two cases of vascular tumors of the spine, classified originally as benign and malignant hemangioendothelioma, and after revision, as cellular hemangioma and angioblastomatosis, respectively. Problems in interpretation of the confusing term hemangioendothelioma and treatment

  19. Performances of the likelihood-ratio classifier based on different data modelings

    NARCIS (Netherlands)

    Chen, C.; Veldhuis, Raymond N.J.

    2008-01-01

    The classical likelihood ratio classifier easily collapses in many biometric applications especially with independent training-test subjects. The reason lies in the inaccurate estimation of the underlying user-specific feature density. Firstly, the feature density estimation suffers from

  20. Vandalism Detection in Wikipedia: a Bag-of-Words Classifier Approach

    OpenAIRE

    Belani, Amit

    2010-01-01

    A bag-of-words based probabilistic classifier is trained using regularized logistic regression to detect vandalism in the English Wikipedia. Isotonic regression is used to calibrate the class membership probabilities. Learning curve, reliability, ROC, and cost analysis are performed.

  1. Effects of cultural characteristics on building an emotion classifier through facial expression analysis

    Science.gov (United States)

    da Silva, Flávio Altinier Maximiano; Pedrini, Helio

    2015-03-01

    Facial expressions are an important demonstration of humanity's humors and emotions. Algorithms capable of recognizing facial expressions and associating them with emotions were developed and employed to compare the expressions that different cultural groups use to show their emotions. Static pictures of predominantly occidental and oriental subjects from public datasets were used to train machine learning algorithms, whereas local binary patterns, histogram of oriented gradients (HOGs), and Gabor filters were employed to describe the facial expressions for six different basic emotions. The most consistent combination, formed by the association of HOG filter and support vector machines, was then used to classify the other cultural group: there was a strong drop in accuracy, meaning that the subtle differences of facial expressions of each culture affected the classifier performance. Finally, a classifier was trained with images from both occidental and oriental subjects and its accuracy was higher on multicultural data, evidencing the need of a multicultural training set to build an efficient classifier.

  2. Sentiment analysis system for movie review in Bahasa Indonesia using naive bayes classifier method

    Science.gov (United States)

    Nurdiansyah, Yanuar; Bukhori, Saiful; Hidayat, Rahmad

    2018-04-01

    There are many ways of implementing the use of sentiments often found in documents; one of which is the sentiments found on the product or service reviews. It is so important to be able to process and extract textual data from the documents. Therefore, we propose a system that is able to classify sentiments from review documents into two classes: positive sentiment and negative sentiment. We use Naive Bayes Classifier method in this document classification system that we build. We choose Movienthusiast, a movie reviews in Bahasa Indonesia website as the source of our review documents. From there, we were able to collect 1201 movie reviews: 783 positive reviews and 418 negative reviews that we use as the dataset for this machine learning classifier. The classifying accuracy yields an average of 88.37% from five times of accuracy measuring attempts using aforementioned dataset.

  3. Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection

    Science.gov (United States)

    Sarojini, Balakrishnan; Ramaraj, Narayanasamy; Nickolas, Savarimuthu

    Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.

  4. Classified Component Disposal at the Nevada National Security Site (NNSS) - 13454

    Energy Technology Data Exchange (ETDEWEB)

    Poling, Jeanne; Arnold, Pat [National Security Technologies, LLC (NSTec), P.O. Box 98521, Las Vegas, NV 89193-8521 (United States); Saad, Max [Sandia National Laboratories, P.O. Box 5800, Albuquerque, NM 87185 (United States); DiSanza, Frank [E. Frank DiSanza Consulting, 2250 Alanhurst Drive, Henderson, NV 89052 (United States); Cabble, Kevin [U.S. Department of Energy, National Nuclear Security Administration Nevada Site Office, P.O. Box 98518, Las Vegas, NV 89193-8518 (United States)

    2013-07-01

    The Nevada National Security Site (NNSS) has added the capability needed for the safe, secure disposal of non-nuclear classified components that have been declared excess to national security requirements. The NNSS has worked with U.S. Department of Energy, National Nuclear Security Administration senior leadership to gain formal approval for permanent burial of classified matter at the NNSS in the Area 5 Radioactive Waste Management Complex owned by the U.S. Department of Energy. Additionally, by working with state regulators, the NNSS added the capability to dispose non-radioactive hazardous and non-hazardous classified components. The NNSS successfully piloted the new disposal pathway with the receipt of classified materials from the Kansas City Plant in March 2012. (authors)

  5. Classified Component Disposal at the Nevada National Security Site (NNSS) - 13454

    International Nuclear Information System (INIS)

    Poling, Jeanne; Arnold, Pat; Saad, Max; DiSanza, Frank; Cabble, Kevin

    2013-01-01

    The Nevada National Security Site (NNSS) has added the capability needed for the safe, secure disposal of non-nuclear classified components that have been declared excess to national security requirements. The NNSS has worked with U.S. Department of Energy, National Nuclear Security Administration senior leadership to gain formal approval for permanent burial of classified matter at the NNSS in the Area 5 Radioactive Waste Management Complex owned by the U.S. Department of Energy. Additionally, by working with state regulators, the NNSS added the capability to dispose non-radioactive hazardous and non-hazardous classified components. The NNSS successfully piloted the new disposal pathway with the receipt of classified materials from the Kansas City Plant in March 2012. (authors)

  6. Heuristics legislation in the field of classified information as a function of training subjects of defense

    OpenAIRE

    БЕРЕШ ПАУН Й.

    2014-01-01

    Education on the protection of classified information should be the top priority when it comes to ensuring the protection of the vital interests of the state. Some information should not be made available to the public because it is mainly related to national security, and no one should question the need to protect this kind of data. This paper is intended for educators dealing with the protection of classified information, and especially to those who work with or come into contact with confi...

  7. Proposing an adaptive mutation to improve XCSF performance to classify ADHD and BMD patients

    Science.gov (United States)

    Sadatnezhad, Khadijeh; Boostani, Reza; Ghanizadeh, Ahmad

    2010-12-01

    There is extensive overlap of clinical symptoms observed among children with bipolar mood disorder (BMD) and those with attention deficit hyperactivity disorder (ADHD). Thus, diagnosis according to clinical symptoms cannot be very accurate. It is therefore desirable to develop quantitative criteria for automatic discrimination between these disorders. This study is aimed at designing an efficient decision maker to accurately classify ADHD and BMD patients by analyzing their electroencephalogram (EEG) signals. In this study, 22 channels of EEGs have been recorded from 21 subjects with ADHD and 22 individuals with BMD. Several informative features, such as fractal dimension, band power and autoregressive coefficients, were extracted from the recorded signals. Considering the multimodal overlapping distribution of the obtained features, linear discriminant analysis (LDA) was used to reduce the input dimension in a more separable space to make it more appropriate for the proposed classifier. A piecewise linear classifier based on the extended classifier system for function approximation (XCSF) was modified by developing an adaptive mutation rate, which was proportional to the genotypic content of best individuals and their fitness in each generation. The proposed operator controlled the trade-off between exploration and exploitation while maintaining the diversity in the classifier's population to avoid premature convergence. To assess the effectiveness of the proposed scheme, the extracted features were applied to support vector machine, LDA, nearest neighbor and XCSF classifiers. To evaluate the method, a noisy environment was simulated with different noise amplitudes. It is shown that the results of the proposed technique are more robust as compared to conventional classifiers. Statistical tests demonstrate that the proposed classifier is a promising method for discriminating between ADHD and BMD patients.

  8. Learning classifier systems with memory condition to solve non-Markov problems

    OpenAIRE

    Zang, Zhaoxiang; Li, Dehua; Wang, Junying

    2012-01-01

    In the family of Learning Classifier Systems, the classifier system XCS has been successfully used for many applications. However, the standard XCS has no memory mechanism and can only learn optimal policy in Markov environments, where the optimal action is determined solely by the state of current sensory input. In practice, most environments are partially observable environments on agent's sensation, which are also known as non-Markov environments. Within these environments, XCS either fail...

  9. Aspectual Morphemes as Verb Classifiers in Slavic and Non-Slavic Languages

    OpenAIRE

    Menzenski, Matthew

    2014-01-01

    This paper was presented at the Slavic Linguistics Society Annual Meeting in Seattle, Washington on September 19 2014.   Abstract: Janda et al. (2013) propose an analysis of Russian aspectual prefixes as verb classifiers, arguing that the prefix which forms the 'natural perfective' from a given verb serves to classify that verb according to its semantic characteristics. This analysis contrasts with the traditional analysis of Russian aspect, described by Tixonov (1998) and others, in...

  10. An Integrated Neuroscience and Engineering Approach to Classifying Human Brain-States

    Science.gov (United States)

    2015-12-22

    AFRL-AFOSR-VA-TR-2016-0037 An Integrated Neuroscience and Engineering Approach to Classifying Human Brain-States Adrian Lee UNIVERSITY OF WASHINGTON...to 14-09-2015 4. TITLE AND SUBTITLE An Integrated Neuroscience and Engineering Approach to Classifying Human Brain- States 5a.  CONTRACT NUMBER 5b...specific cognitive states remains elusive, owing perhaps to limited crosstalk between the fields of neuroscience and engineering. Here, we report a

  11. An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data

    Directory of Open Access Journals (Sweden)

    Datta Susmita

    2010-08-01

    Full Text Available Abstract Background Generally speaking, different classifiers tend to work well for certain types of data and conversely, it is usually not known a priori which algorithm will be optimal in any given classification application. In addition, for most classification problems, selecting the best performing classification algorithm amongst a number of competing algorithms is a difficult task for various reasons. As for example, the order of performance may depend on the performance measure employed for such a comparison. In this work, we present a novel adaptive ensemble classifier constructed by combining bagging and rank aggregation that is capable of adaptively changing its performance depending on the type of data that is being classified. The attractive feature of the proposed classifier is its multi-objective nature where the classification results can be simultaneously optimized with respect to several performance measures, for example, accuracy, sensitivity and specificity. We also show that our somewhat complex strategy has better predictive performance as judged on test samples than a more naive approach that attempts to directly identify the optimal classifier based on the training data performances of the individual classifiers. Results We illustrate the proposed method with two simulated and two real-data examples. In all cases, the ensemble classifier performs at the level of the best individual classifier comprising the ensemble or better. Conclusions For complex high-dimensional datasets resulting from present day high-throughput experiments, it may be wise to consider a number of classification algorithms combined with dimension reduction techniques rather than a fixed standard algorithm set a priori.

  12. SVM Classifiers: The Objects Identification on the Base of Their Hyperspectral Features

    Directory of Open Access Journals (Sweden)

    Demidova Liliya

    2017-01-01

    Full Text Available The problem of the objects identification on the base of their hyperspectral features has been considered. It is offered to use the SVM classifiers on the base of the modified PSO algorithm, adapted to specifics of the problem of the objects identification on the base of their hyperspectral features. The results of the objects identification on the base of their hyperspectral features with using of the SVM classifiers have been presented.

  13. An improved early detection method of type-2 diabetes mellitus using multiple classifier system

    KAUST Repository

    Zhu, Jia

    2015-01-01

    The specific causes of complex diseases such as Type-2 Diabetes Mellitus (T2DM) have not yet been identified. Nevertheless, many medical science researchers believe that complex diseases are caused by a combination of genetic, environmental, and lifestyle factors. Detection of such diseases becomes an issue because it is not free from false presumptions and is accompanied by unpredictable effects. Given the greatly increased amount of data gathered in medical databases, data mining has been used widely in recent years to detect and improve the diagnosis of complex diseases. However, past research showed that no single classifier can be considered optimal for all problems. Therefore, in this paper, we focus on employing multiple classifier systems to improve the accuracy of detection for complex diseases, such as T2DM. We proposed a dynamic weighted voting scheme called multiple factors weighted combination for classifiers\\' decision combination. This method considers not only the local and global accuracy but also the diversity among classifiers and localized generalization error of each classifier. We evaluated our method on two real T2DM data sets and other medical data sets. The favorable results indicated that our proposed method significantly outperforms individual classifiers and other fusion methods.

  14. Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies

    Science.gov (United States)

    Theis, Fabian J.

    2017-01-01

    Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia. PMID:29312464

  15. Comparison of Shallow and Deep Learning Methods on Classifying the Regional Pattern of Diffuse Lung Disease.

    Science.gov (United States)

    Kim, Guk Bae; Jung, Kyu-Hwan; Lee, Yeha; Kim, Hyun-Jun; Kim, Namkug; Jun, Sanghoon; Seo, Joon Beom; Lynch, David A

    2017-10-17

    This study aimed to compare shallow and deep learning of classifying the patterns of interstitial lung diseases (ILDs). Using high-resolution computed tomography images, two experienced radiologists marked 1200 regions of interest (ROIs), in which 600 ROIs were each acquired using a GE or Siemens scanner and each group of 600 ROIs consisted of 100 ROIs for subregions that included normal and five regional pulmonary disease patterns (ground-glass opacity, consolidation, reticular opacity, emphysema, and honeycombing). We employed the convolution neural network (CNN) with six learnable layers that consisted of four convolution layers and two fully connected layers. The classification results were compared with the results classified by a shallow learning of a support vector machine (SVM). The CNN classifier showed significantly better performance for accuracy compared with that of the SVM classifier by 6-9%. As the convolution layer increases, the classification accuracy of the CNN showed better performance from 81.27 to 95.12%. Especially in the cases showing pathological ambiguity such as between normal and emphysema cases or between honeycombing and reticular opacity cases, the increment of the convolution layer greatly drops the misclassification rate between each case. Conclusively, the CNN classifier showed significantly greater accuracy than the SVM classifier, and the results implied structural characteristics that are inherent to the specific ILD patterns.

  16. SAR Target Recognition Based on Multi-feature Multiple Representation Classifier Fusion

    Directory of Open Access Journals (Sweden)

    Zhang Xinzheng

    2017-10-01

    Full Text Available In this paper, we present a Synthetic Aperture Radar (SAR image target recognition algorithm based on multi-feature multiple representation learning classifier fusion. First, it extracts three features from the SAR images, namely principal component analysis, wavelet transform, and Two-Dimensional Slice Zernike Moments (2DSZM features. Second, we harness the sparse representation classifier and the cooperative representation classifier with the above-mentioned features to get six predictive labels. Finally, we adopt classifier fusion to obtain the final recognition decision. We researched three different classifier fusion algorithms in our experiments, and the results demonstrate thatusing Bayesian decision fusion gives thebest recognition performance. The method based on multi-feature multiple representation learning classifier fusion integrates the discrimination of multi-features and combines the sparse and cooperative representation classification performance to gain complementary advantages and to improve recognition accuracy. The experiments are based on the Moving and Stationary Target Acquisition and Recognition (MSTAR database,and they demonstrate the effectiveness of the proposed approach.

  17. Ensembles of novelty detection classifiers for structural health monitoring using guided waves

    Science.gov (United States)

    Dib, Gerges; Karpenko, Oleksii; Koricho, Ermias; Khomenko, Anton; Haq, Mahmoodul; Udpa, Lalita

    2018-01-01

    Guided wave structural health monitoring uses sparse sensor networks embedded in sophisticated structures for defect detection and characterization. The biggest challenge of those sensor networks is developing robust techniques for reliable damage detection under changing environmental and operating conditions (EOC). To address this challenge, we develop a novelty classifier for damage detection based on one class support vector machines. We identify appropriate features for damage detection and introduce a feature aggregation method which quadratically increases the number of available training observations. We adopt a two-level voting scheme by using an ensemble of classifiers and predictions. Each classifier is trained on a different segment of the guided wave signal, and each classifier makes an ensemble of predictions based on a single observation. Using this approach, the classifier can be trained using a small number of baseline signals. We study the performance using Monte-Carlo simulations of an analytical model and data from impact damage experiments on a glass fiber composite plate. We also demonstrate the classifier performance using two types of baseline signals: fixed and rolling baseline training set. The former requires prior knowledge of baseline signals from all EOC, while the latter does not and leverages the fact that EOC vary slowly over time and can be modeled as a Gaussian process.

  18. Thai Finger-Spelling Recognition Using a Cascaded Classifier Based on Histogram of Orientation Gradient Features

    Directory of Open Access Journals (Sweden)

    Kittasil Silanon

    2017-01-01

    Full Text Available Hand posture recognition is an essential module in applications such as human-computer interaction (HCI, games, and sign language systems, in which performance and robustness are the primary requirements. In this paper, we proposed automatic classification to recognize 21 hand postures that represent letters in Thai finger-spelling based on Histogram of Orientation Gradient (HOG feature (which is applied with more focus on the information within certain region of the image rather than each single pixel and Adaptive Boost (i.e., AdaBoost learning technique to select the best weak classifier and to construct a strong classifier that consists of several weak classifiers to be cascaded in detection architecture. We collected 21 static hand posture images from 10 subjects for testing and training in Thai letters finger-spelling. The parameters for the training process have been adjusted in three experiments, false positive rates (FPR, true positive rates (TPR, and number of training stages (N, to achieve the most suitable training model for each hand posture. All cascaded classifiers are loaded into the system simultaneously to classify different hand postures. A correlation coefficient is computed to distinguish the hand postures that are similar. The system achieves approximately 78% accuracy on average on all classifier experiments.

  19. Classifying cognitive profiles using machine learning with privileged information in Mild Cognitive Impairment

    Directory of Open Access Journals (Sweden)

    Hanin Hamdan Alahmadi

    2016-11-01

    Full Text Available Early diagnosis of dementia is critical for assessing disease progression and potential treatment. State-or-the-art machine learning techniques have been increasingly employed to take on this diagnostic task. In this study, we employed Generalised Matrix Learning Vector Quantization (GMLVQ classifiers to discriminate patients with Mild Cognitive Impairment (MCI from healthy controls based on their cognitive skills. Further, we adopted a ``Learning with privileged information'' approach to combine cognitive and fMRI data for the classification task. The resulting classifier operates solely on the cognitive data while it incorporates the fMRI data as privileged information (PI during training. This novel classifier is of practical use as the collection of brain imaging data is not always possible with patients and older participants.MCI patients and healthy age-matched controls were trained to extract structure from temporal sequences. We ask whether machine learning classifiers can be used to discriminate patients from controls based on the learning performance and whether differences between these groups relate to individual cognitive profiles. To this end, we tested participants in four cognitive tasks: working memory, cognitive inhibition, divided attention, and selective attention. We also collected fMRI data before and after training on the learning task and extracted fMRI responses and connectivity as features for machine learning classifiers. Our results show that the PI guided GMLVQ classifiers outperform the baseline classifier that only used the cognitive data. In addition, we found that for the baseline classifier, divided attention is the only relevant cognitive feature. When PI was incorporated, divided attention remained the most relevant feature while cognitive inhibition became also relevant for the task. Interestingly, this analysis for the fMRI GMLVQ classifier suggests that (1 when overall fMRI signal for structured stimuli is

  20. Empirical study of classification process for two-stage turbo air classifier in series

    Science.gov (United States)

    Yu, Yuan; Liu, Jiaxiang; Li, Gang

    2013-05-01

    The suitable process parameters for a two-stage turbo air classifier are important for obtaining the ultrafine powder that has a narrow particle-size distribution, however little has been published internationally on the classification process for the two-stage turbo air classifier in series. The influence of the process parameters of a two-stage turbo air classifier in series on classification performance is empirically studied by using aluminum oxide powders as the experimental material. The experimental results show the following: 1) When the rotor cage rotary speed of the first-stage classifier is increased from 2 300 r/min to 2 500 r/min with a constant rotor cage rotary speed of the second-stage classifier, classification precision is increased from 0.64 to 0.67. However, in this case, the final ultrafine powder yield is decreased from 79% to 74%, which means the classification precision and the final ultrafine powder yield can be regulated through adjusting the rotor cage rotary speed of the first-stage classifier. 2) When the rotor cage rotary speed of the second-stage classifier is increased from 2 500 r/min to 3 100 r/min with a constant rotor cage rotary speed of the first-stage classifier, the cut size is decreased from 13.16 μm to 8.76 μm, which means the cut size of the ultrafine powder can be regulated through adjusting the rotor cage rotary speed of the second-stage classifier. 3) When the feeding speed is increased from 35 kg/h to 50 kg/h, the "fish-hook" effect is strengthened, which makes the ultrafine powder yield decrease. 4) To weaken the "fish-hook" effect, the equalization of the two-stage wind speeds or the combination of a high first-stage wind speed with a low second-stage wind speed should be selected. This empirical study provides a criterion of process parameter configurations for a two-stage or multi-stage classifier in series, which offers a theoretical basis for practical production.

  1. SNRFCB: sub-network based random forest classifier for predicting chemotherapy benefit on survival for cancer treatment.

    Science.gov (United States)

    Shi, Mingguang; He, Jianmin

    2016-04-01

    Adjuvant chemotherapy (CTX) should be individualized to provide potential survival benefit and avoid potential harm to cancer patients. Our goal was to establish a computational approach for making personalized estimates of the survival benefit from adjuvant CTX. We developed Sub-Network based Random Forest classifier for predicting Chemotherapy Benefit (SNRFCB) based gene expression datasets of lung cancer. The SNRFCB approach was then validated in independent test cohorts for identifying chemotherapy responder cohorts and chemotherapy non-responder cohorts. SNRFCB involved the pre-selection of gene sub-network signatures based on the mutations and on protein-protein interaction data as well as the application of the random forest algorithm to gene expression datasets. Adjuvant CTX was significantly associated with the prolonged overall survival of lung cancer patients in the chemotherapy responder group (P = 0.008), but it was not beneficial to patients in the chemotherapy non-responder group (P = 0.657). Adjuvant CTX was significantly associated with the prolonged overall survival of lung cancer squamous cell carcinoma (SQCC) subtype patients in the chemotherapy responder cohorts (P = 0.024), but it was not beneficial to patients in the chemotherapy non-responder cohorts (P = 0.383). SNRFCB improved prediction performance as compared to the machine learning method, support vector machine (SVM). To test the general applicability of the predictive model, we further applied the SNRFCB approach to human breast cancer datasets and also observed superior performance. SNRFCB could provide recurrent probability for individual patients and identify which patients may benefit from adjuvant CTX in clinical trials.

  2. Novel Two-Step Classifier for Torsades de Pointes Risk Stratification from Direct Features

    Directory of Open Access Journals (Sweden)

    Jaimit Parikh

    2017-11-01

    Full Text Available While pre-clinical Torsades de Pointes (TdP risk classifiers had initially been based on drug-induced block of hERG potassium channels, it is now well established that improved risk prediction can be achieved by considering block of non-hERG ion channels. The current multi-channel TdP classifiers can be categorized into two classes. First, the classifiers that take as input the values of drug-induced block of ion channels (direct features. Second, the classifiers that are built on features extracted from output of the drug-induced multi-channel blockage simulations in the in-silico models (derived features. The classifiers built on derived features have thus far not consistently provided increased prediction accuracies, and hence casts doubt on the value of such approaches given the cost of including biophysical detail. Here, we propose a new two-step method for TdP risk classification, referred to as Multi-Channel Blockage at Early After Depolarization (MCB@EAD. In the first step, we classified the compound that produced insufficient hERG block as non-torsadogenic. In the second step, the role of non-hERG channels to modulate TdP risk are considered by constructing classifiers based on direct or derived features at critical hERG block concentrations that generates EADs in the computational cardiac cell models. MCB@EAD provides comparable or superior TdP risk classification of the drugs from the direct features in tests against published methods. TdP risk for the drugs highly correlated to the propensity to generate EADs in the model. However, the derived features of the biophysical models did not improve the predictive capability for TdP risk assessment.

  3. Feature extraction using convolutional neural network for classifying breast density in mammographic images

    Science.gov (United States)

    Thomaz, Ricardo L.; Carneiro, Pedro C.; Patrocinio, Ana C.

    2017-03-01

    Breast cancer is the leading cause of death for women in most countries. The high levels of mortality relate mostly to late diagnosis and to the direct proportionally relationship between breast density and breast cancer development. Therefore, the correct assessment of breast density is important to provide better screening for higher risk patients. However, in modern digital mammography the discrimination among breast densities is highly complex due to increased contrast and visual information for all densities. Thus, a computational system for classifying breast density might be a useful tool for aiding medical staff. Several machine-learning algorithms are already capable of classifying small number of classes with good accuracy. However, machinelearning algorithms main constraint relates to the set of features extracted and used for classification. Although well-known feature extraction techniques might provide a good set of features, it is a complex task to select an initial set during design of a classifier. Thus, we propose feature extraction using a Convolutional Neural Network (CNN) for classifying breast density by a usual machine-learning classifier. We used 307 mammographic images downsampled to 260x200 pixels to train a CNN and extract features from a deep layer. After training, the activation of 8 neurons from a deep fully connected layer are extracted and used as features. Then, these features are feedforward to a single hidden layer neural network that is cross-validated using 10-folds to classify among four classes of breast density. The global accuracy of this method is 98.4%, presenting only 1.6% of misclassification. However, the small set of samples and memory constraints required the reuse of data in both CNN and MLP-NN, therefore overfitting might have influenced the results even though we cross-validated the network. Thus, although we presented a promising method for extracting features and classifying breast density, a greater database is

  4. Trichoderma genes

    Science.gov (United States)

    Foreman, Pamela [Los Altos, CA; Goedegebuur, Frits [Vlaardingen, NL; Van Solingen, Pieter [Naaldwijk, NL; Ward, Michael [San Francisco, CA

    2012-06-19

    Described herein are novel gene sequences isolated from Trichoderma reesei. Two genes encoding proteins comprising a cellulose binding domain, one encoding an arabionfuranosidase and one encoding an acetylxylanesterase are described. The sequences, CIP1 and CIP2, contain a cellulose binding domain. These proteins are especially useful in the textile and detergent industry and in pulp and paper industry.

  5. Genes and proteins of Escherichia coli K-12.

    Science.gov (United States)

    Riley, M

    1998-01-01

    GenProtEC is a database of Escherichia coli genes and their gene products, classified by type of function and physiological role and with citations to the literature for each. Also present are data on sequence similarities among E.coli proteins, representing groups of paralogous genes, with PAM values, percent identity of amino acids, length of alignment and percent aligned. GenProtEC can be accessed at the URL http://www.mbl.edu/html/ecoli.html

  6. A Comprehensive Classification and Evolutionary Analysis of Plant Homeobox Genes

    OpenAIRE

    Mukherjee, Krishanu; Brocchieri, Luciano; B?rglin, Thomas R.

    2009-01-01

    The full complement of homeobox transcription factor sequences, including genes and pseudogenes, was determined from the analysis of 10 complete genomes from flowering plants, moss, Selaginella, unicellular green algae, and red algae. Our exhaustive genome-wide searches resulted in the discovery in each class of a greater number of homeobox genes than previously reported. All homeobox genes can be unambiguously classified by sequence evolutionary analysis into 14 distinct classes also charact...

  7. Local curvature analysis for classifying breast tumors: Preliminary analysis in dedicated breast CT

    International Nuclear Information System (INIS)

    Lee, Juhun; Nishikawa, Robert M.; Reiser, Ingrid; Boone, John M.; Lindfors, Karen K.

    2015-01-01

    Purpose: The purpose of this study is to measure the effectiveness of local curvature measures as novel image features for classifying breast tumors. Methods: A total of 119 breast lesions from 104 noncontrast dedicated breast computed tomography images of women were used in this study. Volumetric segmentation was done using a seed-based segmentation algorithm and then a triangulated surface was extracted from the resulting segmentation. Total, mean, and Gaussian curvatures were then computed. Normalized curvatures were used as classification features. In addition, traditional image features were also extracted and a forward feature selection scheme was used to select the optimal feature set. Logistic regression was used as a classifier and leave-one-out cross-validation was utilized to evaluate the classification performances of the features. The area under the receiver operating characteristic curve (AUC, area under curve) was used as a figure of merit. Results: Among curvature measures, the normalized total curvature (C_T) showed the best classification performance (AUC of 0.74), while the others showed no classification power individually. Five traditional image features (two shape, two margin, and one texture descriptors) were selected via the feature selection scheme and its resulting classifier achieved an AUC of 0.83. Among those five features, the radial gradient index (RGI), which is a margin descriptor, showed the best classification performance (AUC of 0.73). A classifier combining RGI and C_T yielded an AUC of 0.81, which showed similar performance (i.e., no statistically significant difference) to the classifier with the above five traditional image features. Additional comparisons in AUC values between classifiers using different combinations of traditional image features and C_T were conducted. The results showed that C_T was able to replace the other four image features for the classification task. Conclusions: The normalized curvature measure

  8. Classifier for gravitational-wave inspiral signals in nonideal single-detector data

    Science.gov (United States)

    Kapadia, S. J.; Dent, T.; Dal Canton, T.

    2017-11-01

    We describe a multivariate classifier for candidate events in a templated search for gravitational-wave (GW) inspiral signals from neutron-star-black-hole (NS-BH) binaries, in data from ground-based detectors where sensitivity is limited by non-Gaussian noise transients. The standard signal-to-noise ratio (SNR) and chi-squared test for inspiral searches use only properties of a single matched filter at the time of an event; instead, we propose a classifier using features derived from a bank of inspiral templates around the time of each event, and also from a search using approximate sine-Gaussian templates. The classifier thus extracts additional information from strain data to discriminate inspiral signals from noise transients. We evaluate a random forest classifier on a set of single-detector events obtained from realistic simulated advanced LIGO data, using simulated NS-BH signals added to the data. The new classifier detects a factor of 1.5-2 more signals at low false positive rates as compared to the standard "reweighted SNR" statistic, and does not require the chi-squared test to be computed. Conversely, if only the SNR and chi-squared values of single-detector events are available, random forest classification performs nearly identically to the reweighted SNR.

  9. Exploring Land Use and Land Cover of Geotagged Social-Sensing Images Using Naive Bayes Classifier

    Directory of Open Access Journals (Sweden)

    Asamaporn Sitthi

    2016-09-01

    Full Text Available Online social media crowdsourced photos contain a vast amount of visual information about the physical properties and characteristics of the earth’s surface. Flickr is an important online social media platform for users seeking this information. Each day, users generate crowdsourced geotagged digital imagery containing an immense amount of information. In this paper, geotagged Flickr images are used for automatic extraction of low-level land use/land cover (LULC features. The proposed method uses a naive Bayes classifier with color, shape, and color index descriptors. The classified images are mapped using a majority filtering approach. The classifier performance in overall accuracy, kappa coefficient, precision, recall, and f-measure was 87.94%, 82.89%, 88.20%, 87.90%, and 88%, respectively. Labeled-crowdsourced images were filtered into a spatial tile of a 30 m × 30 m resolution using the majority voting method to reduce geolocation uncertainty from the crowdsourced data. These tile datasets were used as training and validation samples to classify Landsat TM5 images. The supervised maximum likelihood method was used for the LULC classification. The results show that the geotagged Flickr images can classify LULC types with reasonable accuracy and that the proposed approach improves LULC classification efficiency if a sufficient spatial distribution of crowdsourced data exists.

  10. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

    Directory of Open Access Journals (Sweden)

    C. V. Subbulakshmi

    2015-01-01

    Full Text Available Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO algorithm with the extreme learning machine (ELM classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN, proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers.

  11. Parameterization of a fuzzy classifier for the diagnosis of an industrial process

    International Nuclear Information System (INIS)

    Toscano, R.; Lyonnet, P.

    2002-01-01

    The aim of this paper is to present a classifier based on a fuzzy inference system. For this classifier, we propose a parameterization method, which is not necessarily based on an iterative training. This approach can be seen as a pre-parameterization, which allows the determination of the rules base and the parameters of the membership functions. We also present a continuous and derivable version of the previous classifier and suggest an iterative learning algorithm based on a gradient method. An example using the learning basis IRIS, which is a benchmark for classification problems, is presented showing the performances of this classifier. Finally this classifier is applied to the diagnosis of a DC motor showing the utility of this method. However in many cases the total knowledge necessary to the synthesis of the fuzzy diagnosis system (FDS) is not, in general, directly available. It must be extracted from an often-considerable mass of information. For this reason, a general methodology for the design of a FDS is presented and illustrated on a non-linear plant

  12. A Novel Design of 4-Class BCI Using Two Binary Classifiers and Parallel Mental Tasks

    Directory of Open Access Journals (Sweden)

    Tao Geng

    2008-01-01

    Full Text Available A novel 4-class single-trial brain computer interface (BCI based on two (rather than four or more binary linear discriminant analysis (LDA classifiers is proposed, which is called a “parallel BCI.” Unlike other BCIs where mental tasks are executed and classified in a serial way one after another, the parallel BCI uses properly designed parallel mental tasks that are executed on both sides of the subject body simultaneously, which is the main novelty of the BCI paradigm used in our experiments. Each of the two binary classifiers only classifies the mental tasks executed on one side of the subject body, and the results of the two binary classifiers are combined to give the result of the 4-class BCI. Data was recorded in experiments with both real movement and motor imagery in 3 able-bodied subjects. Artifacts were not detected or removed. Offline analysis has shown that, in some subjects, the parallel BCI can generate a higher accuracy than a conventional 4-class BCI, although both of them have used the same feature selection and classification algorithms.

  13. Learning to Detect Traffic Incidents from Data Based on Tree Augmented Naive Bayesian Classifiers

    Directory of Open Access Journals (Sweden)

    Dawei Li

    2017-01-01

    Full Text Available This study develops a tree augmented naive Bayesian (TAN classifier based incident detection algorithm. Compared with the Bayesian networks based detection algorithms developed in the previous studies, this algorithm has less dependency on experts’ knowledge. The structure of TAN classifier for incident detection is learned from data. The discretization of continuous attributes is processed using an entropy-based method automatically. A simulation dataset on the section of the Ayer Rajah Expressway (AYE in Singapore is used to demonstrate the development of proposed algorithm, including wavelet denoising, normalization, entropy-based discretization, and structure learning. The performance of TAN based algorithm is evaluated compared with the previous developed Bayesian network (BN based and multilayer feed forward (MLF neural networks based algorithms with the same AYE data. The experiment results show that the TAN based algorithms perform better than the BN classifiers and have a similar performance to the MLF based algorithm. However, TAN based algorithm would have wider vista of applications because the theory of TAN classifiers is much less complicated than MLF. It should be found from the experiment that the TAN classifier based algorithm has a significant superiority over the speed of model training and calibration compared with MLF.

  14. Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion.

    Science.gov (United States)

    Agarwal, Shashank; Yu, Hong

    2009-12-01

    Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied different approaches for automatically classifying sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We first evaluated whether sentences in full-text biomedical articles could be reliably annotated into the IMRAD format and then explored different approaches for automatically classifying these sentences into the IMRAD categories. Our results show an overall annotation agreement of 82.14% with a Kappa score of 0.756. The best classification system is a multinomial naïve Bayes classifier trained on manually annotated data that achieved 91.95% accuracy and an average F-score of 91.55%, which is significantly higher than baseline systems. A web version of this system is available online at-http://wood.ims.uwm.edu/full_text_classifier/.

  15. An SVM classifier to separate false signals from microcalcifications in digital mammograms

    Energy Technology Data Exchange (ETDEWEB)

    Bazzani, Armando; Bollini, Dante; Brancaccio, Rosa; Campanini, Renato; Riccardi, Alessandro; Romani, Davide [Department of Physics, University of Bologna (Italy); INFN, Bologna (Italy); Lanconelli, Nico [Department of Physics, University of Bologna, and INFN, Bologna (Italy). E-mail: nico.lanconelli@bo.infn.it; Bevilacqua, Alessandro [Department of Electronics, Computer Science and Systems, University of Bologna, and INFN, Bologna (Italy)

    2001-06-01

    In this paper we investigate the feasibility of using an SVM (support vector machine) classifier in our automatic system for the detection of clustered microcalcifications in digital mammograms. SVM is a technique for pattern recognition which relies on the statistical learning theory. It minimizes a function of two terms: the number of misclassified vectors of the training set and a term regarding the generalization classifier capability. We compare the SVM classifier with an MLP (multi-layer perceptron) in the false-positive reduction phase of our detection scheme: a detected signal is considered either microcalcification or false signal, according to the value of a set of its features. The SVM classifier gets slightly better results than the MLP one (Az value of 0.963 against 0.958) in the presence of a high number of training data; the improvement becomes much more evident (Az value of 0.952 against 0.918) in training sets of reduced size. Finally, the setting of the SVM classifier is much easier than the MLP one. (author)

  16. Performance evaluation of various classifiers for color prediction of rice paddy plant leaf

    Science.gov (United States)

    Singh, Amandeep; Singh, Maninder Lal

    2016-11-01

    The food industry is one of the industries that uses machine vision for a nondestructive quality evaluation of the produce. These quality measuring systems and softwares are precalculated on the basis of various image-processing algorithms which generally use a particular type of classifier. These classifiers play a vital role in making the algorithms so intelligent that it can contribute its best while performing the said quality evaluations by translating the human perception into machine vision and hence machine learning. The crop of interest is rice, and the color of this crop indicates the health status of the plant. An enormous number of classifiers are available to solve the purpose of color prediction, but choosing the best among them is the focus of this paper. Performance of a total of 60 classifiers has been analyzed from the application point of view, and the results have been discussed. The motivation comes from the idea of providing a set of classifiers with excellent performance and implementing them on a single algorithm for the improvement of machine vision learning and, hence, associated applications.

  17. Obscenity detection using haar-like features and Gentle Adaboost classifier.

    Science.gov (United States)

    Mustafa, Rashed; Min, Yang; Zhu, Dingju

    2014-01-01

    Large exposure of skin area of an image is considered obscene. This only fact may lead to many false images having skin-like objects and may not detect those images which have partially exposed skin area but have exposed erotogenic human body parts. This paper presents a novel method for detecting nipples from pornographic image contents. Nipple is considered as an erotogenic organ to identify pornographic contents from images. In this research Gentle Adaboost (GAB) haar-cascade classifier and haar-like features used for ensuring detection accuracy. Skin filter prior to detection made the system more robust. The experiment showed that, considering accuracy, haar-cascade classifier performs well, but in order to satisfy detection time, train-cascade classifier is suitable. To validate the results, we used 1198 positive samples containing nipple objects and 1995 negative images. The detection rates for haar-cascade and train-cascade classifiers are 0.9875 and 0.8429, respectively. The detection time for haar-cascade is 0.162 seconds and is 0.127 seconds for train-cascade classifier.

  18. Obscenity Detection Using Haar-Like Features and Gentle Adaboost Classifier

    Directory of Open Access Journals (Sweden)

    Rashed Mustafa

    2014-01-01

    Full Text Available Large exposure of skin area of an image is considered obscene. This only fact may lead to many false images having skin-like objects and may not detect those images which have partially exposed skin area but have exposed erotogenic human body parts. This paper presents a novel method for detecting nipples from pornographic image contents. Nipple is considered as an erotogenic organ to identify pornographic contents from images. In this research Gentle Adaboost (GAB haar-cascade classifier and haar-like features used for ensuring detection accuracy. Skin filter prior to detection made the system more robust. The experiment showed that, considering accuracy, haar-cascade classifier performs well, but in order to satisfy detection time, train-cascade classifier is suitable. To validate the results, we used 1198 positive samples containing nipple objects and 1995 negative images. The detection rates for haar-cascade and train-cascade classifiers are 0.9875 and 0.8429, respectively. The detection time for haar-cascade is 0.162 seconds and is 0.127 seconds for train-cascade classifier.

  19. Comparison of Different Features and Classifiers for Driver Fatigue Detection Based on a Single EEG Channel

    Directory of Open Access Journals (Sweden)

    Jianfeng Hu

    2017-01-01

    Full Text Available Driver fatigue has become an important factor to traffic accidents worldwide, and effective detection of driver fatigue has major significance for public health. The purpose method employs entropy measures for feature extraction from a single electroencephalogram (EEG channel. Four types of entropies measures, sample entropy (SE, fuzzy entropy (FE, approximate entropy (AE, and spectral entropy (PE, were deployed for the analysis of original EEG signal and compared by ten state-of-the-art classifiers. Results indicate that optimal performance of single channel is achieved using a combination of channel CP4, feature FE, and classifier Random Forest (RF. The highest accuracy can be up to 96.6%, which has been able to meet the needs of real applications. The best combination of channel + features + classifier is subject-specific. In this work, the accuracy of FE as the feature is far greater than the Acc of other features. The accuracy using classifier RF is the best, while that of classifier SVM with linear kernel is the worst. The impact of channel selection on the Acc is larger. The performance of various channels is very different.

  20. Localization and Recognition of Dynamic Hand Gestures Based on Hierarchy of Manifold Classifiers

    Science.gov (United States)

    Favorskaya, M.; Nosov, A.; Popov, A.

    2015-05-01

    Generally, the dynamic hand gestures are captured in continuous video sequences, and a gesture recognition system ought to extract the robust features automatically. This task involves the highly challenging spatio-temporal variations of dynamic hand gestures. The proposed method is based on two-level manifold classifiers including the trajectory classifiers in any time instants and the posture classifiers of sub-gestures in selected time instants. The trajectory classifiers contain skin detector, normalized skeleton representation of one or two hands, and motion history representing by motion vectors normalized through predetermined directions (8 and 16 in our case). Each dynamic gesture is separated into a set of sub-gestures in order to predict a trajectory and remove those samples of gestures, which do not satisfy to current trajectory. The posture classifiers involve the normalized skeleton representation of palm and fingers and relative finger positions using fingertips. The min-max criterion is used for trajectory recognition, and the decision tree technique was applied for posture recognition of sub-gestures. For experiments, a dataset "Multi-modal Gesture Recognition Challenge 2013: Dataset and Results" including 393 dynamic hand-gestures was chosen. The proposed method yielded 84-91% recognition accuracy, in average, for restricted set of dynamic gestures.

  1. LOCALIZATION AND RECOGNITION OF DYNAMIC HAND GESTURES BASED ON HIERARCHY OF MANIFOLD CLASSIFIERS

    Directory of Open Access Journals (Sweden)

    M. Favorskaya

    2015-05-01

    Full Text Available Generally, the dynamic hand gestures are captured in continuous video sequences, and a gesture recognition system ought to extract the robust features automatically. This task involves the highly challenging spatio-temporal variations of dynamic hand gestures. The proposed method is based on two-level manifold classifiers including the trajectory classifiers in any time instants and the posture classifiers of sub-gestures in selected time instants. The trajectory classifiers contain skin detector, normalized skeleton representation of one or two hands, and motion history representing by motion vectors normalized through predetermined directions (8 and 16 in our case. Each dynamic gesture is separated into a set of sub-gestures in order to predict a trajectory and remove those samples of gestures, which do not satisfy to current trajectory. The posture classifiers involve the normalized skeleton representation of palm and fingers and relative finger positions using fingertips. The min-max criterion is used for trajectory recognition, and the decision tree technique was applied for posture recognition of sub-gestures. For experiments, a dataset “Multi-modal Gesture Recognition Challenge 2013: Dataset and Results” including 393 dynamic hand-gestures was chosen. The proposed method yielded 84–91% recognition accuracy, in average, for restricted set of dynamic gestures.

  2. 36 CFR 1260.20 - Who is responsible for the declassification of classified national security Executive Branch...

    Science.gov (United States)

    2010-07-01

    ... declassification of classified national security Executive Branch information that has been accessioned by NARA... ADMINISTRATION DECLASSIFICATION DECLASSIFICATION OF NATIONAL SECURITY INFORMATION Responsibilities § 1260.20 Who is responsible for the declassification of classified national security Executive Branch information...

  3. A Novel Approach for Multi Class Fault Diagnosis in Induction Machine Based on Statistical Time Features and Random Forest Classifier

    Science.gov (United States)

    Sonje, M. Deepak; Kundu, P.; Chowdhury, A.

    2017-08-01

    Fault diagnosis and detection is the important area in health monitoring of electrical machines. This paper proposes the recently developed machine learning classifier for multi class fault diagnosis in induction machine. The classification is based on random forest (RF) algorithm. Initially, stator currents are acquired from the induction machine under various conditions. After preprocessing the currents, fourteen statistical time features are estimated for each phase of the current. These parameters are considered as inputs to the classifier. The main scope of the paper is to evaluate effectiveness of RF classifier for individual and mixed fault diagnosis in induction machine. The stator, rotor and mixed faults (stator and rotor faults) are classified using the proposed classifier. The obtained performance measures are compared with the multilayer perceptron neural network (MLPNN) classifier. The results show the much better performance measures and more accurate than MLPNN classifier. For demonstration of planned fault diagnosis algorithm, experimentally obtained results are considered to build the classifier more practical.

  4. Generic Black-Box End-to-End Attack Against State of the Art API Call Based Malware Classifiers

    OpenAIRE

    Rosenberg, Ishai; Shabtai, Asaf; Rokach, Lior; Elovici, Yuval

    2017-01-01

    In this paper, we present a black-box attack against API call based machine learning malware classifiers, focusing on generating adversarial sequences combining API calls and static features (e.g., printable strings) that will be misclassified by the classifier without affecting the malware functionality. We show that this attack is effective against many classifiers due to the transferability principle between RNN variants, feed forward DNNs, and traditional machine learning classifiers such...

  5. Using Fuzzy Gaussian Inference and Genetic Programming to Classify 3D Human Motions

    Science.gov (United States)

    Khoury, Mehdi; Liu, Honghai

    This research introduces and builds on the concept of Fuzzy Gaussian Inference (FGI) (Khoury and Liu in Proceedings of UKCI, 2008 and IEEE Workshop on Robotic Intelligence in Informationally Structured Space (RiiSS 2009), 2009) as a novel way to build Fuzzy Membership Functions that map to hidden Probability Distributions underlying human motions. This method is now combined with a Genetic Programming Fuzzy rule-based system in order to classify boxing moves from natural human Motion Capture data. In this experiment, FGI alone is able to recognise seven different boxing stances simultaneously with an accuracy superior to a GMM-based classifier. Results seem to indicate that adding an evolutionary Fuzzy Inference Engine on top of FGI improves the accuracy of the classifier in a consistent way.

  6. AN IMPLEMENTATION OF EIS-SVM CLASSIFIER USING RESEARCH ARTICLES FOR TEXT CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    B Ramesh

    2016-04-01

    Full Text Available Automatic text classification is a prominent research topic in text mining. The text pre-processing is a major role in text classifier. The efficiency of pre-processing techniques is increasing the performance of text classifier. In this paper, we are implementing ECAS stemmer, Efficient Instance Selection and Pre-computed Kernel Support Vector Machine for text classification using recent research articles. We are using better pre-processing techniques such as ECAS stemmer to find root word, Efficient Instance Selection for dimensionality reduction of text data and Pre-computed Kernel Support Vector Machine for classification of selected instances. In this experiments were performed on 750 research articles with three classes such as engineering article, medical articles and educational articles. The EIS-SVM classifier provides better performance in real-time research articles classification.

  7. Fault Diagnosis for Distribution Networks Using Enhanced Support Vector Machine Classifier with Classical Multidimensional Scaling

    Directory of Open Access Journals (Sweden)

    Ming-Yuan Cho

    2017-09-01

    Full Text Available In this paper, a new fault diagnosis techniques based on time domain reflectometry (TDR method with pseudo-random binary sequence (PRBS stimulus and support vector machine (SVM classifier has been investigated to recognize the different types of fault in the radial distribution feeders. This novel technique has considered the amplitude of reflected signals and the peaks of cross-correlation (CCR between the reflected and incident wave for generating fault current dataset for SVM. Furthermore, this multi-layer enhanced SVM classifier is combined with classical multidimensional scaling (CMDS feature extraction algorithm and kernel parameter optimization to increase training speed and improve overall classification accuracy. The proposed technique has been tested on a radial distribution feeder to identify ten different types of fault considering 12 input features generated by using Simulink software and MATLAB Toolbox. The success rate of SVM classifier is over 95% which demonstrates the effectiveness and the high accuracy of proposed method.

  8. A Modified FCM Classifier Constrained by Conditional Random Field Model for Remote Sensing Imagery

    Directory of Open Access Journals (Sweden)

    WANG Shaoyu

    2016-12-01

    Full Text Available Remote sensing imagery has abundant spatial correlation information, but traditional pixel-based clustering algorithms don't take the spatial information into account, therefore the results are often not good. To this issue, a modified FCM classifier constrained by conditional random field model is proposed. Adjacent pixels' priori classified information will have a constraint on the classification of the center pixel, thus extracting spatial correlation information. Spectral information and spatial correlation information are considered at the same time when clustering based on second order conditional random field. What's more, the global optimal inference of pixel's classified posterior probability can be get using loopy belief propagation. The experiment shows that the proposed algorithm can effectively maintain the shape feature of the object, and the classification accuracy is higher than traditional algorithms.

  9. A Naive-Bayes classifier for damage detection in engineering materials

    Energy Technology Data Exchange (ETDEWEB)

    Addin, O. [Laboratory of Intelligent Systems, Institute of Advanced Technology, Universiti Putra Malaysia, 43400 Serdang, Selangor (Malaysia); Sapuan, S.M. [Department of Mechanical and Manufacturing Engineering, Universiti Putra Malaysia, 43400 Serdang, Selangor (Malaysia)]. E-mail: sapuan@eng.upm.edu.my; Mahdi, E. [Department of Aerospace Engineering, Universiti Putra Malaysia, 43400 Serdang, Selangor (Malaysia); Othman, M. [Department of Communication Technology and Networks, Universiti Putra Malaysia, 43400 Serdang, Selangor (Malaysia)

    2007-07-01

    This paper is intended to introduce the Bayesian network in general and the Naive-Bayes classifier in particular as one of the most successful classification systems to simulate damage detection in engineering materials. A method for feature subset selection has also been introduced too. The method is based on mean and maximum values of the amplitudes of waves after dividing them into folds then grouping them by a clustering algorithm (e.g. k-means algorithm). The Naive-Bayes classifier and the feature sub-set selection method were analyzed and tested on two sets of data. The data sets were conducted based on artificial damages created in quasi isotopic laminated composites of the AS4/3501-6 graphite/epoxy system and ball bearing of the type 6204 with a steel cage. The Naive-Bayes classifier and the proposed feature subset selection algorithm have been shown as efficient techniques for damage detection in engineering materials.

  10. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.

    Science.gov (United States)

    Zhang, Yi; Ren, Jinchang; Jiang, Jianmin

    2015-01-01

    Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.

  11. A Cross-Classified CFA-MTMM Model for Structurally Different and Nonindependent Interchangeable Methods.

    Science.gov (United States)

    Koch, Tobias; Schultze, Martin; Jeon, Minjeong; Nussbeck, Fridtjof W; Praetorius, Anna-Katharina; Eid, Michael

    2016-01-01

    Multirater (multimethod, multisource) studies are increasingly applied in psychology. Eid and colleagues (2008) proposed a multilevel confirmatory factor model for multitrait-multimethod (MTMM) data combining structurally different and multiple independent interchangeable methods (raters). In many studies, however, different interchangeable raters (e.g., peers, subordinates) are asked to rate different targets (students, supervisors), leading to violations of the independence assumption and to cross-classified data structures. In the present work, we extend the ML-CFA-MTMM model by Eid and colleagues (2008) to cross-classified multirater designs. The new C4 model (Cross-Classified CTC[M-1] Combination of Methods) accounts for nonindependent interchangeable raters and enables researchers to explicitly model the interaction between targets and raters as a latent variable. Using a real data application, it is shown how credibility intervals of model parameters and different variance components can be obtained using Bayesian estimation techniques.

  12. Use of GMM and SCMS for Accurate Road Centerline Extraction from the Classified Image

    Directory of Open Access Journals (Sweden)

    Zelang Miao

    2015-01-01

    Full Text Available The extraction of road centerline from the classified image is a fundamental image analysis technology. Common problems encountered in road centerline extraction include low ability for coping with the general case, production of undesired objects, and inefficiency. To tackle these limitations, this paper presents a novel accurate centerline extraction method using Gaussian mixture model (GMM and subspace constraint mean shift (SCMS. The proposed method consists of three main steps. GMM is first used to partition the classified image into several clusters. The major axis of the ellipsoid of each cluster is extracted and deemed to be taken as the initial centerline. Finally, the initial result is adjusted using SCMS to produce precise road centerline. Both simulated and real datasets are used to validate the proposed method. Preliminary results demonstrate that the proposed method provides a comparatively robust solution for accurate centerline extraction from a classified image.

  13. Feature weighting using particle swarm optimization for learning vector quantization classifier

    Science.gov (United States)

    Dongoran, A.; Rahmadani, S.; Zarlis, M.; Zakarias

    2018-03-01

    This paper discusses and proposes a method of feature weighting in classification assignments on competitive learning artificial neural network LVQ. The weighting feature method is the search for the weight of an attribute using the PSO so as to give effect to the resulting output. This method is then applied to the LVQ-Classifier and tested on the 3 datasets obtained from the UCI Machine Learning repository. Then an accuracy analysis will be generated by two approaches. The first approach using LVQ1, referred to as LVQ-Classifier and the second approach referred to as PSOFW-LVQ, is a proposed model. The result shows that the PSO algorithm is capable of finding attribute weights that increase LVQ-classifier accuracy.

  14. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations

    Directory of Open Access Journals (Sweden)

    Yi Zhang

    2015-01-01

    Full Text Available Maximum likelihood classifier (MLC and support vector machines (SVM are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.

  15. Feature and score fusion based multiple classifier selection for iris recognition.

    Science.gov (United States)

    Islam, Md Rabiul

    2014-01-01

    The aim of this work is to propose a new feature and score fusion based iris recognition approach where voting method on Multiple Classifier Selection technique has been applied. Four Discrete Hidden Markov Model classifiers output, that is, left iris based unimodal system, right iris based unimodal system, left-right iris feature fusion based multimodal system, and left-right iris likelihood ratio score fusion based multimodal system, is combined using voting method to achieve the final recognition result. CASIA-IrisV4 database has been used to measure the performance of the proposed system with various dimensions. Experimental results show the versatility of the proposed system of four different classifiers with various dimensions. Finally, recognition accuracy of the proposed system has been compared with existing N hamming distance score fusion approach proposed by Ma et al., log-likelihood ratio score fusion approach proposed by Schmid et al., and single level feature fusion approach proposed by Hollingsworth et al.

  16. Effective Heart Disease Detection Based on Quantitative Computerized Traditional Chinese Medicine Using Representation Based Classifiers

    Directory of Open Access Journals (Sweden)

    Ting Shu

    2017-01-01

    Full Text Available At present, heart disease is the number one cause of death worldwide. Traditionally, heart disease is commonly detected using blood tests, electrocardiogram, cardiac computerized tomography scan, cardiac magnetic resonance imaging, and so on. However, these traditional diagnostic methods are time consuming and/or invasive. In this paper, we propose an effective noninvasive computerized method based on facial images to quantitatively detect heart disease. Specifically, facial key block color features are extracted from facial images and analyzed using the Probabilistic Collaborative Representation Based Classifier. The idea of facial key block color analysis is founded in Traditional Chinese Medicine. A new dataset consisting of 581 heart disease and 581 healthy samples was experimented by the proposed method. In order to optimize the Probabilistic Collaborative Representation Based Classifier, an analysis of its parameters was performed. According to the experimental results, the proposed method obtains the highest accuracy compared with other classifiers and is proven to be effective at heart disease detection.

  17. Combining Biometric Fractal Pattern and Particle Swarm Optimization-Based Classifier for Fingerprint Recognition

    Directory of Open Access Journals (Sweden)

    Chia-Hung Lin

    2010-01-01

    Full Text Available This paper proposes combining the biometric fractal pattern and particle swarm optimization (PSO-based classifier for fingerprint recognition. Fingerprints have arch, loop, whorl, and accidental morphologies, and embed singular points, resulting in the establishment of fingerprint individuality. An automatic fingerprint identification system consists of two stages: digital image processing (DIP and pattern recognition. DIP is used to convert to binary images, refine out noise, and locate the reference point. For binary images, Katz's algorithm is employed to estimate the fractal dimension (FD from a two-dimensional (2D image. Biometric features are extracted as fractal patterns using different FDs. Probabilistic neural network (PNN as a classifier performs to compare the fractal patterns among the small-scale database. A PSO algorithm is used to tune the optimal parameters and heighten the accuracy. For 30 subjects in the laboratory, the proposed classifier demonstrates greater efficiency and higher accuracy in fingerprint recognition.

  18. An overview of application of bayesian classifier approach in radioactive tracer technology.case study

    International Nuclear Information System (INIS)

    El-Aseer, A.; Dawood, E.; Ben Ayad, S.; Alwerfalli, M.

    2015-01-01

    The usefulness of implementing a radioactive tracer techniques subjected to varied risk factors. Thus, the setup procedure for the application experimental techniques of radioactive tracer must be evaluated prior the decision action steps. One way of doing this, is to use Bay's theorem techniques. As there is a possibility of classifying the implemented parameters into certain catogries depending on their certainty to effect radioactive tracer technology. In this paper, the radioactive tracer experimental parameters classified accoring to Bayesian theory. Using this theory, one can study the proposed technical systems to determine the probabilities of the effectiveness of any selected parameter among the others. The classification of the applied experimental parameters into suitable or unsuitable in proposed theoretically. Ten parameters used in this experimental data were classified accordingly. The posterior is calculate from the prior and the likelihood previously determined by bayes rule.(author)

  19. Solid waste bin detection and classification using Dynamic Time Warping and MLP classifier

    Energy Technology Data Exchange (ETDEWEB)

    Islam, Md. Shafiqul, E-mail: shafique@eng.ukm.my [Dept. of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia, Bangi 43600, Selangore (Malaysia); Hannan, M.A., E-mail: hannan@eng.ukm.my [Dept. of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia, Bangi 43600, Selangore (Malaysia); Basri, Hassan [Dept. of Civil and Structural Engineering, Universiti Kebangsaan Malaysia, Bangi 43600, Selangore (Malaysia); Hussain, Aini; Arebey, Maher [Dept. of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia, Bangi 43600, Selangore (Malaysia)

    2014-02-15

    Highlights: • Solid waste bin level detection using Dynamic Time Warping (DTW). • Gabor wavelet filter is used to extract the solid waste image features. • Multi-Layer Perceptron classifier network is used for bin image classification. • The classification performance evaluated by ROC curve analysis. - Abstract: The increasing requirement for Solid Waste Management (SWM) has become a significant challenge for municipal authorities. A number of integrated systems and methods have introduced to overcome this challenge. Many researchers have aimed to develop an ideal SWM system, including approaches involving software-based routing, Geographic Information Systems (GIS), Radio-frequency Identification (RFID), or sensor intelligent bins. Image processing solutions for the Solid Waste (SW) collection have also been developed; however, during capturing the bin image, it is challenging to position the camera for getting a bin area centralized image. As yet, there is no ideal system which can correctly estimate the amount of SW. This paper briefly discusses an efficient image processing solution to overcome these problems. Dynamic Time Warping (DTW) was used for detecting and cropping the bin area and Gabor wavelet (GW) was introduced for feature extraction of the waste bin image. Image features were used to train the classifier. A Multi-Layer Perceptron (MLP) classifier was used to classify the waste bin level and estimate the amount of waste inside the bin. The area under the Receiver Operating Characteristic (ROC) curves was used to statistically evaluate classifier performance. The results of this developed system are comparable to previous image processing based system. The system demonstration using DTW with GW for feature extraction and an MLP classifier led to promising results with respect to the accuracy of waste level estimation (98.50%). The application can be used to optimize the routing of waste collection based on the estimated bin level.

  20. Effective Sequential Classifier Training for SVM-Based Multitemporal Remote Sensing Image Classification

    Science.gov (United States)

    Guo, Yiqing; Jia, Xiuping; Paull, David

    2018-06-01

    The explosive availability of remote sensing images has challenged supervised classification algorithms such as Support Vector Machines (SVM), as training samples tend to be highly limited due to the expensive and laborious task of ground truthing. The temporal correlation and spectral similarity between multitemporal images have opened up an opportunity to alleviate this problem. In this study, a SVM-based Sequential Classifier Training (SCT-SVM) approach is proposed for multitemporal remote sensing image classification. The approach leverages the classifiers of previous images to reduce the required number of training samples for the classifier training of an incoming image. For each incoming image, a rough classifier is firstly predicted based on the temporal trend of a set of previous classifiers. The predicted classifier is then fine-tuned into a more accurate position with current training samples. This approach can be applied progressively to sequential image data, with only a small number of training samples being required from each image. Experiments were conducted with Sentinel-2A multitemporal data over an agricultural area in Australia. Results showed that the proposed SCT-SVM achieved better classification accuracies compared with two state-of-the-art model transfer algorithms. When training data are insufficient, the overall classification accuracy of the incoming image was improved from 76.18% to 94.02% with the proposed SCT-SVM, compared with those obtained without the assistance from previous images. These results demonstrate that the leverage of a priori information from previous images can provide advantageous assistance for later images in multitemporal image classification.

  1. Gender and classifiers in concurrent systems: Refining the typology of nominal classification

    Directory of Open Access Journals (Sweden)

    Sebastian Fedden

    2017-04-01

    Full Text Available Some languages have both gender and classifiers, contrary to what was once believed possible. We use these interesting languages as a unique window onto nominal classification. They provide the impetus for a new typology, based on the degree of orthogonality of the semantic systems and the degree of difference of the forms realizing them. This nine-way typology integrates traditional gender, traditional classifiers and – importantly – the many recently attested phenomena lying between. Besides progress specifically in understanding nominal classification, our approach provides clarity on the wider theoretical issue of single versus concurrent featural systems.

  2. Machine Learning Methods for Classifying Human Physical Activity from On-Body Accelerometers

    Science.gov (United States)

    Mannini, Andrea; Sabatini, Angelo Maria

    2010-01-01

    The use of on-body wearable sensors is widespread in several academic and industrial domains. Of great interest are their applications in ambulatory monitoring and pervasive computing systems; here, some quantitative analysis of human motion and its automatic classification are the main computational tasks to be pursued. In this paper, we discuss how human physical activity can be classified using on-body accelerometers, with a major emphasis devoted to the computational algorithms employed for this purpose. In particular, we motivate our current interest for classifiers based on Hidden Markov Models (HMMs). An example is illustrated and discussed by analysing a dataset of accelerometer time series. PMID:22205862

  3. Using hierarchical clustering methods to classify motor activities of COPD patients from wearable sensor data

    Directory of Open Access Journals (Sweden)

    Reilly John J

    2005-06-01

    Full Text Available Abstract Background Advances in miniature sensor technology have led to the development of wearable systems that allow one to monitor motor activities in the field. A variety of classifiers have been proposed in the past, but little has been done toward developing systematic approaches to assess the feasibility of discriminating the motor tasks of interest and to guide the choice of the classifier architecture. Methods A technique is introduced to address this problem according to a hierarchical framework and its use is demonstrated for the application of detecting motor activities in patients with chronic obstructive pulmonary disease (COPD undergoing pulmonary rehabilitation. Accelerometers were used to collect data for 10 different classes of activity. Features were extracted to capture essential properties of the data set and reduce the dimensionality of the problem at hand. Cluster measures were utilized to find natural groupings in the data set and then construct a hierarchy of the relationships between clusters to guide the process of merging clusters that are too similar to distinguish reliably. It provides a means to assess whether the benefits of merging for performance of a classifier outweigh the loss of resolution incurred through merging. Results Analysis of the COPD data set demonstrated that motor tasks related to ambulation can be reliably discriminated from tasks performed in a seated position with the legs in motion or stationary using two features derived from one accelerometer. Classifying motor tasks within the category of activities related to ambulation requires more advanced techniques. While in certain cases all the tasks could be accurately classified, in others merging clusters associated with different motor tasks was necessary. When merging clusters, it was found that the proposed method could lead to more than 12% improvement in classifier accuracy while retaining resolution of 4 tasks. Conclusion Hierarchical

  4. Evaluation of Classifier Performance for Multiclass Phenotype Discrimination in Untargeted Metabolomics.

    Science.gov (United States)

    Trainor, Patrick J; DeFilippis, Andrew P; Rai, Shesh N

    2017-06-21

    Statistical classification is a critical component of utilizing metabolomics data for examining the molecular determinants of phenotypes. Despite this, a comprehensive and rigorous evaluation of the accuracy of classification techniques for phenotype discrimination given metabolomics data has not been conducted. We conducted such an evaluation using both simulated and real metabolomics datasets, comparing Partial Least Squares-Discriminant Analysis (PLS-DA), Sparse PLS-DA, Random Forests, Support Vector Machines (SVM), Artificial Neural Network, k -Nearest Neighbors ( k -NN), and Naïve Bayes classification techniques for discrimination. We evaluated the techniques on simulated data generated to mimic global untargeted metabolomics data by incorporating realistic block-wise correlation and partial correlation structures for mimicking the correlations and metabolite clustering generated by biological processes. Over the simulation studies, covariance structures, means, and effect sizes were stochastically varied to provide consistent estimates of classifier performance over a wide range of possible scenarios. The effects of the presence of non-normal error distributions, the introduction of biological and technical outliers, unbalanced phenotype allocation, missing values due to abundances below a limit of detection, and the effect of prior-significance filtering (dimension reduction) were evaluated via simulation. In each simulation, classifier parameters, such as the number of hidden nodes in a Neural Network, were optimized by cross-validation to minimize the probability of detecting spurious results due to poorly tuned classifiers. Classifier performance was then evaluated using real metabolomics datasets of varying sample medium, sample size, and experimental design. We report that in the most realistic simulation studies that incorporated non-normal error distributions, unbalanced phenotype allocation, outliers, missing values, and dimension reduction

  5. The use of hyperspectral data for tree species discrimination: Combining binary classifiers

    CSIR Research Space (South Africa)

    Dastile, X

    2010-11-01

    Full Text Available classifier Classification system 7 class 1 class 2 new sample For 5-nearest neighbour classification: assign new sample to class 1. RU SASA 2010 ? Given learning task {(x1,t1),(x 2,t2),?,(x p,tp)} (xi ? Rn feature vectors, ti ? {?1,?, ?c...). A review on the combination of binary classifiers in multiclass problems. Springer science and Business Media B.V [7] Dietterich T.G and Bakiri G.(1995). Solving Multiclass Learning Problem via Error-Correcting Output Codes. AI Access Foundation...

  6. A proposed defect tracking model for classifying the inserted defect reports to enhance software quality control.

    Science.gov (United States)

    Sultan, Torky; Khedr, Ayman E; Sayed, Mostafa

    2013-01-01

    NONE DECLARED Defect tracking systems play an important role in the software development organizations as they can store historical information about defects. There are many research in defect tracking models and systems to enhance their capabilities to be more specifically tracking, and were adopted with new technology. Furthermore, there are different studies in classifying bugs in a step by step method to have clear perception and applicable method in detecting such bugs. This paper shows a new proposed defect tracking model for the purpose of classifying the inserted defects reports in a step by step method for more enhancement of the software quality.

  7. Ageing genes

    DEFF Research Database (Denmark)

    Rattan, Suresh

    2018-01-01

    The idea of gerontogenes is in line with the evolutionary explanation of ageing as being an emergent phenomenon as a result of the imperfect maintenance and repair systems. Although evolutionary processes did not select for any specific ageing genes that restrict and determine the lifespan...... of an individual, the term ‘gerontogenes’ primarily refers to any genes that may seem to influence ageing and longevity, without being specifically selected for that role. Such genes can also be called ‘virtual gerontogenes’ by virtue of their indirect influence on the rate and process of ageing. More than 1000...... virtual gerontogenes have been associated with ageing and longevity in model organisms and humans. The ‘real’ genes, which do influence the essential lifespan of a species, and have been selected for in accordance with the evolutionary life history of the species, are known as the longevity assurance...

  8. Ensemble Classifiers for Predicting HIV-1 Resistance from Three Rule-Based Genotypic Resistance Interpretation Systems.

    Science.gov (United States)

    Raposo, Letícia M; Nobre, Flavio F

    2017-08-30

    Resistance to antiretrovirals (ARVs) is a major problem faced by HIV-infected individuals. Different rule-based algorithms were developed to infer HIV-1 susceptibility to antiretrovirals from genotypic data. However, there is discordance between them, resulting in difficulties for clinical decisions about which treatment to use. Here, we developed ensemble classifiers integrating three interpretation algorithms: Agence Nationale de Recherche sur le SIDA (ANRS), Rega, and the genotypic resistance interpretation system from Stanford HIV Drug Resistance Database (HIVdb). Three approaches were applied to develop a classifier with a single resistance profile: stacked generalization, a simple plurality vote scheme and the selection of the interpretation system with the best performance. The strategies were compared with the Friedman's test and the performance of the classifiers was evaluated using the F-measure, sensitivity and specificity values. We found that the three strategies had similar performances for the selected antiretrovirals. For some cases, the stacking technique with naïve Bayes as the learning algorithm showed a statistically superior F-measure. This study demonstrates that ensemble classifiers can be an alternative tool for clinical decision-making since they provide a single resistance profile from the most commonly used resistance interpretation systems.

  9. Derivation of LDA log likelihood ratio one-to-one classifier

    NARCIS (Netherlands)

    Spreeuwers, Lieuwe Jan

    2014-01-01

    The common expression for the Likelihood Ratio classifier using LDA assumes that the reference class mean is available. In biometrics, this is often not the case and only a single sample of the reference class is available. In this paper expressions are derived for biometric comparison between

  10. Classifying Internet Pathological Users: Their Usage, Internet Sensation Seeking, and Perceptions.

    Science.gov (United States)

    Lin, Sunny S. J.

    A study was conducted to identify pathological Internet users and to reveal their psychological features and problematic usage patterns. One thousand and fifty Taiwanese undergraduates were selected. An Internet Addiction Scale was adopted to classify 648 students into 4 clusters. The 146 users in the 4th cluster, who reported significantly higher…

  11. Users in the Driver's Seat: A New Approach to Classifying Teaching Methods in a University Repository

    NARCIS (Netherlands)

    Neumann, Susanne; Oberhuemer, Petra; Koper, Rob

    2009-01-01

    Neumann, S., Oberhuemer, P., & Koper, R. (2009). Users in the Driver's Seat: A New Approach to Classifying Teaching Methods in a University Repository. In U. Cress, V. Dimitrova & M. Specht (Eds.), Learning in the Synergy of Multiple Disciplines. Proceedings of the Fourth European Conference on

  12. A bench-top hyperspectral imaging system to classify beef from Nellore cattle based on tenderness

    Science.gov (United States)

    Nubiato, Keni Eduardo Zanoni; Mazon, Madeline Rezende; Antonelo, Daniel Silva; Calkins, Chris R.; Naganathan, Govindarajan Konda; Subbiah, Jeyamkondan; da Luz e Silva, Saulo

    2018-03-01

    The aim of this study was to evaluate the accuracy of classification of Nellore beef aged for 0, 7, 14, or 21 days and classification based on tenderness and aging period using a bench-top hyperspectral imaging system. A hyperspectral imaging system (λ = 928-2524 nm) was used to collect hyperspectral images of the Longissimus thoracis et lumborum (aging n = 376 and tenderness n = 345) of Nellore cattle. The image processing steps included selection of region of interest, extraction of spectra, and indentification and evalution of selected wavelengths for classification. Six linear discriminant models were developed to classify samples based on tenderness and aging period. The model using the first derivative of partial absorbance spectra (give wavelength range spectra) was able to classify steaks based on the tenderness with an overall accuracy of 89.8%. The model using the first derivative of full absorbance spectra was able to classify steaks based on aging period with an overall accuracy of 84.8%. The results demonstrate that the HIS may be a viable technology for classifying beef based on tenderness and aging period.

  13. Comparing the Effects of Four Instructional Treatments on EFL Students' Achievement in Writing Classified Ads

    Science.gov (United States)

    Khodabandeh, Farzaneh

    2016-01-01

    The current study set out to compare the effect of traditional and non-traditional instructional treatments; i.e. explicit, implicit, task-based and no-instruction approaches on students' abilities to learn how to write classified ads. 72 junior students who have all taken a course in Reading Journalistic Texts at the Payame-Noor University…

  14. Multiobjective optimization of classifiers by means of 3D convex-hull-based evolutionary algorithms

    NARCIS (Netherlands)

    Zhao, J.; Basto, Fernandes V.; Jiao, L.; Yevseyeva, I.; Asep, Maulana A.; Li, R.; Bäck, T.H.W.; Tang, T.; Michael, Emmerich T. M.

    2016-01-01

    The receiver operating characteristic (ROC) and detection error tradeoff(DET) curves are frequently used in the machine learning community to analyze the performance of binary classifiers. Recently, the convex-hull-based multiobjective genetic programming algorithm was proposed and successfully

  15. A Comparison of Physiological Signal Analysis Techniques and Classifiers for Automatic Emotional Evaluation of Audiovisual Contents.

    Science.gov (United States)

    Colomer Granero, Adrián; Fuentes-Hurtado, Félix; Naranjo Ornedo, Valery; Guixeres Provinciale, Jaime; Ausín, Jose M; Alcañiz Raya, Mariano

    2016-01-01

    This work focuses on finding the most discriminatory or representative features that allow to classify commercials according to negative, neutral and positive effectiveness based on the Ace Score index. For this purpose, an experiment involving forty-seven participants was carried out. In this experiment electroencephalography (EEG), electrocardiography (ECG), Galvanic Skin Response (GSR) and respiration data were acquired while subjects were watching a 30-min audiovisual content. This content was composed by a submarine documentary and nine commercials (one of them the ad under evaluation). After the signal pre-processing, four sets of features were extracted from the physiological signals using different state-of-the-art metrics. These features computed in time and frequency domains are the inputs to several basic and advanced classifiers. An average of 89.76% of the instances was correctly classified according to the Ace Score index. The best results were obtained by a classifier consisting of a combination between AdaBoost and Random Forest with automatic selection of features. The selected features were those extracted from GSR and HRV signals. These results are promising in the audiovisual content evaluation field by means of physiological signal processing.

  16. ECLogger: Cross-Project Catch-Block Logging Prediction Using Ensemble of Classifiers

    Directory of Open Access Journals (Sweden)

    Sangeeta Lal

    2017-01-01

    Full Text Available Background: Software developers insert log statements in the source code to record program execution information. However, optimizing the number of log statements in the source code is challenging. Machine learning based within-project logging prediction tools, proposed in previous studies, may not be suitable for new or small software projects. For such software projects, we can use cross-project logging prediction. Aim: The aim of the study presented here is to investigate cross-project logging prediction methods and techniques. Method: The proposed method is ECLogger, which is a novel, ensemble-based, cross-project, catch-block logging prediction model. In the research We use 9 base classifiers were used and combined using ensemble techniques. The performance of ECLogger was evaluated on on three open-source Java projects: Tomcat, CloudStack and Hadoop. Results: ECLogger Bagging, ECLogger AverageVote, and ECLogger MajorityVote show a considerable improvement in the average Logged F-measure (LF on 3, 5, and 4 source -> target project pairs, respectively, compared to the baseline classifiers. ECLogger AverageVote performs best and shows improvements of 3.12% (average LF and 6.08% (average ACC – Accuracy. Conclusion: The classifier based on ensemble techniques, such as bagging, average vote, and majority vote outperforms the baseline classifier. Overall, the ECLogger AverageVote model performs best. The results show that the CloudStack project is more generalizable than the other projects.

  17. Non-Mutually Exclusive Deep Neural Network Classifier for Combined Modes of Bearing Fault Diagnosis

    Directory of Open Access Journals (Sweden)

    Bach Phi Duong

    2018-04-01

    Full Text Available The simultaneous occurrence of various types of defects in bearings makes their diagnosis more challenging owing to the resultant complexity of the constituent parts of the acoustic emission (AE signals. To address this issue, a new approach is proposed in this paper for the detection of multiple combined faults in bearings. The proposed methodology uses a deep neural network (DNN architecture to effectively diagnose the combined defects. The DNN structure is based on the stacked denoising autoencoder non-mutually exclusive classifier (NMEC method for combined modes. The NMEC-DNN is trained using data for a single fault and it classifies both single faults and multiple combined faults. The results of experiments conducted on AE data collected through an experimental test-bed demonstrate that the DNN achieves good classification performance with a maximum accuracy of 95%. The proposed method is compared with a multi-class classifier based on support vector machines (SVMs. The NMEC-DNN yields better diagnostic performance in comparison to the multi-class classifier based on SVM. The NMEC-DNN reduces the number of necessary data collections and improves the bearing fault diagnosis performance.

  18. The Capacity Profile: a method to classify additional care needs in children with neurodevelopmental disabilities

    NARCIS (Netherlands)

    Meester-Delver, Anke; Beelen, Anita; Hennekam, Raoul; Nollet, Frans; Hadders-Algra, Mijna

    2007-01-01

    The aim of this study was to determine the interrater reliability and stability over time of the Capacity Profile (CAP). The CAP is a standardized method for classifying additional care needs indicated by current impairments in five domains of body functions: physical health, neuromusculoskeletal

  19. Fuzzy prototype classifier based on items and its application in recommender system

    Directory of Open Access Journals (Sweden)

    Mei Cai

    2017-01-01

    Full Text Available Currently, recommender systems (RS are incorporating implicit information from social circle of the Internet. The implicit social information in human mind is not easy to reflect in appropriate decision making techniques. This paper consists of 2 contributions. First, we develop an item-based prototype classifier (IPC in which a prototype represents a social circlers preferences as a pattern classification technique. We assume the social circle which distinguishes with others by the items their members like. The prototype structure of the classifier is defined by two2-dimensional matrices. We use information gain and OWA aggregator to construct a feature space. The item-based classifier assigns a new item to some prototypes with different prototypicalities. We reform a typical data setmIris data set in UCI Machine Learning Repository to verify our fuzzy prototype classifier. The second proposition of this paper is to give the application of IPC in recommender system to solve new item cold-start problems. We modify the dataset of MovieLens to perform experimental demonstrations of the proposed ideas.

  20. Genetic evidence and integration of various data sources for classifying uncertain variants into a single model.

    NARCIS (Netherlands)

    Goldgar, D.E.; Easton, D.F.; Byrnes, G.B.; Spurdle, A.B.; Iversen, E.S.; Greenblatt, M.S.; Boffetta, P.; Couch, F.J.; Wind, N. de; Eccles, D.; Foulkes, W.D.; Genuardi, M.; Hofstra, R.M.; Hogervorst, F.; Hoogerbrugge-van der Linden, N.; Plon, S.E.; Radice, P.; Rasmussen, L.; Sinilnikova, O.M.; Tavtigian, S.V.

    2008-01-01

    Genetic testing often results in the finding of a variant whose clinical significance is unknown. A number of different approaches have been employed in the attempt to classify such variants. For some variants, case-control, segregation, family history, or other statistical studies can provide