WorldWideScience

Sample records for supervised classification learning

  1. Genetic classification of populations using supervised learning.

    LENUS (Irish Health Repository)

    Bridges, Michael

    2011-01-01

    There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available.In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.

  2. Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification

    Directory of Open Access Journals (Sweden)

    R. Sathya

    2013-02-01

    Full Text Available This paper presents a comparative account of unsupervised and supervised learning models and their pattern classification evaluations as applied to the higher education scenario. Classification plays a vital role in machine based learning algorithms and in the present study, we found that, though the error back-propagation learning algorithm as provided by supervised learning model is very efficient for a number of non-linear real-time problems, KSOM of unsupervised learning model, offers efficient solution and classification in the present study.

  3. QUEST: Eliminating Online Supervised Learning for Efficient Classification Algorithms

    Directory of Open Access Journals (Sweden)

    Ardjan Zwartjes

    2016-10-01

    Full Text Available In this work, we introduce QUEST (QUantile Estimation after Supervised Training, an adaptive classification algorithm for Wireless Sensor Networks (WSNs that eliminates the necessity for online supervised learning. Online processing is important for many sensor network applications. Transmitting raw sensor data puts high demands on the battery, reducing network life time. By merely transmitting partial results or classifications based on the sampled data, the amount of traffic on the network can be significantly reduced. Such classifications can be made by learning based algorithms using sampled data. An important issue, however, is the training phase of these learning based algorithms. Training a deployed sensor network requires a lot of communication and an impractical amount of human involvement. QUEST is a hybrid algorithm that combines supervised learning in a controlled environment with unsupervised learning on the location of deployment. Using the SITEX02 dataset, we demonstrate that the presented solution works with a performance penalty of less than 10% in 90% of the tests. Under some circumstances, it even outperforms a network of classifiers completely trained with supervised learning. As a result, the need for on-site supervised learning and communication for training is completely eliminated by our solution.

  4. Document Classification Using Expectation Maximization with Semi Supervised Learning

    CERN Document Server

    Nigam, Bhawna; Salve, Sonal; Vamney, Swati

    2011-01-01

    As the amount of online document increases, the demand for document classification to aid the analysis and management of document is increasing. Text is cheap, but information, in the form of knowing what classes a document belongs to, is expensive. The main purpose of this paper is to explain the expectation maximization technique of data mining to classify the document and to learn how to improve the accuracy while using semi-supervised approach. Expectation maximization algorithm is applied with both supervised and semi-supervised approach. It is found that semi-supervised approach is more accurate and effective. The main advantage of semi supervised approach is "Dynamically Generation of New Class". The algorithm first trains a classifier using the labeled document and probabilistically classifies the unlabeled documents. The car dataset for the evaluation purpose is collected from UCI repository dataset in which some changes have been done from our side.

  5. Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data

    OpenAIRE

    Kurth, Thorsten; Zhang, Jian; Satish, Nadathur; Mitliagkas, Ioannis; Racah, Evan; Patwary, Mostofa Ali; Malas, Tareq; Sundaram, Narayanan; Bhimji, Wahid; Smorkalov, Mikhail; Deslippe, Jack; Shiryaev, Mikhail; Sridharan, Srinivas; Prabhat; Dubey, Pradeep

    2017-01-01

    This paper presents the first, 15-PetaFLOP Deep Learning system for solving scientific pattern classification problems on contemporary HPC architectures. We develop supervised convolutional architectures for discriminating signals in high-energy physics data as well as semi-supervised architectures for localizing and classifying extreme weather in climate data. Our Intelcaffe-based implementation obtains $\\sim$2TFLOP/s on a single Cori Phase-II Xeon-Phi node. We use a hybrid strategy employin...

  6. Semi-supervised Learning for Photometric Supernova Classification

    CERN Document Server

    Richards, Joseph W; Freeman, Peter E; Schafer, Chad M; Poznanski, Dovi

    2011-01-01

    We present a semi-supervised method for photometric supernova typing. Our approach is to first use the nonlinear dimension reduction technique diffusion map to detect structure in a database of supernova light curves and subsequently employ random forest classification on a spectroscopically confirmed training set to learn a model that can predict the type of each newly observed supernova. We demonstrate that this is an effective method for supernova typing. As supernova numbers increase, our semi-supervised method efficiently utilizes this information to improve classification, a property not enjoyed by template based methods. Applied to supernova data simulated by Kessler et al. (2010b) to mimic those of the Dark Energy Survey, our methods achieve (cross-validated) 96% Type Ia purity and 86% Type Ia efficiency on the spectroscopic sample, but only 56% Type Ia purity and 48% efficiency on the photometric sample due to their spectroscopic followup strategy. To improve the performance on the photometric sample...

  7. Phenotype classification of zebrafish embryos by supervised learning.

    Directory of Open Access Journals (Sweden)

    Nathalie Jeanray

    Full Text Available Zebrafish is increasingly used to assess biological properties of chemical substances and thus is becoming a specific tool for toxicological and pharmacological studies. The effects of chemical substances on embryo survival and development are generally evaluated manually through microscopic observation by an expert and documented by several typical photographs. Here, we present a methodology to automatically classify brightfield images of wildtype zebrafish embryos according to their defects by using an image analysis approach based on supervised machine learning. We show that, compared to manual classification, automatic classification results in 90 to 100% agreement with consensus voting of biological experts in nine out of eleven considered defects in 3 days old zebrafish larvae. Automation of the analysis and classification of zebrafish embryo pictures reduces the workload and time required for the biological expert and increases the reproducibility and objectivity of this classification.

  8. Phenotype classification of zebrafish embryos by supervised learning.

    Science.gov (United States)

    Jeanray, Nathalie; Marée, Raphaël; Pruvot, Benoist; Stern, Olivier; Geurts, Pierre; Wehenkel, Louis; Muller, Marc

    2015-01-01

    Zebrafish is increasingly used to assess biological properties of chemical substances and thus is becoming a specific tool for toxicological and pharmacological studies. The effects of chemical substances on embryo survival and development are generally evaluated manually through microscopic observation by an expert and documented by several typical photographs. Here, we present a methodology to automatically classify brightfield images of wildtype zebrafish embryos according to their defects by using an image analysis approach based on supervised machine learning. We show that, compared to manual classification, automatic classification results in 90 to 100% agreement with consensus voting of biological experts in nine out of eleven considered defects in 3 days old zebrafish larvae. Automation of the analysis and classification of zebrafish embryo pictures reduces the workload and time required for the biological expert and increases the reproducibility and objectivity of this classification.

  9. Semi-Supervised Learning for Classification of Protein Sequence Data

    Directory of Open Access Journals (Sweden)

    Brian R. King

    2008-01-01

    Full Text Available Protein sequence data continue to become available at an exponential rate. Annotation of functional and structural attributes of these data lags far behind, with only a small fraction of the data understood and labeled by experimental methods. Classification methods that are based on semi-supervised learning can increase the overall accuracy of classifying partly labeled data in many domains, but very few methods exist that have shown their effect on protein sequence classification. We show how proven methods from text classification can be applied to protein sequence data, as we consider both existing and novel extensions to the basic methods, and demonstrate restrictions and differences that must be considered. We demonstrate comparative results against the transductive support vector machine, and show superior results on the most difficult classification problems. Our results show that large repositories of unlabeled protein sequence data can indeed be used to improve predictive performance, particularly in situations where there are fewer labeled protein sequences available, and/or the data are highly unbalanced in nature.

  10. SPAM CLASSIFICATION BASED ON SUPERVISED LEARNING USING MACHINE LEARNING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    T. Hamsapriya

    2011-12-01

    Full Text Available E-mail is one of the most popular and frequently used ways of communication due to its worldwide accessibility, relatively fast message transfer, and low sending cost. The flaws in the e-mail protocols and the increasing amount of electronic business and financial transactions directly contribute to the increase in e-mail-based threats. Email spam is one of the major problems of the today’s Internet, bringing financial damage to companies and annoying individual users. Spam emails are invading users without their consent and filling their mail boxes. They consume more network capacity as well as time in checking and deleting spam mails. The vast majority of Internet users are outspoken in their disdain for spam, although enough of them respond to commercial offers that spam remains a viable source of income to spammers. While most of the users want to do right think to avoid and get rid of spam, they need clear and simple guidelines on how to behave. In spite of all the measures taken to eliminate spam, they are not yet eradicated. Also when the counter measures are over sensitive, even legitimate emails will be eliminated. Among the approaches developed to stop spam, filtering is the one of the most important technique. Many researches in spam filtering have been centered on the more sophisticated classifier-related issues. In recent days, Machine learning for spam classification is an important research issue. The effectiveness of the proposed work is explores and identifies the use of different learning algorithms for classifying spam messages from e-mail. A comparative analysis among the algorithms has also been presented.

  11. Out-of-Sample Generalizations for Supervised Manifold Learning for Classification

    Science.gov (United States)

    Vural, Elif; Guillemot, Christine

    2016-03-01

    Supervised manifold learning methods for data classification map data samples residing in a high-dimensional ambient space to a lower-dimensional domain in a structure-preserving way, while enhancing the separation between different classes in the learned embedding. Most nonlinear supervised manifold learning methods compute the embedding of the manifolds only at the initially available training points, while the generalization of the embedding to novel points, known as the out-of-sample extension problem in manifold learning, becomes especially important in classification applications. In this work, we propose a semi-supervised method for building an interpolation function that provides an out-of-sample extension for general supervised manifold learning algorithms studied in the context of classification. The proposed algorithm computes a radial basis function (RBF) interpolator that minimizes an objective function consisting of the total embedding error of unlabeled test samples, defined as their distance to the embeddings of the manifolds of their own class, as well as a regularization term that controls the smoothness of the interpolation function in a direction-dependent way. The class labels of test data and the interpolation function parameters are estimated jointly with a progressive procedure. Experimental results on face and object images demonstrate the potential of the proposed out-of-sample extension algorithm for the classification of manifold-modeled data sets.

  12. Towards designing an email classification system using multi-view based semi-supervised learning

    NARCIS (Netherlands)

    Li, Wenjuan; Meng, Weizhi; Tan, Zhiyuan; Xiang, Yang

    2014-01-01

    The goal of email classification is to classify user emails into spam and legitimate ones. Many supervised learning algorithms have been invented in this domain to accomplish the task, and these algorithms require a large number of labeled training data. However, data labeling is a labor intensive t

  13. Semi-supervised Learning for Classification of Polarimetric SAR Images Based on SVM-Wishart

    Directory of Open Access Journals (Sweden)

    Hua Wen-qiang

    2015-02-01

    Full Text Available In this study, we propose a new semi-supervised classification method for Polarimetric SAR (PolSAR images, aiming at handling the issue that the number of train set is small. First, considering the scattering characters of PolSAR data, this method extracts multiple scattering features using target decomposition approach. Then, a semi-supervised learning model is established based on a co-training framework and Support Vector Machine (SVM. Both labeled and unlabeled data are utilized in this model to obtain high classification accuracy. Third, a recovery scheme based on the Wishart classifier is proposed to improve the classification performance. From the experiments conducted in this study, it is evident that the proposed method performs more effectively compared with other traditional methods when the number of train set is small.

  14. Gene classification using parameter-free semi-supervised manifold learning.

    Science.gov (United States)

    Huang, Hong; Feng, Hailiang

    2012-01-01

    A new manifold learning method, called parameter-free semi-supervised local Fisher discriminant analysis (pSELF), is proposed to map the gene expression data into a low-dimensional space for tumor classification. Motivated by the fact that semi-supervised and parameter-free are two desirable and promising characteristics for dimension reduction, a new difference-based optimization objective function with unlabeled samples has been designed. The proposed method preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other. The semi-supervised method has an analytic form of the globally optimal solution, which can be computed efficiently by eigen decomposition. Experimental results on synthetic data and SRBCT, DLBCL, and Brain Tumor gene expression data sets demonstrate the effectiveness of the proposed method.

  15. Musical Instrument Classification Based on Nonlinear Recurrence Analysis and Supervised Learning

    Directory of Open Access Journals (Sweden)

    R.Rui

    2013-04-01

    Full Text Available In this paper, the phase space reconstruction of time series produced by different instruments is discussed based on the nonlinear dynamic theory. The dense ratio, a novel quantitative recurrence parameter, is proposed to describe the difference of wind instruments, stringed instruments and keyboard instruments in the phase space by analyzing the recursive property of every instrument. Furthermore, a novel supervised learning algorithm for automatic classification of individual musical instrument signals is addressed deriving from the idea of supervised non-negative matrix factorization (NMF algorithm. In our approach, the orthogonal basis matrix could be obtained without updating the matrix iteratively, which NMF is unable to do. The experimental results indicate that the accuracy of the proposed method is improved by 3% comparing with the conventional features in the individual instrument classification.

  16. Multi-Modal Curriculum Learning for Semi-Supervised Image Classification.

    Science.gov (United States)

    Gong, Chen; Tao, Dacheng; Maybank, Stephen J; Liu, Wei; Kang, Guoliang; Yang, Jie

    2016-07-01

    Semi-supervised image classification aims to classify a large quantity of unlabeled images by typically harnessing scarce labeled images. Existing semi-supervised methods often suffer from inadequate classification accuracy when encountering difficult yet critical images, such as outliers, because they treat all unlabeled images equally and conduct classifications in an imperfectly ordered sequence. In this paper, we employ the curriculum learning methodology by investigating the difficulty of classifying every unlabeled image. The reliability and the discriminability of these unlabeled images are particularly investigated for evaluating their difficulty. As a result, an optimized image sequence is generated during the iterative propagations, and the unlabeled images are logically classified from simple to difficult. Furthermore, since images are usually characterized by multiple visual feature descriptors, we associate each kind of features with a teacher, and design a multi-modal curriculum learning (MMCL) strategy to integrate the information from different feature modalities. In each propagation, each teacher analyzes the difficulties of the currently unlabeled images from its own modality viewpoint. A consensus is subsequently reached among all the teachers, determining the currently simplest images (i.e., a curriculum), which are to be reliably classified by the multi-modal learner. This well-organized propagation process leveraging multiple teachers and one learner enables our MMCL to outperform five state-of-the-art methods on eight popular image data sets.

  17. A new semi-supervised classification strategy combining active learning and spectral unmixing of hyperspectral data

    Science.gov (United States)

    Sun, Yanli; Zhang, Xia; Plaza, Antonio; Li, Jun; Dópido, Inmaculada; Liu, Yi

    2016-10-01

    Hyperspectral remote sensing allows for the detailed analysis of the surface of the Earth by providing high-dimensional images with hundreds of spectral bands. Hyperspectral image classification plays a significant role in hyperspectral image analysis and has been a very active research area in the last few years. In the context of hyperspectral image classification, supervised techniques (which have achieved wide acceptance) must address a difficult task due to the unbalance between the high dimensionality of the data and the limited availability of labeled training samples in real analysis scenarios. While the collection of labeled samples is generally difficult, expensive, and time-consuming, unlabeled samples can be generated in a much easier way. Semi-supervised learning offers an effective solution that can take advantage of both unlabeled and a small amount of labeled samples. Spectral unmixing is another widely used technique in hyperspectral image analysis, developed to retrieve pure spectral components and determine their abundance fractions in mixed pixels. In this work, we propose a method to perform semi-supervised hyperspectral image classification by combining the information retrieved with spectral unmixing and classification. Two kinds of samples that are highly mixed in nature are automatically selected, aiming at finding the most informative unlabeled samples. One kind is given by the samples minimizing the distance between the first two most probable classes by calculating the difference between the two highest abundances. Another kind is given by the samples minimizing the distance between the most probable class and the least probable class, obtained by calculating the difference between the highest and lowest abundances. The effectiveness of the proposed method is evaluated using a real hyperspectral data set collected by the airborne visible infrared imaging spectrometer (AVIRIS) over the Indian Pines region in Northwestern Indiana. In the

  18. Classification of Autism Spectrum Disorder Using Supervised Learning of Brain Connectivity Measures Extracted from Synchrostates

    CERN Document Server

    Jamal, Wasifa; Oprescu, Ioana-Anastasia; Maharatna, Koushik; Apicella, Fabio; Sicca, Federico

    2014-01-01

    Objective. The paper investigates the presence of autism using the functional brain connectivity measures derived from electro-encephalogram (EEG) of children during face perception tasks. Approach. Phase synchronized patterns from 128-channel EEG signals are obtained for typical children and children with autism spectrum disorder (ASD). The phase synchronized states or synchrostates temporally switch amongst themselves as an underlying process for the completion of a particular cognitive task. We used 12 subjects in each group (ASD and typical) for analyzing their EEG while processing fearful, happy and neutral faces. The minimal and maximally occurring synchrostates for each subject are chosen for extraction of brain connectivity features, which are used for classification between these two groups of subjects. Among different supervised learning techniques, we here explored the discriminant analysis and support vector machine both with polynomial kernels for the classification task. Main results. The leave ...

  19. Classification of autism spectrum disorder using supervised learning of brain connectivity measures extracted from synchrostates

    Science.gov (United States)

    Jamal, Wasifa; Das, Saptarshi; Oprescu, Ioana-Anastasia; Maharatna, Koushik; Apicella, Fabio; Sicca, Federico

    2014-08-01

    Objective. The paper investigates the presence of autism using the functional brain connectivity measures derived from electro-encephalogram (EEG) of children during face perception tasks. Approach. Phase synchronized patterns from 128-channel EEG signals are obtained for typical children and children with autism spectrum disorder (ASD). The phase synchronized states or synchrostates temporally switch amongst themselves as an underlying process for the completion of a particular cognitive task. We used 12 subjects in each group (ASD and typical) for analyzing their EEG while processing fearful, happy and neutral faces. The minimal and maximally occurring synchrostates for each subject are chosen for extraction of brain connectivity features, which are used for classification between these two groups of subjects. Among different supervised learning techniques, we here explored the discriminant analysis and support vector machine both with polynomial kernels for the classification task. Main results. The leave one out cross-validation of the classification algorithm gives 94.7% accuracy as the best performance with corresponding sensitivity and specificity values as 85.7% and 100% respectively. Significance. The proposed method gives high classification accuracies and outperforms other contemporary research results. The effectiveness of the proposed method for classification of autistic and typical children suggests the possibility of using it on a larger population to validate it for clinical practice.

  20. Classification and Diagnostic Output Prediction of Cancer Using Gene Expression Profiling and Supervised Machine Learning Algorithms

    DEFF Research Database (Denmark)

    Yoo, C.; Gernaey, Krist

    2008-01-01

    In this paper, a new supervised clustering and classification method is proposed. First, the application of discriminant partial least squares (DPLS) for the selection of a minimum number of key genes is applied on a gene expression microarray data set. Second, supervised hierarchical clustering ...

  1. Supervised learning classification models for prediction of plant virus encoded RNA silencing suppressors.

    Directory of Open Access Journals (Sweden)

    Zeenia Jagga

    Full Text Available Viral encoded RNA silencing suppressor proteins interfere with the host RNA silencing machinery, facilitating viral infection by evading host immunity. In plant hosts, the viral proteins have several basic science implications and biotechnology applications. However in silico identification of these proteins is limited by their high sequence diversity. In this study we developed supervised learning based classification models for plant viral RNA silencing suppressor proteins in plant viruses. We developed four classifiers based on supervised learning algorithms: J48, Random Forest, LibSVM and Naïve Bayes algorithms, with enriched model learning by correlation based feature selection. Structural and physicochemical features calculated for experimentally verified primary protein sequences were used to train the classifiers. The training features include amino acid composition; auto correlation coefficients; composition, transition, and distribution of various physicochemical properties; and pseudo amino acid composition. Performance analysis of predictive models based on 10 fold cross-validation and independent data testing revealed that the Random Forest based model was the best and achieved 86.11% overall accuracy and 86.22% balanced accuracy with a remarkably high area under the Receivers Operating Characteristic curve of 0.95 to predict viral RNA silencing suppressor proteins. The prediction models for plant viral RNA silencing suppressors can potentially aid identification of novel viral RNA silencing suppressors, which will provide valuable insights into the mechanism of RNA silencing and could be further explored as potential targets for designing novel antiviral therapeutics. Also, the key subset of identified optimal features may help in determining compositional patterns in the viral proteins which are important determinants for RNA silencing suppressor activities. The best prediction model developed in the study is available as a

  2. Material classification and automatic content enrichment of images using supervised learning and knowledge bases

    Science.gov (United States)

    Mallepudi, Sri Abhishikth; Calix, Ricardo A.; Knapp, Gerald M.

    2011-02-01

    In recent years there has been a rapid increase in the size of video and image databases. Effective searching and retrieving of images from these databases is a significant current research area. In particular, there is a growing interest in query capabilities based on semantic image features such as objects, locations, and materials, known as content-based image retrieval. This study investigated mechanisms for identifying materials present in an image. These capabilities provide additional information impacting conditional probabilities about images (e.g. objects made of steel are more likely to be buildings). These capabilities are useful in Building Information Modeling (BIM) and in automatic enrichment of images. I2T methodologies are a way to enrich an image by generating text descriptions based on image analysis. In this work, a learning model is trained to detect certain materials in images. To train the model, an image dataset was constructed containing single material images of bricks, cloth, grass, sand, stones, and wood. For generalization purposes, an additional set of 50 images containing multiple materials (some not used in training) was constructed. Two different supervised learning classification models were investigated: a single multi-class SVM classifier, and multiple binary SVM classifiers (one per material). Image features included Gabor filter parameters for texture, and color histogram data for RGB components. All classification accuracy scores using the SVM-based method were above 85%. The second model helped in gathering more information from the images since it assigned multiple classes to the images. A framework for the I2T methodology is presented.

  3. Supervised Ensemble Classification of Kepler Variable Stars

    CERN Document Server

    Bass, Gideon

    2016-01-01

    Variable star analysis and classification is an important task in the understanding of stellar features and processes. While historically classifications have been done manually by highly skilled experts, the recent and rapid expansion in the quantity and quality of data has demanded new techniques, most notably automatic classification through supervised machine learning. We present an expansion of existing work on the field by analyzing variable stars in the {\\em Kepler} field using an ensemble approach, combining multiple characterization and classification techniques to produce improved classification rates. Classifications for each of the roughly 150,000 stars observed by {\\em Kepler} are produced separating the stars into one of 14 variable star classes.

  4. Photometric classification of type Ia supernovae in the SuperNova Legacy Survey with supervised learning

    CERN Document Server

    Möller, A; Leloup, C; Neveu, J; Palanque-Delabrouille, N; Rich, J; Carlberg, R; Lidman, C; Pritchet, C

    2016-01-01

    In the era of large astronomical surveys, photometric classification of supernovae (SNe) has become an important research field due to limited spectroscopic resources for candidate follow-up and classification. In this work, we present a method to photometrically classify type Ia supernovae based on machine learning with redshifts that are derived from the SN light-curves. This method is implemented on real data from the SNLS deferred pipeline, a purely photometric pipeline that identifies SNe Ia at high-redshifts ($0.2learning classification. We study the performance of different algorithms such as Random Forest and Boosted Decision Trees. We evaluate the performance using SN simulations and real data from the first 3 years of the Supernova Legacy Survey (SNLS), which contains large spectroscopically and photometrically classified type Ia sa...

  5. Photometric classification of type Ia supernovae in the SuperNova Legacy Survey with supervised learning

    Science.gov (United States)

    Möller, A.; Ruhlmann-Kleider, V.; Leloup, C.; Neveu, J.; Palanque-Delabrouille, N.; Rich, J.; Carlberg, R.; Lidman, C.; Pritchet, C.

    2016-12-01

    In the era of large astronomical surveys, photometric classification of supernovae (SNe) has become an important research field due to limited spectroscopic resources for candidate follow-up and classification. In this work, we present a method to photometrically classify type Ia supernovae based on machine learning with redshifts that are derived from the SN light-curves. This method is implemented on real data from the SNLS deferred pipeline, a purely photometric pipeline that identifies SNe Ia at high-redshifts (0.2 Random Forest and Boosted Decision Trees. We evaluate the performance using SN simulations and real data from the first 3 years of the Supernova Legacy Survey (SNLS), which contains large spectroscopically and photometrically classified type Ia samples. Using the Area Under the Curve (AUC) metric, where perfect classification is given by 1, we find that our best-performing classifier (Extreme Gradient Boosting Decision Tree) has an AUC of 0.98.We show that it is possible to obtain a large photometrically selected type Ia SN sample with an estimated contamination of less than 5%. When applied to data from the first three years of SNLS, we obtain 529 events. We investigate the differences between classifying simulated SNe, and real SN survey data. In particular, we find that applying a thorough set of selection cuts to the SN sample is essential for good classification. This work demonstrates for the first time the feasibility of machine learning classification in a high-z SN survey with application to real SN data.

  6. Manifold regularized multitask learning for semi-supervised multilabel image classification.

    Science.gov (United States)

    Luo, Yong; Tao, Dacheng; Geng, Bo; Xu, Chao; Maybank, Stephen J

    2013-02-01

    It is a significant challenge to classify images with multiple labels by using only a small number of labeled samples. One option is to learn a binary classifier for each label and use manifold regularization to improve the classification performance by exploring the underlying geometric structure of the data distribution. However, such an approach does not perform well in practice when images from multiple concepts are represented by high-dimensional visual features. Thus, manifold regularization is insufficient to control the model complexity. In this paper, we propose a manifold regularized multitask learning (MRMTL) algorithm. MRMTL learns a discriminative subspace shared by multiple classification tasks by exploiting the common structure of these tasks. It effectively controls the model complexity because different tasks limit one another's search volume, and the manifold regularization ensures that the functions in the shared hypothesis space are smooth along the data manifold. We conduct extensive experiments, on the PASCAL VOC'07 dataset with 20 classes and the MIR dataset with 38 classes, by comparing MRMTL with popular image classification algorithms. The results suggest that MRMTL is effective for image classification.

  7. Classification models for clear cell renal carcinoma stage progression, based on tumor RNAseq expression trained supervised machine learning algorithms.

    Science.gov (United States)

    Jagga, Zeenia; Gupta, Dinesh

    2014-01-01

    Clear-cell Renal Cell Carcinoma (ccRCC) is the most- prevalent, chemotherapy resistant and lethal adult kidney cancer. There is a need for novel diagnostic and prognostic biomarkers for ccRCC, due to its heterogeneous molecular profiles and asymptomatic early stage. This study aims to develop classification models to distinguish early stage and late stage of ccRCC based on gene expression profiles. We employed supervised learning algorithms- J48, Random Forest, SMO and Naïve Bayes; with enriched model learning by fast correlation based feature selection to develop classification models trained on sequencing based gene expression data of RNAseq experiments, obtained from The Cancer Genome Atlas. Different models developed in the study were evaluated on the basis of 10 fold cross validations and independent dataset testing. Random Forest based prediction model performed best amongst the models developed in the study, with a sensitivity of 89%, accuracy of 77% and area under Receivers Operating Curve of 0.8. We anticipate that the prioritized subset of 62 genes and prediction models developed in this study will aid experimental oncologists to expedite understanding of the molecular mechanisms of stage progression and discovery of prognostic factors for ccRCC tumors.

  8. Supervised machine learning on a network scale: application to seismic event classification and detection

    Science.gov (United States)

    Reynen, Andrew; Audet, Pascal

    2017-09-01

    A new method using a machine learning technique is applied to event classification and detection at seismic networks. This method is applicable to a variety of network sizes and settings. The algorithm makes use of a small catalogue of known observations across the entire network. Two attributes, the polarization and frequency content, are used as input to regression. These attributes are extracted at predicted arrival times for P and S waves using only an approximate velocity model, as attributes are calculated over large time spans. This method of waveform characterization is shown to be able to distinguish between blasts and earthquakes with 99 per cent accuracy using a network of 13 stations located in Southern California. The combination of machine learning with generalized waveform features is further applied to event detection in Oklahoma, United States. The event detection algorithm makes use of a pair of unique seismic phases to locate events, with a precision directly related to the sampling rate of the generalized waveform features. Over a week of data from 30 stations in Oklahoma, United States are used to automatically detect 25 times more events than the catalogue of the local geological survey, with a false detection rate of less than 2 per cent. This method provides a highly confident way of detecting and locating events. Furthermore, a large number of seismic events can be automatically detected with low false alarm, allowing for a larger automatic event catalogue with a high degree of trust.

  9. Impact of corpus domain for sentiment classification: An evaluation study using supervised machine learning techniques

    Science.gov (United States)

    Karsi, Redouane; Zaim, Mounia; El Alami, Jamila

    2017-07-01

    Thanks to the development of the internet, a large community now has the possibility to communicate and express its opinions and preferences through multiple media such as blogs, forums, social networks and e-commerce sites. Today, it becomes clearer that opinions published on the web are a very valuable source for decision-making, so a rapidly growing field of research called “sentiment analysis” is born to address the problem of automatically determining the polarity (Positive, negative, neutral,…) of textual opinions. People expressing themselves in a particular domain often use specific domain language expressions, thus, building a classifier, which performs well in different domains is a challenging problem. The purpose of this paper is to evaluate the impact of domain for sentiment classification when using machine learning techniques. In our study three popular machine learning techniques: Support Vector Machines (SVM), Naive Bayes and K nearest neighbors(KNN) were applied on datasets collected from different domains. Experimental results show that Support Vector Machines outperforms other classifiers in all domains, since it achieved at least 74.75% accuracy with a standard deviation of 4,08.

  10. Application of supervised machine learning algorithms for the classification of regulatory RNA riboswitches.

    Science.gov (United States)

    Singh, Swadha; Singh, Raghvendra

    2016-04-03

    Riboswitches, the small structured RNA elements, were discovered about a decade ago. It has been the subject of intense interest to identify riboswitches, understand their mechanisms of action and use them in genetic engineering. The accumulation of genome and transcriptome sequence data and comparative genomics provide unprecedented opportunities to identify riboswitches in the genome. In the present study, we have evaluated the following six machine learning algorithms for their efficiency to classify riboswitches: J48, BayesNet, Naïve Bayes, Multilayer Perceptron, sequential minimal optimization, hidden Markov model (HMM). For determining effective classifier, the algorithms were compared on the statistical measures of specificity, sensitivity, accuracy, F-measure and receiver operating characteristic (ROC) plot analysis. The classifier Multilayer Perceptron achieved the best performance, with the highest specificity, sensitivity, F-score and accuracy, and with the largest area under the ROC curve, whereas HMM was the poorest performer. At present, the available tools for the prediction and classification of riboswitches are based on covariance model, support vector machine and HMM. The present study determines Multilayer Perceptron as a better classifier for the genome-wide riboswitch searches.

  11. A survey of supervised machine learning models for mobile-phone based pathogen identification and classification

    Science.gov (United States)

    Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Tseng, Derek; Benien, Parul; Ozcan, Aydogan

    2017-03-01

    Giardia lamblia causes a disease known as giardiasis, which results in diarrhea, abdominal cramps, and bloating. Although conventional pathogen detection methods used in water analysis laboratories offer high sensitivity and specificity, they are time consuming, and need experts to operate bulky equipment and analyze the samples. Here we present a field-portable and cost-effective smartphone-based waterborne pathogen detection platform that can automatically classify Giardia cysts using machine learning. Our platform enables the detection and quantification of Giardia cysts in one hour, including sample collection, labeling, filtration, and automated counting steps. We evaluated the performance of three prototypes using Giardia-spiked water samples from different sources (e.g., reagent-grade, tap, non-potable, and pond water samples). We populated a training database with >30,000 cysts and estimated our detection sensitivity and specificity using 20 different classifier models, including decision trees, nearest neighbor classifiers, support vector machines (SVMs), and ensemble classifiers, and compared their speed of training and classification, as well as predicted accuracies. Among them, cubic SVM, medium Gaussian SVM, and bagged-trees were the most promising classifier types with accuracies of 94.1%, 94.2%, and 95%, respectively; we selected the latter as our preferred classifier for the detection and enumeration of Giardia cysts that are imaged using our mobile-phone fluorescence microscope. Without the need for any experts or microbiologists, this field-portable pathogen detection platform can present a useful tool for water quality monitoring in resource-limited-settings.

  12. Projected estimators for robust semi-supervised classification

    DEFF Research Database (Denmark)

    Krijthe, Jesse H.; Loog, Marco

    2017-01-01

    For semi-supervised techniques to be applied safely in practice we at least want methods to outperform their supervised counterparts. We study this question for classification using the well-known quadratic surrogate loss function. Unlike other approaches to semi-supervised learning, the procedure...... proposed in this work does not rely on assumptions that are not intrinsic to the classifier at hand. Using a projection of the supervised estimate onto a set of constraints imposed by the unlabeled data, we find we can safely improve over the supervised solution in terms of this quadratic loss. More...... specifically, we prove that, measured on the labeled and unlabeled training data, this semi-supervised procedure never gives a lower quadratic loss than the supervised alternative. To our knowledge this is the first approach that offers such strong, albeit conservative, guarantees for improvement over...

  13. Inductive Supervised Quantum Learning

    Science.gov (United States)

    Monràs, Alex; Sentís, Gael; Wittek, Peter

    2017-05-01

    In supervised learning, an inductive learning algorithm extracts general rules from observed training instances, then the rules are applied to test instances. We show that this splitting of training and application arises naturally, in the classical setting, from a simple independence requirement with a physical interpretation of being nonsignaling. Thus, two seemingly different definitions of inductive learning happen to coincide. This follows from the properties of classical information that break down in the quantum setup. We prove a quantum de Finetti theorem for quantum channels, which shows that in the quantum case, the equivalence holds in the asymptotic setting, that is, for large numbers of test instances. This reveals a natural analogy between classical learning protocols and their quantum counterparts, justifying a similar treatment, and allowing us to inquire about standard elements in computational learning theory, such as structural risk minimization and sample complexity.

  14. EMD-Based Temporal and Spectral Features for the Classification of EEG Signals Using Supervised Learning.

    Science.gov (United States)

    Riaz, Farhan; Hassan, Ali; Rehman, Saad; Niazi, Imran Khan; Dremstrup, Kim

    2016-01-01

    This paper presents a novel method for feature extraction from electroencephalogram (EEG) signals using empirical mode decomposition (EMD). Its use is motivated by the fact that the EMD gives an effective time-frequency analysis of nonstationary signals. The intrinsic mode functions (IMF) obtained as a result of EMD give the decomposition of a signal according to its frequency components. We present the usage of upto third order temporal moments, and spectral features including spectral centroid, coefficient of variation and the spectral skew of the IMFs for feature extraction from EEG signals. These features are physiologically relevant given that the normal EEG signals have different temporal and spectral centroids, dispersions and symmetries when compared with the pathological EEG signals. The calculated features are fed into the standard support vector machine (SVM) for classification purposes. The performance of the proposed method is studied on a publicly available dataset which is designed to handle various classification problems including the identification of epilepsy patients and detection of seizures. Experiments show that good classification results are obtained using the proposed methodology for the classification of EEG signals. Our proposed method also compares favorably to other state-of-the-art feature extraction methods.

  15. Generative supervised classification using Dirichlet process priors.

    Science.gov (United States)

    Davy, Manuel; Tourneret, Jean-Yves

    2010-10-01

    Choosing the appropriate parameter prior distributions associated to a given bayesian model is a challenging problem. Conjugate priors can be selected for simplicity motivations. However, conjugate priors can be too restrictive to accurately model the available prior information. This paper studies a new generative supervised classifier which assumes that the parameter prior distributions conditioned on each class are mixtures of Dirichlet processes. The motivations for using mixtures of Dirichlet processes is their known ability to model accurately a large class of probability distributions. A Monte Carlo method allowing one to sample according to the resulting class-conditional posterior distributions is then studied. The parameters appearing in the class-conditional densities can then be estimated using these generated samples (following bayesian learning). The proposed supervised classifier is applied to the classification of altimetric waveforms backscattered from different surfaces (oceans, ices, forests, and deserts). This classification is a first step before developing tools allowing for the extraction of useful geophysical information from altimetric waveforms backscattered from nonoceanic surfaces.

  16. Automated classification of female facial beauty by image analysis and supervised learning

    Science.gov (United States)

    Gunes, Hatice; Piccardi, Massimo; Jan, Tony

    2004-01-01

    The fact that perception of facial beauty may be a universal concept has long been debated amongst psychologists and anthropologists. In this paper, we performed experiments to evaluate the extent of beauty universality by asking a number of diverse human referees to grade a same collection of female facial images. Results obtained show that the different individuals gave similar votes, thus well supporting the concept of beauty universality. We then trained an automated classifier using the human votes as the ground truth and used it to classify an independent test set of facial images. The high accuracy achieved proves that this classifier can be used as a general, automated tool for objective classification of female facial beauty. Potential applications exist in the entertainment industry and plastic surgery.

  17. Supervised Learning Approach for Spam Classification Analysis using Data Mining Tools

    Directory of Open Access Journals (Sweden)

    R.Deepa Lakshmi

    2010-12-01

    Full Text Available E-mail is one of the most popular and frequently used ways of communication due to its worldwide accessibility, relatively fast message transfer, and low sending cost. The flaws in the e-mail protocols and the increasing amount of electronic business and financial transactions directly contribute to the increase in e-mail-based threats. Email spam is one of the major problems of the today’s Internet, bringing financial damage to companies and annoying individual users. Among the approaches developed to stop spam, filtering is the one of the most important technique. Many researches in spam filtering have been centered on the more sophisticated classifierrelated issues. In recent days, Machine learning for spamclassification is an important research issue. This paper exploresand identifies the use of different learning algorithms for classifying spam messages from e-mail. A comparative analysisamong the algorithms has also been presented.

  18. Supervised Learning Approach for Spam Classification Analysis using Data Mining Tools

    Directory of Open Access Journals (Sweden)

    R.Deepa Lakshmi

    2010-11-01

    Full Text Available E-mail is one of the most popular and frequently used ways of communication due to its worldwide accessibility, relatively fast message transfer, and low sending cost. The flaws in the e-mail protocols and the increasing amount of electronic business and financial transactions directly contribute to the increase in e-mail-based threats. Email spam is one of the major problems of the today’s Internet, bringing financial damage to companies and annoying individual users. Among the approaches developed to stop spam, filtering is the one of the most important technique. Many researches in spam filtering have been centered on the more sophisticated classifierrelated issues. In recent days, Machine learning for spamclassification is an important research issue. This paper exploresand identifies the use of different learning algorithms for classifying spam messages from e-mail. A comparative analysisamong the algorithms has also been presented.

  19. Supervised Learning in Multilayer Spiking Neural Networks

    CERN Document Server

    Sporea, Ioana

    2012-01-01

    The current article introduces a supervised learning algorithm for multilayer spiking neural networks. The algorithm presented here overcomes some limitations of existing learning algorithms as it can be applied to neurons firing multiple spikes and it can in principle be applied to any linearisable neuron model. The algorithm is applied successfully to various benchmarks, such as the XOR problem and the Iris data set, as well as complex classifications problems. The simulations also show the flexibility of this supervised learning algorithm which permits different encodings of the spike timing patterns, including precise spike trains encoding.

  20. Supervised Dictionary Learning

    CERN Document Server

    Mairal, Julien; Ponce, Jean; Sapiro, Guillermo; Zisserman, Andrew

    2008-01-01

    It is now well established that sparse signal models are well suited to restoration tasks and can effectively be learned from audio, image, and video data. Recent research has been aimed at learning discriminative sparse models instead of purely reconstructive ones. This paper proposes a new step in that direction, with a novel sparse representation for signals belonging to different classes in terms of a shared dictionary and multiple class-decision functions. The linear variant of the proposed model admits a simple probabilistic interpretation, while its most general variant admits an interpretation in terms of kernels. An optimization framework for learning all the components of the proposed model is presented, along with experimental results on standard handwritten digit and texture classification tasks.

  1. 7 CFR 27.80 - Fees; classification, Micronaire, and supervision.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Fees; classification, Micronaire, and supervision. 27... Classification and Micronaire § 27.80 Fees; classification, Micronaire, and supervision. For services rendered by... classification and Micronaire determination results certified on cotton class certificates.) (e) Supervision,...

  2. Equality of Opportunity in Supervised Learning

    OpenAIRE

    Hardt, Moritz; Price, Eric; Srebro, Nathan

    2016-01-01

    We propose a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features. Assuming data about the predictor, target, and membership in the protected group are available, we show how to optimally adjust any learned predictor so as to remove discrimination according to our definition. Our framework also improves incentives by shifting the cost of poor classification from disadvantaged groups to...

  3. Incremental Image Classification Method Based on Semi-Supervised Learning%基于半监督学习的增量图像分类方法

    Institute of Scientific and Technical Information of China (English)

    梁鹏; 黎绍发; 覃姜维; 罗剑高

    2012-01-01

    In order to use large numbers of unlabeled images effectively, an image classification method is proposed based on semi-supervised learning. The proposed method bridges a large amount of unlabeled images and limited numbers of labeled images by exploiting the common topics. The classification accuracy is improved by using the must-link constraint and cannot-link constraint of labeled images. The experimental results on Caltech-101 and 7-classes image dataset demonstrate that the classification accuracy improves about 10% by the proposed method. Furthermore, due to the present semi-supervised image classification methods lacking of incremental learning ability, an incremental implementation of our method is proposed. Comparing with non-incremental learning model in literature, the incrementallearning method improves the computation efficiency of nearly 90%.%为有效使用大量未标注的图像进行分类,提出一种基于半监督学习的图像分类方法.通过共同的隐含话题桥接少量已标注的图像和大量未标注的图像,利用已标注图像的Must-link约束和Cannot-link约束提高未标注图像分类的精度.实验结果表明,该方法有效提高Caltech-101数据集和7类图像集约10%的分类精度.此外,针对目前绝大部分半监督图像分类方法不具备增量学习能力这一缺点,提出该方法的增量学习模型.实验结果表明,增量学习模型相比无增量学习模型提高近90%的计算效率.

  4. Semi-supervised SVM for individual tree crown species classification

    Science.gov (United States)

    Dalponte, Michele; Ene, Liviu Theodor; Marconcini, Mattia; Gobakken, Terje; Næsset, Erik

    2015-12-01

    In this paper a novel semi-supervised SVM classifier is presented, specifically developed for tree species classification at individual tree crown (ITC) level. In ITC tree species classification, all the pixels belonging to an ITC should have the same label. This assumption is used in the learning of the proposed semi-supervised SVM classifier (ITC-S3VM). This method exploits the information contained in the unlabeled ITC samples in order to improve the classification accuracy of a standard SVM. The ITC-S3VM method can be easily implemented using freely available software libraries. The datasets used in this study include hyperspectral imagery and laser scanning data acquired over two boreal forest areas characterized by the presence of three information classes (Pine, Spruce, and Broadleaves). The experimental results quantify the effectiveness of the proposed approach, which provides classification accuracies significantly higher (from 2% to above 27%) than those obtained by the standard supervised SVM and by a state-of-the-art semi-supervised SVM (S3VM). Particularly, by reducing the number of training samples (i.e. from 100% to 25%, and from 100% to 5% for the two datasets, respectively) the proposed method still exhibits results comparable to the ones of a supervised SVM trained with the full available training set. This property of the method makes it particularly suitable for practical forest inventory applications in which collection of in situ information can be very expensive both in terms of cost and time.

  5. Learning Dynamics in Doctoral Supervision

    DEFF Research Database (Denmark)

    Kobayashi, Sofie

    This doctoral research explores doctoral supervision within life science research in a Danish university. From one angle it investigates doctoral students’ experiences with strengthening the relationship with their supervisors through a structured meeting with the supervisor, prepared as part...... investigates learning opportunities in supervision with multiple supervisors. This was investigated through observations and recording of supervision, and subsequent analysis of transcripts. The analyses used different perspectives on learning; learning as participation, positioning theory and variation theory....... The research illuminates how learning opportunities are created in the interaction through the scientific discussions. It also shows how multiple supervisors can contribute to supervision by providing new perspectives and opinions that have a potential for creating new understandings. The combination...

  6. Supervised Classification Performance of Multispectral Images

    CERN Document Server

    Perumal, K

    2010-01-01

    Nowadays government and private agencies use remote sensing imagery for a wide range of applications from military applications to farm development. The images may be a panchromatic, multispectral, hyperspectral or even ultraspectral of terra bytes. Remote sensing image classification is one amongst the most significant application worlds for remote sensing. A few number of image classification algorithms have proved good precision in classifying remote sensing data. But, of late, due to the increasing spatiotemporal dimensions of the remote sensing data, traditional classification algorithms have exposed weaknesses necessitating further research in the field of remote sensing image classification. So an efficient classifier is needed to classify the remote sensing images to extract information. We are experimenting with both supervised and unsupervised classification. Here we compare the different classification methods and their performances. It is found that Mahalanobis classifier performed the best in our...

  7. 监督学习的发展动态%Current Directions in Supervised Learning Research

    Institute of Scientific and Technical Information of China (English)

    蒋艳凰; 周海芳; 杨学军

    2003-01-01

    Supervised learning is very important in machine learning area. It has been making great progress in manydirections. This article summarizes three of these directions ,which are the hot problems in supervised learning field.These three directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods forscaling up supervised learning algorithm, (c) extracting understandable rules from classifiers.

  8. Enhanced manifold regularization for semi-supervised classification.

    Science.gov (United States)

    Gan, Haitao; Luo, Zhizeng; Fan, Yingle; Sang, Nong

    2016-06-01

    Manifold regularization (MR) has become one of the most widely used approaches in the semi-supervised learning field. It has shown superiority by exploiting the local manifold structure of both labeled and unlabeled data. The manifold structure is modeled by constructing a Laplacian graph and then incorporated in learning through a smoothness regularization term. Hence the labels of labeled and unlabeled data vary smoothly along the geodesics on the manifold. However, MR has ignored the discriminative ability of the labeled and unlabeled data. To address the problem, we propose an enhanced MR framework for semi-supervised classification in which the local discriminative information of the labeled and unlabeled data is explicitly exploited. To make full use of labeled data, we firstly employ a semi-supervised clustering method to discover the underlying data space structure of the whole dataset. Then we construct a local discrimination graph to model the discriminative information of labeled and unlabeled data according to the discovered intrinsic structure. Therefore, the data points that may be from different clusters, though similar on the manifold, are enforced far away from each other. Finally, the discrimination graph is incorporated into the MR framework. In particular, we utilize semi-supervised fuzzy c-means and Laplacian regularized Kernel minimum squared error for semi-supervised clustering and classification, respectively. Experimental results on several benchmark datasets and face recognition demonstrate the effectiveness of our proposed method.

  9. A new supervised learning algorithm for spiking neurons.

    Science.gov (United States)

    Xu, Yan; Zeng, Xiaoqin; Zhong, Shuiming

    2013-06-01

    The purpose of supervised learning with temporal encoding for spiking neurons is to make the neurons emit a specific spike train encoded by the precise firing times of spikes. If only running time is considered, the supervised learning for a spiking neuron is equivalent to distinguishing the times of desired output spikes and the other time during the running process of the neuron through adjusting synaptic weights, which can be regarded as a classification problem. Based on this idea, this letter proposes a new supervised learning method for spiking neurons with temporal encoding; it first transforms the supervised learning into a classification problem and then solves the problem by using the perceptron learning rule. The experiment results show that the proposed method has higher learning accuracy and efficiency over the existing learning methods, so it is more powerful for solving complex and real-time problems.

  10. Benchmarking protein classification algorithms via supervised cross-validation.

    Science.gov (United States)

    Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor

    2008-04-24

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and

  11. Supervision Learning as Conceptual Threshold Crossing: When Supervision Gets "Medieval"

    Science.gov (United States)

    Carter, Susan

    2016-01-01

    This article presumes that supervision is a category of teaching, and that we all "learn" how to teach better. So it enquires into what novice supervisors need to learn. An anonymised digital questionnaire sought data from supervisors [n226] on their experiences of supervision to find out what was difficult, and supervisor interviews…

  12. Supervision Learning as Conceptual Threshold Crossing: When Supervision Gets "Medieval"

    Science.gov (United States)

    Carter, Susan

    2016-01-01

    This article presumes that supervision is a category of teaching, and that we all "learn" how to teach better. So it enquires into what novice supervisors need to learn. An anonymised digital questionnaire sought data from supervisors [n226] on their experiences of supervision to find out what was difficult, and supervisor interviews…

  13. Representation learning for cross-modality classification

    NARCIS (Netherlands)

    G. van Tulder (Gijs); M. de Bruijne (Marleen)

    2017-01-01

    textabstractDifferences in scanning parameters or modalities can complicate image analysis based on supervised classification. This paper presents two representation learning approaches, based on autoencoders, that address this problem by learning representations that are similar across domains. Bot

  14. Supervised and Unsupervised Classification for Pattern Recognition Purposes

    Directory of Open Access Journals (Sweden)

    Catalina COCIANU

    2006-01-01

    Full Text Available A cluster analysis task has to identify the grouping trends of data, to decide on the sound clusters as well as to validate somehow the resulted structure. The identification of the grouping tendency existing in a data collection assumes the selection of a framework stated in terms of a mathematical model allowing to express the similarity degree between couples of particular objects, quasi-metrics expressing the similarity between an object an a cluster and between clusters, respectively. In supervised classification, we are provided with a collection of preclassified patterns, and the problem is to label a newly encountered pattern. Typically, the given training patterns are used to learn the descriptions of classes which in turn are used to label a new pattern. The final section of the paper presents a new methodology for supervised learning based on PCA. The classes are represented in the measurement/feature space by a continuous repartitions

  15. Quintic spline smooth semi-supervised support vector classification machine

    Institute of Scientific and Technical Information of China (English)

    Xiaodan Zhang; Jinggai Ma; Aihua Li; Ang Li

    2015-01-01

    A semi-supervised vector machine is a relatively new learning method using both labeled and unlabeled data in classifi-cation. Since the objective function of the model for an unstrained semi-supervised vector machine is not smooth, many fast opti-mization algorithms cannot be applied to solve the model. In order to overcome the difficulty of dealing with non-smooth objective functions, new methods that can solve the semi-supervised vector machine with desired classification accuracy are in great demand. A quintic spline function with three-times differentiability at the ori-gin is constructed by a general three-moment method, which can be used to approximate the symmetric hinge loss function. The approximate accuracy of the quintic spline function is estimated. Moreover, a quintic spline smooth semi-support vector machine is obtained and the convergence accuracy of the smooth model to the non-smooth one is analyzed. Three experiments are performed to test the efficiency of the model. The experimental results show that the new model outperforms other smooth models, in terms of classification performance. Furthermore, the new model is not sensitive to the increasing number of the labeled samples, which means that the new model is more efficient.

  16. Supervised Dictionary Learning

    Science.gov (United States)

    2008-11-01

    recently led to state-of-the-art results for numerous low-level image processing tasks such as denoising [2], show- ing that sparse models are well... denoising via sparse and redundant representations over learned dictio- naries. IEEE Trans. IP, 54(12), 2006. [3] K. Huang and S. Aviyente. Sparse...2006. [19] M. Aharon, M. Elad, and A. M. Bruckstein. The K- SVD : An algorithm for designing of overcomplete dictionaries for sparse representations

  17. Weakly supervised histopathology cancer image segmentation and classification.

    Science.gov (United States)

    Xu, Yan; Zhu, Jun-Yan; Chang, Eric I-Chao; Lai, Maode; Tu, Zhuowen

    2014-04-01

    Labeling a histopathology image as having cancerous regions or not is a critical task in cancer diagnosis; it is also clinically important to segment the cancer tissues and cluster them into various classes. Existing supervised approaches for image classification and segmentation require detailed manual annotations for the cancer pixels, which are time-consuming to obtain. In this paper, we propose a new learning method, multiple clustered instance learning (MCIL) (along the line of weakly supervised learning) for histopathology image segmentation. The proposed MCIL method simultaneously performs image-level classification (cancer vs. non-cancer image), medical image segmentation (cancer vs. non-cancer tissue), and patch-level clustering (different classes). We embed the clustering concept into the multiple instance learning (MIL) setting and derive a principled solution to performing the above three tasks in an integrated framework. In addition, we introduce contextual constraints as a prior for MCIL, which further reduces the ambiguity in MIL. Experimental results on histopathology colon cancer images and cytology images demonstrate the great advantage of MCIL over the competing methods.

  18. Weakly supervised visual dictionary learning by harnessing image attributes.

    Science.gov (United States)

    Gao, Yue; Ji, Rongrong; Liu, Wei; Dai, Qionghai; Hua, Gang

    2014-12-01

    Bag-of-features (BoFs) representation has been extensively applied to deal with various computer vision applications. To extract discriminative and descriptive BoF, one important step is to learn a good dictionary to minimize the quantization loss between local features and codewords. While most existing visual dictionary learning approaches are engaged with unsupervised feature quantization, the latest trend has turned to supervised learning by harnessing the semantic labels of images or regions. However, such labels are typically too expensive to acquire, which restricts the scalability of supervised dictionary learning approaches. In this paper, we propose to leverage image attributes to weakly supervise the dictionary learning procedure without requiring any actual labels. As a key contribution, our approach establishes a generative hidden Markov random field (HMRF), which models the quantized codewords as the observed states and the image attributes as the hidden states, respectively. Dictionary learning is then performed by supervised grouping the observed states, where the supervised information is stemmed from the hidden states of the HMRF. In such a way, the proposed dictionary learning approach incorporates the image attributes to learn a semantic-preserving BoF representation without any genuine supervision. Experiments in large-scale image retrieval and classification tasks corroborate that our approach significantly outperforms the state-of-the-art unsupervised dictionary learning approaches.

  19. A supervised learning approach for taxonomic classification of core-photosystem-II genes and transcripts in the marine environment

    Directory of Open Access Journals (Sweden)

    Polz Martin F

    2009-05-01

    Full Text Available Abstract Background Cyanobacteria of the genera Synechococcus and Prochlorococcus play a key role in marine photosynthesis, which contributes to the global carbon cycle and to the world oxygen supply. Recently, genes encoding the photosystem II reaction center (psbA and psbD were found in cyanophage genomes. This phenomenon suggested that the horizontal transfer of these genes may be involved in increasing phage fitness. To date, a very small percentage of marine bacteria and phages has been cultured. Thus, mapping genomic data extracted directly from the environment to its taxonomic origin is necessary for a better understanding of phage-host relationships and dynamics. Results To achieve an accurate and rapid taxonomic classification, we employed a computational approach combining a multi-class Support Vector Machine (SVM with a codon usage position specific scoring matrix (cuPSSM. Our method has been applied successfully to classify core-photosystem-II gene fragments, including partial sequences coming directly from the ocean, to seven different taxonomic classes. Applying the method on a large set of DNA and RNA psbA clones from the Mediterranean Sea, we studied the distribution of cyanobacterial psbA genes and transcripts in their natural environment. Using our approach, we were able to simultaneously examine taxonomic and ecological distributions in the marine environment. Conclusion The ability to accurately classify the origin of individual genes and transcripts coming directly from the environment is of great importance in studying marine ecology. The classification method presented in this paper could be applied further to classify other genes amplified from the environment, for which training data is available.

  20. Extreme Learning Machine for land cover classification

    OpenAIRE

    Pal, Mahesh

    2008-01-01

    This paper explores the potential of extreme learning machine based supervised classification algorithm for land cover classification. In comparison to a backpropagation neural network, which requires setting of several user-defined parameters and may produce local minima, extreme learning machine require setting of one parameter and produce a unique solution. ETM+ multispectral data set (England) was used to judge the suitability of extreme learning machine for remote sensing classifications...

  1. Semi-supervised learning for ordinal Kernel Discriminant Analysis.

    Science.gov (United States)

    Pérez-Ortiz, M; Gutiérrez, P A; Carbonero-Ruz, M; Hervás-Martínez, C

    2016-12-01

    Ordinal classification considers those classification problems where the labels of the variable to predict follow a given order. Naturally, labelled data is scarce or difficult to obtain in this type of problems because, in many cases, ordinal labels are given by a user or expert (e.g. in recommendation systems). Firstly, this paper develops a new strategy for ordinal classification where both labelled and unlabelled data are used in the model construction step (a scheme which is referred to as semi-supervised learning). More specifically, the ordinal version of kernel discriminant learning is extended for this setting considering the neighbourhood information of unlabelled data, which is proposed to be computed in the feature space induced by the kernel function. Secondly, a new method for semi-supervised kernel learning is devised in the context of ordinal classification, which is combined with our developed classification strategy to optimise the kernel parameters. The experiments conducted compare 6 different approaches for semi-supervised learning in the context of ordinal classification in a battery of 30 datasets, showing (1) the good synergy of the ordinal version of discriminant analysis and the use of unlabelled data and (2) the advantage of computing distances in the feature space induced by the kernel function.

  2. Supervised Classification Methods for Seismic Phase Identification

    Science.gov (United States)

    Schneider, Jeff; Given, Jeff; Le Bras, Ronan; Fisseha, Misrak

    2010-05-01

    The Comprehensive Nuclear Test Ban Treaty Organization (CTBTO) is tasked with monitoring compliance with the CTBT. The organization is installing the International Monitoring System (IMS), a global network of seismic, hydroacoustic, infrasound, and radionuclide sensor stations. The International Data Centre (IDC) receives the data from seismic stations either in real time or on request. These data are first processed on a station per station basis. This initial step yields discrete detections which are then assembled on a network basis (with the addition of hydroacoustic and infrasound data) to produce automatic and analyst reviewed bulletins containing seismic, hydroacoustic, and infrasound detections. The initial station processing step includes the identification of seismic and acoustic phases which are given a label. Subsequent network processing relies on this preliminary labeling, and as a consequence, the accuracy and reliability of automatic and reviewed bulletins also depend on this initial step. A very large ground truth database containing massive amounts of detections with analyst-reviewed labels is available to improve on the current operational system using machine learning methods. An initial study using a limited amount of data was conducted during the ISS09 project of the CTBTO. Several classification methods were tested: decision tree with bagging; logistic regression; neural networks trained with back-propagation; Bayesian networks as generative class models; naive Bayse classification; support vector machines. The initial assessment was that the phase identification process could be improved by at least 13% over the current operational system and that the method obtaining the best results was the decision tree with bagging. We present the results of a study using a much larger learning dataset and preliminary implementation results.

  3. SLEAS: Supervised Learning using Entropy as Attribute Selection Measure

    Directory of Open Access Journals (Sweden)

    Kishor Kumar Reddy C

    2014-10-01

    Full Text Available There is embryonic importance in scaling up the broadly used decision tree learning algorithms to huge datasets. Even though abundant diverse methodologies have been proposed, a fast tree growing algorithm without substantial decrease in accuracy and substantial increase in space complexity is essential to a greater extent. This paper aims at improving the performance of the SLIQ (Supervised Learning in Quest decision tree algorithm for classification in data mining. In the present research, we adopted entropy as attribute selection measure, which overcomes the problems facing with Gini Index. Classification accuracy of the proposed supervised learning using entropy as attribute selection measure (SLEAS algorithm is compared with the existing SLIQ algorithm using twelve datasets taken from UCI Machine Learning Repository, and the results yields that the SLEAS outperforms when compared with SLIQ decision tree. Further, error rate is also computed and the results clearly show that the SLEAS algorithm is giving less error rate when compared with SLIQ decision tree.

  4. An AdaBoost algorithm for multiclass semi-supervised learning

    NARCIS (Netherlands)

    Tanha, J.; van Someren, M.; Afsarmanesh, H.; Zaki, M.J.; Siebes, A.; Yu, J.X.; Goethals, B.; Webb, G.; Wu, X.

    2012-01-01

    We present an algorithm for multiclass Semi-Supervised learning which is learning from a limited amount of labeled data and plenty of unlabeled data. Existing semi-supervised algorithms use approaches such as one-versus-all to convert the multiclass problem to several binary classification problems

  5. Multi-Instance Learning from Supervised View

    Institute of Scientific and Technical Information of China (English)

    Zhi-Hua Zhou

    2006-01-01

    In multi-instance learning, the training set comprises labeled bags that are composed of unlabeled instances,and the task is to predict the labels of unseen bags. This paper studies multi-instance learning from the view of supervised learning. First, by analyzing some representative learning algorithms, this paper shows that multi-instance learners can be derived from supervised learners by shifting their focuses from the discrimination on the instances to the discrimination on the bags. Second, considering that ensemble learning paradigms can effectively enhance supervised learners, this paper proposes to build multi-instance ensembles to solve multi-instance problems. Experiments on a real-world benchmark test show that ensemble learning paradigms can significantly enhance multi-instance learners.

  6. A New Method for Solving Supervised Data Classification Problems

    Directory of Open Access Journals (Sweden)

    Parvaneh Shabanzadeh

    2014-01-01

    Full Text Available Supervised data classification is one of the techniques used to extract nontrivial information from data. Classification is a widely used technique in various fields, including data mining, industry, medicine, science, and law. This paper considers a new algorithm for supervised data classification problems associated with the cluster analysis. The mathematical formulations for this algorithm are based on nonsmooth, nonconvex optimization. A new algorithm for solving this optimization problem is utilized. The new algorithm uses a derivative-free technique, with robustness and efficiency. To improve classification performance and efficiency in generating classification model, a new feature selection algorithm based on techniques of convex programming is suggested. Proposed methods are tested on real-world datasets. Results of numerical experiments have been presented which demonstrate the effectiveness of the proposed algorithms.

  7. Incremental Supervised Subspace Learning for Face Recognition

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Subspace learning algorithms have been well studied in face recognition. Among them, linear discriminant analysis (LDA) is one of the most widely used supervised subspace learning method. Due to the difficulty of designing an incremental solution of the eigen decomposition on the product of matrices, there is little work for computing LDA incrementally. To avoid this limitation, an incremental supervised subspace learning (ISSL) algorithm was proposed, which incrementally learns an adaptive subspace by optimizing the maximum margin criterion (MMC). With the dynamically added face images, ISSL can effectively constrain the computational cost. Feasibility of the new algorithm has been successfully tested on different face data sets.

  8. Transfer learning improves supervised image segmentation across imaging protocols

    DEFF Research Database (Denmark)

    van Opbroek, Annegreet; Ikram, M. Arfan; Vernooij, Meike W.;

    2015-01-01

    well, often require a large amount of labeled training data that is exactly representative of the target data. We therefore propose to use transfer learning for image segmentation. Transfer-learning techniques can cope with differences in distributions between training and target data, and therefore......The variation between images obtained with different scanners or different imaging protocols presents a major challenge in automatic segmentation of biomedical images. This variation especially hampers the application of otherwise successful supervised-learning techniques which, in order to perform...... may improve performance over supervised learning for segmentation across scanners and scan protocols. We present four transfer classifiers that can train a classification scheme with only a small amount of representative training data, in addition to a larger amount of other training data...

  9. Subsampled Hessian Newton Methods for Supervised Learning.

    Science.gov (United States)

    Wang, Chien-Chih; Huang, Chun-Heng; Lin, Chih-Jen

    2015-08-01

    Newton methods can be applied in many supervised learning approaches. However, for large-scale data, the use of the whole Hessian matrix can be time-consuming. Recently, subsampled Newton methods have been proposed to reduce the computational time by using only a subset of data for calculating an approximation of the Hessian matrix. Unfortunately, we find that in some situations, the running speed is worse than the standard Newton method because cheaper but less accurate search directions are used. In this work, we propose some novel techniques to improve the existing subsampled Hessian Newton method. The main idea is to solve a two-dimensional subproblem per iteration to adjust the search direction to better minimize the second-order approximation of the function value. We prove the theoretical convergence of the proposed method. Experiments on logistic regression, linear SVM, maximum entropy, and deep networks indicate that our techniques significantly reduce the running time of the subsampled Hessian Newton method. The resulting algorithm becomes a compelling alternative to the standard Newton method for large-scale data classification.

  10. Learning Dynamics in Doctoral Supervision

    DEFF Research Database (Denmark)

    Kobayashi, Sofie

    This doctoral research explores doctoral supervision within life science research in a Danish university. From one angle it investigates doctoral students’ experiences with strengthening the relationship with their supervisors through a structured meeting with the supervisor, prepared as part...... of an introduction course for new doctoral students. This study showed how the course provides an effective way build supervisee agency and strengthening supervisory relationships through clarification and alignment of expectations and sharing goals about doctoral studies. From the other angle the research...

  11. 基于半监督学习的Web页面内容分类技术研究%Study on Web page content classification technology based on semi-supervised learning

    Institute of Scientific and Technical Information of China (English)

    赵夫群

    2016-01-01

    For the key issues that how to use labeled and unlabeled data to conduct Web classification,a classifier of com-bining generative model with discriminative model is explored. The maximum likelihood estimation is adopted in the unlabeled training set to construct a semi-supervised classifier with high classification performance. The Dirichlet-polynomial mixed distri-bution is used to model the text,and then a hybrid model which is suitable for the semi-supervised learning is proposed. Since the EM algorithm for the semi-supervised learning has fast convergence rate and is easy to fall into local optimum,two intelli-gent optimization methods of simulated annealing algorithm and genetic algorithm are introduced,analyzed and processed. A new intelligent semi-supervised classification algorithm was generated by combing the two algorithms,and the feasibility of the algorithm was verified.%针对如何使用标记和未标记数据进行Web分类这一关键性问题,探索一种生成模型和判别模型相互结合的分类器,在无标记训练集中采用最大似然估计,构造一种具有良好分类性能的半监督分类器.利用狄利克雷-多项式混合分布对文本进行建模,提出了适用于半监督学习的混合模型.针对半监督学习的EM算法收敛速度过快,容易陷入局部最优的难题,引入两种智能优化的方法——模拟退火算法和遗传算法进行分析和处理,结合这两种算法形成一种新型智能的半监督分类算法,并且验证了该算法的可行性.

  12. Semi-Supervised Learning Based on Manifold in BCI

    Institute of Scientific and Technical Information of China (English)

    Ji-Ying Zhong; Xu Lei; De-Zhong Yao

    2009-01-01

    A Laplacian support vector machine (LapSVM) algorithm,a semi-supervised learning based on manifold,is introduced to brain-computer interface (BCI) to raise the classification precision and reduce the subjects' training complexity.The data are collected from three subjects in a three-task mental imagery experiment.LapSVM and transductive SVM (TSVM) are trained with a few labeled samples and a large number of unlabeled samples.The results confirm that LapSVM has a much better classification than TSVM.

  13. Action learning in undergraduate engineering thesis supervision

    Directory of Open Access Journals (Sweden)

    Brad Stappenbelt

    2017-03-01

    Full Text Available In the present action learning implementation, twelve action learning sets were conducted over eight years. The action learning sets consisted of students involved in undergraduate engineering research thesis work. The concurrent study accompanying this initiative, investigated the influence of the action learning environment on student approaches to learning and any accompanying academic, learning and personal benefits realised. The influence of preferred learning styles on set function and student adoption of the action learning process were also examined. The action learning environment implemented had a measurable significant positive effect on student academic performance, their ability to cope with the stresses associated with conducting a research thesis, the depth of learning, the development of autonomous learners and student perception of the research thesis experience. The present study acts as an addendum to a smaller scale implementation of this action learning approach, applied to supervision of third and fourth year research projects and theses, published in 2010.

  14. Balancing Design Project Supervision and Learning Facilitation

    DEFF Research Database (Denmark)

    Nielsen, Louise Møller

    2012-01-01

    set of demands to the design lecturer. On one hand she is the facilitator of the learning process, where the students are in charge of their own projects, and where learning happens through the students’ own experiences, successes and mistakes and on the other hand she is a supervisor, who uses her...... experiences and expertise to guide the students’ decisions in relation to the design project. This paper focuses on project supervision in the context of design education – and more specifically on how this supervision is unfolded in a Problem Based Learning culture. The paper explores the supervisor......In design there is a long tradition for apprenticeship, as well as tradition for learning through design projects. Today many design educations are positioned within the University context, and have to be aligned with the learning culture and structure, which they represent. This raises a specific...

  15. Balancing Design Project Supervision and Learning Facilitation

    DEFF Research Database (Denmark)

    Nielsen, Louise Møller

    2012-01-01

    experiences and expertise to guide the students’ decisions in relation to the design project. This paper focuses on project supervision in the context of design education – and more specifically on how this supervision is unfolded in a Problem Based Learning culture. The paper explores the supervisor......’s balance between the roles: 1) Design Project Supervisor – and 2) Learning Facilitator – with the aim to understand when to apply the different roles, and what to be aware of when doing so. This paper represents the first pilot-study of a larger research effort. It is based on a Lego Serious Play workshop......In design there is a long tradition for apprenticeship, as well as tradition for learning through design projects. Today many design educations are positioned within the University context, and have to be aligned with the learning culture and structure, which they represent. This raises a specific...

  16. Semi Supervised Weighted K-Means Clustering for Multi Class Data Classification

    Directory of Open Access Journals (Sweden)

    Vijaya Geeta Dharmavaram

    2013-01-01

    Full Text Available Supervised Learning techniques require large number of labeled examples to train a classifier model. Research on Semi Supervised Learning is motivated by the availability of unlabeled examples in abundance even in domains with limited number of labeled examples. In such domains semi supervised classifier uses the results of clustering for classifier development since clustering does not rely only on labeled examples as it groups the objects based on their similarities. In this paper, the authors propose a new algorithm for semi supervised classification namely Semi Supervised Weighted K-Means (SSWKM. In this algorithm, the authors suggest the usage of weighted Euclidean distance metric designed as per the purpose of clustering for estimating the proximity between a pair of points and used it for building semi supervised classifier. The authors propose a new approach for estimating the weights of features by appropriately adopting the results of multiple discriminant analysis. The proposed method was then tested on benchmark datasets from UCI repository with varied percentage of labeled examples and found to be consistent and promising.

  17. Active semi-supervised learning method with hybrid deep belief networks.

    Science.gov (United States)

    Zhou, Shusen; Chen, Qingcai; Wang, Xiaolong

    2014-01-01

    In this paper, we develop a novel semi-supervised learning algorithm called active hybrid deep belief networks (AHD), to address the semi-supervised sentiment classification problem with deep learning. First, we construct the previous several hidden layers using restricted Boltzmann machines (RBM), which can reduce the dimension and abstract the information of the reviews quickly. Second, we construct the following hidden layers using convolutional restricted Boltzmann machines (CRBM), which can abstract the information of reviews effectively. Third, the constructed deep architecture is fine-tuned by gradient-descent based supervised learning with an exponential loss function. Finally, active learning method is combined based on the proposed deep architecture. We did several experiments on five sentiment classification datasets, and show that AHD is competitive with previous semi-supervised learning algorithm. Experiments are also conducted to verify the effectiveness of our proposed method with different number of labeled reviews and unlabeled reviews respectively.

  18. Research of Plant-Leaves Classification Algorithm Based on Supervised LLE

    Directory of Open Access Journals (Sweden)

    Yan Qing

    2013-06-01

    Full Text Available A new supervised LLE method based on the fisher projection was proposed in this paper, and combined it with a new classification algorithm based on manifold learning to realize the recognition of the plant leaves. Firstly,the method utilizes the Fisher projection distance to replace the sample's geodesic distance, and a new supervised LLE algorithm is obtained .Then, a classification algorithm which uses the manifold reconstruction error to distinguish the sample classification directly is adopted. This algorithm can utilize the category information better,and improve recognition rate effectively. At the same time, it has the advantage of the easily parameter estimation. The experimental results based on the real-world plant leaf databases shows its average accuracy of recognition was up to 95.17%.

  19. SEMI-SUPERVISED RADIO TRANSMITTER CLASSIFICATION BASED ON ELASTIC SPARSITY REGULARIZED SVM

    Institute of Scientific and Technical Information of China (English)

    Hu Guyu; Gong Yong; Chen Yande; Pan Zhisong; Deng Zhantao

    2012-01-01

    Non-collaborative radio transmitter recognition is a significant but challenging issue,sinceit is hard or costly to obtain labeled training data samples.In order to make effective use of the unlabeled samples which can be obtained much easier,a novel semi-supervised classification method named Elastic Sparsity Regularized Support Vector Machine (ESRSVM) is proposed for radio transmitter classification.ESRSVM first constructs an elastic-net graph over data samples to capture the robust and natural discriminating information and then incorporate the information into the manifold learning framework by an elastic sparsity regularization term.Experimental results on 10 GMSK modulated Automatic Identification System radios and 15 FM walkie-talkie radios show that ESRSVM achieves obviously better performance than KNN and SVM,which use only labeled samples for classification,and also outperforms semi-supervised classifier LapSVM based on manifold regularization.

  20. Missing Data Imputation for Supervised Learning

    OpenAIRE

    Poulos, Jason; Valle, Rafael

    2016-01-01

    This paper compares methods for imputing missing categorical data for supervised learning tasks. The ability of researchers to accurately fit a model and yield unbiased estimates may be compromised by missing data, which are prevalent in survey-based social science research. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on non-imputed (i.e., one-hot encoded) or imputed data with different degrees of missing-data perturbat...

  1. A review of supervised object-based land-cover image classification

    Science.gov (United States)

    Ma, Lei; Li, Manchun; Ma, Xiaoxue; Cheng, Liang; Du, Peijun; Liu, Yongxue

    2017-08-01

    vehicle) or agricultural sites where it also correlates with the number of targeted classes. More than 95.6% of studies involve an area less than 300 ha, and the spatial resolution of images is predominantly between 0 and 2 m. Furthermore, we identify some methods that may advance supervised object-based image classification. For example, deep learning and type-2 fuzzy techniques may further improve classification accuracy. Lastly, scientists are strongly encouraged to report results of uncertainty studies to further explore the effects of varied factors on supervised object-based image classification.

  2. Supervised and unsupervised classification - The case of IRAS point sources

    Science.gov (United States)

    Adorf, Hans-Martin; Meurs, E. J. A.

    Progress is reported on a project which aims at mapping the extragalactic sky in order to derive the large scale distribution of luminous matter. The approach consists in selecting from the IRAS Point Source Catalog a set of galaxies which is as clean and as complete as possible. The decision and discrimination problems involved lend themselves to a treatment using methods from multivariate statistics, in particular statistical pattern recognition. Two different approaches, one based on supervised Bayesian classification, the other on unsupervised data-driven classification, are presented and some preliminary results are reported.

  3. Extending self-organizing maps for supervised classification of remotely sensed data

    Institute of Scientific and Technical Information of China (English)

    CHEN Yongliang

    2009-01-01

    An extended self-organizing map for supervised classification is proposed in this paper. Unlike other traditional SOMs, the model has an input layer, a Kohonen layer, and an output layer. The number of neurons in the input layer depends on the dimensionality of input patterns. The number of neurons in the output layer equals the number of the desired classes. The number of neurons in the Kohonen layer may be a few to several thousands, which depends on the complexity of classification problems and the classification precision. Each training sample is expressed by a pair of vectors: an input vector and a class codebook vector. When a training sample is input into the model, Kohonens competitive learning rule is applied to selecting the winning neuron from the Kohonen layer and the weight coefficients connecting all the neurons in the input layer with both the winning neuron and its neighbors in the Kohonen layer are modified to be closer to the input vector, and those connecting all the neurons around the winning neuron within a certain diameter in the Kohonen layer with all the neurons in the output layer are adjusted to be closer to the class codebook vector. If the number of training samples is sufficiently large and the learning epochs iterate enough times, the model will be able to serve as a supervised classifier. The model has been tentatively applied to the supervised classification of multispectral remotely sensed data. The author compared the performances of the extended SOM and BPN in remotely sensed data classification. The investigation manifests that the extended SOM is feasible for supervised classification.

  4. The Supervised Learning Gaussian Mixture Model

    Institute of Scientific and Technical Information of China (English)

    马继涌; 高文

    1998-01-01

    The traditional Gaussian Mixture Model(GMM)for pattern recognition is an unsupervised learning method.The parameters in the model are derived only by the training samples in one class without taking into account the effect of sample distributions of other classes,hence,its recognition accuracy is not ideal sometimes.This paper introduces an approach for estimating the parameters in GMM in a supervising way.The Supervised Learning Gaussian Mixture Model(SLGMM)improves the recognition accuracy of the GMM.An experimental example has shown its effectiveness.The experimental results have shown that the recognition accuracy derived by the approach is higher than those obtained by the Vector Quantization(VQ)approach,the Radial Basis Function (RBF) network model,the Learning Vector Quantization (LVQ) approach and the GMM.In addition,the training time of the approach is less than that of Multilayer Perceptrom(MLP).

  5. Opportunities to Learn Scientific Thinking in Joint Doctoral Supervision

    Science.gov (United States)

    Kobayashi, Sofie; Grout, Brian W.; Rump, Camilla Østerberg

    2015-01-01

    Research into doctoral supervision has increased rapidly over the last decades, yet our understanding of how doctoral students learn scientific thinking from supervision is limited. Most studies are based on interviews with little work being reported that is based on observation of actual supervision. While joint supervision has become widely…

  6. [RVM supervised feature extraction and Seyfert spectra classification].

    Science.gov (United States)

    Li, Xiang-Ru; Hu, Zhan-Yi; Zhao, Yong-Heng; Li, Xiao-Ming

    2009-06-01

    With recent technological advances in wide field survey astronomy and implementation of several large-scale astronomical survey proposals (e. g. SDSS, 2dF and LAMOST), celestial spectra are becoming very abundant and rich. Therefore, research on automated classification methods based on celestial spectra has been attracting more and more attention in recent years. Feature extraction is a fundamental problem in automated spectral classification, which not only influences the difficulty and complexity of the problem, but also determines the performance of the designed classifying system. The available methods of feature extraction for spectra classification are usually unsupervised, e. g. principal components analysis (PCA), wavelet transform (WT), artificial neural networks (ANN) and Rough Set theory. These methods extract features not by their capability to classify spectra, but by some kind of power to approximate the original celestial spectra. Therefore, the extracted features by these methods usually are not the best ones for classification. In the present work, the authors pointed out the necessary to investigate supervised feature extraction by analyzing the characteristics of the spectra classification research in available literature and the limitations of unsupervised feature extracting methods. And the authors also studied supervised feature extracting based on relevance vector machine (RVM) and its application in Seyfert spectra classification. RVM is a recently introduced method based on Bayesian methodology, automatic relevance determination (ARD), regularization technique and hierarchical priors structure. By this method, the authors can easily fuse the information in training data, the authors' prior knowledge and belief in the problem, etc. And RVM could effectively extract the features and reduce the data based on classifying capability. Extensive experiments show its superior performance in dimensional reduction and feature extraction for Seyfert

  7. Random forest automated supervised classification of Hipparcos periodic variable stars

    CERN Document Server

    Dubath, P; Süveges, M; Blomme, J; López, M; Sarro, L M; De Ridder, J; Cuypers, J; Guy, L; Lecoeur, I; Nienartowicz, K; Jan, A; Beck, M; Mowlavi, N; De Cat, P; Lebzelter, T; Eyer, L

    2011-01-01

    We present an evaluation of the performance of an automated classification of the Hipparcos periodic variable stars into 26 types. The sub-sample with the most reliable variability types available in the literature is used to train supervised algorithms to characterize the type dependencies on a number of attributes. The most useful attributes evaluated with the random forest methodology include, in decreasing order of importance, the period, the amplitude, the V-I colour index, the absolute magnitude, the residual around the folded light-curve model, the magnitude distribution skewness and the amplitude of the second harmonic of the Fourier series model relative to that of the fundamental frequency. Random forests and a multi-stage scheme involving Bayesian network and Gaussian mixture methods lead to statistically equivalent results. In standard 10-fold cross-validation experiments, the rate of correct classification is between 90 and 100%, depending on the variability type. The main mis-classification case...

  8. Learning Apache Mahout classification

    CERN Document Server

    Gupta, Ashish

    2015-01-01

    If you are a data scientist who has some experience with the Hadoop ecosystem and machine learning methods and want to try out classification on large datasets using Mahout, this book is ideal for you. Knowledge of Java is essential.

  9. A Supervised Classification Algorithm for Note Onset Detection

    Directory of Open Access Journals (Sweden)

    Douglas Eck

    2007-01-01

    Full Text Available This paper presents a novel approach to detecting onsets in music audio files. We use a supervised learning algorithm to classify spectrogram frames extracted from digital audio as being onsets or nononsets. Frames classified as onsets are then treated with a simple peak-picking algorithm based on a moving average. We present two versions of this approach. The first version uses a single neural network classifier. The second version combines the predictions of several networks trained using different hyperparameters. We describe the details of the algorithm and summarize the performance of both variants on several datasets. We also examine our choice of hyperparameters by describing results of cross-validation experiments done on a custom dataset. We conclude that a supervised learning approach to note onset detection performs well and warrants further investigation.

  10. Artificial neural network classification using a minimal training set - Comparison to conventional supervised classification

    Science.gov (United States)

    Hepner, George F.; Logan, Thomas; Ritter, Niles; Bryant, Nevin

    1990-01-01

    Recent research has shown an artificial neural network (ANN) to be capable of pattern recognition and the classification of image data. This paper examines the potential for the application of neural network computing to satellite image processing. A second objective is to provide a preliminary comparison and ANN classification. An artificial neural network can be trained to do land-cover classification of satellite imagery using selected sites representative of each class in a manner similar to conventional supervised classification. One of the major problems associated with recognition and classifications of pattern from remotely sensed data is the time and cost of developing a set of training sites. This reseach compares the use of an ANN back propagation classification procedure with a conventional supervised maximum likelihood classification procedure using a minimal training set. When using a minimal training set, the neural network is able to provide a land-cover classification superior to the classification derived from the conventional classification procedure. This research is the foundation for developing application parameters for further prototyping of software and hardware implementations for artificial neural networks in satellite image and geographic information processing.

  11. Graph-based semi-supervised learning

    CERN Document Server

    Subramanya, Amarnag

    2014-01-01

    While labeled data is expensive to prepare, ever increasing amounts of unlabeled data is becoming widely available. In order to adapt to this phenomenon, several semi-supervised learning (SSL) algorithms, which learn from labeled as well as unlabeled data, have been developed. In a separate line of work, researchers have started to realize that graphs provide a natural way to represent data in a variety of domains. Graph-based SSL algorithms, which bring together these two lines of work, have been shown to outperform the state-of-the-art in many applications in speech processing, computer visi

  12. Automatic age and gender classification using supervised appearance model

    Science.gov (United States)

    Bukar, Ali Maina; Ugail, Hassan; Connah, David

    2016-11-01

    Age and gender classification are two important problems that recently gained popularity in the research community, due to their wide range of applications. Research has shown that both age and gender information are encoded in the face shape and texture, hence the active appearance model (AAM), a statistical model that captures shape and texture variations, has been one of the most widely used feature extraction techniques for the aforementioned problems. However, AAM suffers from some drawbacks, especially when used for classification. This is primarily because principal component analysis (PCA), which is at the core of the model, works in an unsupervised manner, i.e., PCA dimensionality reduction does not take into account how the predictor variables relate to the response (class labels). Rather, it explores only the underlying structure of the predictor variables, thus, it is no surprise if PCA discards valuable parts of the data that represent discriminatory features. Toward this end, we propose a supervised appearance model (sAM) that improves on AAM by replacing PCA with partial least-squares regression. This feature extraction technique is then used for the problems of age and gender classification. Our experiments show that sAM has better predictive power than the conventional AAM.

  13. Random forest automated supervised classification of Hipparcos periodic variable stars

    Science.gov (United States)

    Dubath, P.; Rimoldini, L.; Süveges, M.; Blomme, J.; López, M.; Sarro, L. M.; De Ridder, J.; Cuypers, J.; Guy, L.; Lecoeur, I.; Nienartowicz, K.; Jan, A.; Beck, M.; Mowlavi, N.; De Cat, P.; Lebzelter, T.; Eyer, L.

    2011-07-01

    We present an evaluation of the performance of an automated classification of the Hipparcos periodic variable stars into 26 types. The sub-sample with the most reliable variability types available in the literature is used to train supervised algorithms to characterize the type dependencies on a number of attributes. The most useful attributes evaluated with the random forest methodology include, in decreasing order of importance, the period, the amplitude, the V-I colour index, the absolute magnitude, the residual around the folded light-curve model, the magnitude distribution skewness and the amplitude of the second harmonic of the Fourier series model relative to that of the fundamental frequency. Random forests and a multi-stage scheme involving Bayesian network and Gaussian mixture methods lead to statistically equivalent results. In standard 10-fold cross-validation (CV) experiments, the rate of correct classification is between 90 and 100 per cent, depending on the variability type. The main mis-classification cases, up to a rate of about 10 per cent, arise due to confusion between SPB and ACV blue variables and between eclipsing binaries, ellipsoidal variables and other variability types. Our training set and the predicted types for the other Hipparcos periodic stars are available online.

  14. Sentiment Analysis of Twitter tweets using supervised classification technique

    Directory of Open Access Journals (Sweden)

    Pranav Waykar

    2016-05-01

    Full Text Available Making use of social media for analyzing the perceptions of the masses over a product, event or a person has gained momentum in recent times. Out of a wide array of social networks, we chose Twitter for our analysis as the opinions expressed their, are concise and bear a distinctive polarity. Here, we collect the most recent tweets on users' area of interest and analyze them. The extracted tweets are then segregated as positive, negative and neutral. We do the classification in following manner: collect the tweets using Twitter API; then we process the collected tweets to convert all letters to lowercase, eliminate special characters etc. which makes the classification more efficient; the processed tweets are classified using a supervised classification technique. We make use of Naive Bayes classifier to segregate the tweets as positive, negative and neutral. We use a set of sample tweets to train the classifier. The percentage of the tweets in each category is then computed and the result is represented graphically. The result can be used further to gain an insight into the views of the people using Twitter about a particular topic that is being searched by the user. It can help corporate houses devise strategies on the basis of the popularity of their product among the masses. It may help the consumers to make informed choices based on the general sentiment expressed by the Twitter users on a product

  15. TV-SVM: Total Variation Support Vector Machine for Semi-Supervised Data Classification

    OpenAIRE

    Bresson, Xavier; Zhang, Ruiliang

    2012-01-01

    We introduce semi-supervised data classification algorithms based on total variation (TV), Reproducing Kernel Hilbert Space (RKHS), support vector machine (SVM), Cheeger cut, labeled and unlabeled data points. We design binary and multi-class semi-supervised classification algorithms. We compare the TV-based classification algorithms with the related Laplacian-based algorithms, and show that TV classification perform significantly better when the number of labeled data is small.

  16. Supervised Cross-Modal Factor Analysis for Multiple Modal Data Classification

    KAUST Repository

    Wang, Jingbin

    2015-10-09

    In this paper we study the problem of learning from multiple modal data for purpose of document classification. In this problem, each document is composed two different modals of data, i.e., An image and a text. Cross-modal factor analysis (CFA) has been proposed to project the two different modals of data to a shared data space, so that the classification of a image or a text can be performed directly in this space. A disadvantage of CFA is that it has ignored the supervision information. In this paper, we improve CFA by incorporating the supervision information to represent and classify both image and text modals of documents. We project both image and text data to a shared data space by factor analysis, and then train a class label predictor in the shared space to use the class label information. The factor analysis parameter and the predictor parameter are learned jointly by solving one single objective function. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projection measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple modal document data sets show the advantage of the proposed algorithm over other CFA methods.

  17. Supervised Classification: The Naive Beyesian Returns to the Old Bailey

    Directory of Open Access Journals (Sweden)

    Vilja Hulden

    2014-12-01

    Full Text Available A few years back, William Turkel wrote a series of blog posts called A Naive Bayesian in the Old Bailey, which showed how one could use machine learning to extract interesting documents out of a digital archive. This tutorial is a kind of an update on that blog essay, with roughly the same data but a slightly different version of the machine learner. The idea is to show why machine learning methods are of interest to historians, as well as to present a step-by-step implementation of a supervised machine learner. This learner is then applied to the Old Bailey digital archive, which contains several centuries’ worth of transcripts of trials held at the Old Bailey in London. We will be using Python for the implementation.

  18. Collaborative Supervised Learning for Sensor Networks

    Science.gov (United States)

    Wagstaff, Kiri L.; Rebbapragada, Umaa; Lane, Terran

    2011-01-01

    Collaboration methods for distributed machine-learning algorithms involve the specification of communication protocols for the learners, which can query other learners and/or broadcast their findings preemptively. Each learner incorporates information from its neighbors into its own training set, and they are thereby able to bootstrap each other to higher performance. Each learner resides at a different node in the sensor network and makes observations (collects data) independently of the other learners. After being seeded with an initial labeled training set, each learner proceeds to learn in an iterative fashion. New data is collected and classified. The learner can then either broadcast its most confident classifications for use by other learners, or can query neighbors for their classifications of its least confident items. As such, collaborative learning combines elements of both passive (broadcast) and active (query) learning. It also uses ideas from ensemble learning to combine the multiple responses to a given query into a single useful label. This approach has been evaluated against current non-collaborative alternatives, including training a single classifier and deploying it at all nodes with no further learning possible, and permitting learners to learn from their own most confident judgments, absent interaction with their neighbors. On several data sets, it has been consistently found that active collaboration is the best strategy for a distributed learner network. The main advantages include the ability for learning to take place autonomously by collaboration rather than by requiring intervention from an oracle (usually human), and also the ability to learn in a distributed environment, permitting decisions to be made in situ and to yield faster response time.

  19. Semi-Supervised Projective Non-Negative Matrix Factorization for Cancer Classification.

    Directory of Open Access Journals (Sweden)

    Xiang Zhang

    Full Text Available Advances in DNA microarray technologies have made gene expression profiles a significant candidate in identifying different types of cancers. Traditional learning-based cancer identification methods utilize labeled samples to train a classifier, but they are inconvenient for practical application because labels are quite expensive in the clinical cancer research community. This paper proposes a semi-supervised projective non-negative matrix factorization method (Semi-PNMF to learn an effective classifier from both labeled and unlabeled samples, thus boosting subsequent cancer classification performance. In particular, Semi-PNMF jointly learns a non-negative subspace from concatenated labeled and unlabeled samples and indicates classes by the positions of the maximum entries of their coefficients. Because Semi-PNMF incorporates statistical information from the large volume of unlabeled samples in the learned subspace, it can learn more representative subspaces and boost classification performance. We developed a multiplicative update rule (MUR to optimize Semi-PNMF and proved its convergence. The experimental results of cancer classification for two multiclass cancer gene expression profile datasets show that Semi-PNMF outperforms the representative methods.

  20. Supervised Filter Learning for Representation Based Face Recognition.

    Directory of Open Access Journals (Sweden)

    Chao Bi

    Full Text Available Representation based classification methods, such as Sparse Representation Classification (SRC and Linear Regression Classification (LRC have been developed for face recognition problem successfully. However, most of these methods use the original face images without any preprocessing for recognition. Thus, their performances may be affected by some problematic factors (such as illumination and expression variances in the face images. In order to overcome this limitation, a novel supervised filter learning algorithm is proposed for representation based face recognition in this paper. The underlying idea of our algorithm is to learn a filter so that the within-class representation residuals of the faces' Local Binary Pattern (LBP features are minimized and the between-class representation residuals of the faces' LBP features are maximized. Therefore, the LBP features of filtered face images are more discriminative for representation based classifiers. Furthermore, we also extend our algorithm for heterogeneous face recognition problem. Extensive experiments are carried out on five databases and the experimental results verify the efficacy of the proposed algorithm.

  1. Semi-supervised Learning with Deep Generative Models

    NARCIS (Netherlands)

    Kingma, D.P.; Rezende, D.J.; Mohamed, S.; Welling, M.

    2014-01-01

    The ever-increasing size of modern data sets combined with the difficulty of obtaining label information has made semi-supervised learning one of the problems of significant practical importance in modern data analysis. We revisit the approach to semi-supervised learning with generative models and

  2. Supervised neural networks for the classification of structures.

    Science.gov (United States)

    Sperduti, A; Starita, A

    1997-01-01

    Standard neural networks and statistical methods are usually believed to be inadequate when dealing with complex structures because of their feature-based approach. In fact, feature-based approaches usually fail to give satisfactory solutions because of the sensitivity of the approach to the a priori selection of the features, and the incapacity to represent any specific information on the relationships among the components of the structures. However, we show that neural networks can, in fact, represent and classify structured patterns. The key idea underpinning our approach is the use of the so called "generalized recursive neuron", which is essentially a generalization to structures of a recurrent neuron. By using generalized recursive neurons, all the supervised networks developed for the classification of sequences, such as backpropagation through time networks, real-time recurrent networks, simple recurrent networks, recurrent cascade correlation networks, and neural trees can, on the whole, be generalized to structures. The results obtained by some of the above networks (with generalized recursive neurons) on the classification of logic terms are presented.

  3. A Novel Classification Algorithm Based on Incremental Semi-Supervised Support Vector Machine

    Science.gov (United States)

    Gao, Fei; Mei, Jingyuan; Sun, Jinping; Wang, Jun; Yang, Erfu; Hussain, Amir

    2015-01-01

    For current computational intelligence techniques, a major challenge is how to learn new concepts in changing environment. Traditional learning schemes could not adequately address this problem due to a lack of dynamic data selection mechanism. In this paper, inspired by human learning process, a novel classification algorithm based on incremental semi-supervised support vector machine (SVM) is proposed. Through the analysis of prediction confidence of samples and data distribution in a changing environment, a “soft-start” approach, a data selection mechanism and a data cleaning mechanism are designed, which complete the construction of our incremental semi-supervised learning system. Noticeably, with the ingenious design procedure of our proposed algorithm, the computation complexity is reduced effectively. In addition, for the possible appearance of some new labeled samples in the learning process, a detailed analysis is also carried out. The results show that our algorithm does not rely on the model of sample distribution, has an extremely low rate of introducing wrong semi-labeled samples and can effectively make use of the unlabeled samples to enrich the knowledge system of classifier and improve the accuracy rate. Moreover, our method also has outstanding generalization performance and the ability to overcome the concept drift in a changing environment. PMID:26275294

  4. A Novel Classification Algorithm Based on Incremental Semi-Supervised Support Vector Machine.

    Directory of Open Access Journals (Sweden)

    Fei Gao

    Full Text Available For current computational intelligence techniques, a major challenge is how to learn new concepts in changing environment. Traditional learning schemes could not adequately address this problem due to a lack of dynamic data selection mechanism. In this paper, inspired by human learning process, a novel classification algorithm based on incremental semi-supervised support vector machine (SVM is proposed. Through the analysis of prediction confidence of samples and data distribution in a changing environment, a "soft-start" approach, a data selection mechanism and a data cleaning mechanism are designed, which complete the construction of our incremental semi-supervised learning system. Noticeably, with the ingenious design procedure of our proposed algorithm, the computation complexity is reduced effectively. In addition, for the possible appearance of some new labeled samples in the learning process, a detailed analysis is also carried out. The results show that our algorithm does not rely on the model of sample distribution, has an extremely low rate of introducing wrong semi-labeled samples and can effectively make use of the unlabeled samples to enrich the knowledge system of classifier and improve the accuracy rate. Moreover, our method also has outstanding generalization performance and the ability to overcome the concept drift in a changing environment.

  5. Semi-Supervised Classification based on Gaussian Mixture Model for remote imagery

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    Semi-Supervised Classification (SSC),which makes use of both labeled and unlabeled data to determine classification borders in feature space,has great advantages in extracting classification information from mass data.In this paper,a novel SSC method based on Gaussian Mixture Model (GMM) is proposed,in which each class’s feature space is described by one GMM.Experiments show the proposed method can achieve high classification accuracy with small amount of labeled data.However,for the same accuracy,supervised classification methods such as Support Vector Machine,Object Oriented Classification,etc.should be provided with much more labeled data.

  6. Establishing a Supervised Classification of Global Blue Carbon Mangrove Ecosystems

    Science.gov (United States)

    Baltezar, P.

    2016-12-01

    Understanding change in mangroves over time will aid forest management systems working to protect them from over exploitation. Mangroves are one of the most carbon dense terrestrial ecosystems on the planet and are therefore a high priority for sustainable forest management. Although they represent 1% of terrestrial cover, they could account for about 10% of global carbon emissions. The foundation of this analysis uses remote sensing to establish a supervised classification of mangrove forests for discrete regions in the Zambezi Delta of Mozambique and the Rufiji Delta of Tanzania. Open-source mapping platforms provided a dynamic space for analyzing satellite imagery in the Google Earth Engine (GEE) coding environment. C-Band Synthetic Aperture Radar data from Sentinel 1 was used in the model as a mask by optimizing SAR parameters. Exclusion metrics identified within Global Land Surface Temperature data from MODIS and the Shuttle Radar Topography Mission were used to accentuate mangrove features. Variance was accounted for in exclusion metrics by statistically calculating thresholds for radar, thermal, and elevation data. Optical imagery from the Landsat 8 archive aided a quality mosaic in extracting the highest spectral index values most appropriate for vegetative mapping. The enhanced radar, thermal, and digital elevation imagery were then incorporated into the quality mosaic. Training sites were selected from Google Earth imagery and used in the classification with a resulting output of four mangrove cover map models for each site. The model was assessed for accuracy by observing the differences between the mangrove classification models to the reference maps. Although the model was over predicting mangroves in non-mangrove regions, it was more accurately classifying mangrove regions established by the references. Future refinements will expand the model with an objective degree of accuracy.

  7. The Learning Alliance: Ethics in Doctoral Supervision

    Science.gov (United States)

    Halse, Christine; Bansel, Peter

    2012-01-01

    This paper is concerned with the ethics of relationships in doctoral supervision. We give an overview of four paradigms of doctoral supervision that have endured over the past 25 years and elucidate some of their strengths and limitations, contextualise them historically and consider their implications for doctoral supervision in the contemporary…

  8. Integrating the Supervised Information into Unsupervised Learning

    Directory of Open Access Journals (Sweden)

    Ping Ling

    2013-01-01

    Full Text Available This paper presents an assembling unsupervised learning framework that adopts the information coming from the supervised learning process and gives the corresponding implementation algorithm. The algorithm consists of two phases: extracting and clustering data representatives (DRs firstly to obtain labeled training data and then classifying non-DRs based on labeled DRs. The implementation algorithm is called SDSN since it employs the tuning-scaled Support vector domain description to collect DRs, uses spectrum-based method to cluster DRs, and adopts the nearest neighbor classifier to label non-DRs. The validation of the clustering procedure of the first-phase is analyzed theoretically. A new metric is defined data dependently in the second phase to allow the nearest neighbor classifier to work with the informed information. A fast training approach for DRs’ extraction is provided to bring more efficiency. Experimental results on synthetic and real datasets verify that the proposed idea is of correctness and performance and SDSN exhibits higher popularity in practice over the traditional pure clustering procedure.

  9. Supervised Speech Separation Based on Deep Learning: An Overview

    OpenAIRE

    Wang, DeLiang; Chen, Jitong

    2017-01-01

    Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning ...

  10. Biomedical data analysis by supervised manifold learning.

    Science.gov (United States)

    Alvarez-Meza, A M; Daza-Santacoloma, G; Castellanos-Dominguez, G

    2012-01-01

    Biomedical data analysis is usually carried out by assuming that the information structure embedded into the biomedical recordings is linear, but that statement actually does not corresponds to the real behavior of the extracted features. In order to improve the accuracy of an automatic system to diagnostic support, and to reduce the computational complexity of the employed classifiers, we propose a nonlinear dimensionality reduction methodology based on manifold learning with multiple kernel representations, which learns the underlying data structure of biomedical information. Moreover, our approach can be used as a tool that allows the specialist to do a visual analysis and interpretation about the studied variables describing the health condition. Obtained results show how our approach maps the original high dimensional features into an embedding space where simple and straightforward classification strategies achieve a suitable system performance.

  11. Supervised learning of semantic classes for image annotation and retrieval.

    Science.gov (United States)

    Carneiro, Gustavo; Chan, Antoni B; Moreno, Pedro J; Vasconcelos, Nuno

    2007-03-01

    A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to-one correspondence between semantic labels and semantic classes, a minimum probability of error annotation and retrieval are feasible with algorithms that are 1) conceptually simple, 2) computationally efficient, and 3) do not require prior semantic segmentation of training images. In particular, images are represented as bags of localized feature vectors, a mixture density estimated for each image, and the mixtures associated with all images annotated with a common semantic label pooled into a density estimate for the corresponding semantic class. This pooling is justified by a multiple instance learning argument and performed efficiently with a hierarchical extension of expectation-maximization. The benefits of the supervised formulation over the more complex, and currently popular, joint modeling of semantic label and visual feature distributions are illustrated through theoretical arguments and extensive experiments. The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost. Finally, the proposed method is shown to be fairly robust to parameter tuning.

  12. Opportunities to learn scientific thinking in joint doctoral supervision

    DEFF Research Database (Denmark)

    Kobayashi, Sofie; Grout, Brian William Wilson; Rump, Camilla Østerberg

    2015-01-01

    Research into doctoral supervision has increased rapidly over the last decades, yet our understanding of how doctoral students learn scientific thinking from supervision is limited. Most studies are based on interviews with little work being reported that is based on observation of actual supervi...

  13. Deep learning classification in asteroseismology

    DEFF Research Database (Denmark)

    Hon, Marc; Stello, Dennis; Yu, Jie

    2017-01-01

    In the power spectra of oscillating red giants, there are visually distinct features defining stars ascending the red giant branch from those that have commenced helium core burning. We train a 1D convolutional neural network by supervised learning to automatically learn these visual features from...

  14. MULTI-LABEL ASRS DATASET CLASSIFICATION USING SEMI-SUPERVISED SUBSPACE CLUSTERING

    Data.gov (United States)

    National Aeronautics and Space Administration — MULTI-LABEL ASRS DATASET CLASSIFICATION USING SEMI-SUPERVISED SUBSPACE CLUSTERING MOHAMMAD SALIM AHMED, LATIFUR KHAN, NIKUNJ OZA, AND MANDAVA RAJESWARI Abstract....

  15. Pulsar Search Using Supervised Machine Learning

    Science.gov (United States)

    Ford, John M.

    2017-05-01

    Pulsars are rapidly rotating neutron stars which emit a strong beam of energy through mechanisms that are not entirely clear to physicists. These very dense stars are used by astrophysicists to study many basic physical phenomena, such as the behavior of plasmas in extremely dense environments, behavior of pulsar-black hole pairs, and tests of general relativity. Many of these tasks require a large ensemble of pulsars to provide enough statistical information to answer the scientific questions posed by physicists. In order to provide more pulsars to study, there are several large-scale pulsar surveys underway, which are generating a huge backlog of unprocessed data. Searching for pulsars is a very labor-intensive process, currently requiring skilled people to examine and interpret plots of data output by analysis programs. An automated system for screening the plots will speed up the search for pulsars by a very large factor. Research to date on using machine learning and pattern recognition has not yielded a completely satisfactory system, as systems with the desired near 100% recall have false positive rates that are higher than desired, causing more manual labor in the classification of pulsars. This work proposed to research, identify, propose and develop methods to overcome the barriers to building an improved classification system with a false positive rate of less than 1% and a recall of near 100% that will be useful for the current and next generation of large pulsar surveys. The results show that it is possible to generate classifiers that perform as needed from the available training data. While a false positive rate of 1% was not reached, recall of over 99% was achieved with a false positive rate of less than 2%. Methods of mitigating the imbalanced training and test data were explored and found to be highly effective in enhancing classification accuracy.

  16. Semi-supervised analysis of human brain tumours from partially labeled MRS information, using manifold learning models.

    Science.gov (United States)

    Cruz-Barbosa, Raúl; Vellido, Alfredo

    2011-02-01

    Medical diagnosis can often be understood as a classification problem. In oncology, this typically involves differentiating between tumour types and grades, or some type of discrete outcome prediction. From the viewpoint of computer-based medical decision support, this classification requires the availability of accurate diagnoses of past cases as training target examples. The availability of such labeled databases is scarce in most areas of oncology, and especially so in neuro-oncology. In such context, semi-supervised learning oriented towards classification can be a sensible data modeling choice. In this study, semi-supervised variants of Generative Topographic Mapping, a model of the manifold learning family, are applied to two neuro-oncology problems: the diagnostic discrimination between different brain tumour pathologies, and the prediction of outcomes for a specific type of aggressive brain tumours. Their performance compared favorably with those of the alternative Laplacian Eigenmaps and Semi-Supervised SVM for Manifold Learning models in most of the experiments.

  17. Multiclass Semi-Supervised Boosting and Similarity Learning

    NARCIS (Netherlands)

    Tanha, J.; Saberian, M.J.; van Someren, M.; Xiong, H.; Karypis, G.; Thuraisingham, B.; Cook, D.; Wu, X.

    2013-01-01

    In this paper, we consider the multiclass semi-supervised classification problem. A boosting algorithm is proposed to solve the multiclass problem directly. The proposed multiclass approach uses a new multiclass loss function, which includes two terms. The first term is the cost of the multiclass ma

  18. Semi-supervised vibration-based classification and condition monitoring of compressors

    Science.gov (United States)

    Potočnik, Primož; Govekar, Edvard

    2017-09-01

    Semi-supervised vibration-based classification and condition monitoring of the reciprocating compressors installed in refrigeration appliances is proposed in this paper. The method addresses the problem of industrial condition monitoring where prior class definitions are often not available or difficult to obtain from local experts. The proposed method combines feature extraction, principal component analysis, and statistical analysis for the extraction of initial class representatives, and compares the capability of various classification methods, including discriminant analysis (DA), neural networks (NN), support vector machines (SVM), and extreme learning machines (ELM). The use of the method is demonstrated on a case study which was based on industrially acquired vibration measurements of reciprocating compressors during the production of refrigeration appliances. The paper presents a comparative qualitative analysis of the applied classifiers, confirming the good performance of several nonlinear classifiers. If the model parameters are properly selected, then very good classification performance can be obtained from NN trained by Bayesian regularization, SVM and ELM classifiers. The method can be effectively applied for the industrial condition monitoring of compressors.

  19. Modeling electroencephalography waveforms with semi-supervised deep belief nets: fast classification and anomaly measurement

    Science.gov (United States)

    Wulsin, D. F.; Gupta, J. R.; Mani, R.; Blanco, J. A.; Litt, B.

    2011-06-01

    Clinical electroencephalography (EEG) records vast amounts of human complex data yet is still reviewed primarily by human readers. Deep belief nets (DBNs) are a relatively new type of multi-layer neural network commonly tested on two-dimensional image data but are rarely applied to times-series data such as EEG. We apply DBNs in a semi-supervised paradigm to model EEG waveforms for classification and anomaly detection. DBN performance was comparable to standard classifiers on our EEG dataset, and classification time was found to be 1.7-103.7 times faster than the other high-performing classifiers. We demonstrate how the unsupervised step of DBN learning produces an autoencoder that can naturally be used in anomaly measurement. We compare the use of raw, unprocessed data—a rarity in automated physiological waveform analysis—with hand-chosen features and find that raw data produce comparable classification and better anomaly measurement performance. These results indicate that DBNs and raw data inputs may be more effective for online automated EEG waveform recognition than other common techniques.

  20. Sentiment Classification of Hotel Reviews in Social Media with Decision Tree Learning

    National Research Council Canada - National Science Library

    Stanimira Yordanova; Dorina Kabakchieva

    2017-01-01

    The aim of this paper is to present an approach for prediction of customer opinion, using supervised machine learning approach and Decision tree method for classification of online hotel reviews as positive or negative...

  1. Deep learning for brain tumor classification

    Science.gov (United States)

    Paul, Justin S.; Plassard, Andrew J.; Landman, Bennett A.; Fabbri, Daniel

    2017-03-01

    Recent research has shown that deep learning methods have performed well on supervised machine learning, image classification tasks. The purpose of this study is to apply deep learning methods to classify brain images with different tumor types: meningioma, glioma, and pituitary. A dataset was publicly released containing 3,064 T1-weighted contrast enhanced MRI (CE-MRI) brain images from 233 patients with either meningioma, glioma, or pituitary tumors split across axial, coronal, or sagittal planes. This research focuses on the 989 axial images from 191 patients in order to avoid confusing the neural networks with three different planes containing the same diagnosis. Two types of neural networks were used in classification: fully connected and convolutional neural networks. Within these two categories, further tests were computed via the augmentation of the original 512×512 axial images. Training neural networks over the axial data has proven to be accurate in its classifications with an average five-fold cross validation of 91.43% on the best trained neural network. This result demonstrates that a more general method (i.e. deep learning) can outperform specialized methods that require image dilation and ring-forming subregions on tumors.

  2. Enhancing Adult Learning in Clinical Supervision

    Science.gov (United States)

    Goldman, Stuart

    2011-01-01

    Objective/Background: For decades, across almost every training site, clinical supervision has been considered "central to the development of skills" in psychiatry. The crucial supervisor/supervisee relationship has been described extensively in the literature, most often framed as a clinical apprenticeship of the novice to the master craftsman.…

  3. Hierarchical Wireless Multimedia Sensor Networks for Collaborative Hybrid Semi-Supervised Classifier Learning

    Directory of Open Access Journals (Sweden)

    Liang Ding

    2007-11-01

    Full Text Available Wireless multimedia sensor networks (WMSN have recently emerged as one ofthe most important technologies, driven by the powerful multimedia signal acquisition andprocessing abilities. Target classification is an important research issue addressed in WMSN,which has strict requirement in robustness, quickness and accuracy. This paper proposes acollaborative semi-supervised classifier learning algorithm to achieve durative onlinelearning for support vector machine (SVM based robust target classification. The proposedalgorithm incrementally carries out the semi-supervised classifier learning process inhierarchical WMSN, with the collaboration of multiple sensor nodes in a hybrid computingparadigm. For decreasing the energy consumption and improving the performance, somemetrics are introduced to evaluate the effectiveness of the samples in specific sensor nodes,and a sensor node selection strategy is also proposed to reduce the impact of inevitablemissing detection and false detection. With the ant optimization routing, the learningprocess is implemented with the selected sensor nodes, which can decrease the energyconsumption. Experimental results demonstrate that the collaborative hybrid semi-supervised classifier learning algorithm can effectively implement target classification inhierarchical WMSN. It has outstanding performance in terms of energy efficiency and timecost, which verifies the effectiveness of the sensor nodes selection and ant optimizationrouting.

  4. Semi-supervised hyperspectral classification from a small number of training samples using a co-training approach

    Science.gov (United States)

    Romaszewski, Michał; Głomb, Przemysław; Cholewa, Michał

    2016-11-01

    We present a novel semi-supervised algorithm for classification of hyperspectral data from remote sensors. Our method is inspired by the Tracking-Learning-Detection (TLD) framework, originally applied for tracking objects in a video stream. TLD introduced the co-training approach called P-N learning, making use of two independent 'experts' (or learners) that scored samples in different feature spaces. In a similar fashion, we formulated the hyperspectral classification task as a co-training problem, that can be solved with the P-N learning scheme. Our method uses both spatial and spectral features of data, extending a small set of initial labelled samples during the process of region growing. We show that this approach is stable and achieves very good accuracy even for small training sets. We analyse the algorithm's performance on several publicly available hyperspectral data sets.

  5. Quantum Ensemble Classification: A Sampling-Based Learning Control Approach.

    Science.gov (United States)

    Chen, Chunlin; Dong, Daoyi; Qi, Bo; Petersen, Ian R; Rabitz, Herschel

    2017-06-01

    Quantum ensemble classification (QEC) has significant applications in discrimination of atoms (or molecules), separation of isotopes, and quantum information extraction. However, quantum mechanics forbids deterministic discrimination among nonorthogonal states. The classification of inhomogeneous quantum ensembles is very challenging, since there exist variations in the parameters characterizing the members within different classes. In this paper, we recast QEC as a supervised quantum learning problem. A systematic classification methodology is presented by using a sampling-based learning control (SLC) approach for quantum discrimination. The classification task is accomplished via simultaneously steering members belonging to different classes to their corresponding target states (e.g., mutually orthogonal states). First, a new discrimination method is proposed for two similar quantum systems. Then, an SLC method is presented for QEC. Numerical results demonstrate the effectiveness of the proposed approach for the binary classification of two-level quantum ensembles and the multiclass classification of multilevel quantum ensembles.

  6. Semi-supervised Eigenvectors for Locally-biased Learning

    DEFF Research Database (Denmark)

    Hansen, Toke Jansen; Mahoney, Michael W.

    2012-01-01

    of this sort are particularly challenging for popular eigenvector-based machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities. In this paper, we address this issue by providing a methodology to construct semi-supervised eigenvectors of a graph......In many applications, one has side information, e.g., labels that are provided in a semi-supervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks "nearby" that pre-specified target region. Locally-biased problems...... Laplacian, and we illustrate how these locally-biased eigenvectors can be used to perform locally-biased machine learning. These semi-supervised eigenvectors capture successively-orthogonalized directions of maximum variance, conditioned on being well-correlated with an input seed set of nodes...

  7. Applying Machine Learning to Star Cluster Classification

    Science.gov (United States)

    Fedorenko, Kristina; Grasha, Kathryn; Calzetti, Daniela; Mahadevan, Sridhar

    2016-01-01

    Catalogs describing populations of star clusters are essential in investigating a range of important issues, from star formation to galaxy evolution. Star cluster catalogs are typically created in a two-step process: in the first step, a catalog of sources is automatically produced; in the second step, each of the extracted sources is visually inspected by 3-to-5 human classifiers and assigned a category. Classification by humans is labor-intensive and time consuming, thus it creates a bottleneck, and substantially slows down progress in star cluster research.We seek to automate the process of labeling star clusters (the second step) through applying supervised machine learning techniques. This will provide a fast, objective, and reproducible classification. Our data is HST (WFC3 and ACS) images of galaxies in the distance range of 3.5-12 Mpc, with a few thousand star clusters already classified by humans as a part of the LEGUS (Legacy ExtraGalactic UV Survey) project. The classification is based on 4 labels (Class 1 - symmetric, compact cluster; Class 2 - concentrated object with some degree of asymmetry; Class 3 - multiple peak system, diffuse; and Class 4 - spurious detection). We start by looking at basic machine learning methods such as decision trees. We then proceed to evaluate performance of more advanced techniques, focusing on convolutional neural networks and other Deep Learning methods. We analyze the results, and suggest several directions for further improvement.

  8. Action Learning in Undergraduate Engineering Thesis Supervision

    Science.gov (United States)

    Stappenbelt, Brad

    2017-01-01

    In the present action learning implementation, twelve action learning sets were conducted over eight years. The action learning sets consisted of students involved in undergraduate engineering research thesis work. The concurrent study accompanying this initiative investigated the influence of the action learning environment on student approaches…

  9. A multi-label, semi-supervised classification approach applied to personality prediction in social media.

    Science.gov (United States)

    Lima, Ana Carolina E S; de Castro, Leandro Nunes

    2014-10-01

    Social media allow web users to create and share content pertaining to different subjects, exposing their activities, opinions, feelings and thoughts. In this context, online social media has attracted the interest of data scientists seeking to understand behaviours and trends, whilst collecting statistics for social sites. One potential application for these data is personality prediction, which aims to understand a user's behaviour within social media. Traditional personality prediction relies on users' profiles, their status updates, the messages they post, etc. Here, a personality prediction system for social media data is introduced that differs from most approaches in the literature, in that it works with groups of texts, instead of single texts, and does not take users' profiles into account. Also, the proposed approach extracts meta-attributes from texts and does not work directly with the content of the messages. The set of possible personality traits is taken from the Big Five model and allows the problem to be characterised as a multi-label classification task. The problem is then transformed into a set of five binary classification problems and solved by means of a semi-supervised learning approach, due to the difficulty in annotating the massive amounts of data generated in social media. In our implementation, the proposed system was trained with three well-known machine-learning algorithms, namely a Naïve Bayes classifier, a Support Vector Machine, and a Multilayer Perceptron neural network. The system was applied to predict the personality of Tweets taken from three datasets available in the literature, and resulted in an approximately 83% accurate prediction, with some of the personality traits presenting better individual classification rates than others.

  10. Improving Semi-Supervised Learning with Auxiliary Deep Generative Models

    DEFF Research Database (Denmark)

    Maaløe, Lars; Sønderby, Casper Kaae; Sønderby, Søren Kaae

    Deep generative models based upon continuous variational distributions parameterized by deep networks give state-of-the-art performance. In this paper we propose a framework for extending the latent representation with extra auxiliary variables in order to make the variational distribution more...... expressive for semi-supervised learning. By utilizing the stochasticity of the auxiliary variable we demonstrate how to train discriminative classifiers resulting in state-of-the-art performance within semi-supervised learning exemplified by an 0.96% error on MNIST using 100 labeled data points. Furthermore...

  11. Polarimetric SAR Image Supervised Classification Method Integrating Eigenvalues

    Directory of Open Access Journals (Sweden)

    Xing Yanxiao

    2016-04-01

    Full Text Available Since classification methods based on H/α space have the drawback of yielding poor classification results for terrains with similar scattering features, in this study, we propose a polarimetric Synthetic Aperture Radar (SAR image classification method based on eigenvalues. First, we extract eigenvalues and fit their distribution with an adaptive Gaussian mixture model. Then, using the naive Bayesian classifier, we obtain preliminary classification results. The distribution of eigenvalues in two kinds of terrains may be similar, leading to incorrect classification in the preliminary step. So, we calculate the similarity of every terrain pair, and add them to the similarity table if their similarity is greater than a given threshold. We then apply the Wishart distance-based KNN classifier to these similar pairs to obtain further classification results. We used the proposed method on both airborne and spaceborne SAR datasets, and the results show that our method can overcome the shortcoming of the H/α-based unsupervised classification method for eigenvalues usage, and produces comparable results with the Support Vector Machine (SVM-based classification method.

  12. Use of Sub-Aperture Decomposition for Supervised PolSAR Classification in Urban Area

    Directory of Open Access Journals (Sweden)

    Lei Deng

    2015-01-01

    Full Text Available A novel approach is proposed for classifying the polarimetric SAR (PolSAR data by integrating polarimetric decomposition, sub-aperture decomposition and decision tree algorithm. It is composed of three key steps: sub-aperture decomposition, feature extraction and combination, and decision tree classification. Feature extraction and combination is the main contribution to the innovation of the proposed method. Firstly, the full-resolution PolSAR image and its two sub-aperture images are decomposed to obtain the scattering entropy, average scattering angle and anisotropy, respectively. Then, the difference information between the two sub-aperture images are extracted, and combined with the target decomposition features from full-resolution images to form the classification feature set. Finally, C5.0 decision tree algorithm is used to classify the PolSAR image. A comparison between the proposed method and commonly-used Wishart supervised classification was made to verify the improvement of the proposed method on the classification. The overall accuracy using the proposed method was 88.39%, much higher than that using the Wishart supervised classification, which exhibited an overall accuracy of 69.82%. The Kappa Coefficient was 0.83, whereas that using the Wishart supervised classification was 0.56. The results indicate that the proposed method performed better than Wishart supervised classification for landscape classification in urban area using PolSAR data. Further investigation was carried out on the contribution of difference information to PolSAR classification. It was found that the sub-aperture decomposition improved the classification accuracy of forest, buildings and grassland effectively in high-density urban area. Compared with support vector machine (SVM and QUEST classifier, C5.0 decision tree classifier performs more efficient in time consumption, feature selection and construction of decision rule.

  13. Determining effects of non-synonymous SNPs on protein-protein interactions using supervised and semi-supervised learning.

    Directory of Open Access Journals (Sweden)

    Nan Zhao

    2014-05-01

    Full Text Available Single nucleotide polymorphisms (SNPs are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs have been found near or inside the protein-protein interaction (PPI interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor. Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1 a 2-class problem (strengthening/weakening PPI mutations, (2 another 2-class problem (mutations that disrupt/preserve a PPI, and (3 a 3-class classification (detrimental/neutral/beneficial mutation effects. In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the

  14. Supervised Classification Processes for the Characterization of Heritage Elements, Case Study: Cuenca-Ecuador

    Science.gov (United States)

    Briones, J. C.; Heras, V.; Abril, C.; Sinchi, E.

    2017-08-01

    The proper control of built heritage entails many challenges related to the complexity of heritage elements and the extent of the area to be managed, for which the available resources must be efficiently used. In this scenario, the preventive conservation approach, based on the concept that prevent is better than cure, emerges as a strategy to avoid the progressive and imminent loss of monuments and heritage sites. Regular monitoring appears as a key tool to identify timely changes in heritage assets. This research demonstrates that the supervised learning model (Support Vector Machines - SVM) is an ideal tool that supports the monitoring process detecting visible elements in aerial images such as roofs structures, vegetation and pavements. The linear, gaussian and polynomial kernel functions were tested; the lineal function provided better results over the other functions. It is important to mention that due to the high level of segmentation generated by the classification procedure, it was necessary to apply a generalization process through opening a mathematical morphological operation, which simplified the over classification for the monitored elements.

  15. Robust Transfer Metric Learning for Image Classification.

    Science.gov (United States)

    Ding, Zhengming; Fu, Yun

    2017-02-01

    Metric learning has attracted increasing attention due to its critical role in image analysis and classification. Conventional metric learning always assumes that the training and test data are sampled from the same or similar distribution. However, to build an effective distance metric, we need abundant supervised knowledge (i.e., side/label information), which is generally inaccessible in practice, because of the expensive labeling cost. In this paper, we develop a robust transfer metric learning (RTML) framework to effectively assist the unlabeled target learning by transferring the knowledge from the well-labeled source domain. Specifically, RTML exploits knowledge transfer to mitigate the domain shift in two directions, i.e., sample space and feature space. In the sample space, domain-wise and class-wise adaption schemes are adopted to bridge the gap of marginal and conditional distribution disparities across two domains. In the feature space, our metric is built in a marginalized denoising fashion and low-rank constraint, which make it more robust to tackle noisy data in reality. Furthermore, we design an explicit rank constraint regularizer to replace the rank minimization NP-hard problem to guide the low-rank metric learning. Experimental results on several standard benchmarks demonstrate the effectiveness of our proposed RTML by comparing it with the state-of-the-art transfer learning and metric learning algorithms.

  16. Classification and Weakly Supervised Pain Localization using Multiple Segment Representation

    Science.gov (United States)

    Sikka, Karan; Dhall, Abhinav; Bartlett, Marian Stewart

    2014-01-01

    Automatic pain recognition from videos is a vital clinical application and, owing to its spontaneous nature, poses interesting challenges to automatic facial expression recognition (AFER) research. Previous pain vs no-pain systems have highlighted two major challenges: (1) ground truth is provided for the sequence, but the presence or absence of the target expression for a given frame is unknown, and (2) the time point and the duration of the pain expression event(s) in each video are unknown. To address these issues we propose a novel framework (referred to as MS-MIL) where each sequence is represented as a bag containing multiple segments, and multiple instance learning (MIL) is employed to handle this weakly labeled data in the form of sequence level ground-truth. These segments are generated via multiple clustering of a sequence or running a multi-scale temporal scanning window, and are represented using a state-of-the-art Bag of Words (BoW) representation. This work extends the idea of detecting facial expressions through ‘concept frames’ to ‘concept segments’ and argues through extensive experiments that algorithms such as MIL are needed to reap the benefits of such representation. The key advantages of our approach are: (1) joint detection and localization of painful frames using only sequence-level ground-truth, (2) incorporation of temporal dynamics by representing the data not as individual frames but as segments, and (3) extraction of multiple segments, which is well suited to signals with uncertain temporal location and duration in the video. Extensive experiments on UNBC-McMaster Shoulder Pain dataset highlight the effectiveness of the approach by achieving competitive results on both tasks of pain classification and localization in videos. We also empirically evaluate the contributions of different components of MS-MIL. The paper also includes the visualization of discriminative facial patches, important for pain detection, as discovered by

  17. Classification and Weakly Supervised Pain Localization using Multiple Segment Representation.

    Science.gov (United States)

    Sikka, Karan; Dhall, Abhinav; Bartlett, Marian Stewart

    2014-10-01

    Automatic pain recognition from videos is a vital clinical application and, owing to its spontaneous nature, poses interesting challenges to automatic facial expression recognition (AFER) research. Previous pain vs no-pain systems have highlighted two major challenges: (1) ground truth is provided for the sequence, but the presence or absence of the target expression for a given frame is unknown, and (2) the time point and the duration of the pain expression event(s) in each video are unknown. To address these issues we propose a novel framework (referred to as MS-MIL) where each sequence is represented as a bag containing multiple segments, and multiple instance learning (MIL) is employed to handle this weakly labeled data in the form of sequence level ground-truth. These segments are generated via multiple clustering of a sequence or running a multi-scale temporal scanning window, and are represented using a state-of-the-art Bag of Words (BoW) representation. This work extends the idea of detecting facial expressions through 'concept frames' to 'concept segments' and argues through extensive experiments that algorithms such as MIL are needed to reap the benefits of such representation. The key advantages of our approach are: (1) joint detection and localization of painful frames using only sequence-level ground-truth, (2) incorporation of temporal dynamics by representing the data not as individual frames but as segments, and (3) extraction of multiple segments, which is well suited to signals with uncertain temporal location and duration in the video. Extensive experiments on UNBC-McMaster Shoulder Pain dataset highlight the effectiveness of the approach by achieving competitive results on both tasks of pain classification and localization in videos. We also empirically evaluate the contributions of different components of MS-MIL. The paper also includes the visualization of discriminative facial patches, important for pain detection, as discovered by our

  18. Semi-supervised Eigenvectors for Locally-biased Learning

    DEFF Research Database (Denmark)

    Hansen, Toke Jansen; Mahoney, Michael W.

    2012-01-01

    of this sort are particularly challenging for popular eigenvector-based machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities. In this paper, we address this issue by providing a methodology to construct semi-supervised eigenvectors of a graph...

  19. SPATIALLY ADAPTIVE SEMI-SUPERVISED LEARNING WITH GAUSSIAN PROCESSES FOR HYPERSPECTRAL DATA ANALYSIS

    Data.gov (United States)

    National Aeronautics and Space Administration — SPATIALLY ADAPTIVE SEMI-SUPERVISED LEARNING WITH GAUSSIAN PROCESSES FOR HYPERSPECTRAL DATA ANALYSIS GOO JUN * AND JOYDEEP GHOSH* Abstract. A semi-supervised learning...

  20. Combining Unsupervised and Supervised Learning for Discovering Disease Subclasses

    OpenAIRE

    Tucker, A; Bosoni, P; Bellazzi, R.; Nihtyanova, S; Denton, C.

    2016-01-01

    Diseases are often umbrella terms for many subcategories of disease. The identification of these subcategories is vital if we are to develop personalised treatments that are better focussed on individual patients. In this short paper, we explore the use of a combination of unsupervised learning to identify potential subclasses, and supervised learning to build models for better predicting a number of different health outcomes for patients that suffer from systemic sclerosis, a rare chronic co...

  1. A review of supervised machine learning applied to ageing research.

    Science.gov (United States)

    Fabris, Fabio; Magalhães, João Pedro de; Freitas, Alex A

    2017-04-01

    Broadly speaking, supervised machine learning is the computational task of learning correlations between variables in annotated data (the training set), and using this information to create a predictive model capable of inferring annotations for new data, whose annotations are not known. Ageing is a complex process that affects nearly all animal species. This process can be studied at several levels of abstraction, in different organisms and with different objectives in mind. Not surprisingly, the diversity of the supervised machine learning algorithms applied to answer biological questions reflects the complexities of the underlying ageing processes being studied. Many works using supervised machine learning to study the ageing process have been recently published, so it is timely to review these works, to discuss their main findings and weaknesses. In summary, the main findings of the reviewed papers are: the link between specific types of DNA repair and ageing; ageing-related proteins tend to be highly connected and seem to play a central role in molecular pathways; ageing/longevity is linked with autophagy and apoptosis, nutrient receptor genes, and copper and iron ion transport. Additionally, several biomarkers of ageing were found by machine learning. Despite some interesting machine learning results, we also identified a weakness of current works on this topic: only one of the reviewed papers has corroborated the computational results of machine learning algorithms through wet-lab experiments. In conclusion, supervised machine learning has contributed to advance our knowledge and has provided novel insights on ageing, yet future work should have a greater emphasis in validating the predictions.

  2. Effects of coaching supervision, mentoring supervision and abusive supervision on talent development among trainee doctors in public hospitals: moderating role of clinical learning environment.

    Science.gov (United States)

    Subramaniam, Anusuiya; Silong, Abu Daud; Uli, Jegak; Ismail, Ismi Arif

    2015-08-13

    Effective talent development requires robust supervision. However, the effects of supervisory styles (coaching, mentoring and abusive supervision) on talent development and the moderating effects of clinical learning environment in the relationship between supervisory styles and talent development among public hospital trainee doctors have not been thoroughly researched. In this study, we aim to achieve the following, (1) identify the extent to which supervisory styles (coaching, mentoring and abusive supervision) can facilitate talent development among trainee doctors in public hospital and (2) examine whether coaching, mentoring and abusive supervision are moderated by clinical learning environment in predicting talent development among trainee doctors in public hospital. A questionnaire-based critical survey was conducted among trainee doctors undergoing housemanship at six public hospitals in the Klang Valley, Malaysia. Prior permission was obtained from the Ministry of Health Malaysia to conduct the research in the identified public hospitals. The survey yielded 355 responses. The results were analysed using SPSS 20.0 and SEM with AMOS 20.0. The findings of this research indicate that coaching and mentoring supervision are positively associated with talent development, and that there is no significant relationship between abusive supervision and talent development. The findings also support the moderating role of clinical learning environment on the relationships between coaching supervision-talent development, mentoring supervision-talent development and abusive supervision-talent development among public hospital trainee doctors. Overall, the proposed model indicates a 26 % variance in talent development. This study provides an improved understanding on the role of the supervisory styles (coaching and mentoring supervision) on facilitating talent development among public hospital trainee doctors. Furthermore, this study extends the literature to better

  3. Enhancing fieldwork learning using blended learning, GIS and remote supervision

    Science.gov (United States)

    Marra, Wouter A.; Alberti, Koko; Karssenberg, Derek

    2015-04-01

    Fieldwork is an important part of education in geosciences and essential to put theoretical knowledge into an authentic context. Fieldwork as teaching tool can take place in various forms, such as field-tutorial, excursion, or supervised research. Current challenges with fieldwork in education are to incorporate state-of-the art methods for digital data collection, on-site GIS-analysis and providing high-quality feedback to large groups of students in the field. We present a case on first-year earth-sciences fieldwork with approximately 80 students in the French Alps focused on geological and geomorphological mapping. Here, students work in couples and each couple maps their own fieldwork area to reconstruct the formative history. We present several major improvements for this fieldwork using a blended-learning approach, relying on open source software only. An important enhancement to the French Alps fieldwork is improving students' preparation. In a GIS environment, students explore their fieldwork areas using existing remote sensing data, a digital elevation model and derivatives to formulate testable hypotheses before the actual fieldwork. The advantage of this is that the students already know their area when arriving in the field, have started to apply the empirical cycle prior to their field visit, and are therefore eager to investigate their own research questions. During the fieldwork, students store and analyze their field observations in the same GIS environment. This enables them to get a better overview of their own collected data, and to integrate existing data sources also used in the preparation phase. This results in a quicker and enhanced understanding by the students. To enable remote access to observational data collected by students, the students synchronize their data daily with a webserver running a web map application. Supervisors can review students' progress remotely, examine and evaluate their observations in a GIS, and provide

  4. Benchmarking protein classification algorithms via supervised cross-validation

    NARCIS (Netherlands)

    Kertész-Farkas, A.; Dhir, S.; Sonego, P.; Pacurar, M.; Netoteia, S.; Nijveen, H.; Kuzniar, A.; Leunissen, J.A.M.; Kocsor, A.; Pongor, S.

    2008-01-01

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-o

  5. Supervised classification of solar features using prior information

    Directory of Open Access Journals (Sweden)

    De Visscher Ruben

    2015-01-01

    Full Text Available Context: The Sun as seen by Extreme Ultraviolet (EUV telescopes exhibits a variety of large-scale structures. Of particular interest for space-weather applications is the extraction of active regions (AR and coronal holes (CH. The next generation of GOES-R satellites will provide continuous monitoring of the solar corona in six EUV bandpasses that are similar to the ones provided by the SDO-AIA EUV telescope since May 2010. Supervised segmentations of EUV images that are consistent with manual segmentations by for example space-weather forecasters help in extracting useful information from the raw data. Aims: We present a supervised segmentation method that is based on the Maximum A Posteriori rule. Our method allows integrating both manually segmented images as well as other type of information. It is applied on SDO-AIA images to segment them into AR, CH, and the remaining Quiet Sun (QS part. Methods: A Bayesian classifier is applied on training masks provided by the user. The noise structure in EUV images is non-trivial, and this suggests the use of a non-parametric kernel density estimator to fit the intensity distribution within each class. Under the Naive Bayes assumption we can add information such as latitude distribution and total coverage of each class in a consistent manner. Those information can be prescribed by an expert or estimated with an Expectation-Maximization algorithm. Results: The segmentation masks are in line with the training masks given as input and show consistency over time. Introduction of additional information besides pixel intensity improves upon the quality of the final segmentation. Conclusions: Such a tool can aid in building automated segmentations that are consistent with some ground truth’ defined by the users.

  6. Gaia eclipsing binary and multiple systems. Supervised classification and self-organizing maps

    Science.gov (United States)

    Süveges, M.; Barblan, F.; Lecoeur-Taïbi, I.; Prša, A.; Holl, B.; Eyer, L.; Kochoska, A.; Mowlavi, N.; Rimoldini, L.

    2017-07-01

    Context. Large surveys producing tera- and petabyte-scale databases require machine-learning and knowledge discovery methods to deal with the overwhelming quantity of data and the difficulties of extracting concise, meaningful information with reliable assessment of its uncertainty. This study investigates the potential of a few machine-learning methods for the automated analysis of eclipsing binaries in the data of such surveys. Aims: We aim to aid the extraction of samples of eclipsing binaries from such databases and to provide basic information about the objects. We intend to estimate class labels according to two different, well-known classification systems, one based on the light curve morphology (EA/EB/EW classes) and the other based on the physical characteristics of the binary system (system morphology classes; detached through overcontact systems). Furthermore, we explore low-dimensional surfaces along which the light curves of eclipsing binaries are concentrated, and consider their use in the characterization of the binary systems and in the exploration of biases of the full unknown Gaia data with respect to the training sets. Methods: We have explored the performance of principal component analysis (PCA), linear discriminant analysis (LDA), Random Forest classification and self-organizing maps (SOM) for the above aims. We pre-processed the photometric time series by combining a double Gaussian profile fit and a constrained smoothing spline, in order to de-noise and interpolate the observed light curves. We achieved further denoising, and selected the most important variability elements from the light curves using PCA. Supervised classification was performed using Random Forest and LDA based on the PC decomposition, while SOM gives a continuous 2-dimensional manifold of the light curves arranged by a few important features. We estimated the uncertainty of the supervised methods due to the specific finite training set using ensembles of models constructed

  7. Action information from classification learning.

    Science.gov (United States)

    Ross, Brian H; Wang, Ranxiao Frances; Kramer, Arthur F; Simons, Daniel J; Crowell, James A

    2007-06-01

    Much of our learning comes from interacting with objects. Two experiments investigated whether or not arbitrary actions used during category learning with objects might be incorporated into object representations and influence later recognition judgments. In a virtual-reality chamber, participants used distinct arm movements to make different classification responses. During a recognition test phase, these same objects required arm movements that were consistent or inconsistent with the classification movement. In both experiments, consistent movements were facilitated relative to inconsistent movements, suggesting that arbitrary action information is incorporated into the representations.

  8. Facial nerve image enhancement from CBCT using supervised learning technique.

    Science.gov (United States)

    Ping Lu; Barazzetti, Livia; Chandran, Vimal; Gavaghan, Kate; Weber, Stefan; Gerber, Nicolas; Reyes, Mauricio

    2015-08-01

    Facial nerve segmentation plays an important role in surgical planning of cochlear implantation. Clinically available CBCT images are used for surgical planning. However, its relatively low resolution renders the identification of the facial nerve difficult. In this work, we present a supervised learning approach to enhance facial nerve image information from CBCT. A supervised learning approach based on multi-output random forest was employed to learn the mapping between CBCT and micro-CT images. Evaluation was performed qualitatively and quantitatively by using the predicted image as input for a previously published dedicated facial nerve segmentation, and cochlear implantation surgical planning software, OtoPlan. Results show the potential of the proposed approach to improve facial nerve image quality as imaged by CBCT and to leverage its segmentation using OtoPlan.

  9. Identification of Village Building via Google Earth Images and Supervised Machine Learning Methods

    Directory of Open Access Journals (Sweden)

    Zhiling Guo

    2016-03-01

    Full Text Available In this study, a method based on supervised machine learning is proposed to identify village buildings from open high-resolution remote sensing images. We select Google Earth (GE RGB images to perform the classification in order to examine its suitability for village mapping, and investigate the feasibility of using machine learning methods to provide automatic classification in such fields. By analyzing the characteristics of GE images, we design different features on the basis of two kinds of supervised machine learning methods for classification: adaptive boosting (AdaBoost and convolutional neural networks (CNN. To recognize village buildings via their color and texture information, the RGB color features and a large number of Haar-like features in a local window are utilized in the AdaBoost method; with multilayer trained networks based on gradient descent algorithms and back propagation, CNN perform the identification by mining deeper information from buildings and their neighborhood. Experimental results from the testing area at Savannakhet province in Laos show that our proposed AdaBoost method achieves an overall accuracy of 96.22% and the CNN method is also competitive with an overall accuracy of 96.30%.

  10. CLASSIFICATION OF LEARNING MANAGEMENT SYSTEMS

    Directory of Open Access Journals (Sweden)

    Yu. B. Popova

    2016-01-01

    Full Text Available Using of information technologies and, in particular, learning management systems, increases opportunities of teachers and students in reaching their goals in education. Such systems provide learning content, help organize and monitor training, collect progress statistics and take into account the individual characteristics of each user. Currently, there is a huge inventory of both paid and free systems are physically located both on college servers and in the cloud, offering different features sets of different licensing scheme and the cost. This creates the problem of choosing the best system. This problem is partly due to the lack of comprehensive classification of such systems. Analysis of more than 30 of the most common now automated learning management systems has shown that a classification of such systems should be carried out according to certain criteria, under which the same type of system can be considered. As classification features offered by the author are: cost, functionality, modularity, keeping the customer’s requirements, the integration of content, the physical location of a system, adaptability training. Considering the learning management system within these classifications and taking into account the current trends of their development, it is possible to identify the main requirements to them: functionality, reliability, ease of use, low cost, support for SCORM standard or Tin Can API, modularity and adaptability. According to the requirements at the Software Department of FITR BNTU under the guidance of the author since 2009 take place the development, the use and continuous improvement of their own learning management system.

  11. Automated labeling of cancer textures in larynx histopathology slides using quasi-supervised learning.

    Science.gov (United States)

    Onder, Devrim; Sarioglu, Sulen; Karacali, Bilge

    2014-12-01

    To evaluate the performance of a quasi-supervised statistical learning algorithm, operating on datasets having normal and neoplastic tissues, to identify larynx squamous cell carcinomas. Furthermore, cancer texture separability measures against normal tissues are to be developed and compared either for colorectal or larynx tissues. Light microscopic digital images from histopathological sections were obtained from laryngectomy materials including squamous cell carcinoma and nonneoplastic regions. The texture features were calculated by using co-occurrence matrices and local histograms. The texture features were input to the quasi-supervised learning algorithm. Larynx regions containing squamous cell carcinomas were accurately identified, having false and true positive rates up to 21% and 87%, respectively. Larynx squamous cell carcinoma versus normal tissue texture separability measures were higher than colorectal adenocarcinoma versus normal textures for the colorectal database. Furthermore, the resultant labeling performances for all larynx datasets are higher than or equal to that of colorectal datasets. The results in larynx datasets, in comparison with the former colorectal study, suggested that quasi-supervised texture classification is to be a helpful method in histopathological image classification and analysis.

  12. Semi-supervised manifold learning with affinity regularization for Alzheimer's disease identification using positron emission tomography imaging.

    Science.gov (United States)

    Lu, Shen; Xia, Yong; Cai, Tom Weidong; Feng, David Dagan

    2015-01-01

    Dementia, Alzheimer's disease (AD) in particular is a global problem and big threat to the aging population. An image based computer-aided dementia diagnosis method is needed to providing doctors help during medical image examination. Many machine learning based dementia classification methods using medical imaging have been proposed and most of them achieve accurate results. However, most of these methods make use of supervised learning requiring fully labeled image dataset, which usually is not practical in real clinical environment. Using large amount of unlabeled images can improve the dementia classification performance. In this study we propose a new semi-supervised dementia classification method based on random manifold learning with affinity regularization. Three groups of spatial features are extracted from positron emission tomography (PET) images to construct an unsupervised random forest which is then used to regularize the manifold learning objective function. The proposed method, stat-of-the-art Laplacian support vector machine (LapSVM) and supervised SVM are applied to classify AD and normal controls (NC). The experiment results show that learning with unlabeled images indeed improves the classification performance. And our method outperforms LapSVM on the same dataset.

  13. Modeling Time Series Data for Supervised Learning

    Science.gov (United States)

    Baydogan, Mustafa Gokce

    2012-01-01

    Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning…

  14. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species

    Directory of Open Access Journals (Sweden)

    Deborah Galpert

    2015-01-01

    Full Text Available Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification.

  15. Semi-supervised classification of emotional pictures based on feature combination

    Science.gov (United States)

    Li, Shuo; Zhang, Yu-Jin

    2011-02-01

    Can the abundant emotions reflected in pictures be classified automatically by computer? Only the visual features extracted from images are considered in the previous researches, which have the constrained capability to reveal various emotions. In addition, the training database utilized by previous methods is the subset of International Affective Picture System (IAPS) that has a relatively small scale, which exerts negative effects on the discrimination of emotion classifiers. To solve the above problems, this paper proposes a novel and practical emotional picture classification approach, using semi-supervised learning scheme with both visual feature and keyword tag information. Besides the IAPS with both emotion labels and keyword tags as part of the training dataset, nearly 2000 pictures with only keyword tags that are downloaded from the website Flickr form an auxiliary training dataset. The visual feature of the latent emotional semantic factors is extracted by probabilistic Latent Semantic Analysis (pLSA) model, while the text feature is described by binary vectors on the tag vocabulary. A first Linear Programming Boost (LPBoost) classifier which is trained on the samples from IAPS combines the above two features, and aims to label the other training samples from the internet. Then the second SVM classifier which is trained on all training images using only visual feature, focuses on the test images. In the experiment, the categorization performance of our approach is better than the latest methods.

  16. A Hybrid Ensemble Learning Approach to Star-Galaxy Classification

    CERN Document Server

    Kim, Edward J; Kind, Matias Carrasco

    2015-01-01

    There exist a variety of star-galaxy classification techniques, each with their own strengths and weaknesses. In this paper, we present a novel meta-classification framework that combines and fully exploits different techniques to produce a more robust star-galaxy classification. To demonstrate this hybrid, ensemble approach, we combine a purely morphological classifier, a supervised machine learning method based on random forest, an unsupervised machine learning method based on self-organizing maps, and a hierarchical Bayesian template fitting method. Using data from the CFHTLenS survey, we consider different scenarios: when a high-quality training set is available with spectroscopic labels from DEEP2, SDSS, VIPERS, and VVDS, and when the demographics of sources in a low-quality training set do not match the demographics of objects in the test data set. We demonstrate that our Bayesian combination technique improves the overall performance over any individual classification method in these scenarios. Thus, s...

  17. Supervised Learning with Complex-valued Neural Networks

    CERN Document Server

    Suresh, Sundaram; Savitha, Ramasamy

    2013-01-01

    Recent advancements in the field of telecommunications, medical imaging and signal processing deal with signals that are inherently time varying, nonlinear and complex-valued. The time varying, nonlinear characteristics of these signals can be effectively analyzed using artificial neural networks.  Furthermore, to efficiently preserve the physical characteristics of these complex-valued signals, it is important to develop complex-valued neural networks and derive their learning algorithms to represent these signals at every step of the learning process. This monograph comprises a collection of new supervised learning algorithms along with novel architectures for complex-valued neural networks. The concepts of meta-cognition equipped with a self-regulated learning have been known to be the best human learning strategy. In this monograph, the principles of meta-cognition have been introduced for complex-valued neural networks in both the batch and sequential learning modes. For applications where the computati...

  18. Enhanced low-rank representation via sparse manifold adaption for semi-supervised learning.

    Science.gov (United States)

    Peng, Yong; Lu, Bao-Liang; Wang, Suhang

    2015-05-01

    Constructing an informative and discriminative graph plays an important role in various pattern recognition tasks such as clustering and classification. Among the existing graph-based learning models, low-rank representation (LRR) is a very competitive one, which has been extensively employed in spectral clustering and semi-supervised learning (SSL). In SSL, the graph is composed of both labeled and unlabeled samples, where the edge weights are calculated based on the LRR coefficients. However, most of existing LRR related approaches fail to consider the geometrical structure of data, which has been shown beneficial for discriminative tasks. In this paper, we propose an enhanced LRR via sparse manifold adaption, termed manifold low-rank representation (MLRR), to learn low-rank data representation. MLRR can explicitly take the data local manifold structure into consideration, which can be identified by the geometric sparsity idea; specifically, the local tangent space of each data point was sought by solving a sparse representation objective. Therefore, the graph to depict the relationship of data points can be built once the manifold information is obtained. We incorporate a regularizer into LRR to make the learned coefficients preserve the geometric constraints revealed in the data space. As a result, MLRR combines both the global information emphasized by low-rank property and the local information emphasized by the identified manifold structure. Extensive experimental results on semi-supervised classification tasks demonstrate that MLRR is an excellent method in comparison with several state-of-the-art graph construction approaches.

  19. Using Supervised Deep Learning for Human Age Estimation Problem

    Science.gov (United States)

    Drobnyh, K. A.; Polovinkin, A. N.

    2017-05-01

    Automatic facial age estimation is a challenging task upcoming in recent years. In this paper, we propose using the supervised deep learning features to improve an accuracy of the existing age estimation algorithms. There are many approaches solving the problem, an active appearance model and the bio-inspired features are two of them which showed the best accuracy. For experiments we chose popular publicly available FG-NET database, which contains 1002 images with a broad variety of light, pose, and expression. LOPO (leave-one-person-out) method was used to estimate the accuracy. Experiments demonstrated that adding supervised deep learning features has improved accuracy for some basic models. For example, adding the features to an active appearance model gave the 4% gain (the error decreased from 4.59 to 4.41).

  20. Semi-supervised Learning with Density Based Distances

    CERN Document Server

    Bijral, Avleen S; Srebro, Nathan

    2012-01-01

    We present a simple, yet effective, approach to Semi-Supervised Learning. Our approach is based on estimating density-based distances (DBD) using a shortest path calculation on a graph. These Graph-DBD estimates can then be used in any distance-based supervised learning method, such as Nearest Neighbor methods and SVMs with RBF kernels. In order to apply the method to very large data sets, we also present a novel algorithm which integrates nearest neighbor computations into the shortest path search and can find exact shortest paths even in extremely large dense graphs. Significant runtime improvement over the commonly used Laplacian regularization method is then shown on a large scale dataset.

  1. Very Short Literature Survey From Supervised Learning To Surrogate Modeling

    CERN Document Server

    Brusan, Altay

    2012-01-01

    The past century was era of linear systems. Either systems (especially industrial ones) were simple (quasi)linear or linear approximations were accurate enough. In addition, just at the ending decades of the century profusion of computing devices were available, before then due to lack of computational resources it was not easy to evaluate available nonlinear system studies. At the moment both these two conditions changed, systems are highly complex and also pervasive amount of computation strength is cheap and easy to achieve. For recent era, a new branch of supervised learning well known as surrogate modeling (meta-modeling, surface modeling) has been devised which aimed at answering new needs of modeling realm. This short literature survey is on to introduce surrogate modeling to whom is familiar with the concepts of supervised learning. Necessity, challenges and visions of the topic are considered.

  2. Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds

    Science.gov (United States)

    Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

    2017-01-01

    Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.

  3. Deep learning classification in asteroseismology

    Science.gov (United States)

    Hon, Marc; Stello, Dennis; Yu, Jie

    2017-08-01

    In the power spectra of oscillating red giants, there are visually distinct features defining stars ascending the red giant branch from those that have commenced helium core burning. We train a 1D convolutional neural network by supervised learning to automatically learn these visual features from images of folded oscillation spectra. By training and testing on Kepler red giants, we achieve an accuracy of up to 99 per cent in separating helium-burning red giants from those ascending the red giant branch. The convolutional neural network additionally shows capability in accurately predicting the evolutionary states of 5379 previously unclassified Kepler red giants, by which we now have greatly increased the number of classified stars.

  4. Automatic Building Detection based on Supervised Classification using High Resolution Google Earth Images

    OpenAIRE

    Ghaffarian, S.

    2014-01-01

    This paper presents a novel approach to detect the buildings by automization of the training area collecting stage for supervised classification. The method based on the fact that a 3d building structure should cast a shadow under suitable imaging conditions. Therefore, the methodology begins with the detection and masking out the shadow areas using luminance component of the LAB color space, which indicates the lightness of the image, and a novel double thresholding technique. Furth...

  5. Classification of multiple sclerosis lesions using adaptive dictionary learning.

    Science.gov (United States)

    Deshpande, Hrishikesh; Maurel, Pierre; Barillot, Christian

    2015-12-01

    This paper presents a sparse representation and an adaptive dictionary learning based method for automated classification of multiple sclerosis (MS) lesions in magnetic resonance (MR) images. Manual delineation of MS lesions is a time-consuming task, requiring neuroradiology experts to analyze huge volume of MR data. This, in addition to the high intra- and inter-observer variability necessitates the requirement of automated MS lesion classification methods. Among many image representation models and classification methods that can be used for such purpose, we investigate the use of sparse modeling. In the recent years, sparse representation has evolved as a tool in modeling data using a few basis elements of an over-complete dictionary and has found applications in many image processing tasks including classification. We propose a supervised classification approach by learning dictionaries specific to the lesions and individual healthy brain tissues, which include white matter (WM), gray matter (GM) and cerebrospinal fluid (CSF). The size of the dictionaries learned for each class plays a major role in data representation but it is an even more crucial element in the case of competitive classification. Our approach adapts the size of the dictionary for each class, depending on the complexity of the underlying data. The algorithm is validated using 52 multi-sequence MR images acquired from 13 MS patients. The results demonstrate the effectiveness of our approach in MS lesion classification.

  6. Recent advances on techniques and theories of feedforward networks with supervised learning

    Science.gov (United States)

    Xu, Lei; Klasa, Stan

    1992-07-01

    The rediscovery and popularization of the back propagation training technique for multilayer perceptrons as well as the invention of the Boltzmann Machine learning algorithm has given a new boost to the study of supervised learning networks. In recent years, besides the widely spread applications and the various further improvements of the classical back propagation technique, many new supervised learning models, techniques as well as theories, have also been proposed in a vast number of publications. This paper tries to give a rather systematical review on the recent advances on supervised learning techniques and theories for static feedforward networks. We summarize a great number of developments into four aspects: (1) Various improvements and variants made on the classical back propagation techniques for multilayer (static) perceptron nets, for speeding up training, avoiding local minima, increasing the generalization ability, as well as for many other interesting purposes. (2) A number of other learning methods for training multilayer (static) perceptron, such as derivative estimation by perturbation, direct weight update by perturbation, genetic algorithms, recursive least square estimate and extended Kalman filter, linear programming, the policy of fixing one layer while updating another, constructing networks by converting decision tree classifiers, and others. (3) Various other feedforward models which are also able to implement function approximation, probability density estimation and classification, including various models of basis function expansion (e.g., radial basis functions, restricted coulomb energy, multivariate adaptive regression splines, trigonometric and polynomial bases, projection pursuit, basis function tree, and may others), and several other supervised learning models. (4) Models with complex structures, e.g., modular architecture, hierarchy architecture, and others. (5) A number of theoretical issues involving the universal

  7. Baccalaureate nursing students' perceptions of learning and supervision in the clinical environment.

    Science.gov (United States)

    Dimitriadou, Maria; Papastavrou, Evridiki; Efstathiou, Georgios; Theodorou, Mamas

    2015-06-01

    This study is an exploration of nursing students' experiences within the clinical learning environment (CLE) and supervision provided in hospital settings. A total of 357 second-year nurse students from all universities in Cyprus participated in the study. Data were collected using the Clinical Learning Environment, Supervision and Nurse Teacher instrument. The dimension "supervisory relationship (mentor)", as well as the frequency of individualized supervision meetings, were found to be important variables in the students' clinical learning. However, no statistically-significant connection was established between successful mentor relationship and team supervision. The majority of students valued their mentor's supervision more highly than a nurse teacher's supervision toward the fulfillment of learning outcomes. The dimensions "premises of nursing care" and "premises of learning" were highly correlated, indicating that a key component of a quality clinical learning environment is the quality of care delivered. The results suggest the need to modify educational strategies that foster desirable learning for students in response to workplace demands.

  8. Function approximation using combined unsupervised and supervised learning.

    Science.gov (United States)

    Andras, Peter

    2014-03-01

    Function approximation is one of the core tasks that are solved using neural networks in the context of many engineering problems. However, good approximation results need good sampling of the data space, which usually requires exponentially increasing volume of data as the dimensionality of the data increases. At the same time, often the high-dimensional data is arranged around a much lower dimensional manifold. Here we propose the breaking of the function approximation task for high-dimensional data into two steps: (1) the mapping of the high-dimensional data onto a lower dimensional space corresponding to the manifold on which the data resides and (2) the approximation of the function using the mapped lower dimensional data. We use over-complete self-organizing maps (SOMs) for the mapping through unsupervised learning, and single hidden layer neural networks for the function approximation through supervised learning. We also extend the two-step procedure by considering support vector machines and Bayesian SOMs for the determination of the best parameters for the nonlinear neurons in the hidden layer of the neural networks used for the function approximation. We compare the approximation performance of the proposed neural networks using a set of functions and show that indeed the neural networks using combined unsupervised and supervised learning outperform in most cases the neural networks that learn the function approximation using the original high-dimensional data.

  9. Classification of damage in structural systems using time series analysis and supervised and unsupervised pattern recognition techniques

    Science.gov (United States)

    Omenzetter, Piotr; de Lautour, Oliver R.

    2010-04-01

    Developed for studying long, periodic records of various measured quantities, time series analysis methods are inherently suited and offer interesting possibilities for Structural Health Monitoring (SHM) applications. However, their use in SHM can still be regarded as an emerging application and deserves more studies. In this research, Autoregressive (AR) models were used to fit experimental acceleration time histories from two experimental structural systems, a 3- storey bookshelf-type laboratory structure and the ASCE Phase II SHM Benchmark Structure, in healthy and several damaged states. The coefficients of the AR models were chosen as damage sensitive features. Preliminary visual inspection of the large, multidimensional sets of AR coefficients to check the presence of clusters corresponding to different damage severities was achieved using Sammon mapping - an efficient nonlinear data compression technique. Systematic classification of damage into states based on the analysis of the AR coefficients was achieved using two supervised classification techniques: Nearest Neighbor Classification (NNC) and Learning Vector Quantization (LVQ), and one unsupervised technique: Self-organizing Maps (SOM). This paper discusses the performance of AR coefficients as damage sensitive features and compares the efficiency of the three classification techniques using experimental data.

  10. Restricted Boltzmann machines based oversampling and semi-supervised learning for false positive reduction in breast CAD.

    Science.gov (United States)

    Cao, Peng; Liu, Xiaoli; Bao, Hang; Yang, Jinzhu; Zhao, Dazhe

    2015-01-01

    The false-positive reduction (FPR) is a crucial step in the computer aided detection system for the breast. The issues of imbalanced data distribution and the limitation of labeled samples complicate the classification procedure. To overcome these challenges, we propose oversampling and semi-supervised learning methods based on the restricted Boltzmann machines (RBMs) to solve the classification of imbalanced data with a few labeled samples. To evaluate the proposed method, we conducted a comprehensive performance study and compared its results with the commonly used techniques. Experiments on benchmark dataset of DDSM demonstrate the effectiveness of the RBMs based oversampling and semi-supervised learning method in terms of geometric mean (G-mean) for false positive reduction in Breast CAD.

  11. Exploiting Attribute Correlations: A Novel Trace Lasso-Based Weakly Supervised Dictionary Learning Method.

    Science.gov (United States)

    Wu, Lin; Wang, Yang; Pan, Shirui

    2016-10-04

    It is now well established that sparse representation models are working effectively for many visual recognition tasks, and have pushed forward the success of dictionary learning therein. Recent studies over dictionary learning focus on learning discriminative atoms instead of purely reconstructive ones. However, the existence of intraclass diversities (i.e., data objects within the same category but exhibit large visual dissimilarities), and interclass similarities (i.e., data objects from distinct classes but share much visual similarities), makes it challenging to learn effective recognition models. To this end, a large number of labeled data objects are required to learn models which can effectively characterize these subtle differences. However, labeled data objects are always limited to access, committing it difficult to learn a monolithic dictionary that can be discriminative enough. To address the above limitations, in this paper, we propose a weakly-supervised dictionary learning method to automatically learn a discriminative dictionary by fully exploiting visual attribute correlations rather than label priors. In particular, the intrinsic attribute correlations are deployed as a critical cue to guide the process of object categorization, and then a set of subdictionaries are jointly learned with respect to each category. The resulting dictionary is highly discriminative and leads to intraclass diversity aware sparse representations. Extensive experiments on image classification and object recognition are conducted to show the effectiveness of our approach.

  12. Robust head pose estimation via supervised manifold learning.

    Science.gov (United States)

    Wang, Chao; Song, Xubo

    2014-05-01

    Head poses can be automatically estimated using manifold learning algorithms, with the assumption that with the pose being the only variable, the face images should lie in a smooth and low-dimensional manifold. However, this estimation approach is challenging due to other appearance variations related to identity, head location in image, background clutter, facial expression, and illumination. To address the problem, we propose to incorporate supervised information (pose angles of training samples) into the process of manifold learning. The process has three stages: neighborhood construction, graph weight computation and projection learning. For the first two stages, we redefine inter-point distance for neighborhood construction as well as graph weight by constraining them with the pose angle information. For Stage 3, we present a supervised neighborhood-based linear feature transformation algorithm to keep the data points with similar pose angles close together but the data points with dissimilar pose angles far apart. The experimental results show that our method has higher estimation accuracy than the other state-of-art algorithms and is robust to identity and illumination variations.

  13. Galaxy Classifications with Deep Learning

    Science.gov (United States)

    Lukic, Vesna; Brüggen, Marcus

    2017-06-01

    Machine learning techniques have proven to be increasingly useful in astronomical applications over the last few years, for example in object classification, estimating redshifts and data mining. One example of object classification is classifying galaxy morphology. This is a tedious task to do manually, especially as the datasets become larger with surveys that have a broader and deeper search-space. The Kaggle Galaxy Zoo competition presented the challenge of writing an algorithm to find the probability that a galaxy belongs in a particular class, based on SDSS optical spectroscopy data. The use of convolutional neural networks (convnets), proved to be a popular solution to the problem, as they have also produced unprecedented classification accuracies in other image databases such as the database of handwritten digits (MNIST †) and large database of images (CIFAR ‡). We experiment with the convnets that comprised the winning solution, but using broad classifications. The effect of changing the number of layers is explored, as well as using a different activation function, to help in developing an intuition of how the networks function and to see how they can be applied to radio galaxy images.

  14. Generalization of Supervised Learning for Binary Mask Estimation

    DEFF Research Database (Denmark)

    May, Tobias; Gerkmann, Timo

    2014-01-01

    This paper addresses the problem of speech segregation by es- timating the ideal binary mask (IBM) from noisy speech. Two methods will be compared, one supervised learning approach that incorporates a priori knowledge about the feature distri- bution observed during training. The second method...... solely relies on a frame-based speech presence probability (SPP) es- timation, and therefore, does not depend on the acoustic con- dition seen during training. We investigate the influence of mismatches between the acoustic conditions used for training and testing on the IBM estimation performance...

  15. Automatic Building Detection based on Supervised Classification using High Resolution Google Earth Images

    Science.gov (United States)

    Ghaffarian, S.; Ghaffarian, S.

    2014-08-01

    This paper presents a novel approach to detect the buildings by automization of the training area collecting stage for supervised classification. The method based on the fact that a 3d building structure should cast a shadow under suitable imaging conditions. Therefore, the methodology begins with the detection and masking out the shadow areas using luminance component of the LAB color space, which indicates the lightness of the image, and a novel double thresholding technique. Further, the training areas for supervised classification are selected by automatically determining a buffer zone on each building whose shadow is detected by using the shadow shape and the sun illumination direction. Thereafter, by calculating the statistic values of each buffer zone which is collected from the building areas the Improved Parallelepiped Supervised Classification is executed to detect the buildings. Standard deviation thresholding applied to the Parallelepiped classification method to improve its accuracy. Finally, simple morphological operations conducted for releasing the noises and increasing the accuracy of the results. The experiments were performed on set of high resolution Google Earth images. The performance of the proposed approach was assessed by comparing the results of the proposed approach with the reference data by using well-known quality measurements (Precision, Recall and F1-score) to evaluate the pixel-based and object-based performances of the proposed approach. Evaluation of the results illustrates that buildings detected from dense and suburban districts with divers characteristics and color combinations using our proposed method have 88.4 % and 853 % overall pixel-based and object-based precision performances, respectively.

  16. Learning outcomes using video in supervision and peer feedback during clinical skills training

    DEFF Research Database (Denmark)

    Lauridsen, Henrik Hein; Toftgård, Rie Castella; Nørgaard, Cita

    supervision of clinical skills (formative assessment). Demonstrations of these principles will be presented as video podcasts during the session. The learning outcomes of video supervision and peer-feedback were assessed in an online questionnaire survey. Results Results of the supervision showed large self...

  17. Detection and Evaluation of Cheating on College Exams using Supervised Classification

    Directory of Open Access Journals (Sweden)

    Elmano Ramalho CAVALCANTI

    2012-10-01

    Full Text Available Text mining has been used for various purposes, such as document classification and extraction of domain-specific information from text. In this paper we present a study in which text mining methodology and algorithms were properly employed for academic dishonesty (cheating detection and evaluation on open-ended college exams, based on document classification techniques. Firstly, we propose two classification models for cheating detection by using a decision tree supervised algorithm. Then, both classifiers are compared against the result produced by a domain expert. The results point out that one of the classifiers achieved an excellent quality in detecting and evaluating cheating in exams, making possible its use in real school and college environments.

  18. Extreme learning machine and adaptive sparse representation for image classification.

    Science.gov (United States)

    Cao, Jiuwen; Zhang, Kai; Luo, Minxia; Yin, Chun; Lai, Xiaoping

    2016-09-01

    Recent research has shown the speed advantage of extreme learning machine (ELM) and the accuracy advantage of sparse representation classification (SRC) in the area of image classification. Those two methods, however, have their respective drawbacks, e.g., in general, ELM is known to be less robust to noise while SRC is known to be time-consuming. Consequently, ELM and SRC complement each other in computational complexity and classification accuracy. In order to unify such mutual complementarity and thus further enhance the classification performance, we propose an efficient hybrid classifier to exploit the advantages of ELM and SRC in this paper. More precisely, the proposed classifier consists of two stages: first, an ELM network is trained by supervised learning. Second, a discriminative criterion about the reliability of the obtained ELM output is adopted to decide whether the query image can be correctly classified or not. If the output is reliable, the classification will be performed by ELM; otherwise the query image will be fed to SRC. Meanwhile, in the stage of SRC, a sub-dictionary that is adaptive to the query image instead of the entire dictionary is extracted via the ELM output. The computational burden of SRC thus can be reduced. Extensive experiments on handwritten digit classification, landmark recognition and face recognition demonstrate that the proposed hybrid classifier outperforms ELM and SRC in classification accuracy with outstanding computational efficiency. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. ZeitZeiger: supervised learning for high-dimensional data from an oscillatory system

    National Research Council Canada - National Science Library

    Hughey, Jacob J; Hastie, Trevor; Butte, Atul J

    2016-01-01

    Numerous biological systems oscillate over time or space. Despite these oscillators' importance, data from an oscillatory system is problematic for existing methods of regularized supervised learning...

  20. Redesigning Language Learning Strategy Classifications

    Directory of Open Access Journals (Sweden)

    Ag. Bambang Setiyadi

    2004-01-01

    Full Text Available In the current study a total of 79 university students of a 3-month English course participated. This study attempted to explore what learning strategies language Indonesian learners used and how the strategies were classified. To increase the internal consistency of the hypotesized scales, Cronbach Alpha coefficients of internal consistency were computed for each scale of skill-based areas, namely: speaking, listening, reading and writing. Correlation analysis was also conducted to see how variance of speaking, listening, reading and writing in language learning strategy questionnare were correlated. The result shows that each skill-based scale has realatively high reliability with alpha .73, .67, .69, .80 for listening, speaking, reading, and writing respectively. It is also found out that the four scales are significantly and positively correlated. The classification of learning strategies based on the language skills is a new way of learning strategy measurement, which may be worth considering in the Indonesian context in which English is learned as a foreign language.

  1. Myths and legends in learning classification rules

    Science.gov (United States)

    Buntine, Wray

    1990-01-01

    This paper is a discussion of machine learning theory on empirically learning classification rules. The paper proposes six myths in the machine learning community that address issues of bias, learning as search, computational learning theory, Occam's razor, 'universal' learning algorithms, and interactive learnings. Some of the problems raised are also addressed from a Bayesian perspective. The paper concludes by suggesting questions that machine learning researchers should be addressing both theoretically and experimentally.

  2. Path Control Experiment of Mobile Robot Based on Supervised Learning

    Directory of Open Access Journals (Sweden)

    Gao Chi

    2013-07-01

    Full Text Available To solve the weak capacity and low control accuracy of the robots which adapt to the complex working conditions, proposed that a path control method based on the driving experience and supervised learning. According to the slope road geometry characteristics, established the modeling study due to ramp pavement path control method and the control structure based on monitoring and self-learning. Made use of the Global Navigation Satellite System did the experiment. The test data illustrates that when the running speed is not greater than 5 m / s, the straight-line trajectory path transverse vertical deviation within 士20cm ,which proved that the control method has a high feasibility. 

  3. SUPERVISED LEARNING METHODS FOR BANGLA WEB DOCUMENT CATEGORIZATION

    Directory of Open Access Journals (Sweden)

    Ashis Kumar Mandal

    2014-09-01

    Full Text Available This paper explores the use of machine learning approaches, or more specifically, four supervised learning Methods, namely Decision Tree(C 4.5, K-Nearest Neighbour (KNN, Naïve Bays (NB, and Support Vector Machine (SVM for categorization of Bangla web documents. This is a task of automatically sorting a set of documents into categories from a predefined set. Whereas a wide range of methods have been applied to English text categorization, relatively few studies have been conducted on Bangla language text categorization. Hence, we attempt to analyze the efficiency of those four methods for categorization of Bangla documents. In order to validate, Bangla corpus from various websites has been developed and used as examples for the experiment. For Bangla, empirical results support that all four methods produce satisfactory performance with SVM attaining good result in terms of high dimensional and relatively noisy document feature vectors.

  4. Mining visual collocation patterns via self-supervised subspace learning.

    Science.gov (United States)

    Yuan, Junsong; Wu, Ying

    2012-04-01

    Traditional text data mining techniques are not directly applicable to image data which contain spatial information and are characterized by high-dimensional visual features. It is not a trivial task to discover meaningful visual patterns from images because the content variations and spatial dependence in visual data greatly challenge most existing data mining methods. This paper presents a novel approach to coping with these difficulties for mining visual collocation patterns. Specifically, the novelty of this work lies in the following new contributions: 1) a principled solution to the discovery of visual collocation patterns based on frequent itemset mining and 2) a self-supervised subspace learning method to refine the visual codebook by feeding back discovered patterns via subspace learning. The experimental results show that our method can discover semantically meaningful patterns efficiently and effectively.

  5. Prototype Vector Machine for Large Scale Semi-Supervised Learning

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Kai; Kwok, James T.; Parvin, Bahram

    2009-04-29

    Practicaldataminingrarelyfalls exactlyinto the supervisedlearning scenario. Rather, the growing amount of unlabeled data poses a big challenge to large-scale semi-supervised learning (SSL). We note that the computationalintensivenessofgraph-based SSLarises largely from the manifold or graph regularization, which in turn lead to large models that are dificult to handle. To alleviate this, we proposed the prototype vector machine (PVM), a highlyscalable,graph-based algorithm for large-scale SSL. Our key innovation is the use of"prototypes vectors" for effcient approximation on both the graph-based regularizer and model representation. The choice of prototypes are grounded upon two important criteria: they not only perform effective low-rank approximation of the kernel matrix, but also span a model suffering the minimum information loss compared with the complete model. We demonstrate encouraging performance and appealing scaling properties of the PVM on a number of machine learning benchmark data sets.

  6. Multicultural supervision: lessons learned about an ongoing struggle.

    Science.gov (United States)

    Christiansen, Abigail Tolhurst; Thomas, Volker; Kafescioglu, Nilufer; Karakurt, Gunnur; Lowe, Walter; Smith, William; Wittenborn, Andrea

    2011-01-01

    This article examines the experiences of seven diverse therapists in a supervision course as they wrestled with the real-world application of multicultural supervision. Existing literature on multicultural supervision does not address the difficulties that arise in addressing multicultural issues in the context of the supervision relationship. The experiences of six supervisory candidates and one mentoring supervisor in addressing multicultural issues in supervision are explored. Guidelines for conversations regarding multicultural issues are provided.

  7. Supervised pixel classification using a feature space derived from an artificial visual system

    Science.gov (United States)

    Baxter, Lisa C.; Coggins, James M.

    1991-01-01

    Image segmentation involves labelling pixels according to their membership in image regions. This requires the understanding of what a region is. Using supervised pixel classification, the paper investigates how groups of pixels labelled manually according to perceived image semantics map onto the feature space created by an Artificial Visual System. Multiscale structure of regions are investigated and it is shown that pixels form clusters based on their geometric roles in the image intensity function, not by image semantics. A tentative abstract definition of a 'region' is proposed based on this behavior.

  8. Supervised Self-Organizing Classification of Superresolution ISAR Images: An Anechoic Chamber Experiment

    Directory of Open Access Journals (Sweden)

    Radoi Emanuel

    2006-01-01

    Full Text Available The problem of the automatic classification of superresolution ISAR images is addressed in the paper. We describe an anechoic chamber experiment involving ten-scale-reduced aircraft models. The radar images of these targets are reconstructed using MUSIC-2D (multiple signal classification method coupled with two additional processing steps: phase unwrapping and symmetry enhancement. A feature vector is then proposed including Fourier descriptors and moment invariants, which are calculated from the target shape and the scattering center distribution extracted from each reconstructed image. The classification is finally performed by a new self-organizing neural network called SART (supervised ART, which is compared to two standard classifiers, MLP (multilayer perceptron and fuzzy KNN ( nearest neighbors. While the classification accuracy is similar, SART is shown to outperform the two other classifiers in terms of training speed and classification speed, especially for large databases. It is also easier to use since it does not require any input parameter related to its structure.

  9. Supervised Self-Organizing Classification of Superresolution ISAR Images: An Anechoic Chamber Experiment

    Science.gov (United States)

    Radoi, Emanuel; Quinquis, André; Totir, Felix

    2006-12-01

    The problem of the automatic classification of superresolution ISAR images is addressed in the paper. We describe an anechoic chamber experiment involving ten-scale-reduced aircraft models. The radar images of these targets are reconstructed using MUSIC-2D (multiple signal classification) method coupled with two additional processing steps: phase unwrapping and symmetry enhancement. A feature vector is then proposed including Fourier descriptors and moment invariants, which are calculated from the target shape and the scattering center distribution extracted from each reconstructed image. The classification is finally performed by a new self-organizing neural network called SART (supervised ART), which is compared to two standard classifiers, MLP (multilayer perceptron) and fuzzy KNN ([InlineEquation not available: see fulltext.] nearest neighbors). While the classification accuracy is similar, SART is shown to outperform the two other classifiers in terms of training speed and classification speed, especially for large databases. It is also easier to use since it does not require any input parameter related to its structure.

  10. A Novel Semi-Supervised Electronic Nose Learning Technique: M-Training

    Directory of Open Access Journals (Sweden)

    Pengfei Jia

    2016-03-01

    Full Text Available When an electronic nose (E-nose is used to distinguish different kinds of gases, the label information of the target gas could be lost due to some fault of the operators or some other reason, although this is not expected. Another fact is that the cost of getting the labeled samples is usually higher than for unlabeled ones. In most cases, the classification accuracy of an E-nose trained using labeled samples is higher than that of the E-nose trained by unlabeled ones, so gases without label information should not be used to train an E-nose, however, this wastes resources and can even delay the progress of research. In this work a novel multi-class semi-supervised learning technique called M-training is proposed to train E-noses with both labeled and unlabeled samples. We employ M-training to train the E-nose which is used to distinguish three indoor pollutant gases (benzene, toluene and formaldehyde. Data processing results prove that the classification accuracy of E-nose trained by semi-supervised techniques (tri-training and M-training is higher than that of an E-nose trained only with labeled samples, and the performance of M-training is better than that of tri-training because more base classifiers can be employed by M-training.

  11. Fall detection using supervised machine learning algorithms: A comparative study

    KAUST Repository

    Zerrouki, Nabil

    2017-01-05

    Fall incidents are considered as the leading cause of disability and even mortality among older adults. To address this problem, fall detection and prevention fields receive a lot of intention over the past years and attracted many researcher efforts. We present in the current study an overall performance comparison between fall detection systems using the most popular machine learning approaches which are: Naïve Bayes, K nearest neighbor, neural network, and support vector machine. The analysis of the classification power associated to these most widely utilized algorithms is conducted on two fall detection databases namely FDD and URFD. Since the performance of the classification algorithm is inherently dependent on the features, we extracted and used the same features for all classifiers. The classification evaluation is conducted using different state of the art statistical measures such as the overall accuracy, the F-measure coefficient, and the area under ROC curve (AUC) value.

  12. Descriptor Learning via Supervised Manifold Regularization for Multioutput Regression.

    Science.gov (United States)

    Zhen, Xiantong; Yu, Mengyang; Islam, Ali; Bhaduri, Mousumi; Chan, Ian; Li, Shuo

    2016-06-08

    Multioutput regression has recently shown great ability to solve challenging problems in both computer vision and medical image analysis. However, due to the huge image variability and ambiguity, it is fundamentally challenging to handle the highly complex input-target relationship of multioutput regression, especially with indiscriminate high-dimensional representations. In this paper, we propose a novel supervised descriptor learning (SDL) algorithm for multioutput regression, which can establish discriminative and compact feature representations to improve the multivariate estimation performance. The SDL is formulated as generalized low-rank approximations of matrices with a supervised manifold regularization. The SDL is able to simultaneously extract discriminative features closely related to multivariate targets and remove irrelevant and redundant information by transforming raw features into a new low-dimensional space aligned to targets. The achieved discriminative while compact descriptor largely reduces the variability and ambiguity for multioutput regression, which enables more accurate and efficient multivariate estimation. We conduct extensive evaluation of the proposed SDL on both synthetic data and real-world multioutput regression tasks for both computer vision and medical image analysis. Experimental results have shown that the proposed SDL can achieve high multivariate estimation accuracy on all tasks and largely outperforms the algorithms in the state of the arts. Our method establishes a novel SDL framework for multioutput regression, which can be widely used to boost the performance in different applications.

  13. Detection of money laundering groups using supervised learning in networks

    CERN Document Server

    Savage, David; Chou, Pauline; Zhang, Xiuzhen; Yu, Xinghuo

    2016-01-01

    Money laundering is a major global problem, enabling criminal organisations to hide their ill-gotten gains and to finance further operations. Prevention of money laundering is seen as a high priority by many governments, however detection of money laundering without prior knowledge of predicate crimes remains a significant challenge. Previous detection systems have tended to focus on individuals, considering transaction histories and applying anomaly detection to identify suspicious behaviour. However, money laundering involves groups of collaborating individuals, and evidence of money laundering may only be apparent when the collective behaviour of these groups is considered. In this paper we describe a detection system that is capable of analysing group behaviour, using a combination of network analysis and supervised learning. This system is designed for real-world application and operates on networks consisting of millions of interacting parties. Evaluation of the system using real-world data indicates th...

  14. Unsupervised/supervised learning concept for 24-hour load forecasting

    Energy Technology Data Exchange (ETDEWEB)

    Djukanovic, M. (Electrical Engineering Inst. ' Nikola Tesla' , Belgrade (Yugoslavia)); Babic, B. (Electrical Power Industry of Serbia, Belgrade (Yugoslavia)); Sobajic, D.J.; Pao, Y.-H. (Case Western Reserve Univ., Cleveland, OH (United States). Dept. of Electrical Engineering and Computer Science)

    1993-07-01

    An application of artificial neural networks in short-term load forecasting is described. An algorithm using an unsupervised/supervised learning concept and historical relationship between the load and temperature for a given season, day type and hour of the day to forecast hourly electric load with a lead time of 24 hours is proposed. An additional approach using functional link net, temperature variables, average load and last one-hour load of previous day is introduced and compared with the ANN model with one hidden layer load forecast. In spite of limited available weather variables (maximum, minimum and average temperature for the day) quite acceptable results have been achieved. The 24-hour-ahead forecast errors (absolute average) ranged from 2.78% for Saturdays and 3.12% for working days to 3.54% for Sundays. (Author)

  15. Online Semi-Supervised Learning on Quantized Graphs

    CERN Document Server

    Valko, Michal; Huang, Ling; Ting, Daniel

    2012-01-01

    In this paper, we tackle the problem of online semi-supervised learning (SSL). When data arrive in a stream, the dual problems of computation and data storage arise for any SSL method. We propose a fast approximate online SSL algorithm that solves for the harmonic solution on an approximate graph. We show, both empirically and theoretically, that good behavior can be achieved by collapsing nearby points into a set of local "representative points" that minimize distortion. Moreover, we regularize the harmonic solution to achieve better stability properties. We apply our algorithm to face recognition and optical character recognition applications to show that we can take advantage of the manifold structure to outperform the previous methods. Unlike previous heuristic approaches, we show that our method yields provable performance bounds.

  16. Using Supervised Learning to Improve Monte Carlo Integral Estimation

    CERN Document Server

    Tracey, Brendan; Alonso, Juan J

    2011-01-01

    Monte Carlo (MC) techniques are often used to estimate integrals of a multivariate function using randomly generated samples of the function. In light of the increasing interest in uncertainty quantification and robust design applications in aerospace engineering, the calculation of expected values of such functions (e.g. performance measures) becomes important. However, MC techniques often suffer from high variance and slow convergence as the number of samples increases. In this paper we present Stacked Monte Carlo (StackMC), a new method for post-processing an existing set of MC samples to improve the associated integral estimate. StackMC is based on the supervised learning techniques of fitting functions and cross validation. It should reduce the variance of any type of Monte Carlo integral estimate (simple sampling, importance sampling, quasi-Monte Carlo, MCMC, etc.) without adding bias. We report on an extensive set of experiments confirming that the StackMC estimate of an integral is more accurate than ...

  17. EEG source space analysis of the supervised factor analytic approach for the classification of multi-directional arm movement

    Science.gov (United States)

    Shenoy Handiru, Vikram; Vinod, A. P.; Guan, Cuntai

    2017-08-01

    Objective. In electroencephalography (EEG)-based brain-computer interface (BCI) systems for motor control tasks the conventional practice is to decode motor intentions by using scalp EEG. However, scalp EEG only reveals certain limited information about the complex tasks of movement with a higher degree of freedom. Therefore, our objective is to investigate the effectiveness of source-space EEG in extracting relevant features that discriminate arm movement in multiple directions. Approach. We have proposed a novel feature extraction algorithm based on supervised factor analysis that models the data from source-space EEG. To this end, we computed the features from the source dipoles confined to Brodmann areas of interest (BA4a, BA4p and BA6). Further, we embedded class-wise labels of multi-direction (multi-class) source-space EEG to an unsupervised factor analysis to make it into a supervised learning method. Main Results. Our approach provided an average decoding accuracy of 71% for the classification of hand movement in four orthogonal directions, that is significantly higher (>10%) than the classification accuracy obtained using state-of-the-art spatial pattern features in sensor space. Also, the group analysis on the spectral characteristics of source-space EEG indicates that the slow cortical potentials from a set of cortical source dipoles reveal discriminative information regarding the movement parameter, direction. Significance. This study presents evidence that low-frequency components in the source space play an important role in movement kinematics, and thus it may lead to new strategies for BCI-based neurorehabilitation.

  18. Supervised dictionary learning for inferring concurrent brain networks.

    Science.gov (United States)

    Zhao, Shijie; Han, Junwei; Lv, Jinglei; Jiang, Xi; Hu, Xintao; Zhao, Yu; Ge, Bao; Guo, Lei; Liu, Tianming

    2015-10-01

    Task-based fMRI (tfMRI) has been widely used to explore functional brain networks via predefined stimulus paradigm in the fMRI scan. Traditionally, the general linear model (GLM) has been a dominant approach to detect task-evoked networks. However, GLM focuses on task-evoked or event-evoked brain responses and possibly ignores the intrinsic brain functions. In comparison, dictionary learning and sparse coding methods have attracted much attention recently, and these methods have shown the promise of automatically and systematically decomposing fMRI signals into meaningful task-evoked and intrinsic concurrent networks. Nevertheless, two notable limitations of current data-driven dictionary learning method are that the prior knowledge of task paradigm is not sufficiently utilized and that the establishment of correspondences among dictionary atoms in different brains have been challenging. In this paper, we propose a novel supervised dictionary learning and sparse coding method for inferring functional networks from tfMRI data, which takes both of the advantages of model-driven method and data-driven method. The basic idea is to fix the task stimulus curves as predefined model-driven dictionary atoms and only optimize the other portion of data-driven dictionary atoms. Application of this novel methodology on the publicly available human connectome project (HCP) tfMRI datasets has achieved promising results.

  19. Generation of a Supervised Classification Algorithm for Time-Series Variable Stars with an Application to the LINEAR Dataset

    CERN Document Server

    Johnston, Kyle B

    2016-01-01

    With the advent of digital astronomy, new benefits and new problems have been presented to the modern day astronomer. While data can be captured in a more efficient and accurate manor using digital means, the efficiency of data retrieval has led to an overload of scientific data for processing and storage. This paper will focus on the construction and application of a supervised pattern classification algorithm for the identification of variable stars. Given the reduction of a survey of stars into a standard feature space, the problem of using prior patterns to identify new observed patterns can be reduced to time tested classification methodologies and algorithms. Such supervised methods, so called because the user trains the algorithms prior to application using patterns with known classes or labels, provide a means to probabilistically determine the estimated class type of new observations. This paper will demonstrate the construction and application of a supervised classification algorithm on variable sta...

  20. I’m just thinking - How learning opportunities are created in doctoral supervision

    DEFF Research Database (Denmark)

    Kobayashi, Sofie; Berge, Maria; Grout, Brian William Wilson;

    With this paper we aim to contribute towards an understanding of learning dynamics in doctoral supervision by analysing how learning opportunities are created in the interaction. We analyse interaction between supervisors and doctoral students using the notion of experiencing variation as a key...... for learning. Earlier research into doctoral supervision has been rather vague on how doctoral students learn to carry out research. Empirically, we have based the study on four cases each with one doctoral student and their supervisors. The supervision sessions were captured on video and audio to provide...

  1. Efficient supervised learning in networks with binary synapses

    CERN Document Server

    Baldassi, Carlo; Brunel, Nicolas; Zecchina, Riccardo

    2007-01-01

    Recent experimental studies indicate that synaptic changes induced by neuronal activity are discrete jumps between a small number of stable states. Learning in systems with discrete synapses is known to be a computationally hard problem. Here, we study a neurobiologically plausible on-line learning algorithm that derives from Belief Propagation algorithms. We show that it performs remarkably well in a model neuron with binary synapses, and a finite number of `hidden' states per synapse, that has to learn a random classification task. Such system is able to learn a number of associations close to the theoretical limit, in time which is sublinear in system size. This is to our knowledge the first on-line algorithm that is able to achieve efficiently a finite number of patterns learned per binary synapse. Furthermore, we show that performance is optimal for a finite number of hidden states which becomes very small for sparse coding. The algorithm is similar to the standard `perceptron' learning algorithm, with a...

  2. Response monitoring using quantitative ultrasound methods and supervised dictionary learning in locally advanced breast cancer

    Science.gov (United States)

    Gangeh, Mehrdad J.; Fung, Brandon; Tadayyon, Hadi; Tran, William T.; Czarnota, Gregory J.

    2016-03-01

    A non-invasive computer-aided-theragnosis (CAT) system was developed for the early assessment of responses to neoadjuvant chemotherapy in patients with locally advanced breast cancer. The CAT system was based on quantitative ultrasound spectroscopy methods comprising several modules including feature extraction, a metric to measure the dissimilarity between "pre-" and "mid-treatment" scans, and a supervised learning algorithm for the classification of patients to responders/non-responders. One major requirement for the successful design of a high-performance CAT system is to accurately measure the changes in parametric maps before treatment onset and during the course of treatment. To this end, a unified framework based on Hilbert-Schmidt independence criterion (HSIC) was used for the design of feature extraction from parametric maps and the dissimilarity measure between the "pre-" and "mid-treatment" scans. For the feature extraction, HSIC was used to design a supervised dictionary learning (SDL) method by maximizing the dependency between the scans taken from "pre-" and "mid-treatment" with "dummy labels" given to the scans. For the dissimilarity measure, an HSIC-based metric was employed to effectively measure the changes in parametric maps as an indication of treatment effectiveness. The HSIC-based feature extraction and dissimilarity measure used a kernel function to nonlinearly transform input vectors into a higher dimensional feature space and computed the population means in the new space, where enhanced group separability was ideally obtained. The results of the classification using the developed CAT system indicated an improvement of performance compared to a CAT system with basic features using histogram of intensity.

  3. Image Classification Workflow Using Machine Learning Methods

    Science.gov (United States)

    Christoffersen, M. S.; Roser, M.; Valadez-Vergara, R.; Fernández-Vega, J. A.; Pierce, S. A.; Arora, R.

    2016-12-01

    Recent increases in the availability and quality of remote sensing datasets have fueled an increasing number of scientifically significant discoveries based on land use classification and land use change analysis. However, much of the software made to work with remote sensing data products, specifically multispectral images, is commercial and often prohibitively expensive. The free to use solutions that are currently available come bundled up as small parts of much larger programs that are very susceptible to bugs and difficult to install and configure. What is needed is a compact, easy to use set of tools to perform land use analysis on multispectral images. To address this need, we have developed software using the Python programming language with the sole function of land use classification and land use change analysis. We chose Python to develop our software because it is relatively readable, has a large body of relevant third party libraries such as GDAL and Spectral Python, and is free to install and use on Windows, Linux, and Macintosh operating systems. In order to test our classification software, we performed a K-means unsupervised classification, Gaussian Maximum Likelihood supervised classification, and a Mahalanobis Distance based supervised classification. The images used for testing were three Landsat rasters of Austin, Texas with a spatial resolution of 60 meters for the years of 1984 and 1999, and 30 meters for the year 2015. The testing dataset was easily downloaded using the Earth Explorer application produced by the USGS. The software should be able to perform classification based on any set of multispectral rasters with little to no modification. Our software makes the ease of land use classification using commercial software available without an expensive license.

  4. A Comparison of Supervised Machine Learning Algorithms and Feature Vectors for MS Lesion Segmentation Using Multimodal Structural MRI

    Science.gov (United States)

    Sweeney, Elizabeth M.; Vogelstein, Joshua T.; Cuzzocreo, Jennifer L.; Calabresi, Peter A.; Reich, Daniel S.; Crainiceanu, Ciprian M.; Shinohara, Russell T.

    2014-01-01

    Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance. PMID:24781953

  5. Weakly supervised learning of a classifier for unusual event detection.

    Science.gov (United States)

    Jäger, Mark; Knoll, Christian; Hamprecht, Fred A

    2008-09-01

    In this paper, we present an automatic classification framework combining appearance based features and hidden Markov models (HMM) to detect unusual events in image sequences. One characteristic of the classification task is that anomalies are rare. This reflects the situation in the quality control of industrial processes, where error events are scarce by nature. As an additional restriction, class labels are only available for the complete image sequence, since frame-wise manual scanning of the recorded sequences for anomalies is too expensive and should, therefore, be avoided. The proposed framework reduces the feature space dimension of the image sequences by employing subspace methods and encodes characteristic temporal dynamics using continuous hidden Markov models (CHMMs). The applied learning procedure is as follows. 1) A generative model for the regular sequences is trained (one-class learning). 2) The regular sequence model (RSM) is used to locate potentially unusual segments within error sequences by means of a change detection algorithm (outlier detection). 3) Unusual segments are used to expand the RSM to an error sequence model (ESM). The complexity of the ESM is controlled by means of the Bayesian Information Criterion (BIC). The likelihood ratio of the data given the ESM and the RSM is used for the classification decision. This ratio is close to one for sequences without error events and increases for sequences containing error events. Experimental results are presented for image sequences recorded from industrial laser welding processes. We demonstrate that the learning procedure can significantly reduce the user interaction and that sequences with error events can be found with a small false positive rate. It has also been shown that a modeling of the temporal dynamics is necessary to reach these low error rates.

  6. Deep Learning in Label-free Cell Classification

    Science.gov (United States)

    Chen, Claire Lifan; Mahjoubfar, Ata; Tai, Li-Chia; Blaby, Ian K.; Huang, Allen; Niazi, Kayvan Reza; Jalali, Bahram

    2016-03-01

    Label-free cell analysis is essential to personalized genomics, cancer diagnostics, and drug development as it avoids adverse effects of staining reagents on cellular viability and cell signaling. However, currently available label-free cell assays mostly rely only on a single feature and lack sufficient differentiation. Also, the sample size analyzed by these assays is limited due to their low throughput. Here, we integrate feature extraction and deep learning with high-throughput quantitative imaging enabled by photonic time stretch, achieving record high accuracy in label-free cell classification. Our system captures quantitative optical phase and intensity images and extracts multiple biophysical features of individual cells. These biophysical measurements form a hyperdimensional feature space in which supervised learning is performed for cell classification. We compare various learning algorithms including artificial neural network, support vector machine, logistic regression, and a novel deep learning pipeline, which adopts global optimization of receiver operating characteristics. As a validation of the enhanced sensitivity and specificity of our system, we show classification of white blood T-cells against colon cancer cells, as well as lipid accumulating algal strains for biofuel production. This system opens up a new path to data-driven phenotypic diagnosis and better understanding of the heterogeneous gene expressions in cells.

  7. Deep Learning in Label-free Cell Classification.

    Science.gov (United States)

    Chen, Claire Lifan; Mahjoubfar, Ata; Tai, Li-Chia; Blaby, Ian K; Huang, Allen; Niazi, Kayvan Reza; Jalali, Bahram

    2016-03-15

    Label-free cell analysis is essential to personalized genomics, cancer diagnostics, and drug development as it avoids adverse effects of staining reagents on cellular viability and cell signaling. However, currently available label-free cell assays mostly rely only on a single feature and lack sufficient differentiation. Also, the sample size analyzed by these assays is limited due to their low throughput. Here, we integrate feature extraction and deep learning with high-throughput quantitative imaging enabled by photonic time stretch, achieving record high accuracy in label-free cell classification. Our system captures quantitative optical phase and intensity images and extracts multiple biophysical features of individual cells. These biophysical measurements form a hyperdimensional feature space in which supervised learning is performed for cell classification. We compare various learning algorithms including artificial neural network, support vector machine, logistic regression, and a novel deep learning pipeline, which adopts global optimization of receiver operating characteristics. As a validation of the enhanced sensitivity and specificity of our system, we show classification of white blood T-cells against colon cancer cells, as well as lipid accumulating algal strains for biofuel production. This system opens up a new path to data-driven phenotypic diagnosis and better understanding of the heterogeneous gene expressions in cells.

  8. Supervised Classification of Polarimetric SAR Imagery Using Temporal and Contextual Information

    Science.gov (United States)

    Dargahi, A.; Maghsoudi, Y.; Abkar, A. A.

    2013-09-01

    Using the context as a source of ancillary information in classification process provides a powerful tool to obtain better class discrimination. Modelling context using Markov Random Fields (MRFs) and combining with Bayesian approach, a context-based supervised classification method is proposed. In this framework, to have a full use of the statistical a priori knowledge of the data, the spatial relation of the neighbouring pixels was used. The proposed context-based algorithm combines a Gaussian-based wishart distribution of PolSAR images with temporal and contextual information. This combination was done through the Bayes decision theory: the class-conditional probability density function and the prior probability are modelled by the wishart distribution and the MRF model. Given the complexity and similarity of classes, in order to enhance the class separation, simultaneously two PolSAR images from two different seasons (leaf-on and leaf-off) were used. According to the achieved results, the maximum improvement in the overall accuracy of classification using WMRF (Combining Wishart and MRF) compared to the wishart classifier when the leaf-on image was used. The highest accuracy obtained was when using the combined datasets. In this case, the overall accuracy of the wishart and WMRF methods were 72.66% and 78.95% respectively.

  9. SUPERVISED CLASSIFICATION OF POLARIMETRIC SAR IMAGERY USING TEMPORAL AND CONTEXTUAL INFORMATION

    Directory of Open Access Journals (Sweden)

    A. Dargahi

    2013-09-01

    Full Text Available Using the context as a source of ancillary information in classification process provides a powerful tool to obtain better class discrimination. Modelling context using Markov Random Fields (MRFs and combining with Bayesian approach, a context-based supervised classification method is proposed. In this framework, to have a full use of the statistical a priori knowledge of the data, the spatial relation of the neighbouring pixels was used. The proposed context-based algorithm combines a Gaussian-based wishart distribution of PolSAR images with temporal and contextual information. This combination was done through the Bayes decision theory: the class-conditional probability density function and the prior probability are modelled by the wishart distribution and the MRF model. Given the complexity and similarity of classes, in order to enhance the class separation, simultaneously two PolSAR images from two different seasons (leaf-on and leaf-off were used. According to the achieved results, the maximum improvement in the overall accuracy of classification using WMRF (Combining Wishart and MRF compared to the wishart classifier when the leaf-on image was used. The highest accuracy obtained was when using the combined datasets. In this case, the overall accuracy of the wishart and WMRF methods were 72.66% and 78.95% respectively.

  10. How Supervisor Experience Influences Trust, Supervision, and Trainee Learning: A Qualitative Study.

    Science.gov (United States)

    Sheu, Leslie; Kogan, Jennifer R; Hauer, Karen E

    2017-09-01

    Appropriate trust and supervision facilitate trainees' growth toward unsupervised practice. The authors investigated how supervisor experience influences trust, supervision, and subsequently trainee learning. In a two-phase qualitative inductive content analysis, phase one entailed reviewing 44 internal medicine resident and attending supervisor interviews from two institutions (July 2013 to September 2014) for themes on how supervisor experience influences trust and supervision. Three supervisor exemplars (early, developing, experienced) were developed and shared in phase two focus groups at a single institution, wherein 23 trainees validated the exemplars and discussed how each impacted learning (November 2015). Phase one: Four domains of trust and supervision varying with experience emerged: data, approach, perspective, clinical. Early supervisors were detail oriented and determined trust depending on task completion (data), were rule based (approach), drew on their experiences as trainees to guide supervision (perspective), and felt less confident clinically compared with more experienced supervisors (clinical). Experienced supervisors determined trust holistically (data), checked key aspects of patient care selectively and covertly (approach), reflected on individual experiences supervising (perspective), and felt comfortable managing clinical problems and gauging trainee abilities (clinical). Phase two: Trainees felt the exemplars reflected their experiences, described their preferences and learning needs shifting over time, and emphasized the importance of supervisor flexibility to match their learning needs. With experience, supervisors differ in their approach to trust and supervision. Supervisors need to trust themselves before being able to trust others. Trainees perceive these differences and seek supervision approaches that align with their learning needs.

  11. Compressed classification learning with Markov chain samples.

    Science.gov (United States)

    Cao, Feilong; Dai, Tenghui; Zhang, Yongquan; Tan, Yuanpeng

    2014-02-01

    In this article, we address the problem of compressed classification learning. A generalization bound of the support vector machines (SVMs) compressed classification algorithm with uniformly ergodic Markov chain samples is established. This bound indicates that the accuracy of the SVM classifier in the compressed domain is close to that of the best classifier in the data domain. In a sense, the fact that the compressed learning can avoid the curse of dimensionality in the learning process is shown. In addition, we show that compressed classification learning reduces the learning time at the price of decreasing the classification accuracy, but the decrement can be controlled. The numerical experiments further verify the results claimed in this article.

  12. Weakly Supervised Segmentation-Aided Classification of Urban Scenes from 3d LIDAR Point Clouds

    Science.gov (United States)

    Guinard, S.; Landrieu, L.

    2017-05-01

    We consider the problem of the semantic classification of 3D LiDAR point clouds obtained from urban scenes when the training set is limited. We propose a non-parametric segmentation model for urban scenes composed of anthropic objects of simple shapes, partionning the scene into geometrically-homogeneous segments which size is determined by the local complexity. This segmentation can be integrated into a conditional random field classifier (CRF) in order to capture the high-level structure of the scene. For each cluster, this allows us to aggregate the noisy predictions of a weakly-supervised classifier to produce a higher confidence data term. We demonstrate the improvement provided by our method over two publicly-available large-scale data sets.

  13. Classification of Polarimetric SAR Data Using Dictionary Learning

    DEFF Research Database (Denmark)

    Vestergaard, Jacob Schack; Nielsen, Allan Aasbjerg; Dahl, Anders Lindbjerg

    2012-01-01

    a method for supervised classification. Sparse coding of these image patches aims to maintain a proficient number of typical patches and associated labels. Data is consecutively classified by a nearest neighbor search of the dictionary elements and labeled with probabilities of each class. Each dictionary...... element consists of one or more features, such as spectral measurements, in a neighborhood around each pixel. For polarimetric SAR data these features are the elements of the complex covariance matrix for each pixel. We quantitatively compare the effect of using different representations of the covariance...... matrix as the dictionary element features. Furthermore, we compare the method of dictionary learning, in the context of classifying polarimetric SAR data, with standard classification methods based on single-pixel measurements....

  14. A hybrid ensemble learning approach to star-galaxy classification

    Science.gov (United States)

    Kim, Edward J.; Brunner, Robert J.; Carrasco Kind, Matias

    2015-10-01

    There exist a variety of star-galaxy classification techniques, each with their own strengths and weaknesses. In this paper, we present a novel meta-classification framework that combines and fully exploits different techniques to produce a more robust star-galaxy classification. To demonstrate this hybrid, ensemble approach, we combine a purely morphological classifier, a supervised machine learning method based on random forest, an unsupervised machine learning method based on self-organizing maps, and a hierarchical Bayesian template-fitting method. Using data from the CFHTLenS survey (Canada-France-Hawaii Telescope Lensing Survey), we consider different scenarios: when a high-quality training set is available with spectroscopic labels from DEEP2 (Deep Extragalactic Evolutionary Probe Phase 2 ), SDSS (Sloan Digital Sky Survey), VIPERS (VIMOS Public Extragalactic Redshift Survey), and VVDS (VIMOS VLT Deep Survey), and when the demographics of sources in a low-quality training set do not match the demographics of objects in the test data set. We demonstrate that our Bayesian combination technique improves the overall performance over any individual classification method in these scenarios. Thus, strategies that combine the predictions of different classifiers may prove to be optimal in currently ongoing and forthcoming photometric surveys, such as the Dark Energy Survey and the Large Synoptic Survey Telescope.

  15. The Practice of Supervision for Professional Learning: The Example of Future Forensic Specialists

    Science.gov (United States)

    Köpsén, Susanne; Nyström, Sofia

    2015-01-01

    Supervision intended to support learning is of great interest in professional knowledge development. No single definition governs the implementation and enactment of supervision because of different conditions, intentions, and pedagogical approaches. Uncertainty exists at a time when knowledge and methods are undergoing constant development. This…

  16. The Practice of Supervision for Professional Learning: The Example of Future Forensic Specialists

    Science.gov (United States)

    Köpsén, Susanne; Nyström, Sofia

    2015-01-01

    Supervision intended to support learning is of great interest in professional knowledge development. No single definition governs the implementation and enactment of supervision because of different conditions, intentions, and pedagogical approaches. Uncertainty exists at a time when knowledge and methods are undergoing constant development. This…

  17. Output Effect Evaluation Based on Input Features in Neural Incremental Attribute Learning for Better Classification Performance

    Directory of Open Access Journals (Sweden)

    Ting Wang

    2015-01-01

    Full Text Available Machine learning is a very important approach to pattern classification. This paper provides a better insight into Incremental Attribute Learning (IAL with further analysis as to why it can exhibit better performance than conventional batch training. IAL is a novel supervised machine learning strategy, which gradually trains features in one or more chunks. Previous research showed that IAL can obtain lower classification error rates than a conventional batch training approach. Yet the reason for that is still not very clear. In this study, the feasibility of IAL is verified by mathematical approaches. Moreover, experimental results derived by IAL neural networks on benchmarks also confirm the mathematical validation.

  18. Machine-learning methods in the classification of water bodies

    Directory of Open Access Journals (Sweden)

    Sołtysiak Marek

    2016-06-01

    Full Text Available Amphibian species have been considered as useful ecological indicators. They are used as indicators of environmental contamination, ecosystem health and habitat quality., Amphibian species are sensitive to changes in the aquatic environment and therefore, may form the basis for the classification of water bodies. Water bodies in which there are a large number of amphibian species are especially valuable even if they are located in urban areas. The automation of the classification process allows for a faster evaluation of the presence of amphibian species in the water bodies. Three machine-learning methods (artificial neural networks, decision trees and the k-nearest neighbours algorithm have been used to classify water bodies in Chorzów – one of 19 cities in the Upper Silesia Agglomeration. In this case, classification is a supervised data mining method consisting of several stages such as building the model, the testing phase and the prediction. Seven natural and anthropogenic features of water bodies (e.g. the type of water body, aquatic plants, the purpose of the water body (destination, position of the water body in relation to any possible buildings, condition of the water body, the degree of littering, the shore type and fishing activities have been taken into account in the classification. The data set used in this study involved information about 71 different water bodies and 9 amphibian species living in them. The results showed that the best average classification accuracy was obtained with the multilayer perceptron neural network.

  19. Discriminative Structured Dictionary Learning for Image Classification

    Institute of Scientific and Technical Information of China (English)

    王萍; 兰俊花; 臧玉卫; 宋占杰

    2016-01-01

    In this paper, a discriminative structured dictionary learning algorithm is presented. To enhance the dictionary’s discriminative power, the reconstruction error, classification error and inhomogeneous representation error are integrated into the objective function. The proposed approach learns a single structured dictionary and a linear classifier jointly. The learned dictionary encourages the samples from the same class to have similar sparse codes, and the samples from different classes to have dissimilar sparse codes. The solution to the objective function is achieved by employing a feature-sign search algorithm and Lagrange dual method. Experimental results on three public databases demonstrate that the proposed approach outperforms several recently proposed dictionary learning techniques for classification.

  20. Entropy-based generation of supervised neural networks for classification of structured patterns.

    Science.gov (United States)

    Tsai, Hsien-Leing; Lee, Shie-Jue

    2004-03-01

    Sperduti and Starita proposed a new type of neural network which consists of generalized recursive neurons for classification of structures. In this paper, we propose an entropy-based approach for constructing such neural networks for classification of acyclic structured patterns. Given a classification problem, the architecture, i.e., the number of hidden layers and the number of neurons in each hidden layer, and all the values of the link weights associated with the corresponding neural network are automatically determined. Experimental results have shown that the networks constructed by our method can have a better performance, with respect to network size, learning speed, or recognition accuracy, than the networks obtained by other methods.

  1. Cortex transform and its application for supervised texture classification of digital images

    Science.gov (United States)

    Bashar, M. K.; Ohnishi, Noboru; Shevgaonkar, R. K.

    2002-02-01

    This paper proposes a localized multi-channel filtering approach of image texture analysis based on the cortical behavior of Human Visual System (HVS). In our efforts, 2D Gaussian function, called Cortex Filter, in the frequency domain is used to model the band pass nature of simple cells in HVS. A block-based iterative method is addressed. In each pass, a square block of data is captured and cortex filters at various directions and radial bands are applied to filter out the available texture information in that block. Such decomposition results in a set of band pass images from a single input image and we call it Cortex Transform (CT). We use filter responses in each pass to compute the representative texture features i.e., the average filtered energies. The procedure is repeated for the subsequent blocks of data until the whole image is scanned. Various energy values calculated above are stored into different arrays or files and are regarded as feature images. Thus the obtained feature images are integrated with minimum distance classifier for supervised texture classification. We demonstrated the algorithm with various real world and synthetic images from various sources. Confusion matrix analysis shows a high average overall classification accuracy (97.01%) of our CT based approach in comparison with that (71.27%) of the popular gray level co-occurrence matrix (GLCM) approach.

  2. [Administrative reform thinking on the regulations on the supervision and administration of medical devices].

    Science.gov (United States)

    Yue, Wei

    2014-09-01

    This paper learned and interpreted the regulations on the supervision and administration of medical devices, carded the thoughts of administrative reform, then put forward the "ten principles", including full supervision, classification supervision, risk classification, safety-effective-save, to encourage innovation, simplified license, scientific-standard, sincerity & self-discipline, clear responsibility, severe punishment, and discussed these priciples.

  3. Evaluation of Four Supervised Learning Methods for Benthic Habitat Mapping Using Backscatter from Multi-Beam Sonar

    Directory of Open Access Journals (Sweden)

    Jacquomo Monk

    2012-11-01

    Full Text Available An understanding of the distribution and extent of marine habitats is essential for the implementation of ecosystem-based management strategies. Historically this had been difficult in marine environments until the advancement of acoustic sensors. This study demonstrates the applicability of supervised learning techniques for benthic habitat characterization using angular backscatter response data. With the advancement of multibeam echo-sounder (MBES technology, full coverage datasets of physical structure over vast regions of the seafloor are now achievable. Supervised learning methods typically applied to terrestrial remote sensing provide a cost-effective approach for habitat characterization in marine systems. However the comparison of the relative performance of different classifiers using acoustic data is limited. Characterization of acoustic backscatter data from MBES using four different supervised learning methods to generate benthic habitat maps is presented. Maximum Likelihood Classifier (MLC, Quick, Unbiased, Efficient Statistical Tree (QUEST, Random Forest (RF and Support Vector Machine (SVM were evaluated to classify angular backscatter response into habitat classes using training data acquired from underwater video observations. Results for biota classifications indicated that SVM and RF produced the highest accuracies, followed by QUEST and MLC, respectively. The most important backscatter data were from the moderate incidence angles between 30° and 50°. This study presents initial results for understanding how acoustic backscatter from MBES can be optimized for the characterization of marine benthic biological habitats.

  4. Combining theories to reach multi-faceted insights into learning opportunities in doctoral supervision

    DEFF Research Database (Denmark)

    Kobayashi, Sofie; Rump, Camilla Østerberg

    The aim of this paper is to illustrate how theories can be combined to explore opportunities for learning in doctoral supervision. While our earlier research into learning dynamics in doctoral supervision in life science research (Kobayashi, 2014) has focused on illustrating learning opportunities...... this paper focuses on the methodological advantages and potential criticism of combining theories. Learning in doctoral education, as in classroom learning, can be analysed from different perspectives. Zembylas (2005) suggests three perspectives with the aim of linking the cognitive and the emotional...

  5. Metric learning for automatic sleep stage classification.

    Science.gov (United States)

    Phan, Huy; Do, Quan; Do, The-Luan; Vu, Duc-Lung

    2013-01-01

    We introduce in this paper a metric learning approach for automatic sleep stage classification based on single-channel EEG data. We show that learning a global metric from training data instead of using the default Euclidean metric, the k-nearest neighbor classification rule outperforms state-of-the-art methods on Sleep-EDF dataset with various classification settings. The overall accuracy for Awake/Sleep and 4-class classification setting are 98.32% and 94.49% respectively. Furthermore, the superior accuracy is achieved by performing classification on a low-dimensional feature space derived from time and frequency domains and without the need for artifact removal as a preprocessing step.

  6. Assessing Electronic Cigarette-Related Tweets for Sentiment and Content Using Supervised Machine Learning.

    Science.gov (United States)

    Cole-Lewis, Heather; Varghese, Arun; Sanders, Amy; Schwarz, Mary; Pugatch, Jillian; Augustson, Erik

    2015-08-25

    Electronic cigarettes (e-cigarettes) continue to be a growing topic among social media users, especially on Twitter. The ability to analyze conversations about e-cigarettes in real-time can provide important insight into trends in the public's knowledge, attitudes, and beliefs surrounding e-cigarettes, and subsequently guide public health interventions. Our aim was to establish a supervised machine learning algorithm to build predictive classification models that assess Twitter data for a range of factors related to e-cigarettes. Manual content analysis was conducted for 17,098 tweets. These tweets were coded for five categories: e-cigarette relevance, sentiment, user description, genre, and theme. Machine learning classification models were then built for each of these five categories, and word groupings (n-grams) were used to define the feature space for each classifier. Predictive performance scores for classification models indicated that the models correctly labeled the tweets with the appropriate variables between 68.40% and 99.34% of the time, and the percentage of maximum possible improvement over a random baseline that was achieved by the classification models ranged from 41.59% to 80.62%. Classifiers with the highest performance scores that also achieved the highest percentage of the maximum possible improvement over a random baseline were Policy/Government (performance: 0.94; % improvement: 80.62%), Relevance (performance: 0.94; % improvement: 75.26%), Ad or Promotion (performance: 0.89; % improvement: 72.69%), and Marketing (performance: 0.91; % improvement: 72.56%). The most appropriate word-grouping unit (n-gram) was 1 for the majority of classifiers. Performance continued to marginally increase with the size of the training dataset of manually annotated data, but eventually leveled off. Even at low dataset sizes of 4000 observations, performance characteristics were fairly sound. Social media outlets like Twitter can uncover real-time snapshots of

  7. Semi-Supervised Learning Techniques in AO Applications: A Novel Approach To Drift Counteraction

    Science.gov (United States)

    De Vito, S.; Fattoruso, G.; Pardo, M.; Tortorella, F.; Di Francia, G.

    2011-11-01

    In this work we proposed and tested the use of SSL techniques in the AO domain. The SSL characteristics have been exploited to reduce the need for costly supervised samples and the effects of time dependant drift of state-of-the-art statistical learning approaches. For this purpose, an on-field recorded one year long atmospheric pollution dataset has been used. The semi-supervised approach benefitted from the use of updated unlabeled samples, adapting its knowledge to the slowly changing drift effects. We expect that semi-supervised learning can provide significant advantages to the performance of sensor fusion subsystems in artificial olfaction exhibiting an interesting drift counteraction effect.

  8. Supervised learning of short and high-dimensional temporal sequences for life science measurements

    CERN Document Server

    Schleif, F -M; Hammer, B

    2011-01-01

    The analysis of physiological processes over time are often given by spectrometric or gene expression profiles over time with only few time points but a large number of measured variables. The analysis of such temporal sequences is challenging and only few methods have been proposed. The information can be encoded time independent, by means of classical expression differences for a single time point or in expression profiles over time. Available methods are limited to unsupervised and semi-supervised settings. The predictive variables can be identified only by means of wrapper or post-processing techniques. This is complicated due to the small number of samples for such studies. Here, we present a supervised learning approach, termed Supervised Topographic Mapping Through Time (SGTM-TT). It learns a supervised mapping of the temporal sequences onto a low dimensional grid. We utilize a hidden markov model (HMM) to account for the time domain and relevance learning to identify the relevant feature dimensions mo...

  9. Supervised pixel classification for segmenting geographic atrophy in fundus autofluorescene images

    Science.gov (United States)

    Hu, Zhihong; Medioni, Gerard G.; Hernandez, Matthias; Sadda, SriniVas R.

    2014-03-01

    Age-related macular degeneration (AMD) is the leading cause of blindness in people over the age of 65. Geographic atrophy (GA) is a manifestation of the advanced or late-stage of the AMD, which may result in severe vision loss and blindness. Techniques to rapidly and precisely detect and quantify GA lesions would appear to be of important value in advancing the understanding of the pathogenesis of GA and the management of GA progression. The purpose of this study is to develop an automated supervised pixel classification approach for segmenting GA including uni-focal and multi-focal patches in fundus autofluorescene (FAF) images. The image features include region wise intensity (mean and variance) measures, gray level co-occurrence matrix measures (angular second moment, entropy, and inverse difference moment), and Gaussian filter banks. A k-nearest-neighbor (k-NN) pixel classifier is applied to obtain a GA probability map, representing the likelihood that the image pixel belongs to GA. A voting binary iterative hole filling filter is then applied to fill in the small holes. Sixteen randomly chosen FAF images were obtained from sixteen subjects with GA. The algorithm-defined GA regions are compared with manual delineation performed by certified graders. Two-fold cross-validation is applied for the evaluation of the classification performance. The mean Dice similarity coefficients (DSC) between the algorithm- and manually-defined GA regions are 0.84 +/- 0.06 for one test and 0.83 +/- 0.07 for the other test and the area correlations between them are 0.99 (p < 0.05) and 0.94 (p < 0.05) respectively.

  10. Combining theories to reach multi-faceted insights into learning opportunities in doctoral supervision

    DEFF Research Database (Denmark)

    Kobayashi, Sofie; Rump, Camilla Østerberg

    in science learning; conceptual change, socio-constructivism and post-structuralism. In the present study we employ variation theory (Marton & Tsui, 2004) to study the individual acquisition perspective, what Zembylas terms conceptual change. As for the post-structural perspective we employ positioning......The aim of this paper is to illustrate how theories can be combined to explore opportunities for learning in doctoral supervision. While our earlier research into learning dynamics in doctoral supervision in life science research (Kobayashi, 2014) has focused on illustrating learning opportunities......-another when intertwining the analyses to get a multi-faceted insight into the phenomenon of learning to be a life science researcher. The data was derived from four observations of supervision of doctoral students in life science, each with a doctoral student and two supervisors. The storylines hypothesized...

  11. Automated segmentation of geographic atrophy in fundus autofluorescence images using supervised pixel classification.

    Science.gov (United States)

    Hu, Zhihong; Medioni, Gerard G; Hernandez, Matthias; Sadda, Srinivas R

    2015-01-01

    Geographic atrophy (GA) is a manifestation of the advanced or late stage of age-related macular degeneration (AMD). AMD is the leading cause of blindness in people over the age of 65 in the western world. The purpose of this study is to develop a fully automated supervised pixel classification approach for segmenting GA, including uni- and multifocal patches in fundus autofluorescene (FAF) images. The image features include region-wise intensity measures, gray-level co-occurrence matrix measures, and Gaussian filter banks. A [Formula: see text]-nearest-neighbor pixel classifier is applied to obtain a GA probability map, representing the likelihood that the image pixel belongs to GA. Sixteen randomly chosen FAF images were obtained from 16 subjects with GA. The algorithm-defined GA regions are compared with manual delineation performed by a certified image reading center grader. Eight-fold cross-validation is applied to evaluate the algorithm performance. The mean overlap ratio (OR), area correlation (Pearson's [Formula: see text]), accuracy (ACC), true positive rate (TPR), specificity (SPC), positive predictive value (PPV), and false discovery rate (FDR) between the algorithm- and manually defined GA regions are [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text], respectively.

  12. Improving Landsat and IRS Image Classification: Evaluation of Unsupervised and Supervised Classification through Band Ratios and DEM in a Mountainous Landscape in Nepal

    Directory of Open Access Journals (Sweden)

    Krishna Bahadur K.C.

    2009-12-01

    Full Text Available Modification of the original bands and integration of ancillary data in digital image classification has been shown to improve land use land cover classification accuracy. There are not many studies demonstrating such techniques in the context of the mountains of Nepal. The objective of this study was to explore and evaluate the use of modified band and ancillary data in Landsat and IRS image classification, and to produce a land use land cover map of the Galaudu watershed of Nepal. Classification of land uses were explored using supervised and unsupervised classification for 12 feature sets containing the LandsatMSS, TM and IRS original bands, ratios, normalized difference vegetation index, principal components and a digital elevation model. Overall, the supervised classification method produced higher accuracy than the unsupervised approach. The result from the combination of bands ration 4/3, 5/4 and 5/7 ranked the highest in terms of accuracy (82.86%, while the combination of bands 2, 3 and 4 ranked the lowest (45.29%. Inclusion of DEM as a component band shows promising results.

  13. SU-E-J-107: Supervised Learning Model of Aligned Collagen for Human Breast Carcinoma Prognosis

    Energy Technology Data Exchange (ETDEWEB)

    Bredfeldt, J; Liu, Y; Conklin, M; Keely, P; Eliceiri, K; Mackie, T [University of Wisconsin, Madison, WI (United States)

    2014-06-01

    Purpose: Our goal is to develop and apply a set of optical and computational tools to enable large-scale investigations of the interaction between collagen and tumor cells. Methods: We have built a novel imaging system for automating the capture of whole-slide second harmonic generation (SHG) images of collagen in registry with bright field (BF) images of hematoxylin and eosin stained tissue. To analyze our images, we have integrated a suite of supervised learning tools that semi-automatically model and score collagen interactions with tumor cells via a variety of metrics, a method we call Electronic Tumor Associated Collagen Signatures (eTACS). This group of tools first segments regions of epithelial cells and collagen fibers from BF and SHG images respectively. We then associate fibers with groups of epithelial cells and finally compute features based on the angle of interaction and density of the collagen surrounding the epithelial cell clusters. These features are then processed with a support vector machine to separate cancer patients into high and low risk groups. Results: We validated our model by showing that eTACS produces classifications that have statistically significant correlation with manual classifications. In addition, our system generated classification scores that accurately predicted breast cancer patient survival in a cohort of 196 patients. Feature rank analysis revealed that TACS positive fibers are more well aligned with each other, generally lower density, and terminate within or near groups of epithelial cells. Conclusion: We are working to apply our model to predict survival in larger cohorts of breast cancer patients with a diversity of breast cancer types, predict response to treatments such as COX2 inhibitors, and to study collagen architecture changes in other cancer types. In the future, our system may be used to provide metastatic potential information to cancer patients to augment existing clinical assays.

  14. Robust Latent Subspace Learning for Image Classification.

    Science.gov (United States)

    Fang, Xiaozhao; Teng, Shaohua; Lai, Zhihui; He, Zhaoshui; Xie, Shengli; Wong, Wai Keung

    2017-05-10

    This paper proposes a novel method, called robust latent subspace learning (RLSL), for image classification. We formulate an RLSL problem as a joint optimization problem over both the latent SL and classification model parameter predication, which simultaneously minimizes: 1) the regression loss between the learned data representation and objective outputs and 2) the reconstruction error between the learned data representation and original inputs. The latent subspace can be used as a bridge that is expected to seamlessly connect the origin visual features and their class labels and hence improve the overall prediction performance. RLSL combines feature learning with classification so that the learned data representation in the latent subspace is more discriminative for classification. To learn a robust latent subspace, we use a sparse item to compensate error, which helps suppress the interference of noise via weakening its response during regression. An efficient optimization algorithm is designed to solve the proposed optimization problem. To validate the effectiveness of the proposed RLSL method, we conduct experiments on diverse databases and encouraging recognition results are achieved compared with many state-of-the-arts methods.

  15. Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.

    Directory of Open Access Journals (Sweden)

    Chihyun Park

    Full Text Available BACKGROUND: The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. RESULTS: In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. CONCLUSIONS: The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.

  16. Supervised pre-processing approaches in multiple class variables classification for fish recruitment forecasting

    KAUST Repository

    Fernandes, José Antonio

    2013-02-01

    A multi-species approach to fisheries management requires taking into account the interactions between species in order to improve recruitment forecasting of the fish species. Recent advances in Bayesian networks direct the learning of models with several interrelated variables to be forecasted simultaneously. These models are known as multi-dimensional Bayesian network classifiers (MDBNs). Pre-processing steps are critical for the posterior learning of the model in these kinds of domains. Therefore, in the present study, a set of \\'state-of-the-art\\' uni-dimensional pre-processing methods, within the categories of missing data imputation, feature discretization and feature subset selection, are adapted to be used with MDBNs. A framework that includes the proposed multi-dimensional supervised pre-processing methods, coupled with a MDBN classifier, is tested with synthetic datasets and the real domain of fish recruitment forecasting. The correctly forecasting of three fish species (anchovy, sardine and hake) simultaneously is doubled (from 17.3% to 29.5%) using the multi-dimensional approach in comparison to mono-species models. The probability assessments also show high improvement reducing the average error (estimated by means of Brier score) from 0.35 to 0.27. Finally, these differences are superior to the forecasting of species by pairs. © 2012 Elsevier Ltd.

  17. Detection of facilities in satellite imagery using semi-supervised image classification and auxiliary contextual observables

    Energy Technology Data Exchange (ETDEWEB)

    Harvey, Neal R [Los Alamos National Laboratory; Ruggiero, Christy E [Los Alamos National Laboratory; Pawley, Norma H [Los Alamos National Laboratory; Brumby, Steven P [Los Alamos National Laboratory; Macdonald, Brian [Los Alamos National Laboratory; Balick, Lee [Los Alamos National Laboratory; Oyer, Alden [Los Alamos National Laboratory

    2009-01-01

    Detecting complex targets, such as facilities, in commercially available satellite imagery is a difficult problem that human analysts try to solve by applying world knowledge. Often there are known observables that can be extracted by pixel-level feature detectors that can assist in the facility detection process. Individually, each of these observables is not sufficient for an accurate and reliable detection, but in combination, these auxiliary observables may provide sufficient context for detection by a machine learning algorithm. We describe an approach for automatic detection of facilities that uses an automated feature extraction algorithm to extract auxiliary observables, and a semi-supervised assisted target recognition algorithm to then identify facilities of interest. We illustrate the approach using an example of finding schools in Quickbird image data of Albuquerque, New Mexico. We use Los Alamos National Laboratory's Genie Pro automated feature extraction algorithm to find a set of auxiliary features that should be useful in the search for schools, such as parking lots, large buildings, sports fields and residential areas and then combine these features using Genie Pro's assisted target recognition algorithm to learn a classifier that finds schools in the image data.

  18. Pre-trained Convolutional Networks and generative statiscial models: a study in semi-supervised learning

    OpenAIRE

    John Michael Salgado Cebola

    2016-01-01

    Comparative study between the performance of Convolutional Networks using pretrained models and statistical generative models on tasks of image classification in semi-supervised enviroments.Study of multiple ensembles using these techniques and generated data from estimated pdfs.Pretrained Convents, LDA, pLSA, Fisher Vectors, Sparse-coded SPMs, TSVMs being the key models worked upon.

  19. Network traffic classification based on ensemble learning and co-training

    Institute of Scientific and Technical Information of China (English)

    HE HaiTao; LUO XiaoNan; MA FeiTeng; CHE ChunHui; WANG JianMin

    2009-01-01

    Classification of network traffic Is the essential step for many network researches. However, with the rapid evolution of Internet applications the effectiveness of the port-based or payload-based identifi-cation approaches has been greatly diminished In recent years. And many researchers begin to turn their attentions to an alternative machine learning based method. This paper presents a novel machine learning-based classification model, which combines ensemble learning paradigm with co-training tech-niques. Compared to previous approaches, most of which only employed single classifier, multiple clas-sifiers and semi-supervised learning are applied in our method and it mainly helps to overcome three shortcomings: limited flow accuracy rate, weak adaptability and huge demand of labeled training set. In this paper, statistical characteristics of IP flows are extracted from the packet level traces to establish the feature set, then the classification model is created and tested and the empirical results prove its feasibility and effectiveness.

  20. An efficient flow-based botnet detection using supervised machine learning

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2014-01-01

    Botnet detection represents one of the most crucial prerequisites of successful botnet neutralization. This paper explores how accurate and timely detection can be achieved by using supervised machine learning as the tool of inferring about malicious botnet traffic. In order to do so, the paper...... introduces a novel flow-based detection system that relies on supervised machine learning for identifying botnet network traffic. For use in the system we consider eight highly regarded machine learning algorithms, indicating the best performing one. Furthermore, the paper evaluates how much traffic needs...... to accurately and timely detect botnet traffic using purely flow-based traffic analysis and supervised machine learning. Additionally, the results show that in order to achieve accurate detection traffic flows need to be monitored for only a limited time period and number of packets per flow. This indicates...

  1. Modeling Multiple Annotator Expertise in the Semi-Supervised Learning Scenario

    CERN Document Server

    Yan, Yan; Fung, Glenn; Dy, Jennifer

    2012-01-01

    Learning algorithms normally assume that there is at most one annotation or label per data point. However, in some scenarios, such as medical diagnosis and on-line collaboration,multiple annotations may be available. In either case, obtaining labels for data points can be expensive and time-consuming (in some circumstances ground-truth may not exist). Semi-supervised learning approaches have shown that utilizing the unlabeled data is often beneficial in these cases. This paper presents a probabilistic semi-supervised model and algorithm that allows for learning from both unlabeled and labeled data in the presence of multiple annotators. We assume that it is known what annotator labeled which data points. The proposed approach produces annotator models that allow us to provide (1) estimates of the true label and (2) annotator variable expertise for both labeled and unlabeled data. We provide numerical comparisons under various scenarios and with respect to standard semi-supervised learning. Experiments showed ...

  2. Generation of a supervised classification algorithm for time-series variable stars with an application to the LINEAR dataset

    Science.gov (United States)

    Johnston, K. B.; Oluseyi, H. M.

    2017-04-01

    With the advent of digital astronomy, new benefits and new problems have been presented to the modern day astronomer. While data can be captured in a more efficient and accurate manner using digital means, the efficiency of data retrieval has led to an overload of scientific data for processing and storage. This paper will focus on the construction and application of a supervised pattern classification algorithm for the identification of variable stars. Given the reduction of a survey of stars into a standard feature space, the problem of using prior patterns to identify new observed patterns can be reduced to time-tested classification methodologies and algorithms. Such supervised methods, so called because the user trains the algorithms prior to application using patterns with known classes or labels, provide a means to probabilistically determine the estimated class type of new observations. This paper will demonstrate the construction and application of a supervised classification algorithm on variable star data. The classifier is applied to a set of 192,744 LINEAR data points. Of the original samples, 34,451 unique stars were classified with high confidence (high level of probability of being the true class).

  3. Optimal Subset Selection of Time-Series MODIS Images and Sample Data Transfer with Random Forests for Supervised Classification Modelling.

    Science.gov (United States)

    Zhou, Fuqun; Zhang, Aining

    2016-10-25

    Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2-3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests' features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data.

  4. Benchmarking Deep Learning Frameworks for the Classification of Very High Resolution Satellite Multispectral Data

    Science.gov (United States)

    Papadomanolaki, M.; Vakalopoulou, M.; Zagoruyko, S.; Karantzalos, K.

    2016-06-01

    In this paper we evaluated deep-learning frameworks based on Convolutional Neural Networks for the accurate classification of multispectral remote sensing data. Certain state-of-the-art models have been tested on the publicly available SAT-4 and SAT-6 high resolution satellite multispectral datasets. In particular, the performed benchmark included the AlexNet, AlexNet-small and VGG models which had been trained and applied to both datasets exploiting all the available spectral information. Deep Belief Networks, Autoencoders and other semi-supervised frameworks have been, also, compared. The high level features that were calculated from the tested models managed to classify the different land cover classes with significantly high accuracy rates i.e., above 99.9%. The experimental results demonstrate the great potentials of advanced deep-learning frameworks for the supervised classification of high resolution multispectral remote sensing data.

  5. Learning to Teach: Teaching Internships in Counselor Education and Supervision

    Science.gov (United States)

    Hunt, Brandon; Gilmore, Genevieve Weber

    2011-01-01

    In an effort to ensure the efficacy of preparing emerging counselors in the field, CACREP standards require that by 2013 all core faculty at accredited universities have a doctorate in Counselor Education and Supervision. However, literature suggests that a disparity may exist in the preparation of counselor educators and the actual…

  6. Predicting incomplete gene microarray data with the use of supervised learning algorithms

    CSIR Research Space (South Africa)

    Twala, B

    2010-10-01

    Full Text Available of many well-established supervised learning (SL) algorithms in an attempt to provide more accurate and automatic diagnosis class (cancer/non cancer) prediction. Virtually all research on SL addresses the task of learning to classify complete domain...

  7. Supervised, Multivariate, Whole-brain Reduction Did Not Help to Achieve High Classification Performance in Schizophrenia Research

    Directory of Open Access Journals (Sweden)

    Eva Janousova

    2016-08-01

    Full Text Available We examined how penalized linear discriminant analysis with resampling, which is a supervised, multivariate, whole-brain reduction technique, can help schizophrenia diagnostics and research. In an experiment with magnetic resonance brain images of 52 first-episode schizophrenia patients and 52 healthy controls, this method allowed us to select brain areas relevant to schizophrenia, such as the left prefrontal cortex, the anterior cingulum, the right anterior insula, the thalamus and the hippocampus. Nevertheless, the classification performance based on such reduced data was not significantly better than the classification of data reduced by mass univariate selection using a t-test or unsupervised multivariate reduction using principal component analysis. Moreover, we found no important influence of the type of imaging features, namely local deformations or grey matter volumes, and the classification method, specifically linear discriminant analysis or linear support vector machines, on the classification results. However, we ascertained significant effect of a cross-validation setting on classification performance as classification results were overestimated even though the resampling was performed during the selection of brain imaging features. Therefore, it is critically important to perform cross-validation in all steps of the analysis (not only during classification in case there is no external validation set to avoid optimistically biasing the results of classification studies.

  8. Photometric Supernova Classification With Machine Learning

    CERN Document Server

    Lochner, Michelle; Peiris, Hiranya V; Lahav, Ofer; Winter, Max K

    2016-01-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Telescope (LSST), given that spectroscopic confirmation of type for all supernovae discovered with these surveys will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques fitting parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks and boosted decision trees. We test the pipeline on simulated multi-ba...

  9. Stellar classification from single-band imaging using machine learning

    CERN Document Server

    Kuntzer, T; Courbin, F

    2016-01-01

    Information on the spectral types of stars is of great interest in view of the exploitation of space-based imaging surveys. In this article, we investigate the classification of stars into spectral types using only the shape of their diffraction pattern in a single broad-band image. We propose a supervised machine learning approach to this endeavour, based on principal component analysis (PCA) for dimensionality reduction, followed by artificial neural networks (ANNs) estimating the spectral type. Our analysis is performed with image simulations mimicking the Hubble Space Telescope (HST) Advanced Camera for Surveys (ACS) in the F606W and F814W bands, as well as the Euclid VIS imager. We first demonstrate this classification in a simple context, assuming perfect knowledge of the point spread function (PSF) model and the possibility of accurately generating mock training data for the machine learning. We then analyse its performance in a fully data-driven situation, in which the training would be performed with...

  10. A comparative evaluation of supervised and unsupervised representation learning approaches for anaplastic medulloblastoma differentiation

    Science.gov (United States)

    Cruz-Roa, Angel; Arevalo, John; Basavanhally, Ajay; Madabhushi, Anant; González, Fabio

    2015-01-01

    Learning data representations directly from the data itself is an approach that has shown great success in different pattern recognition problems, outperforming state-of-the-art feature extraction schemes for different tasks in computer vision, speech recognition and natural language processing. Representation learning applies unsupervised and supervised machine learning methods to large amounts of data to find building-blocks that better represent the information in it. Digitized histopathology images represents a very good testbed for representation learning since it involves large amounts of high complex, visual data. This paper presents a comparative evaluation of different supervised and unsupervised representation learning architectures to specifically address open questions on what type of learning architectures (deep or shallow), type of learning (unsupervised or supervised) is optimal. In this paper we limit ourselves to addressing these questions in the context of distinguishing between anaplastic and non-anaplastic medulloblastomas from routine haematoxylin and eosin stained images. The unsupervised approaches evaluated were sparse autoencoders and topographic reconstruct independent component analysis, and the supervised approach was convolutional neural networks. Experimental results show that shallow architectures with more neurons are better than deeper architectures without taking into account local space invariances and that topographic constraints provide useful invariant features in scale and rotations for efficient tumor differentiation.

  11. Mapping of riparian invasive species with supervised classification of Unmanned Aerial System (UAS) imagery

    Science.gov (United States)

    Michez, Adrien; Piégay, Hervé; Jonathan, Lisein; Claessens, Hugues; Lejeune, Philippe

    2016-02-01

    Riparian zones are key landscape features, representing the interface between terrestrial and aquatic ecosystems. Although they have been influenced by human activities for centuries, their degradation has increased during the 20th century. Concomitant with (or as consequences of) these disturbances, the invasion of exotic species has increased throughout the world's riparian zones. In our study, we propose a easily reproducible methodological framework to map three riparian invasive taxa using Unmanned Aerial Systems (UAS) imagery: Impatiens glandulifera Royle, Heracleum mantegazzianum Sommier and Levier, and Japanese knotweed (Fallopia sachalinensis (F. Schmidt Petrop.), Fallopia japonica (Houtt.) and hybrids). Based on visible and near-infrared UAS orthophoto, we derived simple spectral and texture image metrics computed at various scales of image segmentation (10, 30, 45, 60 using eCognition software). Supervised classification based on the random forests algorithm was used to identify the most relevant variable (or combination of variables) derived from UAS imagery for mapping riparian invasive plant species. The models were built using 20% of the dataset, the rest of the dataset being used as a test set (80%). Except for H. mantegazzianum, the best results in terms of global accuracy were achieved with the finest scale of analysis (segmentation scale parameter = 10). The best values of overall accuracies reached 72%, 68%, and 97% for I. glandulifera, Japanese knotweed, and H. mantegazzianum respectively. In terms of selected metrics, simple spectral metrics (layer mean/camera brightness) were the most used. Our results also confirm the added value of texture metrics (GLCM derivatives) for mapping riparian invasive species. The results obtained for I. glandulifera and Japanese knotweed do not reach sufficient accuracies for operational applications. However, the results achieved for H. mantegazzianum are encouraging. The high accuracies values combined to

  12. Deep transfer learning for automatic target classification: MWIR to LWIR

    Science.gov (United States)

    Ding, Zhengming; Nasrabadi, Nasser; Fu, Yun

    2016-05-01

    Publisher's Note: This paper, originally published on 5/12/2016, was replaced with a corrected/revised version on 5/18/2016. If you downloaded the original PDF but are unable to access the revision, please contact SPIE Digital Library Customer Service for assistance. When dealing with sparse or no labeled data in the target domain, transfer learning shows its appealing performance by borrowing the supervised knowledge from external domains. Recently deep structure learning has been exploited in transfer learning due to its attractive power in extracting effective knowledge through multi-layer strategy, so that deep transfer learning is promising to address the cross-domain mismatch. In general, cross-domain disparity can be resulted from the difference between source and target distributions or different modalities, e.g., Midwave IR (MWIR) and Longwave IR (LWIR). In this paper, we propose a Weighted Deep Transfer Learning framework for automatic target classification through a task-driven fashion. Specifically, deep features and classifier parameters are obtained simultaneously for optimal classification performance. In this way, the proposed deep structures can extract more effective features with the guidance of the classifier performance; on the other hand, the classifier performance is further improved since it is optimized on more discriminative features. Furthermore, we build a weighted scheme to couple source and target output by assigning pseudo labels to target data, therefore we can transfer knowledge from source (i.e., MWIR) to target (i.e., LWIR). Experimental results on real databases demonstrate the superiority of the proposed algorithm by comparing with others.

  13. Cloud detection in all-sky images via multi-scale neighborhood features and multiple supervised learning techniques

    Science.gov (United States)

    Cheng, Hsu-Yung; Lin, Chih-Lung

    2017-01-01

    Cloud detection is important for providing necessary information such as cloud cover in many applications. Existing cloud detection methods include red-to-blue ratio thresholding and other classification-based techniques. In this paper, we propose to perform cloud detection using supervised learning techniques with multi-resolution features. One of the major contributions of this work is that the features are extracted from local image patches with different sizes to include local structure and multi-resolution information. The cloud models are learned through the training process. We consider classifiers including random forest, support vector machine, and Bayesian classifier. To take advantage of the clues provided by multiple classifiers and various levels of patch sizes, we employ a voting scheme to combine the results to further increase the detection accuracy. In the experiments, we have shown that the proposed method can distinguish cloud and non-cloud pixels more accurately compared with existing works.

  14. Semi-supervised prediction of gene regulatory networks using machine learning algorithms

    Indian Academy of Sciences (India)

    Nihir Patel; T L Wang

    2015-10-01

    Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.

  15. Semi-supervised eigenvectors for large-scale locally-biased learning

    DEFF Research Database (Denmark)

    Hansen, Toke Jansen; Mahoney, Michael W.

    2014-01-01

    -based machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities, thus limiting the applicability of eigenvector-based methods in situations where one is interested in very local properties of the data. In this paper, we address this issue by providing......In many applications, one has side information, e.g., labels that are provided in a semi-supervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks nearby that prespecified target region. For example, one might...... a methodology to construct semi-supervised eigenvectors of a graph Laplacian, and we illustrate how these locally-biased eigenvectors can be used to perform locally-biased machine learning. These semi-supervised eigenvectors capture successively-orthogonalized directions of maximum variance, conditioned...

  16. Customers Behavior Modeling by Semi-Supervised Learning in Customer Relationship Management

    CERN Document Server

    Emtiyaz, Siavash; 10.4156/AISS.vol3.issue9.31

    2012-01-01

    Leveraging the power of increasing amounts of data to analyze customer base for attracting and retaining the most valuable customers is a major problem facing companies in this information age. Data mining technologies extract hidden information and knowledge from large data stored in databases or data warehouses, thereby supporting the corporate decision making process. CRM uses data mining (one of the elements of CRM) techniques to interact with customers. This study investigates the use of a technique, semi-supervised learning, for the management and analysis of customer-related data warehouse and information. The idea of semi-supervised learning is to learn not only from the labeled training data, but to exploit also the structural information in additionally available unlabeled data. The proposed semi-supervised method is a model by means of a feed-forward neural network trained by a back propagation algorithm (multi-layer perceptron) in order to predict the category of an unknown customer (potential cus...

  17. Discriminative Nonlinear Analysis Operator Learning: When Cosparse Model Meets Image Classification

    Science.gov (United States)

    Wen, Zaidao; Hou, Biao; Jiao, Licheng

    2017-07-01

    Linear synthesis model based dictionary learning framework has achieved remarkable performances in image classification in the last decade. Behaved as a generative feature model, it however suffers from some intrinsic deficiencies. In this paper, we propose a novel parametric nonlinear analysis cosparse model (NACM) with which a unique feature vector will be much more efficiently extracted. Additionally, we derive a deep insight to demonstrate that NACM is capable of simultaneously learning the task adapted feature transformation and regularization to encode our preferences, domain prior knowledge and task oriented supervised information into the features. The proposed NACM is devoted to the classification task as a discriminative feature model and yield a novel discriminative nonlinear analysis operator learning framework (DNAOL). The theoretical analysis and experimental performances clearly demonstrate that DNAOL will not only achieve the better or at least competitive classification accuracies than the state-of-the-art algorithms but it can also dramatically reduce the time complexities in both training and testing phases.

  18. Regional manifold learning for disease classification.

    Science.gov (United States)

    Ye, Dong Hye; Desjardins, Benoit; Hamm, Jihun; Litt, Harold; Pohl, Kilian M

    2014-06-01

    While manifold learning from images itself has become widely used in medical image analysis, the accuracy of existing implementations suffers from viewing each image as a single data point. To address this issue, we parcellate images into regions and then separately learn the manifold for each region. We use the regional manifolds as low-dimensional descriptors of high-dimensional morphological image features, which are then fed into a classifier to identify regions affected by disease. We produce a single ensemble decision for each scan by the weighted combination of these regional classification results. Each weight is determined by the regional accuracy of detecting the disease. When applied to cardiac magnetic resonance imaging of 50 normal controls and 50 patients with reconstructive surgery of Tetralogy of Fallot, our method achieves significantly better classification accuracy than approaches learning a single manifold across the entire image domain.

  19. Contributions to unsupervised and supervised learning with applications in digital image processing

    OpenAIRE

    2012-01-01

    311 p. : il. [EN]This Thesis covers a broad period of research activities with a commonthread: learning processes and its application to image processing. The twomain categories of learning algorithms, supervised and unsupervised, have beentouched across these years. The main body of initial works was devoted tounsupervised learning neural architectures, specially the Self Organizing Map.Our aim was to study its convergence properties from empirical and analyticalviewpoints.From the digita...

  20. Contributions to unsupervised and supervised learning with applications in digital image processing

    OpenAIRE

    González Acuña, Ana Isabel

    2014-01-01

    311 p. : il. [EN]This Thesis covers a broad period of research activities with a commonthread: learning processes and its application to image processing. The twomain categories of learning algorithms, supervised and unsupervised, have beentouched across these years. The main body of initial works was devoted tounsupervised learning neural architectures, specially the Self Organizing Map.Our aim was to study its convergence properties from empirical and analyticalviewpoints.From the digita...

  1. Semi-supervised learning and domain adaptation in natural language processing

    CERN Document Server

    Søgaard, Anders

    2013-01-01

    This book introduces basic supervised learning algorithms applicable to natural language processing (NLP) and shows how the performance of these algorithms can often be improved by exploiting the marginal distribution of large amounts of unlabeled data. One reason for that is data sparsity, i.e., the limited amounts of data we have available in NLP. However, in most real-world NLP applications our labeled data is also heavily biased. This book introduces extensions of supervised learning algorithms to cope with data sparsity and different kinds of sampling bias.This book is intended to be both

  2. A new tool for supervised classification of satellite images available on web servers: Google Maps as a case study

    Science.gov (United States)

    García-Flores, Agustín.; Paz-Gallardo, Abel; Plaza, Antonio; Li, Jun

    2016-10-01

    This paper describes a new web platform dedicated to the classification of satellite images called Hypergim. The current implementation of this platform enables users to perform classification of satellite images from any part of the world thanks to the worldwide maps provided by Google Maps. To perform this classification, Hypergim uses unsupervised algorithms like Isodata and K-means. Here, we present an extension of the original platform in which we adapt Hypergim in order to use supervised algorithms to improve the classification results. This involves a significant modification of the user interface, providing the user with a way to obtain samples of classes present in the images to use in the training phase of the classification process. Another main goal of this development is to improve the runtime of the image classification process. To achieve this goal, we use a parallel implementation of the Random Forest classification algorithm. This implementation is a modification of the well-known CURFIL software package. The use of this type of algorithms to perform image classification is widespread today thanks to its precision and ease of training. The actual implementation of Random Forest was developed using CUDA platform, which enables us to exploit the potential of several models of NVIDIA graphics processing units using them to execute general purpose computing tasks as image classification algorithms. As well as CUDA, we use other parallel libraries as Intel Boost, taking advantage of the multithreading capabilities of modern CPUs. To ensure the best possible results, the platform is deployed in a cluster of commodity graphics processing units (GPUs), so that multiple users can use the tool in a concurrent way. The experimental results indicate that this new algorithm widely outperform the previous unsupervised algorithms implemented in Hypergim, both in runtime as well as precision of the actual classification of the images.

  3. Scalable active learning for multiclass image classification.

    Science.gov (United States)

    Joshi, Ajay J; Porikli, Fatih; Papanikolopoulos, Nikolaos P

    2012-11-01

    Machine learning techniques for computer vision applications like object recognition, scene classification, etc., require a large number of training samples for satisfactory performance. Especially when classification is to be performed over many categories, providing enough training samples for each category is infeasible. This paper describes new ideas in multiclass active learning to deal with the training bottleneck, making it easier to train large multiclass image classification systems. First, we propose a new interaction modality for training which requires only yes-no type binary feedback instead of a precise category label. The modality is especially powerful in the presence of hundreds of categories. For the proposed modality, we develop a Value-of-Information (VOI) algorithm that chooses informative queries while also considering user annotation cost. Second, we propose an active selection measure that works with many categories and is extremely fast to compute. This measure is employed to perform a fast seed search before computing VOI, resulting in an algorithm that scales linearly with dataset size. Third, we use locality sensitive hashing to provide a very fast approximation to active learning, which gives sublinear time scaling, allowing application to very large datasets. The approximation provides up to two orders of magnitude speedups with little loss in accuracy. Thorough empirical evaluation of classification accuracy, noise sensitivity, imbalanced data, and computational performance on a diverse set of image datasets demonstrates the strengths of the proposed algorithms.

  4. Galaxy Classification using Machine Learning

    Science.gov (United States)

    Fowler, Lucas; Schawinski, Kevin; Brandt, Ben-Elias; widmer, Nicole

    2017-01-01

    We present our current research into the use of machine learning to classify galaxy imaging data with various convolutional neural network configurations in TensorFlow. We are investigating how five-band Sloan Digital Sky Survey imaging data can be used to train on physical properties such as redshift, star formation rate, mass and morphology. We also investigate the performance of artificially redshifted images in recovering physical properties as image quality degrades.

  5. Photometric Supernova Classification with Machine Learning

    Science.gov (United States)

    Lochner, Michelle; McEwen, Jason D.; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.

    2016-08-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  6. Classification

    Data.gov (United States)

    National Aeronautics and Space Administration — A supervised learning task involves constructing a mapping from an input data space (normally described by several features) to an output space. A set of training...

  7. Binary classification of ¹⁸F-flutemetamol PET using machine learning

    DEFF Research Database (Denmark)

    Vandenberghe, Rik; Nelissen, Natalie; Salmon, Eric

    2013-01-01

    (18)F-flutemetamol is a positron emission tomography (PET) tracer for in vivo amyloid imaging. The ability to classify amyloid scans in a binary manner as 'normal' versus 'Alzheimer-like', is of high clinical relevance. We evaluated whether a supervised machine learning technique, support vector...... machines (SVM), can replicate the assignments made by visual readers blind to the clinical diagnosis, which image components have highest diagnostic value according to SVM and how (18)F-flutemetamol-based classification using SVM relates to structural MRI-based classification using SVM within the same...

  8. Supervised Learning of Logical Operations in Layered Spiking Neural Networks with Spike Train Encoding

    CERN Document Server

    Grüning, André

    2011-01-01

    Few algorithms for supervised training of spiking neural networks exist that can deal with patterns of multiple spikes, and their computational properties are largely unexplored. We demonstrate in a set of simulations that the ReSuMe learning algorithm can be successfully applied to layered neural networks. Input and output patterns are encoded as spike trains of multiple precisely timed spikes, and the network learns to transform the input trains into target output trains. This is done by combining the ReSuMe learning algorithm with multiplicative scaling of the connections of downstream neurons. We show in particular that layered networks with one hidden layer can learn the basic logical operations, including Exclusive-Or, while networks without hidden layer cannot, mirroring an analogous result for layered networks of rate neurons. While supervised learning in spiking neural networks is not yet fit for technical purposes, exploring computational properties of spiking neural networks advances our understand...

  9. Extension of Companion Modeling Using Classification Learning

    Science.gov (United States)

    Torii, Daisuke; Bousquet, François; Ishida, Toru

    Companion Modeling is a methodology of refining initial models for understanding reality through a role-playing game (RPG) and a multiagent simulation. In this research, we propose a novel agent model construction methodology in which classification learning is applied to the RPG log data in Companion Modeling. This methodology enables a systematic model construction that handles multi-parameters, independent of the modelers ability. There are three problems in applying classification learning to the RPG log data: 1) It is difficult to gather enough data for the number of features because the cost of gathering data is high. 2) Noise data can affect the learning results because the amount of data may be insufficient. 3) The learning results should be explained as a human decision making model and should be recognized by the expert as being the result that reflects reality. We realized an agent model construction system using the following two approaches: 1) Using a feature selction method, the feature subset that has the best prediction accuracy is identified. In this process, the important features chosen by the expert are always included. 2) The expert eliminates irrelevant features from the learning results after evaluating the learning model through a visualization of the results. Finally, using the RPG log data from the Companion Modeling of agricultural economics in northeastern Thailand, we confirm the capability of this methodology.

  10. Stellar classification from single-band imaging using machine learning

    Science.gov (United States)

    Kuntzer, T.; Tewes, M.; Courbin, F.

    2016-06-01

    Information on the spectral types of stars is of great interest in view of the exploitation of space-based imaging surveys. In this article, we investigate the classification of stars into spectral types using only the shape of their diffraction pattern in a single broad-band image. We propose a supervised machine learning approach to this endeavour, based on principal component analysis (PCA) for dimensionality reduction, followed by artificial neural networks (ANNs) estimating the spectral type. Our analysis is performed with image simulations mimicking the Hubble Space Telescope (HST) Advanced Camera for Surveys (ACS) in the F606W and F814W bands, as well as the Euclid VIS imager. We first demonstrate this classification in a simple context, assuming perfect knowledge of the point spread function (PSF) model and the possibility of accurately generating mock training data for the machine learning. We then analyse its performance in a fully data-driven situation, in which the training would be performed with a limited subset of bright stars from a survey, and an unknown PSF with spatial variations across the detector. We use simulations of main-sequence stars with flat distributions in spectral type and in signal-to-noise ratio, and classify these stars into 13 spectral subclasses, from O5 to M5. Under these conditions, the algorithm achieves a high success rate both for Euclid and HST images, with typical errors of half a spectral class. Although more detailed simulations would be needed to assess the performance of the algorithm on a specific survey, this shows that stellar classification from single-band images is well possible.

  11. Improvements on coronal hole detection in SDO/AIA images using supervised classification

    Science.gov (United States)

    Reiss, Martin A.; Hofmeister, Stefan J.; De Visscher, Ruben; Temmer, Manuela; Veronig, Astrid M.; Delouille, Véronique; Mampaey, Benjamin; Ahammer, Helmut

    2015-07-01

    We demonstrate the use of machine learning algorithms in combination with segmentation techniques in order to distinguish coronal holes and filaments in SDO/AIA EUV images of the Sun. Based on two coronal hole detection techniques (intensity-based thresholding, SPoCA), we prepared datasets of manually labeled coronal hole and filament channel regions present on the Sun during the time range 2011-2013. By mapping the extracted regions from EUV observations onto HMI line-of-sight magnetograms we also include their magnetic characteristics. We computed shape measures from the segmented binary maps as well as first order and second order texture statistics from the segmented regions in the EUV images and magnetograms. These attributes were used for data mining investigations to identify the most performant rule to differentiate between coronal holes and filament channels. We applied several classifiers, namely Support Vector Machine (SVM), Linear Support Vector Machine, Decision Tree, and Random Forest, and found that all classification rules achieve good results in general, with linear SVM providing the best performances (with a true skill statistic of ≈ 0.90). Additional information from magnetic field data systematically improves the performance across all four classifiers for the SPoCA detection. Since the calculation is inexpensive in computing time, this approach is well suited for applications on real-time data. This study demonstrates how a machine learning approach may help improve upon an unsupervised feature extraction method.

  12. African Journal of Science and Technology (AJST) SUPERVISED ...

    African Journals Online (AJOL)

    NORBERT OPIYO AKECH

    ABSTRACT: TThis paper proposes a new method for supervised color image classification by the ... learning quantisation vector (LVQ), is constructed and compared to the K-means clustering ..... colored scanned maps, Machine Vision and.

  13. Visual feature learning in artificial grammar classification.

    Science.gov (United States)

    Chang, Grace Y; Knowlton, Barbara J

    2004-05-01

    The Artificial Grammar Learning task has been used extensively to assess individuals' implicit learning capabilities. Previous work suggests that participants implicitly acquire rule-based knowledge as well as exemplar-specific knowledge in this task. This study investigated whether exemplar-specific knowledge acquired in this task is based on the visual features of the exemplars. When a change in the font and case occurred between study and test, there was no effect on sensitivity to grammatical rules in classification judgments. However, such a change did virtually eliminate sensitivity to training frequencies of letter bigrams and trigrams (chunk strength) in classification judgments. Performance of a secondary task during study eliminated this font sensitivity and generally reduced the contribution of chunk strength knowledge. The results are consistent with the idea that perceptual fluency makes a contribution to artificial grammar judgments.

  14. Machine Learning for Biological Trajectory Classification Applications

    Science.gov (United States)

    Sbalzarini, Ivo F.; Theriot, Julie; Koumoutsakos, Petros

    2002-01-01

    Machine-learning techniques, including clustering algorithms, support vector machines and hidden Markov models, are applied to the task of classifying trajectories of moving keratocyte cells. The different algorithms axe compared to each other as well as to expert and non-expert test persons, using concepts from signal-detection theory. The algorithms performed very well as compared to humans, suggesting a robust tool for trajectory classification in biological applications.

  15. Machine Learning for Galaxy Morphology Classification

    CERN Document Server

    Gauci, Adam; Abela, John; Magro, Alessio

    2010-01-01

    In this work, decision tree learning algorithms and fuzzy inferencing systems are applied for galaxy morphology classification. In particular, the CART, the C4.5, the Random Forest and fuzzy logic algorithms are studied and reliable classifiers are developed to distinguish between spiral galaxies, elliptical galaxies or star/unknown galactic objects. Morphology information for the training and testing datasets is obtained from the Galaxy Zoo project while the corresponding photometric and spectra parameters are downloaded from the SDSS DR7 catalogue.

  16. Supervised learning with decision tree-based methods in computational and systems biology.

    Science.gov (United States)

    Geurts, Pierre; Irrthum, Alexandre; Wehenkel, Louis

    2009-12-01

    At the intersection between artificial intelligence and statistics, supervised learning allows algorithms to automatically build predictive models from just observations of a system. During the last twenty years, supervised learning has been a tool of choice to analyze the always increasing and complexifying data generated in the context of molecular biology, with successful applications in genome annotation, function prediction, or biomarker discovery. Among supervised learning methods, decision tree-based methods stand out as non parametric methods that have the unique feature of combining interpretability, efficiency, and, when used in ensembles of trees, excellent accuracy. The goal of this paper is to provide an accessible and comprehensive introduction to this class of methods. The first part of the review is devoted to an intuitive but complete description of decision tree-based methods and a discussion of their strengths and limitations with respect to other supervised learning methods. The second part of the review provides a survey of their applications in the context of computational and systems biology.

  17. Re/Learning Student Teaching Supervision: A Co/Autoethnographic Self-Study

    Science.gov (United States)

    Butler, Brandon M.; Diacopoulos, Mark M.

    2016-01-01

    This article documents the critical friendship of an experienced teacher educator and a doctoral student through our joint exploration of student teaching supervision. By adopting a co/autoethnographic approach, we learned from biographical and contemporaneous critical incidents that informed short- and long-term practices. In particular, we…

  18. Undergraduate Internship Supervision in Psychology Departments: Use of Experiential Learning Best Practices

    Science.gov (United States)

    Bailey, Sarah F.; Barber, Larissa K.; Nelson, Videl L.

    2017-01-01

    This study examined trends in how psychology internships are supervised compared to current experiential learning best practices in the literature. We sent a brief online survey to relevant contact persons for colleges/universities with psychology departments throughout the United States (n = 149 responded). Overall, the majority of institutions…

  19. Multiclass semi-supervised learning for animal behavior recognition from accelerometer data

    NARCIS (Netherlands)

    Tanha, J.; van Someren, M.; de Bakker, M.; Bouten, W.; Shamoun-Baranes, J.; Afsarmanesh, H.

    2012-01-01

    In this paper we present a new Multiclass semi-supervised learning algorithm that uses a base classifier in combination with a similarity function applied to all data to find a classifier that maximizes the margin and consistency over all data. A novel multiclass loss function is presented and used

  20. Social media research: The application of supervised machine learning in organizational communication research

    NARCIS (Netherlands)

    van Zoonen, W.; van der Meer, T.G.L.A.

    2016-01-01

    Despite the online availability of data, analysis of this information in academic research is arduous. This article explores the application of supervised machine learning (SML) to overcome challenges associated with online data analysis. In SML classifiers are used to categorize and code binary dat

  1. Learning features for tissue classification with the classification restricted Boltzmann machine

    DEFF Research Database (Denmark)

    van Tulder, Gijs; de Bruijne, Marleen

    2014-01-01

    Performance of automated tissue classification in medical imaging depends on the choice of descriptive features. In this paper, we show how restricted Boltzmann machines (RBMs) can be used to learn features that are especially suited for texture-based tissue classification. We introduce...... the convolutional classification RBM, a combination of the existing convolutional RBM and classification RBM, and use it for discriminative feature learning. We evaluate the classification accuracy of convolutional and non-convolutional classification RBMs on two lung CT problems. We find that RBM-learned features...... outperform conventional RBM-based feature learning, which is unsupervised and uses only a generative learning objective, as well as often-used filter banks. We show that a mixture of generative and discriminative learning can produce filters that give a higher classification accuracy....

  2. Kernel-based machine learning techniques for infrasound signal classification

    Science.gov (United States)

    Tuma, Matthias; Igel, Christian; Mialle, Pierrick

    2014-05-01

    Infrasound monitoring is one of four remote sensing technologies continuously employed by the CTBTO Preparatory Commission. The CTBTO's infrasound network is designed to monitor the Earth for potential evidence of atmospheric or shallow underground nuclear explosions. Upon completion, it will comprise 60 infrasound array stations distributed around the globe, of which 47 were certified in January 2014. Three stages can be identified in CTBTO infrasound data processing: automated processing at the level of single array stations, automated processing at the level of the overall global network, and interactive review by human analysts. At station level, the cross correlation-based PMCC algorithm is used for initial detection of coherent wavefronts. It produces estimates for trace velocity and azimuth of incoming wavefronts, as well as other descriptive features characterizing a signal. Detected arrivals are then categorized into potentially treaty-relevant versus noise-type signals by a rule-based expert system. This corresponds to a binary classification task at the level of station processing. In addition, incoming signals may be grouped according to their travel path in the atmosphere. The present work investigates automatic classification of infrasound arrivals by kernel-based pattern recognition methods. It aims to explore the potential of state-of-the-art machine learning methods vis-a-vis the current rule-based and task-tailored expert system. To this purpose, we first address the compilation of a representative, labeled reference benchmark dataset as a prerequisite for both classifier training and evaluation. Data representation is based on features extracted by the CTBTO's PMCC algorithm. As classifiers, we employ support vector machines (SVMs) in a supervised learning setting. Different SVM kernel functions are used and adapted through different hyperparameter optimization routines. The resulting performance is compared to several baseline classifiers. All

  3. Supervised learning for neural manifold using spatiotemporal brain activity

    Science.gov (United States)

    Kuo, Po-Chih; Chen, Yong-Sheng; Chen, Li-Fen

    2015-12-01

    Objective. Determining the means by which perceived stimuli are compactly represented in the human brain is a difficult task. This study aimed to develop techniques for the construction of the neural manifold as a representation of visual stimuli. Approach. We propose a supervised locally linear embedding method to construct the embedded manifold from brain activity, taking into account similarities between corresponding stimuli. In our experiments, photographic portraits were used as visual stimuli and brain activity was calculated from magnetoencephalographic data using a source localization method. Main results. The results of 10 × 10-fold cross-validation revealed a strong correlation between manifolds of brain activity and the orientation of faces in the presented images, suggesting that high-level information related to image content can be revealed in the brain responses represented in the manifold. Significance. Our experiments demonstrate that the proposed method is applicable to investigation into the inherent patterns of brain activity.

  4. Multilabel image classification via high-order label correlation driven active learning.

    Science.gov (United States)

    Zhang, Bang; Wang, Yang; Chen, Fang

    2014-03-01

    Supervised machine learning techniques have been applied to multilabel image classification problems with tremendous success. Despite disparate learning mechanisms, their performances heavily rely on the quality of training images. However, the acquisition of training images requires significant efforts from human annotators. This hinders the applications of supervised learning techniques to large scale problems. In this paper, we propose a high-order label correlation driven active learning (HoAL) approach that allows the iterative learning algorithm itself to select the informative example-label pairs from which it learns so as to learn an accurate classifier with less annotation efforts. Four crucial issues are considered by the proposed HoAL: 1) unlike binary cases, the selection granularity for multilabel active learning need to be fined from example to example-label pair; 2) different labels are seldom independent, and label correlations provide critical information for efficient learning; 3) in addition to pair-wise label correlations, high-order label correlations are also informative for multilabel active learning; and 4) since the number of label combinations increases exponentially with respect to the number of labels, an efficient mining method is required to discover informative label correlations. The proposed approach is tested on public data sets, and the empirical results demonstrate its effectiveness.

  5. A Generalized Image Scene Decomposition-Based System for Supervised Classification of Very High Resolution Remote Sensing Imagery

    Directory of Open Access Journals (Sweden)

    ZhiYong Lv

    2016-09-01

    Full Text Available Very high resolution (VHR remote sensing images are widely used for land cover classification. However, to the best of our knowledge, few approaches have been shown to improve classification accuracies through image scene decomposition. In this paper, a simple yet powerful observational scene scale decomposition (OSSD-based system is proposed for the classification of VHR images. Different from the traditional methods, the OSSD-based system aims to improve the classification performance by decomposing the complexity of an image’s content. First, an image scene is divided into sub-image blocks through segmentation to decompose the image content. Subsequently, each sub-image block is classified respectively, or each block is processed firstly through an image filter or spectral–spatial feature extraction method, and then each processed segment is taken as the feature input of a classifier. Finally, classified sub-maps are fused together for accuracy evaluation. The effectiveness of our proposed approach was investigated through experiments performed on different images with different supervised classifiers, namely, support vector machine, k-nearest neighbor, naive Bayes classifier, and maximum likelihood classifier. Compared with the accuracy achieved without OSSD processing, the accuracy of each classifier improved significantly, and our proposed approach shows outstanding performance in terms of classification accuracy.

  6. Cost-conscious comparison of supervised learning algorithms over multiple data sets

    OpenAIRE

    Ulaş, Aydın; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem

    2012-01-01

    In the literature, there exist statistical tests to compare supervised learning algorithms on multiple data sets in terms of accuracy but they do not always generate an ordering. We propose Multi(2)Test, a generalization of our previous work, for ordering multiple learning algorithms on multiple data sets from "best" to "worst" where our goodness measure is composed of a prior cost term additional to generalization error. Our simulations show that Multi2Test generates orderings using pairwise...

  7. Supervised classification of aerial imagery and multi-source data fusion for flood assessment

    Science.gov (United States)

    Sava, E.; Harding, L.; Cervone, G.

    2015-12-01

    Floods are among the most devastating natural hazards and the ability to produce an accurate and timely flood assessment before, during, and after an event is critical for their mitigation and response. Remote sensing technologies have become the de-facto approach for observing the Earth and its environment. However, satellite remote sensing data are not always available. For these reasons, it is crucial to develop new techniques in order to produce flood assessments during and after an event. Recent advancements in data fusion techniques of remote sensing with near real time heterogeneous datasets have allowed emergency responders to more efficiently extract increasingly precise and relevant knowledge from the available information. This research presents a fusion technique using satellite remote sensing imagery coupled with non-authoritative data such as Civil Air Patrol (CAP) and tweets. A new computational methodology is proposed based on machine learning algorithms to automatically identify water pixels in CAP imagery. Specifically, wavelet transformations are paired with multiple classifiers, run in parallel, to build models discriminating water and non-water regions. The learned classification models are first tested against a set of control cases, and then used to automatically classify each image separately. A measure of uncertainty is computed for each pixel in an image proportional to the number of models classifying the pixel as water. Geo-tagged tweets are continuously harvested and stored on a MongoDB and queried in real time. They are fused with CAP classified data, and with satellite remote sensing derived flood extent results to produce comprehensive flood assessment maps. The final maps are then compared with FEMA generated flood extents to assess their accuracy. The proposed methodology is applied on two test cases, relative to the 2013 floods in Boulder CO, and the 2015 floods in Texas.

  8. Developing a practice of supervision in university as a collective learning process

    DEFF Research Database (Denmark)

    Lund, Birthe; Jensen, Annie Aarup

    2009-01-01

    of the framework surrounding the supervision process, both as regards the students and the teachers; to de-privatize the problems encountered by the individual teacher during the supervision; to ensure that students would be able to graduate within the timeframe of the education (the institutional economic...... of creating a transformation in the sense that it may change from being a top-down project (instigated by the Faculty) and develop into being a bottom-up project. It may hold the potential for developing collective learning processes assuming that good structures and frameworks can be created, as well...

  9. Using Supervised Learning Techniques for Diagnosis of Dynamic Systems

    Science.gov (United States)

    2002-05-04

    classification systems [11]. Neural network techniques have recently been applied in diverse fields, as 1 INTRODUCTION medicine [12] or power supply [13]. Machine...partiality financed by the Comisi6n 0.99 0.98 1.02 OK OK OK Interministerial de Ciencia y Tecnologia (DP12000-0666-C02-02) 1 1.02 1.02 OK OK OK and the

  10. Discriminative Bayesian Dictionary Learning for Classification.

    Science.gov (United States)

    Akhtar, Naveed; Shafait, Faisal; Mian, Ajmal

    2016-12-01

    We propose a Bayesian approach to learn discriminative dictionaries for sparse representation of data. The proposed approach infers probability distributions over the atoms of a discriminative dictionary using a finite approximation of Beta Process. It also computes sets of Bernoulli distributions that associate class labels to the learned dictionary atoms. This association signifies the selection probabilities of the dictionary atoms in the expansion of class-specific data. Furthermore, the non-parametric character of the proposed approach allows it to infer the correct size of the dictionary. We exploit the aforementioned Bernoulli distributions in separately learning a linear classifier. The classifier uses the same hierarchical Bayesian model as the dictionary, which we present along the analytical inference solution for Gibbs sampling. For classification, a test instance is first sparsely encoded over the learned dictionary and the codes are fed to the classifier. We performed experiments for face and action recognition; and object and scene-category classification using five public datasets and compared the results with state-of-the-art discriminative sparse representation approaches. Experiments show that the proposed Bayesian approach consistently outperforms the existing approaches.

  11. 局部学习半监督多类分类机%Local learning semi-supervised multi-class classifier

    Institute of Scientific and Technical Information of China (English)

    吕佳; 邓乃扬; 田英杰; 邵元海; 杨新民

    2013-01-01

    半监督多类分类问题是机器学习和模式识别领域中的一个研究热点,目前大多数多类分类算法是将问题分解成若干个二类分类问题来求解.提出两种类标号表示方法来避免多个二类分类问题的求解,一种是单位圆类标号表示方法,一种是二进制序列类标号表示方法,并利用局部学习在二类分类问题中的良好学习特性,提出基于局部学习的半监督多类分类机.实验结果证明采用了基于局部学习的半监督多类分类机错分率更小,稳定性更高.%Semi-supervised multi-class classification problem opens research focuses in machine learning and pattern recognition, currently it is decomposed into a set of binary classification problems. Two kinds of class label presentation methods that one was class label presentation method of unit disc and the other was that of binary string were proposed for fear that multiple binary classification problems were solved. Besides, local learning has the good feature in semi-supervised binary classification problem. On the basis of it, local learning semi-supervised multi-class classifier was presented in this paper. The effectiveness of the algorithms was confirmed with experiments on benchmark datasets compared to other related algorithms.

  12. Sparse extreme learning machine for classification.

    Science.gov (United States)

    Bai, Zuo; Huang, Guang-Bin; Wang, Danwei; Wang, Han; Westover, M Brandon

    2014-10-01

    Extreme learning machine (ELM) was initially proposed for single-hidden-layer feedforward neural networks (SLFNs). In the hidden layer (feature mapping), nodes are randomly generated independently of training data. Furthermore, a unified ELM was proposed, providing a single framework to simplify and unify different learning methods, such as SLFNs, least square support vector machines, proximal support vector machines, and so on. However, the solution of unified ELM is dense, and thus, usually plenty of storage space and testing time are required for large-scale applications. In this paper, a sparse ELM is proposed as an alternative solution for classification, reducing storage space and testing time. In addition, unified ELM obtains the solution by matrix inversion, whose computational complexity is between quadratic and cubic with respect to the training size. It still requires plenty of training time for large-scale problems, even though it is much faster than many other traditional methods. In this paper, an efficient training algorithm is specifically developed for sparse ELM. The quadratic programming problem involved in sparse ELM is divided into a series of smallest possible sub-problems, each of which are solved analytically. Compared with SVM, sparse ELM obtains better generalization performance with much faster training speed. Compared with unified ELM, sparse ELM achieves similar generalization performance for binary classification applications, and when dealing with large-scale binary classification problems, sparse ELM realizes even faster training speed than unified ELM.

  13. Supervised orthogonal discriminant subspace projects learning for face recognition.

    Science.gov (United States)

    Chen, Yu; Xu, Xiao-Hong

    2014-02-01

    In this paper, a new linear dimension reduction method called supervised orthogonal discriminant subspace projection (SODSP) is proposed, which addresses high-dimensionality of data and the small sample size problem. More specifically, given a set of data points in the ambient space, a novel weight matrix that describes the relationship between the data points is first built. And in order to model the manifold structure, the class information is incorporated into the weight matrix. Based on the novel weight matrix, the local scatter matrix as well as non-local scatter matrix is defined such that the neighborhood structure can be preserved. In order to enhance the recognition ability, we impose an orthogonal constraint into a graph-based maximum margin analysis, seeking to find a projection that maximizes the difference, rather than the ratio between the non-local scatter and the local scatter. In this way, SODSP naturally avoids the singularity problem. Further, we develop an efficient and stable algorithm for implementing SODSP, especially, on high-dimensional data set. Moreover, the theoretical analysis shows that LPP is a special instance of SODSP by imposing some constraints. Experiments on the ORL, Yale, Extended Yale face database B and FERET face database are performed to test and evaluate the proposed algorithm. The results demonstrate the effectiveness of SODSP.

  14. Developing a practice of supervision in university as a collective learning process

    DEFF Research Database (Denmark)

    Lund, Birthe; Jensen, Annie Aarup

    2009-01-01

    of the framework surrounding the supervision process, both as regards the students and the teachers; to de-privatize the problems encountered by the individual teacher during the supervision; to ensure that students would be able to graduate within the timeframe of the education (the institutional economic......The point of departure of the paper is a university pedagogical course established with the purpose of strengthening the university teachers’ competence regarding the supervision of students working on their master’s thesis. The purpose of the course is furthermore to ensure the improvement...... of creating a transformation in the sense that it may change from being a top-down project (instigated by the Faculty) and develop into being a bottom-up project. It may hold the potential for developing collective learning processes assuming that good structures and frameworks can be created, as well...

  15. Soft supervised self-organizing mapping (3SOM) for improving land cover classification with MODIS time-series

    Science.gov (United States)

    Lawawirojwong, Siam

    Classification of remote sensing data has long been a fundamental technique for studying vegetation and land cover. Furthermore, land use and land cover maps are a basic need for environmental science. These maps are important for crop system monitoring and are also valuable resources for decision makers. Therefore, an up-to-date and highly accurate land cover map with detailed and timely information is required for the global environmental change research community to support natural resource management, environmental protection, and policy making. However, there appears to be a number of limitations associated with data utilization such as weather conditions, data availability, cost, and the time needed for acquiring and processing large numbers of images. Additionally, improving the classification accuracy and reducing the classification time have long been the goals of remote sensing research and they still require the further study. To manage these challenges, the primary goal of this research is to improve classification algorithms that utilize MODIS-EVI time-series images. A supervised self-organizing map (SSOM) and a soft supervised self-organizing map (3SOM) are modified and improved to increase classification efficiency and accuracy. To accomplish the main goal, the performance of the proposed methods is investigated using synthetic and real landscape data derived from MODIS-EVI time-series images. Two study areas are selected based on a difference of land cover characteristics: one in Thailand and one in the Midwestern U.S. The results indicate that time-series imagery is a potentially useful input dataset for land cover classification. Moreover, the SSOM with time-series data significantly outperforms the conventional classification techniques of the Gaussian maximum likelihood classifier (GMLC) and backpropagation neural network (BPNN). In addition, the 3SOM employed as a soft classifier delivers a more accurate classification than the SSOM applied as

  16. No Free Lunch versus Occam's Razor in Supervised Learning

    CERN Document Server

    Lattimore, Tor

    2011-01-01

    The No Free Lunch theorems are often used to argue that domain specific knowledge is required to design successful algorithms. We use algorithmic information theory to argue the case for a universal bias allowing an algorithm to succeed in all interesting problem domains. Additionally, we give a new algorithm for off-line classification, inspired by Solomonoff induction, with good performance on all structured problems under reasonable assumptions. This includes a proof of the efficacy of the well-known heuristic of randomly selecting training data in the hope of reducing misclassification rates.

  17. A Novel Approach to Developing a Supervised Spatial Decision Support System for Image Classification: A Study of Paddy Rice Investigation

    Directory of Open Access Journals (Sweden)

    Shih-Hsun Chang

    2014-01-01

    Full Text Available Paddy rice area estimation via remote sensing techniques has been well established in recent years. Texture information and vegetation indicators are widely used to improve the classification accuracy of satellite images. Accordingly, this study employs texture information and vegetation indicators as ancillary information for classifying paddy rice through remote sensing images. In the first stage, the images are attained using a remote sensing technique and ancillary information is employed to increase the accuracy of classification. In the second stage, we decide to construct an efficient supervised classifier, which is used to evaluate the ancillary information. In the third stage, linear discriminant analysis (LDA is introduced. LDA is a well-known method for classifying images to various categories. Also, the particle swarm optimization (PSO algorithm is employed to optimize the LDA classification outcomes and increase classification performance. In the fourth stage, we discuss the strategy of selecting different window sizes and analyze particle numbers and iteration numbers with corresponding accuracy. Accordingly, a rational strategy for the combination of ancillary information is introduced. Afterwards, the PSO algorithm improves the accuracy rate from 82.26% to 89.31%. The improved accuracy results in a much lower salt-and-pepper effect in the thematic map.

  18. Supervised Classification of Benthic Reflectance in Shallow Subtropical Waters Using a Generalized Pixel-Based Classifier across a Time Series

    Directory of Open Access Journals (Sweden)

    Tara Blakey

    2015-04-01

    Full Text Available We tested a supervised classification approach with Landsat 5 Thematic Mapper (TM data for time-series mapping of seagrass in a subtropical lagoon. Seagrass meadows are an integral link between marine and inland ecosystems and are at risk from upstream processes such as runoff and erosion. Despite the prevalence of image-specific approaches, the classification accuracies we achieved show that pixel-based spectral classes may be generalized and applied to a time series of images that were not included in the classifier training. We employed in-situ data on seagrass abundance from 2007 to 2011 to train and validate a classification model. We created depth-invariant bands from TM bands 1, 2, and 3 to correct for variations in water column depth prior to building the classification model. In-situ data showed mean total seagrass cover remained relatively stable over the study area and period, with seagrass cover generally denser in the west than the east. Our approach achieved mapping accuracies (67% and 76% for two validation years comparable with those attained using spectral libraries, but was simpler to implement. We produced a series of annual maps illustrating inter-annual variability in seagrass occurrence. Accuracies may be improved in future work by better addressing the spatial mismatch between pixel size of remotely sensed data and footprint of field data and by employing atmospheric correction techniques that normalize reflectances across images.

  19. SAR Target Recognition via Supervised Discriminative Dictionary Learning and Sparse Representation of the SAR-HOG Feature

    Directory of Open Access Journals (Sweden)

    Shengli Song

    2016-08-01

    Full Text Available Automatic target recognition (ATR in synthetic aperture radar (SAR images plays an important role in both national defense and civil applications. Although many methods have been proposed, SAR ATR is still very challenging due to the complex application environment. Feature extraction and classification are key points in SAR ATR. In this paper, we first design a novel feature, which is a histogram of oriented gradients (HOG-like feature for SAR ATR (called SAR-HOG. Then, we propose a supervised discriminative dictionary learning (SDDL method to learn a discriminative dictionary for SAR ATR and propose a strategy to simplify the optimization problem. Finally, we propose a SAR ATR classifier based on SDDL and sparse representation (called SDDLSR, in which both the reconstruction error and the classification error are considered. Extensive experiments are performed on the MSTAR database under standard operating conditions and extended operating conditions. The experimental results show that SAR-HOG can reliably capture the structures of targets in SAR images, and SDDL can further capture subtle differences among the different classes. By virtue of the SAR-HOG feature and SDDLSR, the proposed method achieves the state-of-the-art performance on MSTAR database. Especially for the extended operating conditions (EOC scenario “Training 17 ∘ —Testing 45 ∘ ”, the proposed method improves remarkably with respect to the previous works.

  20. A Supervised Learning Approach to Search of Definitions

    Institute of Scientific and Technical Information of China (English)

    Jun Xu; Yun-Bo Cao; Hang Li; Min Zhao; Ya-Lou Huang

    2006-01-01

    This paper addresses the issue of search of definitions. Specifically, for a given term, we are to find out its definition candidates and rank the candidates according to their likelihood of being good definitions. This is in contrast to the traditional methods of either generating a single combined definition or outputting all retrieved definitions. Definition ranking is essential for tasks. A specification for judging the goodness of a definition is given. In the specification, a definition is categorized into one of the three levels: good definition, indifferent definition, or bad definition. Methods of performing definition ranking are also proposed in this paper, which formalize the problem as either classification or ordinal regression.We employ SVM (Support Vector Machines) as the classification model and Ranking SVM as the ordinal regression model respectively, and thus they rank definition candidates according to their likelihood of being good definitions. Features for constructing the SVM and Ranking SVM models are defined, which represent the characteristics of terms, definition candidate, and their relationship. Experimental results indicate that the use of SVM and Ranking SVM can significantly outperform the baseline methods such as heuristic rules, the conventional information retrieval-Okapi, or SVM regression.This is true when both the answers are paragraphs and they are sentences. Experimental results also show that SVM or Ranking SVM models trained in one domain can be adapted to another domain, indicating that generic models for definition ranking can be constructed.

  1. Facilitating the Learning Process in Design-Based Learning Practices: An Investigation of Teachers' Actions in Supervising Students

    Science.gov (United States)

    Gómez Puente, S. M.; van Eijck, M.; Jochems, W.

    2013-01-01

    Background: In research on design-based learning (DBL), inadequate attention is paid to the role the teacher plays in supervising students in gathering and applying knowledge to design artifacts, systems, and innovative solutions in higher education. Purpose: In this study, we examine whether teacher actions we previously identified in the DBL…

  2. Emotional Literacy Support Assistants' Views on Supervision Provided by Educational Psychologists: What EPs Can Learn from Group Supervision

    Science.gov (United States)

    Osborne, Cara; Burton, Sheila

    2014-01-01

    The Educational Psychology Service in this study has responsibility for providing group supervision to Emotional Literacy Support Assistants (ELSAs) working in schools. To date, little research has examined this type of inter-professional supervision arrangement. The current study used a questionnaire to examine ELSAs' views on the supervision…

  3. Online semi-supervised learning: algorithm and application in metagenomics

    NARCIS (Netherlands)

    S. Imangaliyev; B. Keijser; W. Crielaard; E. Tsivtsivadze

    2013-01-01

    As the amount of metagenomic data grows rapidly, online statistical learning algorithms are poised to play key role in metagenome analysis tasks. Frequently, data are only partially labeled, namely dataset contains partial information about the problem of interest. This work presents an algorithm an

  4. Online Semi-Supervised Learning: Algorithm and Application in Metagenomics

    NARCIS (Netherlands)

    Imangaliyev, S.; Keijser, B.J.F.; Crielaard, W.; Tsivtsivadze, E.

    2013-01-01

    As the amount of metagenomic data grows rapidly, online statistical learning algorithms are poised to play key rolein metagenome analysis tasks. Frequently, data are only partially labeled, namely dataset contains partial information about the problem of interest. This work presents an algorithm and

  5. Generating a Spanish Affective Dictionary with Supervised Learning Techniques

    Science.gov (United States)

    Bermudez-Gonzalez, Daniel; Miranda-Jiménez, Sabino; García-Moreno, Raúl-Ulises; Calderón-Nepamuceno, Dora

    2016-01-01

    Nowadays, machine learning techniques are being used in several Natural Language Processing (NLP) tasks such as Opinion Mining (OM). OM is used to analyse and determine the affective orientation of texts. Usually, OM approaches use affective dictionaries in order to conduct sentiment analysis. These lexicons are labeled manually with affective…

  6. Extended apprenticeship learning in doctoral training and supervision - moving beyond 'cookbook recipes'

    DEFF Research Database (Denmark)

    Tanggaard, Lene; Wegener, Charlotte

    An apprenticeship perspective on learning in academia sheds light on the potential for mutual learning and production, and also reveals the diverse range of learning resources beyond the formal novice-–expert relationship. Although apprenticeship is a well-known concept in educational research......, in this case apprenticeship offers an innovative perspective on future practice and research in academia allowing more students access to high high-quality research training and giving supervisors a chance to combine their own research with their supervision obligations....

  7. Machine Learning Algorithms in Web Page Classification

    Directory of Open Access Journals (Sweden)

    W.A.AWAD

    2012-11-01

    Full Text Available In this paper we use machine learning algorithms like SVM, KNN and GIS to perform a behaviorcomparison on the web pages classifications problem, from the experiment we see in the SVM with smallnumber of negative documents to build the centroids has the smallest storage requirement and the least online test computation cost. But almost all GIS with different number of nearest neighbors have an evenhigher storage requirement and on line test computation cost than KNN. This suggests that some futurework should be done to try to reduce the storage requirement and on list test cost of GIS.

  8. A Semi-supervised Heat Kernel Pagerank MBO Algorithm for Data Classification

    Science.gov (United States)

    2016-07-01

    closed-form expression for the class of each node is derived. Moreover, the authors of [50] describe a semi-supervised method for classifying data using...manifold smoothing and image denoising. In addition to image processing, methods in- volving spectral graph theory [17,56], based on a graphical setting...pagerank and Section 3 presents a model using heat kernel pagerank directly as a classifier . Section 4 formulates the new algorithm as well as provides

  9. Assessing Miniaturized Sensor Performance using Supervised Learning, with Application to Drug and Explosive Detection

    DEFF Research Database (Denmark)

    Alstrøm, Tommy Sonne

    of sensors, as the sensors are designed to provide robust and reliable measurements. That means, the sensors are designed to have repeated measurement clusters. Sensor fusion is presented for the sensor based on chemoselective compounds. An array of color changing compounds are handled and in unity they make......This Ph.D. thesis titled “Assessing Miniaturized Sensor Performance using Supervised Learning, with Application to Drug and Explosive Detection” is a part of the strategic research project “Miniaturized sensors for explosives detection in air” funded by the Danish Agency for Science and Technology...... before the sensor responses can be applied to supervised learning algorithms. The technologies used for sensing consist of Calorimetry, Cantilevers, Chemoselective compounds, Quartz Crystal Microbalance and Surface Enhanced Raman Scattering. Each of the sensors have their own strength and weaknesses...

  10. Supervised Machine Learning Methods Applied to Predict Ligand- Binding Affinity.

    Science.gov (United States)

    Heck, Gabriela S; Pintro, Val O; Pereira, Richard R; de Ávila, Mauricio B; Levin, Nayara M B; de Azevedo, Walter F

    2017-01-01

    Calculation of ligand-binding affinity is an open problem in computational medicinal chemistry. The ability to computationally predict affinities has a beneficial impact in the early stages of drug development, since it allows a mathematical model to assess protein-ligand interactions. Due to the availability of structural and binding information, machine learning methods have been applied to generate scoring functions with good predictive power. Our goal here is to review recent developments in the application of machine learning methods to predict ligand-binding affinity. We focus our review on the application of computational methods to predict binding affinity for protein targets. In addition, we also describe the major available databases for experimental binding constants and protein structures. Furthermore, we explain the most successful methods to evaluate the predictive power of scoring functions. Association of structural information with ligand-binding affinity makes it possible to generate scoring functions targeted to a specific biological system. Through regression analysis, this data can be used as a base to generate mathematical models to predict ligandbinding affinities, such as inhibition constant, dissociation constant and binding energy. Experimental biophysical techniques were able to determine the structures of over 120,000 macromolecules. Considering also the evolution of binding affinity information, we may say that we have a promising scenario for development of scoring functions, making use of machine learning techniques. Recent developments in this area indicate that building scoring functions targeted to the biological systems of interest shows superior predictive performance, when compared with other approaches. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  11. Computer-Vision-Assisted Palm Rehabilitation With Supervised Learning.

    Science.gov (United States)

    Vamsikrishna, K M; Dogra, Debi Prosad; Desarkar, Maunendra Sankar

    2016-05-01

    Physical rehabilitation supported by the computer-assisted-interface is gaining popularity among health-care fraternity. In this paper, we have proposed a computer-vision-assisted contactless methodology to facilitate palm and finger rehabilitation. Leap motion controller has been interfaced with a computing device to record parameters describing 3-D movements of the palm of a user undergoing rehabilitation. We have proposed an interface using Unity3D development platform. Our interface is capable of analyzing intermediate steps of rehabilitation without the help of an expert, and it can provide online feedback to the user. Isolated gestures are classified using linear discriminant analysis (DA) and support vector machines (SVM). Finally, a set of discrete hidden Markov models (HMM) have been used to classify gesture sequence performed during rehabilitation. Experimental validation using a large number of samples collected from healthy volunteers reveals that DA and SVM perform similarly while applied on isolated gesture recognition. We have compared the results of HMM-based sequence classification with CRF-based techniques. Our results confirm that both HMM and CRF perform quite similarly when tested on gesture sequences. The proposed system can be used for home-based palm or finger rehabilitation in the absence of experts.

  12. Efficient dynamic graph construction for inductive semi-supervised learning.

    Science.gov (United States)

    Dornaika, F; Dahbi, R; Bosaghzadeh, A; Ruichek, Y

    2017-10-01

    Most of graph construction techniques assume a transductive setting in which the whole data collection is available at construction time. Addressing graph construction for inductive setting, in which data are coming sequentially, has received much less attention. For inductive settings, constructing the graph from scratch can be very time consuming. This paper introduces a generic framework that is able to make any graph construction method incremental. This framework yields an efficient and dynamic graph construction method that adds new samples (labeled or unlabeled) to a previously constructed graph. As a case study, we use the recently proposed Two Phase Weighted Regularized Least Square (TPWRLS) graph construction method. The paper has two main contributions. First, we use the TPWRLS coding scheme to represent new sample(s) with respect to an existing database. The representative coefficients are then used to update the graph affinity matrix. The proposed method not only appends the new samples to the graph but also updates the whole graph structure by discovering which nodes are affected by the introduction of new samples and by updating their edge weights. The second contribution of the article is the application of the proposed framework to the problem of graph-based label propagation using multiple observations for vision-based recognition tasks. Experiments on several image databases show that, without any significant loss in the accuracy of the final classification, the proposed dynamic graph construction is more efficient than the batch graph construction. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Integrating learning assessment and supervision in a competency framework for clinical workplace education.

    Science.gov (United States)

    Embo, M; Driessen, E; Valcke, M; van der Vleuten, C P M

    2015-02-01

    Although competency-based education is well established in health care education, research shows that the competencies do not always match the reality of clinical workplaces. Therefore, there is a need to design feasible and evidence-based competency frameworks that fit the workplace reality. This theoretical paper outlines a competency-based framework, designed to facilitate learning, assessment and supervision in clinical workplace education. Integration is the cornerstone of this holistic competency framework.

  14. Clinical learning environment, supervision and nurse teacher evaluation scale: psychometric evaluation of the Swedish version.

    Science.gov (United States)

    Johansson, Unn-Britt; Kaila, Päivi; Ahlner-Elmqvist, Marianne; Leksell, Janeth; Isoaho, Hannu; Saarikoski, Mikko

    2010-09-01

    This article is a report of the development and psychometric testing of the Swedish version of the Clinical Learning Environment, Supervision and Nurse Teacher evaluation scale. To achieve quality assurance, collaboration between the healthcare and nursing systems is a pre-requisite. Therefore, it is important to develop a tool that can measure the quality of clinical education. The Clinical Learning Environment, Supervision and Nurse Teacher evaluation scale is a previously validated instrument, currently used in several universities across Europe. The instrument has been suggested for use as part of quality assessment and evaluation of nursing education. The scale was translated into Swedish from the English version. Data were collected between March 2008 and May 2009 among nursing students from three university colleges, with 324 students completing the questionnaire. Exploratory factor analysis was performed on the 34-item scale to determine construct validity and Cronbach's alpha was used to measure the internal consistency. The five sub-dimensions identified in the original scale were replicated in the exploratory factor analysis. The five factors had explanation percentages of 60.2%, which is deemed sufficient. Cronbach's alpha coefficient for the total scale was 0.95, and varied between 0.96 and 0.75 within the five sub-dimensions. The Swedish version of Clinical Learning Environment, Supervision and Nurse Teacher evaluation scale has satisfactory psychometric properties and could be a useful quality instrument in nursing education. However, further investigation is required to develop and evaluate the questionnaire.

  15. Evaluation of supervised machine-learning algorithms to distinguish between inflammatory bowel disease and alimentary lymphoma in cats.

    Science.gov (United States)

    Awaysheh, Abdullah; Wilcke, Jeffrey; Elvinger, François; Rees, Loren; Fan, Weiguo; Zimmerman, Kurt L

    2016-11-01

    Inflammatory bowel disease (IBD) and alimentary lymphoma (ALA) are common gastrointestinal diseases in cats. The very similar clinical signs and histopathologic features of these diseases make the distinction between them diagnostically challenging. We tested the use of supervised machine-learning algorithms to differentiate between the 2 diseases using data generated from noninvasive diagnostic tests. Three prediction models were developed using 3 machine-learning algorithms: naive Bayes, decision trees, and artificial neural networks. The models were trained and tested on data from complete blood count (CBC) and serum chemistry (SC) results for the following 3 groups of client-owned cats: normal, inflammatory bowel disease (IBD), or alimentary lymphoma (ALA). Naive Bayes and artificial neural networks achieved higher classification accuracy (sensitivities of 70.8% and 69.2%, respectively) than the decision tree algorithm (63%, p machine learning provided a method for distinguishing between ALA-IBD, ALA-normal, and IBD-normal. The naive Bayes and artificial neural networks classifiers used 10 and 4 of the CBC and SC variables, respectively, to outperform the C4.5 decision tree, which used 5 CBC and SC variables in classifying cats into the 3 classes. These models can provide another noninvasive diagnostic tool to assist clinicians with differentiating between IBD and ALA, and between diseased and nondiseased cats. © 2016 The Author(s).

  16. Clinical learning environment and supervision of international nursing students: A cross-sectional study.

    Science.gov (United States)

    Mikkonen, Kristina; Elo, Satu; Miettunen, Jouko; Saarikoski, Mikko; Kääriäinen, Maria

    2017-05-01

    Previously, it has been shown that the clinical learning environment causes challenges for international nursing students, but there is a lack of empirical evidence relating to the background factors explaining and influencing the outcomes. To describe international and national students' perceptions of their clinical learning environment and supervision, and explain the related background factors. An explorative cross-sectional design was used in a study conducted in eight universities of applied sciences in Finland during September 2015-May 2016. All nursing students studying English language degree programs were invited to answer a self-administered questionnaire based on both the clinical learning environment, supervision and nurse teacher scale and Cultural and Linguistic Diversity scale with additional background questions. Participants (n=329) included international (n=231) and Finnish (n=98) nursing students. Binary logistic regression was used to identify background factors relating to the clinical learning environment and supervision. International students at a beginner level in Finnish perceived the pedagogical atmosphere as worse than native speakers. In comparison to native speakers, these international students generally needed greater support from the nurse teacher at their university. Students at an intermediate level in Finnish reported two times fewer negative encounters in cultural diversity at their clinical placement than the beginners. To facilitate a successful learning experience, international nursing students require a sufficient level of competence in the native language when conducting clinical placements. Educational interventions in language education are required to test causal effects on students' success in the clinical learning environment. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Deep Extreme Learning Machine and Its Application in EEG Classification

    Directory of Open Access Journals (Sweden)

    Shifei Ding

    2015-01-01

    Full Text Available Recently, deep learning has aroused wide interest in machine learning fields. Deep learning is a multilayer perceptron artificial neural network algorithm. Deep learning has the advantage of approximating the complicated function and alleviating the optimization difficulty associated with deep models. Multilayer extreme learning machine (MLELM is a learning algorithm of an artificial neural network which takes advantages of deep learning and extreme learning machine. Not only does MLELM approximate the complicated function but it also does not need to iterate during the training process. We combining with MLELM and extreme learning machine with kernel (KELM put forward deep extreme learning machine (DELM and apply it to EEG classification in this paper. This paper focuses on the application of DELM in the classification of the visual feedback experiment, using MATLAB and the second brain-computer interface (BCI competition datasets. By simulating and analyzing the results of the experiments, effectiveness of the application of DELM in EEG classification is confirmed.

  18. Semi-Supervised Bayesian Classification of Materials with Impact-Echo Signals

    Directory of Open Access Journals (Sweden)

    Jorge Igual

    2015-05-01

    Full Text Available The detection and identification of internal defects in a material require the use of some technology that translates the hidden interior damages into observable signals with different signature-defect correspondences. We apply impact-echo techniques for this purpose. The materials are classified according to their defective status (homogeneous, one defect or multiple defects and kind of defect (hole or crack, passing through or not. Every specimen is impacted by a hammer, and the spectrum of the propagated wave is recorded. This spectrum is the input data to a Bayesian classifier that is based on the modeling of the conditional probabilities with a mixture of Gaussians. The parameters of the Gaussian mixtures and the class probabilities are estimated using an extended expectation-maximization algorithm. The advantage of our proposal is that it is flexible, since it obtains good results for a wide range of models even under little supervision; e.g., it obtains a harmonic average of precision and recall value of 92.38% given only a 10% supervision ratio. We test the method with real specimens made of aluminum alloy. The results show that the algorithm works very well. This technique could be applied in many industrial problems, such as the optimization of the marble cutting process.

  19. Semi-supervised Bayesian classification of materials with impact-echo signals.

    Science.gov (United States)

    Igual, Jorge; Salazar, Addisson; Safont, Gonzalo; Vergara, Luis

    2015-05-19

    The detection and identification of internal defects in a material require the use of some technology that translates the hidden interior damages into observable signals with different signature-defect correspondences. We apply impact-echo techniques for this purpose. The materials are classified according to their defective status (homogeneous, one defect or multiple defects) and kind of defect (hole or crack, passing through or not). Every specimen is impacted by a hammer, and the spectrum of the propagated wave is recorded. This spectrum is the input data to a Bayesian classifier that is based on the modeling of the conditional probabilities with a mixture of Gaussians. The parameters of the Gaussian mixtures and the class probabilities are estimated using an extended expectation-maximization algorithm. The advantage of our proposal is that it is flexible, since it obtains good results for a wide range of models even under little supervision; e.g., it obtains a harmonic average of precision and recall value of 92.38% given only a 10% supervision ratio. We test the method with real specimens made of aluminum alloy. The results show that the algorithm works very well. This technique could be applied in many industrial problems, such as the optimization of the marble cutting process.

  20. Automated cell analysis tool for a genome-wide RNAi screen with support vector machine based supervised learning

    Science.gov (United States)

    Remmele, Steffen; Ritzerfeld, Julia; Nickel, Walter; Hesser, Jürgen

    2011-03-01

    RNAi-based high-throughput microscopy screens have become an important tool in biological sciences in order to decrypt mostly unknown biological functions of human genes. However, manual analysis is impossible for such screens since the amount of image data sets can often be in the hundred thousands. Reliable automated tools are thus required to analyse the fluorescence microscopy image data sets usually containing two or more reaction channels. The herein presented image analysis tool is designed to analyse an RNAi screen investigating the intracellular trafficking and targeting of acylated Src kinases. In this specific screen, a data set consists of three reaction channels and the investigated cells can appear in different phenotypes. The main issue of the image processing task is an automatic cell segmentation which has to be robust and accurate for all different phenotypes and a successive phenotype classification. The cell segmentation is done in two steps by segmenting the cell nuclei first and then using a classifier-enhanced region growing on basis of the cell nuclei to segment the cells. The classification of the cells is realized by a support vector machine which has to be trained manually using supervised learning. Furthermore, the tool is brightness invariant allowing different staining quality and it provides a quality control that copes with typical defects during preparation and acquisition. A first version of the tool has already been successfully applied for an RNAi-screen containing three hundred thousand image data sets and the SVM extended version is designed for additional screens.

  1. Multiclass Semi-Supervised Learning on Graphs using Ginzburg-Landau Functional Minimization

    CERN Document Server

    Garcia-Cardona, Cristina; Percus, Allon G

    2013-01-01

    We present a graph-based variational algorithm for classification of high-dimensional data, generalizing the binary diffuse interface model to the case of multiple classes. Motivated by total variation techniques, the method involves minimizing an energy functional made up of three terms. The first two terms promote a stepwise continuous classification function with sharp transitions between classes, while preserving symmetry among the class labels. The third term is a data fidelity term, allowing us to incorporate prior information into the model in a semi-supervised framework. The performance of the algorithm on synthetic data, as well as on the COIL and MNIST benchmark datasets, is competitive with state-of-the-art graph-based multiclass segmentation methods.

  2. A convolutional learning system for object classification in 3-D Lidar data.

    Science.gov (United States)

    Prokhorov, Danil

    2010-05-01

    In this brief, a convolutional learning system for classification of segmented objects represented in 3-D as point clouds of laser reflections is proposed. Several novelties are discussed: (1) extension of the existing convolutional neural network (CNN) framework to direct processing of 3-D data in a multiview setting which may be helpful for rotation-invariant consideration, (2) improvement of CNN training effectiveness by employing a stochastic meta-descent (SMD) method, and (3) combination of unsupervised and supervised training for enhanced performance of CNN. CNN performance is illustrated on a two-class data set of objects in a segmented outdoor environment.

  3. Classification/Categorization Model of Instruction for Learning Disabled Students.

    Science.gov (United States)

    Freund, Lisa A.

    1987-01-01

    Learning-disabled students deficient in classification and categorization require specific instruction in these skills. Use of a classification/categorization instructional model improved the questioning strategies of 60 learning-disabled students, aged 10 to 12. The use of similar models is discussed as a basis for instruction in science, social…

  4. Unsupervised feature learning for autonomous rock image classification

    Science.gov (United States)

    Shu, Lei; McIsaac, Kenneth; Osinski, Gordon R.; Francis, Raymond

    2017-09-01

    Autonomous rock image classification can enhance the capability of robots for geological detection and enlarge the scientific returns, both in investigation on Earth and planetary surface exploration on Mars. Since rock textural images are usually inhomogeneous and manually hand-crafting features is not always reliable, we propose an unsupervised feature learning method to autonomously learn the feature representation for rock images. In our tests, rock image classification using the learned features shows that the learned features can outperform manually selected features. Self-taught learning is also proposed to learn the feature representation from a large database of unlabelled rock images of mixed class. The learned features can then be used repeatedly for classification of any subclass. This takes advantage of the large dataset of unlabelled rock images and learns a general feature representation for many kinds of rocks. We show experimental results supporting the feasibility of self-taught learning on rock images.

  5. Sleep Stage Classification Using Unsupervised Feature Learning

    Directory of Open Access Journals (Sweden)

    Martin Längkvist

    2012-01-01

    Full Text Available Most attempts at training computers for the difficult and time-consuming task of sleep stage classification involve a feature extraction step. Due to the complexity of multimodal sleep data, the size of the feature space can grow to the extent that it is also necessary to include a feature selection step. In this paper, we propose the use of an unsupervised feature learning architecture called deep belief nets (DBNs and show how to apply it to sleep data in order to eliminate the use of handmade features. Using a postprocessing step of hidden Markov model (HMM to accurately capture sleep stage switching, we compare our results to a feature-based approach. A study of anomaly detection with the application to home environment data collection is also presented. The results using raw data with a deep architecture, such as the DBN, were comparable to a feature-based approach when validated on clinical datasets.

  6. Backpropagation Learning Algorithms for Email Classification.

    Directory of Open Access Journals (Sweden)

    *David Ndumiyana and Tarirayi Mukabeta

    2016-07-01

    Full Text Available Today email has become one the fastest and most effective form of communication. The popularity of this mode of transmitting goods, information and services has motivated spammers to perfect their technical skills to fool spam filters. This development has worsened the problems faced by Internet users as they have to deal with email congestion, email overload and unprioritised email messages. The result was an exponential increase in the number of email classification management tools for the past few decades. In this paper we propose a new spam classifier using a learning process of multilayer neural network to implement back propagation technique. Our contribution to the body of knowledge is the use of an improved empirical analysis to choose an optimum, novel collection of attributes of a user’s email contents that allows a quick detection of most important words in emails. We also demonstrate the effectiveness of two equal sets of emails training and testing data.

  7. Detection and Evaluation of Cheating on College Exams Using Supervised Classification

    Science.gov (United States)

    Cavalcanti, Elmano Ramalho; Pires, Carlos Eduardo; Cavalcanti, Elmano Pontes; Pires, Vládia Freire

    2012-01-01

    Text mining has been used for various purposes, such as document classification and extraction of domain-specific information from text. In this paper we present a study in which text mining methodology and algorithms were properly employed for academic dishonesty (cheating) detection and evaluation on open-ended college exams, based on document…

  8. A comparison of supervised, unsupervised and synthetic land use classification methods in the north of Iran

    NARCIS (Netherlands)

    Mohammady, M.; Moradi, H.R.; Zeinivand, H.; Temme, A.J.A.M.

    2015-01-01

    Land use classification is often the first step in land use studies and thus forms the basis for many earth science studies. In this paper, we focus on low-cost techniques for combining Landsat images with geographic information system approaches to create a land use map. In the Golestan region of I

  9. Determination of Land Cover/land Use Using SPOT 7 Data with Supervised Classification Methods

    Science.gov (United States)

    Bektas Balcik, F.; Karakacan Kuzucu, A.

    2016-10-01

    Land use/ land cover (LULC) classification is a key research field in remote sensing. With recent developments of high-spatial-resolution sensors, Earth-observation technology offers a viable solution for land use/land cover identification and management in the rural part of the cities. There is a strong need to produce accurate, reliable, and up-to-date land use/land cover maps for sustainable monitoring and management. In this study, SPOT 7 imagery was used to test the potential of the data for land cover/land use mapping. Catalca is selected region located in the north west of the Istanbul in Turkey, which is mostly covered with agricultural fields and forest lands. The potentials of two classification algorithms maximum likelihood, and support vector machine, were tested, and accuracy assessment of the land cover maps was performed through error matrix and Kappa statistics. The results indicated that both of the selected classifiers were highly useful (over 83% accuracy) in the mapping of land use/cover in the study region. The support vector machine classification approach slightly outperformed the maximum likelihood classification in both overall accuracy and Kappa statistics.

  10. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    Science.gov (United States)

    Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Benien, Parul; Ozcan, Aydogan

    2017-06-01

    Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of 0.8 cm2 and weighs only 180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging) approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond water) and achieved a

  11. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    Directory of Open Access Journals (Sweden)

    Ceylan Koydemir Hatice

    2017-06-01

    Full Text Available Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of ~0.8 cm2 and weighs only ~180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond

  12. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    KAUST Repository

    Ceylan Koydemir, Hatice

    2017-06-14

    Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of ~0.8 cm2 and weighs only ~180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging) approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond water) and achieved

  13. Semi-supervised classification of remote sensing image based on probabilistic topic model%利用概率主题模型的遥感影像半监督分类

    Institute of Scientific and Technical Information of China (English)

    易文斌; 冒亚明; 慎利

    2013-01-01

    Land cover is the center of the interaction of the natural environment and human activities and the acquisition of land cover information are obtained through the classification of remote sensing images, so the image classification is one of the most basic issues of remote sensing image analysis. Based on the image clustering analysis of high-resolution remote sensing image through the probabilistic topic model, the generated model which is a typical method in the semi-supervised learning is analyzed and a classification method based on probabilistic topic model and semi-supervised learning(SS-LDA)is formed in the paper. The process of SS-LDA model used in the text recognition applications is relearned and a basic image classification process of high-resolution remote sensing image is constructed. Comparing to traditional unsupervised classification and supervised classi-fication algorithm, the SS-LDA algorithm will get more accuracy of image classification results through experiments.%  土地覆盖是自然环境与人类活动相互作用的中心,而土地覆盖信息主要是通过遥感影像分类来获取,因此影像分类是遥感影像分析的最基本问题之一。在参考基于概率主题模型的高分辨率遥感影像聚类分析的基础上,通过半监督学习最典型的生成模型方法引出了基于概率主题模型的半监督分类(SS-LDA)算法。借鉴SS-LDA模型在文本识别应用的流程,构建了基于SS-LDA算法的高分辨率遥感影像分类的基本流程。通过实验证明,相对于传统的非监督分类与监督分类算法,SS-LDA算法能够获取较高精度的影像分类结果。

  14. Algorithm of Supervised Learning on Outlier Manifold%有监督的噪音流形学习算法

    Institute of Scientific and Technical Information of China (English)

    黄添强; 李凯; 郑之

    2011-01-01

    流形学习算法是维度约简与数据可视化领域的重要工具,提高算法的效率与健壮性对其实际应用有积极意义.经典的流形学习算法普遍的对噪音点较为敏感,现有的改进算法尚存在不足.本文提出一种基于监督学习与核函数的健壮流形学习算法,把核方法与监督学习引入降维过程,利用已知标签数据信息与核函数特性,使得同类样本变得紧密,不同类样本变成分散,提高后续分类任务的效果,降低算法对流形上噪音的敏感性.在UCI数据与白血病拉曼光谱数据上的实验表明本文改进的算法具有更高的抗噪性.%Manifold learning algorithm is an important tool in the field of dimension reduction and data visualization. Improving the algorithm's efficiency and robustness is of positive significance to its practical application. Classical manifold learning algorithm is sensitive to noise points,and its improved algorithms have been imperfect. This paper presents a robust manifold learning algorithm based on supervised learning and kernel function. It introduces nuclear methods and supervised learning into the dimensionality reduction ,and takes full advantage of the label of some data and the property of kernel function. The proposed algorithm can make close and same types of samples and distribute different types of samples,thus to improves the effect of the classification task and reduce the noise sensitivity of outliers on manifold. The experiments on the UCI data and Raman data of leukemia reveal that the algorithm has better noise immunity.

  15. New supervised learning theory applied to cerebellar modeling for suppression of variability of saccade end points.

    Science.gov (United States)

    Fujita, Masahiko

    2013-06-01

    A new supervised learning theory is proposed for a hierarchical neural network with a single hidden layer of threshold units, which can approximate any continuous transformation, and applied to a cerebellar function to suppress the end-point variability of saccades. In motor systems, feedback control can reduce noise effects if the noise is added in a pathway from a motor center to a peripheral effector; however, it cannot reduce noise effects if the noise is generated in the motor center itself: a new control scheme is necessary for such noise. The cerebellar cortex is well known as a supervised learning system, and a novel theory of cerebellar cortical function developed in this study can explain the capability of the cerebellum to feedforwardly reduce noise effects, such as end-point variability of saccades. This theory assumes that a Golgi-granule cell system can encode the strength of a mossy fiber input as the state of neuronal activity of parallel fibers. By combining these parallel fiber signals with appropriate connection weights to produce a Purkinje cell output, an arbitrary continuous input-output relationship can be obtained. By incorporating such flexible computation and learning ability in a process of saccadic gain adaptation, a new control scheme in which the cerebellar cortex feedforwardly suppresses the end-point variability when it detects a variation in saccadic commands can be devised. Computer simulation confirmed the efficiency of such learning and showed a reduction in the variability of saccadic end points, similar to results obtained from experimental data.

  16. Supervised Transfer Sparse Coding

    KAUST Repository

    Al-Shedivat, Maruan

    2014-07-27

    A combination of the sparse coding and transfer learn- ing techniques was shown to be accurate and robust in classification tasks where training and testing objects have a shared feature space but are sampled from differ- ent underlying distributions, i.e., belong to different do- mains. The key assumption in such case is that in spite of the domain disparity, samples from different domains share some common hidden factors. Previous methods often assumed that all the objects in the target domain are unlabeled, and thus the training set solely comprised objects from the source domain. However, in real world applications, the target domain often has some labeled objects, or one can always manually label a small num- ber of them. In this paper, we explore such possibil- ity and show how a small number of labeled data in the target domain can significantly leverage classifica- tion accuracy of the state-of-the-art transfer sparse cod- ing methods. We further propose a unified framework named supervised transfer sparse coding (STSC) which simultaneously optimizes sparse representation, domain transfer and classification. Experimental results on three applications demonstrate that a little manual labeling and then learning the model in a supervised fashion can significantly improve classification accuracy.

  17. Semi-automatic supervised classification of minerals from x-ray mapping images

    DEFF Research Database (Denmark)

    Nielsen, Allan Aasbjerg; Flesche, Harald; Larsen, Rasmus

    1998-01-01

    spectroscopy (EDS) in a scanning electron microscope (SEM). Extensions to traditional multivariate statistical methods are applied to perform the classification. Training sets are grown from one or a few seed points by a method that ensures spatial and spectral closeness of observations. Spectral closeness...... to a small area in order to allow for the estimation of a variance-covariance matrix. This expansion is controlled by upper limits for the spatial and Euclidean spectral distances from the seed point. Second, after this initial expansion the growing of the training set is controlled by an upper limit...... training, a standard quadratic classifier is applied. The performance for each parameter setting is measured by the overall misclassification rate on an independently generated validation set. The classification method is presently used as a routine petrographical analysis method at Norsk Hydro Research...

  18. Performance of machine learning methods for classification tasks

    OpenAIRE

    B. Krithika; Dr. V. Ramalingam; Rajan, K

    2013-01-01

    In this paper, the performance of various machine learning methods on pattern classification and recognition tasks are proposed. The proposed method for evaluating performance will be based on the feature representation, feature selection and setting model parameters. The nature of the data, the methods of feature extraction and feature representation are discussed. The results of the Machine Learning algorithms on the classification task are analysed. The performance of Machine Learning meth...

  19. Restructuring supervision and reconfiguration of skill mix in community pharmacy: Classification of perceived safety and risk.

    Science.gov (United States)

    Bradley, Fay; Willis, Sarah C; Noyce, Peter R; Schafheutle, Ellen I

    2016-01-01

    Broadening the range of services provided through community pharmacy increases workloads for pharmacists that could be alleviated by reconfiguring roles within the pharmacy team. To examine pharmacists' and pharmacy technicians (PTs)' perceptions of how safe it would be for support staff to undertake a range of pharmacy activities during a pharmacist's absence. Views on supervision, support staff roles, competency and responsibility were also sought. Informed by nominal group discussions, a questionnaire was developed and distributed to a random sample of 1500 pharmacists and 1500 PTs registered in England. Whilst focused on community pharmacy practice, hospital pharmacy respondents were included, as more advanced skill mix models may provide valuable insights. Respondents were asked to rank a list of 22 pharmacy activities in terms of perceived risk and safety of these activities being performed by support staff during a pharmacist's absence. Descriptive and comparative statistic analyses were conducted. Six-hundred-and-forty-two pharmacists (43.2%) and 854 PTs (57.3%) responded; the majority worked in community pharmacy. Dependent on agreement levels with perceived safety, from community pharmacists and PTs, and hospital pharmacists and PTs, the 22 activities were grouped into 'safe' (n = 7), 'borderline' (n = 9) and 'unsafe' (n = 6). Activities such as assembly and labeling were considered 'safe,' clinical activities were considered 'unsafe.' There were clear differences between pharmacists and PTs, and sectors (community pharmacy vs. hospital). Community pharmacists were most cautious (particularly mobile and portfolio pharmacists) about which activities they felt support staff could safely perform; PTs in both sectors felt significantly more confident performing particularly technical activities than pharmacists. This paper presents novel empirical evidence informing the categorization of pharmacy activities into 'safe,' 'borderline' or 'unsafe

  20. Machine learning algorithms for mode-of-action classification in toxicity assessment.

    Science.gov (United States)

    Zhang, Yile; Wong, Yau Shu; Deng, Jian; Anton, Cristina; Gabos, Stephan; Zhang, Weiping; Huang, Dorothy Yu; Jin, Can

    2016-01-01

    Real Time Cell Analysis (RTCA) technology is used to monitor cellular changes continuously over the entire exposure period. Combining with different testing concentrations, the profiles have potential in probing the mode of action (MOA) of the testing substances. In this paper, we present machine learning approaches for MOA assessment. Computational tools based on artificial neural network (ANN) and support vector machine (SVM) are developed to analyze the time-concentration response curves (TCRCs) of human cell lines responding to tested chemicals. The techniques are capable of learning data from given TCRCs with known MOA information and then making MOA classification for the unknown toxicity. A novel data processing step based on wavelet transform is introduced to extract important features from the original TCRC data. From the dose response curves, time interval leading to higher classification success rate can be selected as input to enhance the performance of the machine learning algorithm. This is particularly helpful when handling cases with limited and imbalanced data. The validation of the proposed method is demonstrated by the supervised learning algorithm applied to the exposure data of HepG2 cell line to 63 chemicals with 11 concentrations in each test case. Classification success rate in the range of 85 to 95 % are obtained using SVM for MOA classification with two clusters to cases up to four clusters. Wavelet transform is capable of capturing important features of TCRCs for MOA classification. The proposed SVM scheme incorporated with wavelet transform has a great potential for large scale MOA classification and high-through output chemical screening.

  1. DL-ReSuMe: A Delay Learning-Based Remote Supervised Method for Spiking Neurons.

    Science.gov (United States)

    Taherkhani, Aboozar; Belatreche, Ammar; Li, Yuhua; Maguire, Liam P

    2015-12-01

    Recent research has shown the potential capability of spiking neural networks (SNNs) to model complex information processing in the brain. There is biological evidence to prove the use of the precise timing of spikes for information coding. However, the exact learning mechanism in which the neuron is trained to fire at precise times remains an open problem. The majority of the existing learning methods for SNNs are based on weight adjustment. However, there is also biological evidence that the synaptic delay is not constant. In this paper, a learning method for spiking neurons, called delay learning remote supervised method (DL-ReSuMe), is proposed to merge the delay shift approach and ReSuMe-based weight adjustment to enhance the learning performance. DL-ReSuMe uses more biologically plausible properties, such as delay learning, and needs less weight adjustment than ReSuMe. Simulation results have shown that the proposed DL-ReSuMe approach achieves learning accuracy and learning speed improvements compared with ReSuMe.

  2. Virtual Calibration of Cosmic Ray Sensor: Using Supervised Ensemble Machine Learning

    Directory of Open Access Journals (Sweden)

    Ritaban Dutta

    2013-09-01

    Full Text Available In this paper an ensemble of supervised machine learning methods has been investigated to virtually and dynamically calibrate the cosmic ray sensors measuring area wise bulk soil moisture. Main focus of this study was to find an alternative to the currently available field calibration method; based on expensive and time consuming soil sample collection methodology. Data from the Australian Water Availability Project (AWAP database was used as independent soil moisture ground truth and results were compared against the conventionally estimated soil moisture using a Hydroinnova CRS-1000 cosmic ray probe deployed in Tullochgorum, Australia. Prediction performance of a complementary ensemble of four supervised estimators, namely Sugano type Adaptive Neuro-Fuzzy Inference System (S-ANFIS, Cascade Forward Neural Network (CFNN, Elman Neural Network (ENN and Learning Vector Quantization Neural Network (LVQN was evaluated using training and testing paradigms. An AWAP trained ensemble of four estimators was able to predict bulk soil moisture directly from cosmic ray neutron counts with 94.4% as best accuracy. The ensemble approach outperformed the individual performances from these networks. This result proved that an ensemble machine learning based paradigm could be a valuable alternative data driven calibration method for cosmic ray sensors against the current expensive and hydrological assumption based field calibration method.

  3. Test-retest reliability of the Clinical Learning Environment, Supervision and Nurse Teacher (CLES + T) scale.

    Science.gov (United States)

    Gustafsson, Margareta; Blomberg, Karin; Holmefur, Marie

    2015-07-01

    The Clinical Learning Environment, Supervision and Nurse Teacher (CLES + T) scale evaluates the student nurses' perception of the learning environment and supervision within the clinical placement. It has never been tested in a replication study. The aim of the present study was to evaluate the test-retest reliability of the CLES + T scale. The CLES + T scale was administered twice to a group of 42 student nurses, with a one-week interval. Test-retest reliability was determined by calculations of Intraclass Correlation Coefficients (ICCs) and weighted Kappa coefficients. Standard Error of Measurements (SEM) and Smallest Detectable Difference (SDD) determined the precision of individual scores. Bland-Altman plots were created for analyses of systematic differences between the test occasions. The results of the study showed that the stability over time was good to excellent (ICC 0.88-0.96) in the sub-dimensions "Supervisory relationship", "Pedagogical atmosphere on the ward" and "Role of the nurse teacher". Measurements of "Premises of nursing on the ward" and "Leadership style of the manager" had lower but still acceptable stability (ICC 0.70-0.75). No systematic differences occurred between the test occasions. This study supports the usefulness of the CLES + T scale as a reliable measure of the student nurses' perception of the learning environment within the clinical placement at a hospital.

  4. Active Learning of Classification Models with Likert-Scale Feedback.

    Science.gov (United States)

    Xue, Yanbing; Hauskrecht, Milos

    2017-01-01

    Annotation of classification data by humans can be a time-consuming and tedious process. Finding ways of reducing the annotation effort is critical for building the classification models in practice and for applying them to a variety of classification tasks. In this paper, we develop a new active learning framework that combines two strategies to reduce the annotation effort. First, it relies on label uncertainty information obtained from the human in terms of the Likert-scale feedback. Second, it uses active learning to annotate examples with the greatest expected change. We propose a Bayesian approach to calculate the expectation and an incremental SVM solver to reduce the time complexity of the solvers. We show the combination of our active learning strategy and the Likert-scale feedback can learn classification models more rapidly and with a smaller number of labeled instances than methods that rely on either Likert-scale labels or active learning alone.

  5. The Use of Meaningful Reception Learning in Lesson on Classification

    OpenAIRE

    2013-01-01

    This paper begins with a learning theory of instruction. It describes how Meaningful Reception Learning can be used to teach in classification of items. Meaningful Reception Learning is a learning theory of instruction proposed by Ausubel who believed that learners can learn best when the new material being taught can be anchored into existing cognitive information in the learners. He also proposed the use of advance organizers as representations of the facts of the lesson. ...

  6. Towards harmonized seismic analysis across Europe using supervised machine learning approaches

    Science.gov (United States)

    Zaccarelli, Riccardo; Bindi, Dino; Cotton, Fabrice; Strollo, Angelo

    2017-04-01

    In the framework of the Thematic Core Services for Seismology of EPOS-IP (European Plate Observing System-Implementation Phase), a service for disseminating a regionalized logic-tree of ground motions models for Europe is under development. While for the Mediterranean area the large availability of strong motion data qualified and disseminated through the Engineering Strong Motion database (ESM-EPOS), supports the development of both selection criteria and ground motion models, for the low-to-moderate seismic regions of continental Europe the development of ad-hoc models using weak motion recordings of moderate earthquakes is unavoidable. Aim of this work is to present a platform for creating application-oriented earthquake databases by retrieving information from EIDA (European Integrated Data Archive) and applying supervised learning models for earthquake records selection and processing suitable for any specific application of interest. Supervised learning models, i.e. the task of inferring a function from labelled training data, have been extensively used in several fields such as spam detection, speech and image recognition and in general pattern recognition. Their suitability to detect anomalies and perform a semi- to fully- automated filtering on large waveform data set easing the effort of (or replacing) human expertise is therefore straightforward. Being supervised learning algorithms capable of learning from a relatively small training set to predict and categorize unseen data, its advantage when processing large amount of data is crucial. Moreover, their intrinsic ability to make data driven predictions makes them suitable (and preferable) in those cases where explicit algorithms for detection might be unfeasible or too heuristic. In this study, we consider relatively simple statistical classifiers (e.g., Naive Bayes, Logistic Regression, Random Forest, SVMs) where label are assigned to waveform data based on "recognized classes" needed for our use case

  7. Automated supervised classification of variable stars II. Application to the OGLE database

    CERN Document Server

    Sarro, L M; López, M; Aerts, C

    2008-01-01

    We aim to extend and test the classifiers presented in a previous work against an independent dataset. We complement the assessment of the validity of the classifiers by applying them to the set of OGLE light curves treated as variable objects of unknown class. The results are compared to published classification results based on the so-called extractor methods.Two complementary analyses are carried out in parallel. In both cases, the original time series of OGLE observations of the Galactic bulge and Magellanic Clouds are processed in order to identify and characterize the frequency components. In the first approach, the classifiers are applied to the data and the results analyzed in terms of systematic errors and differences between the definition samples in the training set and in the extractor rules. In the second approach, the original classifiers are extended with colour information and, again, applied to OGLE light curves. We have constructed a classification system that can process huge amounts of tim...

  8. Automated Classification and Correlation of Drill Cores using High-Resolution Hyperspectral Images and Supervised Pattern Classification Algorithms. Applications to Paleoseismology

    Science.gov (United States)

    Ragona, D. E.; Minster, B.; Rockwell, T.; Jasso, H.

    2006-12-01

    The standard methodology to describe, classify and correlate geologic materials in the field or lab rely on physical inspection of samples, sometimes with the assistance of conventional analytical techniques (e. g. XRD, microscopy, particle size analysis). This is commonly both time-consuming and inherently subjective. Many geological materials share identical visible properties (e.g. fine grained materials, alteration minerals) and therefore cannot be mapped using the human eye alone. Recent investigations have shown that ground- based hyperspectral imaging provides an effective method to study and digitally store stratigraphic and structural data from cores or field exposures. Neural networks and Naive Bayesian classifiers supply a variety of well-established techniques towards pattern recognition, especially for data examples with high- dimensionality input-outputs. In this poster, we present a new methodology for automatic mapping of sedimentary stratigraphy in the lab (drill cores, samples) or the field (outcrops, exposures) using short wave infrared (SWIR) hyperspectral images and these two supervised classification algorithms. High-spatial/spectral resolution data from large sediment samples (drill cores) from a paleoseismic excavation site were collected using a portable hyperspectral scanner with 245 continuous channels measured across the 960 to 2404 nm spectral range. The data were corrected for geometric and radiometric distortions and pre-processed to obtain reflectance at each pixel of the images. We built an example set using hundreds of reflectance spectra collected from the sediment core images. The examples were grouped into eight classes corresponding to materials found in the samples. We constructed two additional example sets by computing the 2-norm normalization, the derivative of the smoothed original reflectance examples. Each example set was divided into four subsets: training, training test, verification and validation. A multi

  9. Multiple Structure-View Learning for Graph Classification.

    Science.gov (United States)

    Wu, Jia; Pan, Shirui; Zhu, Xingquan; Zhang, Chengqi; Yu, Philip S

    2017-09-20

    Many applications involve objects containing structure and rich content information, each describing different feature aspects of the object. Graph learning and classification is a common tool for handling such objects. To date, existing graph classification has been limited to the single-graph setting with each object being represented as one graph from a single structure-view. This inherently limits its use to the classification of complicated objects containing complex structures and uncertain labels. In this paper, we advance graph classification to handle multigraph learning for complicated objects from multiple structure views, where each object is represented as a bag containing several graphs and the label is only available for each graph bag but not individual graphs inside the bag. To learn such graph classification models, we propose a multistructure-view bag constrained learning (MSVBL) algorithm, which aims to explore substructure features across multiple structure views for learning. By enabling joint regularization across multiple structure views and enforcing labeling constraints at the bag and graph levels, MSVBL is able to discover the most effective substructure features across all structure views. Experiments and comparisons on real-world data sets validate and demonstrate the superior performance of MSVBL in representing complicated objects as multigraph for classification, e.g., MSVBL outperforms the state-of-the-art multiview graph classification and multiview multi-instance learning approaches.

  10. Feasibility of Active Machine Learning for Multiclass Compound Classification.

    Science.gov (United States)

    Lang, Tobias; Flachsenberg, Florian; von Luxburg, Ulrike; Rarey, Matthias

    2016-01-25

    A common task in the hit-to-lead process is classifying sets of compounds into multiple, usually structural classes, which build the groundwork for subsequent SAR studies. Machine learning techniques can be used to automate this process by learning classification models from training compounds of each class. Gathering class information for compounds can be cost-intensive as the required data needs to be provided by human experts or experiments. This paper studies whether active machine learning can be used to reduce the required number of training compounds. Active learning is a machine learning method which processes class label data in an iterative fashion. It has gained much attention in a broad range of application areas. In this paper, an active learning method for multiclass compound classification is proposed. This method selects informative training compounds so as to optimally support the learning progress. The combination with human feedback leads to a semiautomated interactive multiclass classification procedure. This method was investigated empirically on 15 compound classification tasks containing 86-2870 compounds in 3-38 classes. The empirical results show that active learning can solve these classification tasks using 10-80% of the data which would be necessary for standard learning techniques.

  11. Improving supervised classification accuracy using non-rigid multimodal image registration: detecting prostate cancer

    Science.gov (United States)

    Chappelow, Jonathan; Viswanath, Satish; Monaco, James; Rosen, Mark; Tomaszewski, John; Feldman, Michael; Madabhushi, Anant

    2008-03-01

    Computer-aided diagnosis (CAD) systems for the detection of cancer in medical images require precise labeling of training data. For magnetic resonance (MR) imaging (MRI) of the prostate, training labels define the spatial extent of prostate cancer (CaP); the most common source for these labels is expert segmentations. When ancillary data such as whole mount histology (WMH) sections, which provide the gold standard for cancer ground truth, are available, the manual labeling of CaP can be improved by referencing WMH. However, manual segmentation is error prone, time consuming and not reproducible. Therefore, we present the use of multimodal image registration to automatically and accurately transcribe CaP from histology onto MRI following alignment of the two modalities, in order to improve the quality of training data and hence classifier performance. We quantitatively demonstrate the superiority of this registration-based methodology by comparing its results to the manual CaP annotation of expert radiologists. Five supervised CAD classifiers were trained using the labels for CaP extent on MRI obtained by the expert and 4 different registration techniques. Two of the registration methods were affi;ne schemes; one based on maximization of mutual information (MI) and the other method that we previously developed, Combined Feature Ensemble Mutual Information (COFEMI), which incorporates high-order statistical features for robust multimodal registration. Two non-rigid schemes were obtained by succeeding the two affine registration methods with an elastic deformation step using thin-plate splines (TPS). In the absence of definitive ground truth for CaP extent on MRI, classifier accuracy was evaluated against 7 ground truth surrogates obtained by different combinations of the expert and registration segmentations. For 26 multimodal MRI-WMH image pairs, all four registration methods produced a higher area under the receiver operating characteristic curve compared to that

  12. On Internet Traffic Classification: A Two-Phased Machine Learning Approach

    Directory of Open Access Journals (Sweden)

    Taimur Bakhshi

    2016-01-01

    Full Text Available Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived per application through k-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected to k-means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.

  13. Gene selection and classification for cancer microarray data based on machine learning and similarity measures

    Directory of Open Access Journals (Sweden)

    Liu Qingzhong

    2011-12-01

    Full Text Available Abstract Background Microarray data have a high dimension of variables and a small sample size. In microarray data analyses, two important issues are how to choose genes, which provide reliable and good prediction for disease status, and how to determine the final gene set that is best for classification. Associations among genetic markers mean one can exploit information redundancy to potentially reduce classification cost in terms of time and money. Results To deal with redundant information and improve classification, we propose a gene selection method, Recursive Feature Addition, which combines supervised learning and statistical similarity measures. To determine the final optimal gene set for prediction and classification, we propose an algorithm, Lagging Prediction Peephole Optimization. By using six benchmark microarray gene expression data sets, we compared Recursive Feature Addition with recently developed gene selection methods: Support Vector Machine Recursive Feature Elimination, Leave-One-Out Calculation Sequential Forward Selection and several others. Conclusions On average, with the use of popular learning machines including Nearest Mean Scaled Classifier, Support Vector Machine, Naive Bayes Classifier and Random Forest, Recursive Feature Addition outperformed other methods. Our studies also showed that Lagging Prediction Peephole Optimization is superior to random strategy; Recursive Feature Addition with Lagging Prediction Peephole Optimization obtained better testing accuracies than the gene selection method varSelRF.

  14. Manifold learning based feature extraction for classification of hyperspectral data

    CSIR Research Space (South Africa)

    Lunga, D

    2014-01-01

    Full Text Available Interest in manifold learning for representing the topology of large, high dimensional nonlinear data sets in lower, but still meaningful dimensions for visualization and classification has grown rapidly over the past decade, and particularly...

  15. Deep Learning in Label-free Cell Classification

    National Research Council Canada - National Science Library

    Chen, Claire Lifan; Mahjoubfar, Ata; Tai, Li-Chia; Blaby, Ian K; Huang, Allen; Niazi, Kayvan Reza; Jalali, Bahram

    2016-01-01

    .... Here, we integrate feature extraction and deep learning with high-throughput quantitative imaging enabled by photonic time stretch, achieving record high accuracy in label-free cell classification...

  16. A Protein Classification Benchmark collection for machine learning

    NARCIS (Netherlands)

    Sonego, P.; Pacurar, M.; Dhir, S.; Kertész-Farkas, A.; Kocsor, A.; Gáspári, Z.; Leunissen, J.A.M.; Pongor, S.

    2007-01-01

    Protein classification by machine learning algorithms is now widely used in structural and functional annotation of proteins. The Protein Classification Benchmark collection (http://hydra.icgeb.trieste.it/benchmark) was created in order to provide standard datasets on which the performance of machin

  17. Documentation Report, Self-Paced Physics, Classification of Learning Objectives.

    Science.gov (United States)

    Naval Academy, Annapolis, MD.

    The purpose of this study was to develop a taxonomy which would categorize high level physics problem-solving behaviors, and to examine the usefulness of such a classification system. This classification of learning objectives is based on complexity, a nonarbitrary measure which does not rely upon comparison between students but rather is based on…

  18. Strategies to Increase Accuracy in Text Classification

    NARCIS (Netherlands)

    D. Blommesteijn (Dennis)

    2014-01-01

    htmlabstractText classification via supervised learning involves various steps from processing raw data, features extraction to training and validating classifiers. Within these steps implementation decisions are critical to the resulting classifier accuracy. This paper contains a report of the

  19. Strategies to Increase Accuracy in Text Classification

    NARCIS (Netherlands)

    Blommesteijn, D.

    2014-01-01

    Text classification via supervised learning involves various steps from processing raw data, features extraction to training and validating classifiers. Within these steps implementation decisions are critical to the resulting classifier accuracy. This paper contains a report of the study performed

  20. A supervised machine learning estimator for the non-linear matter power spectrum - SEMPS

    CERN Document Server

    Mohammed, Irshad

    2015-01-01

    In this article, we argue that models based on machine learning (ML) can be very effective in estimating the non-linear matter power spectrum ($P(k)$). We employ the prediction ability of the supervised ML algorithms to build an estimator for the $P(k)$. The estimator is trained on a set of cosmological models, and redshifts for which the $P(k)$ is known, and it learns to predict $P(k)$ for any other set. We review three ML algorithms -- Random Forest, Gradient Boosting Machines, and K-Nearest Neighbours -- and investigate their prime parameters to optimize the prediction accuracy of the estimator. We also compute an optimal size of the training set, which is realistic enough, and still yields high accuracy. We find that, employing the optimal values of the internal parameters, a set of $50-100$ cosmological models is enough to train the estimator that can predict the $P(k)$ for a wide range of cosmological models, and redshifts. Using this configuration, we build a blackbox -- Supervised Estimator for Matter...

  1. AcceleRater: a web application for supervised learning of behavioral modes from acceleration measurements.

    Science.gov (United States)

    Resheff, Yehezkel S; Rotics, Shay; Harel, Roi; Spiegel, Orr; Nathan, Ran

    2014-01-01

    The study of animal movement is experiencing rapid progress in recent years, forcefully driven by technological advancement. Biologgers with Acceleration (ACC) recordings are becoming increasingly popular in the fields of animal behavior and movement ecology, for estimating energy expenditure and identifying behavior, with prospects for other potential uses as well. Supervised learning of behavioral modes from acceleration data has shown promising results in many species, and for a diverse range of behaviors. However, broad implementation of this technique in movement ecology research has been limited due to technical difficulties and complicated analysis, deterring many practitioners from applying this approach. This highlights the need to develop a broadly applicable tool for classifying behavior from acceleration data. Here we present a free-access python-based web application called AcceleRater, for rapidly training, visualizing and using models for supervised learning of behavioral modes from ACC measurements. We introduce AcceleRater, and illustrate its successful application for classifying vulture behavioral modes from acceleration data obtained from free-ranging vultures. The seven models offered in the AcceleRater application achieved overall accuracy of between 77.68% (Decision Tree) and 84.84% (Artificial Neural Network), with a mean overall accuracy of 81.51% and standard deviation of 3.95%. Notably, variation in performance was larger between behavioral modes than between models. AcceleRater provides the means to identify animal behavior, offering a user-friendly tool for ACC-based behavioral annotation, which will be dynamically upgraded and maintained.

  2. Real-Time Classification of Complex Patterns Using Spike-Based Learning in Neuromorphic VLSI.

    Science.gov (United States)

    Mitra, S; Fusi, S; Indiveri, G

    2009-02-01

    Real-time classification of patterns of spike trains is a difficult computational problem that both natural and artificial networks of spiking neurons are confronted with. The solution to this problem not only could contribute to understanding the fundamental mechanisms of computation used in the biological brain, but could also lead to efficient hardware implementations of a wide range of applications ranging from autonomous sensory-motor systems to brain-machine interfaces. Here we demonstrate real-time classification of complex patterns of mean firing rates, using a VLSI network of spiking neurons and dynamic synapses which implement a robust spike-driven plasticity mechanism. The learning rule implemented is a supervised one: a teacher signal provides the output neuron with an extra input spike-train during training, in parallel to the spike-trains that represent the input pattern. The teacher signal simply indicates if the neuron should respond to the input pattern with a high rate or with a low one. The learning mechanism modifies the synaptic weights only as long as the current generated by all the stimulated plastic synapses does not match the output desired by the teacher, as in the perceptron learning rule. We describe the implementation of this learning mechanism and present experimental data that demonstrate how the VLSI neural network can learn to classify patterns of neural activities, also in the case in which they are highly correlated.

  3. Retinal vessel segmentation using the 2-D Gabor wavelet and supervised classification.

    Science.gov (United States)

    Soares, João V B; Leandro, Jorge J G; Cesar Júnior, Roberto M; Jelinek, Herbert F; Cree, Michael J

    2006-09-01

    We present a method for automated segmentation of the vasculature in retinal images. The method produces segmentations by classifying each image pixel as vessel or nonvessel, based on the pixel's feature vector. Feature vectors are composed of the pixel's intensity and two-dimensional Gabor wavelet transform responses taken at multiple scales. The Gabor wavelet is capable of tuning to specific frequencies, thus allowing noise filtering and vessel enhancement in a single step. We use a Bayesian classifier with class-conditional probability density functions (likelihoods) described as Gaussian mixtures, yielding a fast classification, while being able to model complex decision surfaces. The probability distributions are estimated based on a training set of labeled pixels obtained from manual segmentations. The method's performance is evaluated on publicly available DRIVE (Staal et al., 2004) and STARE (Hoover et al., 2000) databases of manually labeled images. On the DRIVE database, it achieves an area under the receiver operating characteristic curve of 0.9614, being slightly superior than that presented by state-of-the-art approaches. We are making our implementation available as open source MATLAB scripts for researchers interested in implementation details, evaluation, or development of methods.

  4. Semi-supervised eigenvectors for large-scale locally-biased learning

    DEFF Research Database (Denmark)

    Hansen, Toke Jansen; Mahoney, Michael W.

    2014-01-01

    In many applications, one has side information, e.g., labels that are provided in a semi-supervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks nearby that prespecified target region. For example, one might...... machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities, thus limiting the applicability of eigenvector-based methods in situations where one is interested in very local properties of the data. In this paper, we address this issue by providing...... be interested in the clustering structure of a data graph near a prespecified seed set of nodes, or one might be interested in finding partitions in an image that are near a prespecified ground truth set of pixels. Locally-biased problems of this sort are particularly challenging for popular eigenvector-based...

  5. Semi-supervised eigenvectors for large-scale locally-biased learning

    DEFF Research Database (Denmark)

    Hansen, Toke Jansen; Mahoney, Michael W.

    2014-01-01

    -based machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities, thus limiting the applicability of eigenvector-based methods in situations where one is interested in very local properties of the data. In this paper, we address this issue by providing......In many applications, one has side information, e.g., labels that are provided in a semi-supervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks nearby that prespecified target region. For example, one might...... be interested in the clustering structure of a data graph near a prespecified seed set of nodes, or one might be interested in finding partitions in an image that are near a prespecified ground truth set of pixels. Locally-biased problems of this sort are particularly challenging for popular eigenvector...

  6. A Weighted Block Dictionary Learning Algorithm for Classification

    OpenAIRE

    Zhongrong Shi

    2016-01-01

    Discriminative dictionary learning, playing a critical role in sparse representation based classification, has led to state-of-the-art classification results. Among the existing discriminative dictionary learning methods, two different approaches, shared dictionary and class-specific dictionary, which associate each dictionary atom to all classes or a single class, have been studied. The shared dictionary is a compact method but with lack of discriminative information; the class-specific dict...

  7. Application of graph-based semi-supervised learning for development of cyber COP and network intrusion detection

    Science.gov (United States)

    Levchuk, Georgiy; Colonna-Romano, John; Eslami, Mohammed

    2017-05-01

    The United States increasingly relies on cyber-physical systems to conduct military and commercial operations. Attacks on these systems have increased dramatically around the globe. The attackers constantly change their methods, making state-of-the-art commercial and military intrusion detection systems ineffective. In this paper, we present a model to identify functional behavior of network devices from netflow traces. Our model includes two innovations. First, we define novel features for a host IP using detection of application graph patterns in IP's host graph constructed from 5-min aggregated packet flows. Second, we present the first application, to the best of our knowledge, of Graph Semi-Supervised Learning (GSSL) to the space of IP behavior classification. Using a cyber-attack dataset collected from NetFlow packet traces, we show that GSSL trained with only 20% of the data achieves higher attack detection rates than Support Vector Machines (SVM) and Naïve Bayes (NB) classifiers trained with 80% of data points. We also show how to improve detection quality by filtering out web browsing data, and conclude with discussion of future research directions.

  8. IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST SENSITIVE CRITERION FOR C-SVM

    Directory of Open Access Journals (Sweden)

    M’hamed Bilal Abidine

    2013-11-01

    Full Text Available The growing population of elders in the society calls for a new approach in care giving. By inferring what activities elderly are performing in their houses it is possible to determine their physical and cognitive capabilities. In this paper we show the potential of important discriminative classifiers namely the Soft-Support Vector Machines (C-SVM, Conditional Random Fields (CRF and k-Nearest Neighbors (k-NN for recognizing activities from sensor patterns in a smart home environment. We address also the class imbalance problem in activity recognition field which has been known to hinder the learning performance of classifiers. Cost sensitive learning is attractive under most imbalanced circumstances, but it is difficult to determine the precise misclassification costs in practice. We introduce a new criterion for selecting the suitable cost parameter C of the C-SVM method. Through our evaluation on four real world imbalanced activity datasets, we demonstrate that C-SVM based on our proposed criterion outperforms the state-of-the-art discriminative methods in activity recognition.

  9. Supervised novelty detection in brain tissue classification with an application to white matter hyperintensities

    Science.gov (United States)

    Kuijf, Hugo J.; Moeskops, Pim; de Vos, Bob D.; Bouvy, Willem H.; de Bresser, Jeroen; Biessels, Geert Jan; Viergever, Max A.; Vincken, Koen L.

    2016-03-01

    Novelty detection is concerned with identifying test data that differs from the training data of a classifier. In the case of brain MR images, pathology or imaging artefacts are examples of untrained data. In this proof-of-principle study, we measure the behaviour of a classifier during the classification of trained labels (i.e. normal brain tissue). Next, we devise a measure that distinguishes normal classifier behaviour from abnormal behavior that occurs in the case of a novelty. This will be evaluated by training a kNN classifier on normal brain tissue, applying it to images with an untrained pathology (white matter hyperintensities (WMH)), and determine if our measure is able to identify abnormal classifier behaviour at WMH locations. For our kNN classifier, behaviour is modelled as the mean, median, or q1 distance to the k nearest points. Healthy tissue was trained on 15 images; classifier behaviour was trained/tested on 5 images with leave-one-out cross-validation. For each trained class, we measure the distribution of mean/median/q1 distances to the k nearest point. Next, for each test voxel, we compute its Z-score with respect to the measured distribution of its predicted label. We consider a Z-score >=4 abnormal behaviour of the classifier, having a probability due to chance of 0.000032. Our measure identified >90% of WMH volume and also highlighted other non-trained findings. The latter being predominantly vessels, cerebral falx, brain mask errors, choroid plexus. This measure is generalizable to other classifiers and might help in detecting unexpected findings or novelties by measuring classifier behaviour.

  10. Improvements on coronal hole detection in SDO/AIA images using supervised classification

    CERN Document Server

    Reiss, Martin A; De Visscher, Ruben; Temmer, Manuela; Veronig, Astrid M; Delouille, Véronique; Mampaey, Benjamin; Ahammer, Helmut

    2015-01-01

    We demonstrate the use of machine learning algorithms in combination with segmentation techniques in order to distinguish coronal holes and filaments in SDO/AIA EUV images of the Sun. Based on two coronal hole detection techniques (intensity-based thresholding, SPoCA), we prepared data sets of manually labeled coronal hole and filament channel regions present on the Sun during the time range 2011 - 2013. By mapping the extracted regions from EUV observations onto HMI line-of-sight magnetograms we also include their magnetic characteristics. We computed shape measures from the segmented binary maps as well as first order and second order texture statistics from the segmented regions in the EUV images and magnetograms. These attributes were used for data mining investigations to identify the most performant rule to differentiate between coronal holes and filament channels. We applied several classifiers, namely Support Vector Machine, Linear Support Vector Machine, Decision Tree, and Random Forest and found tha...

  11. Hierarchical Maximum Margin Learning for Multi-Class Classification

    CERN Document Server

    Yang, Jian-Bo

    2012-01-01

    Due to myriads of classes, designing accurate and efficient classifiers becomes very challenging for multi-class classification. Recent research has shown that class structure learning can greatly facilitate multi-class learning. In this paper, we propose a novel method to learn the class structure for multi-class classification problems. The class structure is assumed to be a binary hierarchical tree. To learn such a tree, we propose a maximum separating margin method to determine the child nodes of any internal node. The proposed method ensures that two classgroups represented by any two sibling nodes are most separable. In the experiments, we evaluate the accuracy and efficiency of the proposed method over other multi-class classification methods on real world large-scale problems. The results show that the proposed method outperforms benchmark methods in terms of accuracy for most datasets and performs comparably with other class structure learning methods in terms of efficiency for all datasets.

  12. Hearing in a shoe-box : binaural source position and wall absorption estimation using virtually supervised learning

    OpenAIRE

    Kataria, Saurabh; Gaultier, Clément; Deleforge, Antoine

    2016-01-01

    This paper introduces a new framework for supervised sound source localization referred to as virtually-supervised learning. An acoustic shoe-box room simulator is used to generate a large number of binaural single-source audio scenes. These scenes are used to build a dataset of spatial binaural features annotated with acoustic properties such as the 3D source position and the walls' absorption coefficients. A probabilis-tic high-to low-dimensional regression framework is used to learn a mapp...

  13. Hierarchical discriminant manifold learning for dimensionality reduction and image classification

    Science.gov (United States)

    Chen, Weihai; Zhao, Changchen; Ding, Kai; Wu, Xingming; Chen, Peter C. Y.

    2015-09-01

    In the field of image classification, it has been a trend that in order to deliver a reliable classification performance, the feature extraction model becomes increasingly more complicated, leading to a high dimensionality of image representations. This, in turn, demands greater computation resources for image classification. Thus, it is desirable to apply dimensionality reduction (DR) methods for image classification. It is necessary to apply DR methods to relieve the computational burden as well as to improve the classification accuracy. However, traditional DR methods are not compatible with modern feature extraction methods. A framework that combines manifold learning based DR and feature extraction in a deeper way for image classification is proposed. A multiscale cell representation is extracted from the spatial pyramid to satisfy the locality constraints for a manifold learning method. A spectral weighted mean filtering is proposed to eliminate noise in the feature space. A hierarchical discriminant manifold learning is proposed which incorporates both category label and image scale information to guide the DR process. Finally, the image representation is generated by concatenating dimensionality reduced cell representations from the same image. Extensive experiments are conducted to test the proposed algorithm on both scene and object recognition datasets in comparison with several well-established and state-of-the-art methods with respect to classification precision and computational time. The results verify the effectiveness of incorporating manifold learning in the feature extraction procedure and imply that the multiscale cell representations may be distributed on a manifold.

  14. Literature mining of protein-residue associations with graph rules learned through distant supervision

    Directory of Open Access Journals (Sweden)

    Ravikumar KE

    2012-10-01

    Full Text Available Abstract Background We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. Results The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. Conclusions The primary contributions of this work are to (1 demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2 show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature.

  15. Human-interpretable feature pattern classification system using learning classifier systems.

    Science.gov (United States)

    Ebadi, Toktam; Kukenys, Ignas; Browne, Will N; Zhang, Mengjie

    2014-01-01

    Image pattern classification is a challenging task due to the large search space of pixel data. Supervised and subsymbolic approaches have proven accurate in learning a problem's classes. However, in the complex image recognition domain, there is a need for investigation of learning techniques that allow humans to interpret the learned rules in order to gain an insight about the problem. Learning classifier systems (LCSs) are a machine learning technique that have been minimally explored for image classification. This work has developed the feature pattern classification system (FPCS) framework by adopting Haar-like features from the image recognition domain for feature extraction. The FPCS integrates Haar-like features with XCS, which is an accuracy-based LCS. A major contribution of this work is that the developed framework is capable of producing human-interpretable rules. The FPCS system achieved 91 [Formula: see text] 1% accuracy on the unseen test set of the MNIST dataset. In addition, the FPCS is capable of autonomously adjusting the rotation angle in unaligned images. This rotation adjustment raised the accuracy of FPCS to 95%. Although the performance is competitive with equivalent approaches, this was not as accurate as subsymbolic approaches on this dataset. However, the benefit of the interpretability of rules produced by FPCS enabled us to identify the distribution of the learned angles-a normal distribution around [Formula: see text]-which would have been very difficult in subsymbolic approaches. The analyzable nature of FPCS is anticipated to be beneficial in domains such as speed sign recognition, where underlying reasoning and confidence of recognition needs to be human interpretable.

  16. Preliminary hard and soft bottom seafloor substrate map derived from an supervised classification of bathymetry derived from multispectral World View-2 satellite imagery of Ni'ihau Island, Territory of Main Hawaiian Islands, USA

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — Preliminary hard and soft seafloor substrate map derived from a supervised classification from multispectral World View-2 satellite imagery of Ni'ihau Island,...

  17. Supervised neural network modeling: an empirical investigation into learning from imbalanced data with labeling errors.

    Science.gov (United States)

    Khoshgoftaar, Taghi M; Van Hulse, Jason; Napolitano, Amri

    2010-05-01

    Neural network algorithms such as multilayer perceptrons (MLPs) and radial basis function networks (RBFNets) have been used to construct learners which exhibit strong predictive performance. Two data related issues that can have a detrimental impact on supervised learning initiatives are class imbalance and labeling errors (or class noise). Imbalanced data can make it more difficult for the neural network learning algorithms to distinguish between examples of the various classes, and class noise can lead to the formulation of incorrect hypotheses. Both class imbalance and labeling errors are pervasive problems encountered in a wide variety of application domains. Many studies have been performed to investigate these problems in isolation, but few have focused on their combined effects. This study presents a comprehensive empirical investigation using neural network algorithms to learn from imbalanced data with labeling errors. In particular, the first component of our study investigates the impact of class noise and class imbalance on two common neural network learning algorithms, while the second component considers the ability of data sampling (which is commonly used to address the issue of class imbalance) to improve their performances. Our results, for which over two million models were trained and evaluated, show that conclusions drawn using the more commonly studied C4.5 classifier may not apply when using neural networks.

  18. SimNest: Social Media Nested Epidemic Simulation via Online Semi-supervised Deep Learning.

    Science.gov (United States)

    Zhao, Liang; Chen, Jiangzhuo; Chen, Feng; Wang, Wei; Lu, Chang-Tien; Ramakrishnan, Naren

    2015-11-01

    Infectious disease epidemics such as influenza and Ebola pose a serious threat to global public health. It is crucial to characterize the disease and the evolution of the ongoing epidemic efficiently and accurately. Computational epidemiology can model the disease progress and underlying contact network, but suffers from the lack of real-time and fine-grained surveillance data. Social media, on the other hand, provides timely and detailed disease surveillance, but is insensible to the underlying contact network and disease model. This paper proposes a novel semi-supervised deep learning framework that integrates the strengths of computational epidemiology and social media mining techniques. Specifically, this framework learns the social media users' health states and intervention actions in real time, which are regularized by the underlying disease model and contact network. Conversely, the learned knowledge from social media can be fed into computational epidemic model to improve the efficiency and accuracy of disease diffusion modeling. We propose an online optimization algorithm to substantialize the above interactive learning process iteratively to achieve a consistent stage of the integration. The extensive experimental results demonstrated that our approach can effectively characterize the spatio-temporal disease diffusion, outperforming competing methods by a substantial margin on multiple metrics.

  19. Large-Scale Machine Learning for Classification and Search

    Science.gov (United States)

    Liu, Wei

    2012-01-01

    With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing large-scale machine learning techniques for the purpose of making classification and nearest…

  20. Large-Scale Machine Learning for Classification and Search

    Science.gov (United States)

    Liu, Wei

    2012-01-01

    With the rapid development of the Internet, nowadays tremendous amounts of data including images and videos, up to millions or billions, can be collected for training machine learning models. Inspired by this trend, this thesis is dedicated to developing large-scale machine learning techniques for the purpose of making classification and nearest…

  1. Semi-supervised learning of causal relations in biomedical scientific discourse

    Science.gov (United States)

    2014-01-01

    Background The increasing number of daily published articles in the biomedical domain has become too large for humans to handle on their own. As a result, bio-text mining technologies have been developed to improve their workload by automatically analysing the text and extracting important knowledge. Specific bio-entities, bio-events between these and facts can now be recognised with sufficient accuracy and are widely used by biomedical researchers. However, understanding how the extracted facts are connected in text is an extremely difficult task, which cannot be easily tackled by machinery. Results In this article, we describe our method to recognise causal triggers and their arguments in biomedical scientific discourse. We introduce new features and show that a self-learning approach improves the performance obtained by supervised machine learners to 83.47% for causal triggers. Furthermore, the spans of causal arguments can be recognised to a slightly higher level that by using supervised or rule-based methods that have been employed before. Conclusion Exploiting the large amount of unlabelled data that is already available can help improve the performance of recognising causal discourse relations in the biomedical domain. This improvement will further benefit the development of multiple tasks, such as hypothesis generation for experimental laboratories, contradiction detection, and the creation of causal networks. PMID:25559746

  2. Using distant supervised learning to identify protein subcellular localizations from full-text scientific articles.

    Science.gov (United States)

    Zheng, Wu; Blake, Catherine

    2015-10-01

    Databases of curated biomedical knowledge, such as the protein-locations reflected in the UniProtKB database, provide an accurate and useful resource to researchers and decision makers. Our goal is to augment the manual efforts currently used to curate knowledge bases with automated approaches that leverage the increased availability of full-text scientific articles. This paper describes experiments that use distant supervised learning to identify protein subcellular localizations, which are important to understand protein function and to identify candidate drug targets. Experiments consider Swiss-Prot, the manually annotated subset of the UniProtKB protein knowledge base, and 43,000 full-text articles from the Journal of Biological Chemistry that contain just under 11.5 million sentences. The system achieves 0.81 precision and 0.49 recall at sentence level and an accuracy of 57% on held-out instances in a test set. Moreover, the approach identifies 8210 instances that are not in the UniProtKB knowledge base. Manual inspection of the 50 most likely relations showed that 41 (82%) were valid. These results have immediate benefit to researchers interested in protein function, and suggest that distant supervision should be explored to complement other manual data curation efforts.

  3. Performance of machine learning methods for classification tasks

    Directory of Open Access Journals (Sweden)

    B. Krithika

    2013-06-01

    Full Text Available In this paper, the performance of various machine learning methods on pattern classification and recognition tasks are proposed. The proposed method for evaluating performance will be based on the feature representation, feature selection and setting model parameters. The nature of the data, the methods of feature extraction and feature representation are discussed. The results of the Machine Learning algorithms on the classification task are analysed. The performance of Machine Learning methods on classifying Tamil word patterns, i.e., classification of noun and verbs are analysed.The software WEKA (data mining tool is used for evaluating the performance. WEKA has several machine learning algorithms like Bayes, Trees, Lazy, Rule based classifiers.

  4. INTERACTIVE DOMAIN ADAPTION FOR THE CLASSIFICATION OF REMOTE SENSING IMAGES USING ACTIVE LEARNING

    Directory of Open Access Journals (Sweden)

    U.Pushpa Lingam

    2015-11-01

    Full Text Available Interactive Domain Adaptation (IDA technique based on active learning for the classification of remote sensing images. Interactive domain adaptation method is used for adapting the supervised classifier trained on a given remote sensing source image to make it suitable for classifying a different but related target image. The two images can be acquired in different locations and at different times. This method iteratively selects the most informative samples of the target image to be labeled and included in the training set and the source image samples are reweighted or removed from the training set on the basis of their disagreement with the target image classification problem. The consistent information available from the source image can be effectively exploited for the classification of the target image and for guiding the selection of new samples to be labeled, whereas the inconsistent information is automatically detected and removed. This approach significantly reduces the number of new labeled samples to be collected from the target image. Experimental results on both a multispectral very high resolution and a hyper spectral data set confirm the effectiveness of the interactive domain adaptation for theclassification of remote sensing using active learning method.

  5. Non-Supervised Learning for Spread Spectrum Signal Pseudo-Noise Sequence Acquisition

    Institute of Scientific and Technical Information of China (English)

    Hao Cheng; Na Yu,; Tai-Jun Wang

    2015-01-01

    Abstract¾An idea of estimating the direct sequence spread spectrum (DSSS) signal pseudo-noise (PN) sequence is presented. Without the apriority knowledge about the DSSS signal in the non-cooperation condition, we propose a self-organizing feature map (SOFM) neural network algorithm to detect and identify the PN sequence. A non-supervised learning algorithm is proposed according the Kohonen rule in SOFM. The blind algorithm can also estimate the PN sequence in a low signal-to-noise (SNR) and computer simulation demonstrates that the algorithm is effective. Compared with the traditional correlation algorithm based on slip-correlation, the proposed algorithm’s bit error rate (BER) and complexity are lower.

  6. Exhaustive and Efficient Constraint Propagation: A Semi-Supervised Learning Perspective and Its Applications

    CERN Document Server

    Lu, Zhiwu; Peng, Yuxin

    2011-01-01

    This paper presents a novel pairwise constraint propagation approach by decomposing the challenging constraint propagation problem into a set of independent semi-supervised learning subproblems which can be solved in quadratic time using label propagation based on k-nearest neighbor graphs. Considering that this time cost is proportional to the number of all possible pairwise constraints, our approach actually provides an efficient solution for exhaustively propagating pairwise constraints throughout the entire dataset. The resulting exhaustive set of propagated pairwise constraints are further used to adjust the similarity matrix for constrained spectral clustering. Other than the traditional constraint propagation on single-source data, our approach is also extended to more challenging constraint propagation on multi-source data where each pairwise constraint is defined over a pair of data points from different sources. This multi-source constraint propagation has an important application to cross-modal mul...

  7. Anxiety, supervision and a space for thinking: some narcissistic perils for clinical psychologists in learning psychotherapy.

    Science.gov (United States)

    Mollon, P

    1989-06-01

    The process of learning psychotherapy involves narcissistic dangers--there may be injuries to self-esteem and self-image, especially when working with certain kinds of disturbed and hostile patients. Some patients will unconsciously recreate, in the transference, representations of early damaging experiences with parents, but now reversed with the therapist as the victim. It is vital for the trainee to be helped to understand these powerful interactional pressures. There are aspects of the professional culture and ideals of clinical psychologists (and possibly of some psychiatrists and social workers as well) which may make them particularly vulnerable in work with the hostile patient. It is argued that the function of supervision is not to teach a technique directly, but to create a 'space for thinking'--a kind of thinking which is more akin to maternal reverie, as described by Bion, than problem solving.

  8. Semi-supervised binary classification algorithm based on global and local regularization%结合全局和局部正则化的半监督二分类算法

    Institute of Scientific and Technical Information of China (English)

    吕佳

    2012-01-01

    As for semi-supervised classification problem, it is difficult to obtain a good classification function for the entire input space if global learning is used alone, while if local learning is utilized alone, a good classification function on some specified regions of the input space can be got. Accordingly, a new semi-supervised binary classification algorithm based on a mixed local and global regularization was presented in this paper. The algorithm integrated the benefits of global regularizer and local regularizes Global regularizer was built to smooth the class labels of the data so as to lessen insufficient training of local regularizer, and based upon the neighboring region, local regularizer was constructed to make class label of each data have the desired property, thus the objective function of semi-supervised binary classification problem was constructed. Comparative semi-supervised binary classification experiments on some benchmark datasets validate that the average classification accuracy and the standard error of the proposed algorithm are obviously superior to other algorithms.%针对在半监督分类问题中单独使用全局学习容易出现的在整个输入空间中较难获得一个优良的决策函数的问题,以及单独使用局部学习可在特定的局部区域内习得较好的决策函数的特点,提出了一种结合全局和局部正则化的半监督二分类算法.该算法综合全局正则项和局部正则项的优点,基于先验知识构建的全局正则项能平滑样本的类标号以避免局部正则项学习不充分的问题,通过基于局部邻域内样本信息构建的局部正则项使得每个样本的类标号具有理想的特性,从而构造出半监督二分类问题的目标函数.通过在标准二类数据集上的实验,结果表明所提出的算法其平均分类正确率和标准误差均优于基于拉普拉斯正则项方法、基于正则化拉普拉斯正则项方法和基于局部学习正则项方法.

  9. Multi-robot system learning based on evolutionary classification

    Directory of Open Access Journals (Sweden)

    Manko Sergey

    2016-01-01

    Full Text Available This paper presents a novel machine learning method for agents of a multi-robot system. The learning process is based on knowledge discovery through continual analysis of robot sensory information. We demonstrate that classification trees and evolutionary forests may be a basis for creation of autonomous robots capable both of learning and knowledge exchange with other agents in multi-robot system. The results of experimental studies confirm the effectiveness of the proposed approach.

  10. A semi-supervised learning framework for biomedical event extraction based on hidden topics.

    Science.gov (United States)

    Zhou, Deyu; Zhong, Dayou

    2015-05-01

    Scientists have devoted decades of efforts to understanding the interaction between proteins or RNA production. The information might empower the current knowledge on drug reactions or the development of certain diseases. Nevertheless, due to the lack of explicit structure, literature in life science, one of the most important sources of this information, prevents computer-based systems from accessing. Therefore, biomedical event extraction, automatically acquiring knowledge of molecular events in research articles, has attracted community-wide efforts recently. Most approaches are based on statistical models, requiring large-scale annotated corpora to precisely estimate models' parameters. However, it is usually difficult to obtain in practice. Therefore, employing un-annotated data based on semi-supervised learning for biomedical event extraction is a feasible solution and attracts more interests. In this paper, a semi-supervised learning framework based on hidden topics for biomedical event extraction is presented. In this framework, sentences in the un-annotated corpus are elaborately and automatically assigned with event annotations based on their distances to these sentences in the annotated corpus. More specifically, not only the structures of the sentences, but also the hidden topics embedded in the sentences are used for describing the distance. The sentences and newly assigned event annotations, together with the annotated corpus, are employed for training. Experiments were conducted on the multi-level event extraction corpus, a golden standard corpus. Experimental results show that more than 2.2% improvement on F-score on biomedical event extraction is achieved by the proposed framework when compared to the state-of-the-art approach. The results suggest that by incorporating un-annotated data, the proposed framework indeed improves the performance of the state-of-the-art event extraction system and the similarity between sentences might be precisely

  11. Entry-Level Technical Skills That Teachers Expected Students to Learn through Supervised Agricultural Experiences (SAEs): A Modified Delphi Study

    Science.gov (United States)

    Ramsey, Jon W.; Edwards, M. Craig

    2012-01-01

    Supervised experiences are designed to provide opportunities for the hands-on learning of skills and practices that lead to successful personal growth and future employment in an agricultural career (Talbert, Vaughn, Croom, & Lee, 2007). In the Annual Report for Agricultural Education (2005-2006), it was stated that 91% of the respondents…

  12. Just How Much Can School Pupils Learn from School Gardening? A Study of Two Supervised Agricultural Experience Approaches in Uganda

    Science.gov (United States)

    Okiror, John James; Matsiko, Biryabaho Frank; Oonyu, Joseph

    2011-01-01

    School systems in Africa are short of skills that link well with rural communities, yet arguments to vocationalize curricula remain mixed and school agriculture lacks the supervised practical component. This study, conducted in eight primary (elementary) schools in Uganda, sought to compare the learning achievement of pupils taught using…

  13. Teaching the computer to code frames in news: comparing two supervised machine learning approaches to frame analysis

    NARCIS (Netherlands)

    Burscher, B.; Odijk, D.; Vliegenthart, R.; de Rijke, M.; de Vreese, C.H.

    2014-01-01

    We explore the application of supervised machine learning (SML) to frame coding. By automating the coding of frames in news, SML facilitates the incorporation of large-scale content analysis into framing research, even if financial resources are scarce. This furthers a more integrated investigation

  14. Teaching the computer to code frames in news: comparing two supervised machine learning approaches to frame analysis

    NARCIS (Netherlands)

    Burscher, B.; Odijk, D.; Vliegenthart, R.; de Rijke, M.; de Vreese, C.H.

    2014-01-01

    We explore the application of supervised machine learning (SML) to frame coding. By automating the coding of frames in news, SML facilitates the incorporation of large-scale content analysis into framing research, even if financial resources are scarce. This furthers a more integrated investigation

  15. Entry-Level Technical Skills That Teachers Expected Students to Learn through Supervised Agricultural Experiences (SAEs): A Modified Delphi Study

    Science.gov (United States)

    Ramsey, Jon W.; Edwards, M. Craig

    2012-01-01

    Supervised experiences are designed to provide opportunities for the hands-on learning of skills and practices that lead to successful personal growth and future employment in an agricultural career (Talbert, Vaughn, Croom, & Lee, 2007). In the Annual Report for Agricultural Education (2005-2006), it was stated that 91% of the respondents (i.e.,…

  16. Collective Academic Supervision: A Model for Participation and Learning in Higher Education

    Science.gov (United States)

    Nordentoft, Helle Merete; Thomsen, Rie; Wichmann-Hansen, Gitte

    2013-01-01

    Supervision of graduate students is a core activity in higher education. Previous research on graduate supervision focuses on individual and relational aspects of the supervisory relationship rather than collective, pedagogical and methodological aspects of the supervision process. In presenting a collective model we have developed for academic…

  17. An Ontology to Support the Classification of Learning Material in an Organizational Learning Environment: An Evaluation

    Science.gov (United States)

    Valaski, Joselaine; Reinehr, Sheila; Malucelli, Andreia

    2017-01-01

    Purpose: The purpose of this research was to evaluate whether ontology integrated in an organizational learning environment may support the automatic learning material classification in a specific knowledge area. Design/methodology/approach: An ontology for recommending learning material was integrated in the organizational learning environment…

  18. Visual Recognition by Learning From Web Data via Weakly Supervised Domain Generalization.

    Science.gov (United States)

    Niu, Li; Li, Wen; Xu, Dong; Cai, Jianfei

    2016-06-01

    In this paper, a weakly supervised domain generalization (WSDG) method is proposed for real-world visual recognition tasks, in which we train classifiers by using Web data (\\eg, Web images and Web videos) with noisy labels. In particular, two challenging problems need to be solved when learning robust classifiers, in which the first issue is to cope with the label noise of training Web data from the source domain, while the second issue is to enhance the generalization capability of learned classifiers to an arbitrary target domain. In order to handle the first problem, the training samples within each category are partitioned into clusters, where we use one bag to denote each cluster and instances to denote the samples in each cluster. Then, we identify a proportion of good training samples in each bag and train robust classifiers by using the good training samples, which leads to a multi-instance learning (MIL) problem. In order to handle the second problem, we assume that the training samples possibly form a set of hidden domains, with each hidden domain associated with a distinctive data distribution. Then, for each category and each hidden latent domain, we propose to learn one classifier by extending our MIL formulation, which leads to our WSDG approach. In the testing stage, our approach can obtain better generalization capability by effectively integrating multiple classifiers from different latent domains in each category. Moreover, our WSDG approach is further extended to utilize additional textual descriptions associated with Web data as privileged information (PI), although testing data do not have such PI. Extensive experiments on three benchmark data sets indicate that our newly proposed methods are effective for real-world visual recognition tasks by learning from Web data.

  19. Whither Supervision?

    Directory of Open Access Journals (Sweden)

    Duncan Waite

    2006-11-01

    Full Text Available This paper inquires if the school supervision is in decadence. Dr. Waite responds that the answer will depend on which perspective you look at it. Dr. Waite suggests taking in consideration three elements that are related: the field itself, the expert in the field (the professor, the theorist, the student and the administrator, and the context. When these three elements are revised, it emphasizes that there is not a consensus about the field of supervision, but there are coincidences related to its importance and that it is related to the improvement of the practice of the students in the school for their benefit. Dr. Waite suggests that the practice on this field is not always in harmony with what the theorists affirm. When referring to the supervisor or the skilled person, the author indicates that his or her perspective depends on his or her epistemological believes or in the way he or she conceives the learning; that is why supervision can be understood in different ways. About the context, Waite suggests that there have to be taken in consideration the social or external forces that influent the people and the society, because through them the education is affected. Dr. Waite concludes that the way to understand the supervision depends on the performer’s perspective. He responds to the initial question saying that the supervision authorities, the knowledge on this field, the performers, and its practice, are maybe spread but not extinct because the supervision will always be part of the great enterprise that we called education.

  20. Photometric classification of emission line galaxies with Machine Learning methods

    CERN Document Server

    Cavuoti, Stefano; D'Abrusco, Raffaele; Longo, Giuseppe; Paolillo, Maurizio

    2013-01-01

    In this paper we discuss an application of machine learning based methods to the identification of candidate AGN from optical survey data and to the automatic classification of AGNs in broad classes. We applied four different machine learning algorithms, namely the Multi Layer Perceptron (MLP), trained respectively with the Conjugate Gradient, Scaled Conjugate Gradient and Quasi Newton learning rules, and the Support Vector Machines (SVM), to tackle the problem of the classification of emission line galaxies in different classes, mainly AGNs vs non-AGNs, obtained using optical photometry in place of the diagnostics based on line intensity ratios which are classically used in the literature. Using the same photometric features we discuss also the behavior of the classifiers on finer AGN classification tasks, namely Seyfert I vs Seyfert II and Seyfert vs LINER. Furthermore we describe the algorithms employed, the samples of spectroscopically classified galaxies used to train the algorithms, the procedure follow...

  1. Significance of Classification Techniques in Prediction of Learning Disabilities

    CERN Document Server

    Balakrishnan, Julie M David And Kannan

    2010-01-01

    The aim of this study is to show the importance of two classification techniques, viz. decision tree and clustering, in prediction of learning disabilities (LD) of school-age children. LDs affect about 10 percent of all children enrolled in schools. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Decision trees and clustering are powerful and popular tools used for classification and prediction in Data mining. Different rules extracted from the decision tree are used for prediction of learning disabilities. Clustering is the assignment of a set of observations into subsets, called clusters, which are useful in finding the different signs and symptoms (attributes) present in the LD affected child. In this paper, J48 algorithm is used for constructing the decision tree and K-means algorithm is used for creating the clusters. By applying these classification techniques, LD in any child can be identified.

  2. Automatic stellar spectral classification via sparse representations and dictionary learning

    Science.gov (United States)

    Díaz-Hernández, R.; Peregrina-Barreto, H.; Altamirano-Robles, L.; González-Bernal, J. A.; Ortiz-Esquivel, A. E.

    2014-11-01

    Stellar classification is an important topic in astronomical tasks such as the study of stellar populations. However, stellar classification of a region of the sky is a time-consuming process due to the large amount of objects present in an image. Therefore, automatic techniques to speed up the process are required. In this work, we study the application of a sparse representation and a dictionary learning for automatic spectral stellar classification. Our dataset consist of 529 calibrated stellar spectra of classes B to K, belonging to the Pulkovo Spectrophotometric catalog, in the 3400-5500Å range. These stellar spectra are used for both training and testing of the proposed methodology. The sparse technique is applied by using the greedy algorithm OMP (Orthogonal Matching Pursuit) for finding an approximated solution, and the K-SVD (K-Singular Value Decomposition) for the dictionary learning step. Thus, sparse classification is based on the recognition of the common characteristics of a particular stellar type through the construction of a trained basis. In this work, we propose a classification criterion that evaluates the results of the sparse representation techniques and determines the final classification of the spectra. This methodology demonstrates its ability to achieve levels of classification comparable with automatic methodologies previously reported such as the Maximum Correlation Coefficient (MCC) and Artificial Neural Networks (ANN).

  3. Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation.

    Directory of Open Access Journals (Sweden)

    Chao Wei

    Full Text Available Topic models and neural networks can discover meaningful low-dimensional latent representations of text corpora; as such, they have become a key technology of document representation. However, such models presume all documents are non-discriminatory, resulting in latent representation dependent upon all other documents and an inability to provide discriminative document representation. To address this problem, we propose a semi-supervised manifold-inspired autoencoder to extract meaningful latent representations of documents, taking the local perspective that the latent representation of nearby documents should be correlative. We first determine the discriminative neighbors set with Euclidean distance in observation spaces. Then, the autoencoder is trained by joint minimization of the Bernoulli cross-entropy error between input and output and the sum of the square error between neighbors of input and output. The results of two widely used corpora show that our method yields at least a 15% improvement in document clustering and a nearly 7% improvement in classification tasks compared to comparative methods. The evidence demonstrates that our method can readily capture more discriminative latent representation of new documents. Moreover, some meaningful combinations of words can be efficiently discovered by activating features that promote the comprehensibility of latent representation.

  4. Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation.

    Science.gov (United States)

    Wei, Chao; Luo, Senlin; Ma, Xincheng; Ren, Hao; Zhang, Ji; Pan, Limin

    2016-01-01

    Topic models and neural networks can discover meaningful low-dimensional latent representations of text corpora; as such, they have become a key technology of document representation. However, such models presume all documents are non-discriminatory, resulting in latent representation dependent upon all other documents and an inability to provide discriminative document representation. To address this problem, we propose a semi-supervised manifold-inspired autoencoder to extract meaningful latent representations of documents, taking the local perspective that the latent representation of nearby documents should be correlative. We first determine the discriminative neighbors set with Euclidean distance in observation spaces. Then, the autoencoder is trained by joint minimization of the Bernoulli cross-entropy error between input and output and the sum of the square error between neighbors of input and output. The results of two widely used corpora show that our method yields at least a 15% improvement in document clustering and a nearly 7% improvement in classification tasks compared to comparative methods. The evidence demonstrates that our method can readily capture more discriminative latent representation of new documents. Moreover, some meaningful combinations of words can be efficiently discovered by activating features that promote the comprehensibility of latent representation.

  5. 一种用于半监督学习的核优化设计%A Kernel Optimization Design for Semi-supervised Learning

    Institute of Scientific and Technical Information of China (English)

    崔鹏

    2013-01-01

    Semi-supervised learning aims to obtain good performance and learning ability under lacking of some information on training examples.We proposed a semi-supervised learning framework based on optimizing kernel,which project data into high dimensional feature space and equal to linear classification.In kernel design,we applied spectral feature decomposition to unsupervised kernel design,and found optimal kernel by minimizing learning bound.With experimental results,we demonstrated our theory by comparison of different kernel approaches.%半监督学习研究主要关注当训练数据的部分信息缺失的情况下,如何获得具有良好性能和推广能力的学习机器。本文我们提出了一种基于核优化的半监督学习框架,将数据嵌入到高维特征空间,从而与线性分类器等价。在核的设计上,采用了基于谱分解的无监督核设计,提出了学习边界,通过最小化边界来获得最优核表示。通过实验,对不同的核方法进行了比较,证明了我们结论的正确性。

  6. Visual Feature Learning in Artificial Grammar Classification

    Science.gov (United States)

    Chang, Grace Y.; Knowlton, Barbara J.

    2004-01-01

    The Artificial Grammar Learning task has been used extensively to assess individuals' implicit learning capabilities. Previous work suggests that participants implicitly acquire rule-based knowledge as well as exemplar-specific knowledge in this task. This study investigated whether exemplar-specific knowledge acquired in this task is based on the…

  7. Automated Quality Assessment of Structural Magnetic Resonance Brain Images Based on a Supervised Machine Learning Algorithm

    Directory of Open Access Journals (Sweden)

    Ricardo Andres Pizarro

    2016-12-01

    Full Text Available High-resolution three-dimensional magnetic resonance imaging (3D-MRI is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM algorithm in the quality assessment of structural brain images, using global and region of interest (ROI automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.

  8. Automated detection of microaneurysms using scale-adapted blob analysis and semi-supervised learning.

    Science.gov (United States)

    Adal, Kedir M; Sidibé, Désiré; Ali, Sharib; Chaum, Edward; Karnowski, Thomas P; Mériaudeau, Fabrice

    2014-04-01

    Despite several attempts, automated detection of microaneurysm (MA) from digital fundus images still remains to be an open issue. This is due to the subtle nature of MAs against the surrounding tissues. In this paper, the microaneurysm detection problem is modeled as finding interest regions or blobs from an image and an automatic local-scale selection technique is presented. Several scale-adapted region descriptors are introduced to characterize these blob regions. A semi-supervised based learning approach, which requires few manually annotated learning examples, is also proposed to train a classifier which can detect true MAs. The developed system is built using only few manually labeled and a large number of unlabeled retinal color fundus images. The performance of the overall system is evaluated on Retinopathy Online Challenge (ROC) competition database. A competition performance measure (CPM) of 0.364 shows the competitiveness of the proposed system against state-of-the art techniques as well as the applicability of the proposed features to analyze fundus images.

  9. The effects of supervised learning on event-related potential correlates of music-syntactic processing.

    Science.gov (United States)

    Guo, Shuang; Koelsch, Stefan

    2015-11-11

    Humans process music even without conscious effort according to implicit knowledge about syntactic regularities. Whether such automatic and implicit processing is modulated by veridical knowledge has remained unknown in previous neurophysiological studies. This study investigates this issue by testing whether the acquisition of veridical knowledge of a music-syntactic irregularity (acquired through supervised learning) modulates early, partly automatic, music-syntactic processes (as reflected in the early right anterior negativity, ERAN), and/or late controlled processes (as reflected in the late positive component, LPC). Excerpts of piano sonatas with syntactically regular and less regular chords were presented repeatedly (10 times) to non-musicians and amateur musicians. Participants were informed by a cue as to whether the following excerpt contained a regular or less regular chord. Results showed that the repeated exposure to several presentations of regular and less regular excerpts did not influence the ERAN elicited by less regular chords. By contrast, amplitudes of the LPC (as well as of the P3a evoked by less regular chords) decreased systematically across learning trials. These results reveal that late controlled, but not early (partly automatic), neural mechanisms of music-syntactic processing are modulated by repeated exposure to a musical piece. This article is part of a Special Issue entitled SI: Prediction and Attention. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. Learning features for tissue classification with the classification restricted Boltzmann machine

    NARCIS (Netherlands)

    G. van Tulder (Gijs); M. de Bruijne (Marleen)

    2014-01-01

    markdownabstract__Abstract__ Performance of automated tissue classification in medical imaging depends on the choice of descriptive features. In this paper, we show how restricted Boltzmann machines (RBMs) can be used to learn features that are especially suited for texture-based tissue

  11. Language Learning Strategies: Classification and Pedagogical Implication

    OpenAIRE

    Ag. Bambang Setiyadi

    2001-01-01

    Many studies have been conducted to explore language learning strategies (Rubin, 1975, Naiman et . al ., 1978; Fillmore, 1979; O'Malley et . al ., 1985 and 1990; Politzer and Groarty, 1985; Prokop, 1989; Oxford, 1990; and Wenden, 1991). In the current study a total of 79 university students participating in a 3 month English course participated. This study attempted to explore what language learning strategies successful learners used and to what extent the strategies contributed to success i...

  12. Language Learning Strategies: Classification and Pedagogical Implication

    Directory of Open Access Journals (Sweden)

    Ag. Bambang Setiyadi

    2001-01-01

    Full Text Available Many studies have been conducted to explore language learning strategies (Rubin, 1975, Naiman et . al ., 1978; Fillmore, 1979; O'Malley et . al ., 1985 and 1990; Politzer and Groarty, 1985; Prokop, 1989; Oxford, 1990; and Wenden, 1991. In the current study a total of 79 university students participating in a 3 month English course participated. This study attempted to explore what language learning strategies successful learners used and to what extent the strategies contributed to success in learning English in Indonesia . Factor analyses, accounting for 62.1 %, 56.0 %, 41.1 %, and 43.5 % of the varience of speaking, listening, reading and writing measures in the language learning strategy questionnaire, suggested that the questionnaire constituted three constructs. The three constructs were named metacognitive strategies, deep level cognitive and surface level cognitive strategies. Regression analyses, performed using scales based on these factors revealed significant main effects for the use of the language learning strategies in learning English, constituting 43 % of the varience in the posttest English achievement scores. An analysis of varience of the gain scores of the highest, middle, and the lowest groups of performers suggested a greater use of metacognitive strategies among successful learners and a greater use of surface level cognitive strategies among unsuccessful learners. Implications for the classroom and future research are also discussed.

  13. Learning and retention through predictive inference and classification.

    Science.gov (United States)

    Sakamoto, Yasuaki; Love, Bradley C

    2010-12-01

    Work in category learning addresses how humans acquire knowledge and, thus, should inform classroom practices. In two experiments, we apply and evaluate intuitions garnered from laboratory-based research in category learning to learning tasks situated in an educational context. In Experiment 1, learning through predictive inference and classification were compared for fifth-grade students using class-related materials. Making inferences about properties of category members and receiving feedback led to the acquisition of both queried (i.e., tested) properties and nonqueried properties that were correlated with a queried property (e.g., even if not queried, students learned about a species' habitat because it correlated with a queried property, like the species' size). In contrast, classifying items according to their species and receiving feedback led to knowledge of only the property most diagnostic of category membership. After multiple-day delay, the fifth-graders who learned through inference selectively retained information about the queried properties, and the fifth-graders who learned through classification retained information about the diagnostic property, indicating a role for explicit evaluation in establishing memories. Overall, inference learning resulted in fewer errors, better retention, and more liking of the categories than did classification learning. Experiment 2 revealed that querying a property only a few times was enough to manifest the full benefits of inference learning in undergraduate students. These results suggest that classroom teaching should emphasize reasoning from the category to multiple properties rather than from a set of properties to the category. (PsycINFO Database Record (c) 2010 APA, all rights reserved).

  14. Novel Approaches for Diagnosing Melanoma Skin Lesions Through Supervised and Deep Learning Algorithms.

    Science.gov (United States)

    Premaladha, J; Ravichandran, K S

    2016-04-01

    Dermoscopy is a technique used to capture the images of skin, and these images are useful to analyze the different types of skin diseases. Malignant melanoma is a kind of skin cancer whose severity even leads to death. Earlier detection of melanoma prevents death and the clinicians can treat the patients to increase the chances of survival. Only few machine learning algorithms are developed to detect the melanoma using its features. This paper proposes a Computer Aided Diagnosis (CAD) system which equips efficient algorithms to classify and predict the melanoma. Enhancement of the images are done using Contrast Limited Adaptive Histogram Equalization technique (CLAHE) and median filter. A new segmentation algorithm called Normalized Otsu's Segmentation (NOS) is implemented to segment the affected skin lesion from the normal skin, which overcomes the problem of variable illumination. Fifteen features are derived and extracted from the segmented images are fed into the proposed classification techniques like Deep Learning based Neural Networks and Hybrid Adaboost-Support Vector Machine (SVM) algorithms. The proposed system is tested and validated with nearly 992 images (malignant & benign lesions) and it provides a high classification accuracy of 93 %. The proposed CAD system can assist the dermatologists to confirm the decision of the diagnosis and to avoid excisional biopsies.

  15. Nonlinear programming for classification problems in machine learning

    Science.gov (United States)

    Astorino, Annabella; Fuduli, Antonio; Gaudioso, Manlio

    2016-10-01

    We survey some nonlinear models for classification problems arising in machine learning. In the last years this field has become more and more relevant due to a lot of practical applications, such as text and web classification, object recognition in machine vision, gene expression profile analysis, DNA and protein analysis, medical diagnosis, customer profiling etc. Classification deals with separation of sets by means of appropriate separation surfaces, which is generally obtained by solving a numerical optimization model. While linear separability is the basis of the most popular approach to classification, the Support Vector Machine (SVM), in the recent years using nonlinear separating surfaces has received some attention. The objective of this work is to recall some of such proposals, mainly in terms of the numerical optimization models. In particular we tackle the polyhedral, ellipsoidal, spherical and conical separation approaches and, for some of them, we also consider the semisupervised versions.

  16. Classification of Boar Sperm Head Images using Learning Vector Quantization

    NARCIS (Netherlands)

    Biehl, Michael; Pasma, Piter; Pijl, Marten; Sánchez, Lidia; Petkov, Nicolai; Verleysen, Michel

    2006-01-01

    We apply Learning Vector Quantization (LVQ) in automated boar semen quality assessment. The classification of single boar sperm heads into healthy (normal) and non-normal ones is based on grey-scale microscopic images only. Sample data was classified by veterinary experts and is used for training a

  17. Teaching/Learning Methods and Students' Classification of Food Items

    Science.gov (United States)

    Hamilton-Ekeke, Joy-Telu; Thomas, Malcolm

    2011-01-01

    Purpose: This study aims to investigate the effectiveness of a teaching method (TLS (Teaching/Learning Sequence)) based on a social constructivist paradigm on students' conceptualisation of classification of food. Design/methodology/approach: The study compared the TLS model developed by the researcher based on the social constructivist paradigm…

  18. A Cognitive Skill Classification Based On Multi Objective Optimization Using Learning Vector Quantization for Serious Games

    Directory of Open Access Journals (Sweden)

    Moh. Aries Syufagi

    2011-12-01

    Full Text Available Nowadays, serious games and game technology are poised to transform the way of educating and training students at all levels. However, pedagogical value in games do not help novice students learn, too many memorizing and reduce learning process due to no information of player’s ability. To asses the cognitive level of player ability, we propose a Cognitive Skill Game (CSG. CSG improves this cognitive concept to monitor how players interact with the game. This game employs Learning Vector Quantization (LVQ for optimizing the cognitive skill input classification of the player. CSG is using teacher’s data to obtain the neuron vector of cognitive skill pattern supervise. Three clusters multi objective target will be classified as; trial and error, carefully and, expert cognitive skill. In the game play experiments using 33 respondent players demonstrates that 61% of players have high trial and error cognitive skill, 21% have high carefully cognitive skill, and 18% have high expert cognitive skill. CSG may provide information to game engine when a player needs help or when wanting a formidable challenge. The game engine will provide the appropriate tasks according to players’ ability. CSG will help balance the emotions of players, so players do not get bored and frustrated. Players have a high interest to finish the game if the player is emotionally stable. Interests in the players strongly support the procedural learning in a serious game.

  19. Supervised Mineral Classification with Semi-automatic Training and Validation Set Generation in Scanning Electron Microscope Energy Dispersive Spectroscopy Images of Thin Sections

    DEFF Research Database (Denmark)

    Flesche, Harald; Nielsen, Allan Aasbjerg; Larsen, Rasmus

    2000-01-01

    This paper addresses the problem of classifying minerals common in siliciclastic and carbonate rocks. Twelve chemical elements are mapped from thin sections by energy dispersive spectroscopy in a scanning electron microscope (SEM). Extensions to traditional multivariate statistical methods...... are applied to perform the classification. First, training and validation sets are grown from one or a few seed points by a method that ensures spatial and spectral closeness of observations. Spectral closeness is obtained by excluding observations that have high Mahalanobis distances to the training class......–Matusita distance and the posterior probability of a class mean being classified as another class. Fourth, the actual classification is carried out based on four supervised classifiers all assuming multinormal distributions: simple quadratic, a contextual quadratic, and two hierarchical quadratic classifiers...

  20. A framework to facilitate self-directed learning, assessment and supervision in midwifery practice: a qualitative study of supervisors' perceptions.

    Science.gov (United States)

    Embo, M; Driessen, E; Valcke, M; van der Vleuten, C P M

    2014-08-01

    Self-directed learning is an educational concept that has received increasing attention. The recent workplace literature, however, reports problems with the facilitation of self-directed learning in clinical practice. We developed the Midwifery Assessment and Feedback Instrument (MAFI) as a framework to facilitate self-directed learning. In the present study, we sought clinical supervisors' perceptions of the usefulness of MAFI. Interviews with fifteen clinical supervisors were audio taped, transcribed verbatim and analysed thematically using Atlas-Ti software for qualitative data analysis. Four themes emerged from the analysis. (1) The competency-based educational structure promotes the setting of realistic learning outcomes and a focus on competency development, (2) instructing students to write reflections facilitates student-centred supervision, (3) creating a feedback culture is necessary to achieve continuity in supervision and (4) integrating feedback and assessment might facilitate competency development under the condition that evidence is discussed during assessment meetings. Supervisors stressed the need for direct observation, and instruction how to facilitate a self-directed learning process. The MAFI appears to be a useful framework to promote self-directed learning in clinical practice. The effect can be advanced by creating a feedback and assessment culture where learners and supervisors share the responsibility for developing self-directed learning. Copyright © 2014 Elsevier Ltd. All rights reserved.

  1. A Model for Detecting Tor Encrypted Traffic using Supervised Machine Learning

    Directory of Open Access Journals (Sweden)

    Alaeddin Almubayed

    2015-06-01

    Full Text Available Tor is the low-latency anonymity tool and one of the prevalent used open source anonymity tools for anonymizing TCP traffic on the Internet used by around 500,000 people every day. Tor protects user's privacy against surveillance and censorship by making it extremely difficult for an observer to correlate visited websites in the Internet with the real physical-world identity. Tor accomplished that by ensuring adequate protection of Tor traffic against traffic analysis and feature extraction techniques. Further, Tor ensures anti-website fingerprinting by implementing different defences like TLS encryption, padding, and packet relaying. However, in this paper, an analysis has been performed against Tor from a local observer in order to bypass Tor protections; the method consists of a feature extraction from a local network dataset. Analysis shows that it's still possible for a local observer to fingerprint top monitored sites on Alexa and Tor traffic can be classified amongst other HTTPS traffic in the network despite the use of Tor's protections. In the experiment, several supervised machine-learning algorithms have been employed. The attack assumes a local observer sitting on a local network fingerprinting top 100 sites on Alexa; results gave an improvement amongst previous results by achieving an accuracy of 99.64% and 0.01% false positive.

  2. An Adaptive Privacy Protection Method for Smart Home Environments Using Supervised Learning

    Directory of Open Access Journals (Sweden)

    Jingsha He

    2017-03-01

    Full Text Available In recent years, smart home technologies have started to be widely used, bringing a great deal of convenience to people’s daily lives. At the same time, privacy issues have become particularly prominent. Traditional encryption methods can no longer meet the needs of privacy protection in smart home applications, since attacks can be launched even without the need for access to the cipher. Rather, attacks can be successfully realized through analyzing the frequency of radio signals, as well as the timestamp series, so that the daily activities of the residents in the smart home can be learnt. Such types of attacks can achieve a very high success rate, making them a great threat to users’ privacy. In this paper, we propose an adaptive method based on sample data analysis and supervised learning (SDASL, to hide the patterns of daily routines of residents that would adapt to dynamically changing network loads. Compared to some existing solutions, our proposed method exhibits advantages such as low energy consumption, low latency, strong adaptability, and effective privacy protection.

  3. How to measure metallicity from five-band photometry with supervised machine learning algorithms

    CERN Document Server

    Acquaviva, Viviana

    2015-01-01

    We demonstrate that it is possible to measure metallicity from the SDSS five-band photometry to better than 0.1 dex using supervised machine learning algorithms. Using spectroscopic estimates of metallicity as ground truth, we build, optimize and train several estimators to predict metallicity. We use the observed photometry, as well as derived quantities such as stellar mass and photometric redshift, as features, and we build two sample data sets at median redshifts of 0.103 and 0.218 and median r-band magnitude of 17.5 and 18.3 respectively. We find that ensemble methods, such as Random Forests of Trees and Extremely Randomized Trees, and Support Vector Machines all perform comparably well and can measure metallicity with a Root Mean Square Error (RMSE) of 0.081 and 0.090 for the two data sets when all objects are included. The fraction of outliers (objects for which the difference between true and predicted metallicity is larger than 0.2 dex) is only 2.2 and 3.9% respectively, and the RMSE decreases to 0.0...

  4. Supervised Learning Detection of Sixty Non-transiting Hot Jupiter Candidates

    Science.gov (United States)

    Millholland, Sarah; Laughlin, Gregory

    2017-09-01

    The optical full-phase photometric variations of a short-period planet provide a unique view of the planet’s atmospheric composition and dynamics. The number of planets with optical phase curve detections, however, is currently too small to study them as an aggregate population, motivating an extension of the search to non-transiting planets. Here we present an algorithm for the detection of non-transiting short-period giant planets in the Kepler field. The procedure uses the phase curves themselves as evidence for the planets’ existence. We employ a supervised learning algorithm to recognize the salient time-dependent properties of synthetic phase curves; we then search for detections of signals that match these properties. After demonstrating the algorithm’s capabilities, we classify 142,630 FGK Kepler stars without confirmed planets or Kepler Objects of Interest, and for each one, we assign a probability of a phase curve of a non-transiting planet being present. We identify 60 high-probability non-transiting hot Jupiter candidates. We also derive constraints on the candidates’ albedos and offsets of the phase curve maxima. These targets are strong candidates for follow-up radial velocity confirmation and characterization. Once confirmed, the atmospheric information content in the phase curves may be studied in yet greater detail.

  5. Distributed multisensory integration in a recurrent network model through supervised learning

    Science.gov (United States)

    Wang, He; Wong, K. Y. Michael

    Sensory integration between different modalities has been extensively studied. It is suggested that the brain integrates signals from different modalities in a Bayesian optimal way. However, how the Bayesian rule is implemented in a neural network remains under debate. In this work we propose a biologically plausible recurrent network model, which can perform Bayesian multisensory integration after trained by supervised learning. Our model is composed of two modules, each for one modality. We assume that each module is a recurrent network, whose activity represents the posterior distribution of each stimulus. The feedforward input on each module is the likelihood of each modality. Two modules are integrated through cross-links, which are feedforward connections from the other modality, and reciprocal connections, which are recurrent connections between different modules. By stochastic gradient descent, we successfully trained the feedforward and recurrent coupling matrices simultaneously, both of which resembles the Mexican-hat. We also find that there are more than one set of coupling matrices that can approximate the Bayesian theorem well. Specifically, reciprocal connections and cross-links will compensate each other if one of them is removed. Even though trained with two inputs, the network's performance with only one input is in good accordance with what is predicted by the Bayesian theorem.

  6. Semi-supervised learning for detecting text-lines in noisy document images

    Science.gov (United States)

    Liu, Zongyi; Zhou, Hanning

    2010-01-01

    Document layout analysis is a key step in document image understanding with wide applications in document digitization and reformatting. Identifying correct layout from noisy scanned images is especially challenging. In this paper, we introduce a semi-supervised learning framework to detect text-lines from noisy document images. Our framework consists of three steps. The first step is the initial segmentation that extracts text-lines and images using simple morphological operations. The second step is a grouping-based layout analysis that identifies text-lines, image zones, column separator and vertical border noise. It is able to efficiently remove the vertical border noises from multi-column pages. The third step is an online classifier that is trained with the high confidence line detection results from Step Two, and filters out noise from low confidence lines. The classifier effectively removes speckle noises embedded inside the content zones. We compare the performance of our algorithm to the state-of-the-art work in the field on the UW-III database. We choose the results reported by the Image Understanding Pattern Recognition Research (IUPR) and Scansoft Omnipage SDK 15.5. We evaluate the performances at both the page frame level and the text-line level. The result shows that our system has much lower false-alarm rate, while maintains similar content detection rate. In addition, we also show that our online training model generalizes better than algorithms depending on offline training.

  7. Supporting and Supervising Teachers Working With Adults Learning English. CAELA Network Brief

    Science.gov (United States)

    Young, Sarah

    2009-01-01

    This brief provides an overview of the knowledge and skills that administrators need in order to support and supervise teachers of adult English language learners. It begins with a review of resources and literature related to teacher supervision in general and to adult ESL education. It continues with information on the background and…

  8. Understanding Trust as an Essential Element of Trainee Supervision and Learning in the Workplace

    Science.gov (United States)

    Hauer, Karen E.; ten Cate, Olle; Boscardin, Christy; Irby, David M.; Iobst, William; O'Sullivan, Patricia S.

    2014-01-01

    Clinical supervision requires that supervisors make decisions about how much independence to allow their trainees for patient care tasks. The simultaneous goals of ensuring quality patient care and affording trainees appropriate and progressively greater responsibility require that the supervising physician trusts the trainee. Trust allows the…

  9. Enhancing the Doctoral Journey: The Role of Group Supervision in Supporting Collaborative Learning and Creativity

    Science.gov (United States)

    Fenge, Lee-Ann

    2012-01-01

    This article explores the role of group supervision within doctoral education, offering an exploration of the experience of group supervision processes through a small-scale study evaluating both student and staff experience across three cohorts of one professional doctorate programme. There has been very little research to date exploring…

  10. Is Direct Supervision in Clinical Education for Athletic Training Students Always Necessary to Enhance Student Learning?

    Science.gov (United States)

    Scriber, Kent; Trowbridge, Cindy

    2009-01-01

    Objective: To present an alternative model of supervision within clinical education experiences. Background: Several years ago direct supervision was defined more clearly in the accreditation standards for athletic training education programs (ATEPs). Currently, athletic training students may not gain any clinical experience without their clinical…

  11. Clinical group supervision in yoga therapy: model effects, and lessons learned.

    Science.gov (United States)

    Forbes, Bo; Volpe Horii, Cassandra; Earls, Bethany; Mashek, Stephanie; Akhtar, Fiona

    2012-01-01

    Clinical supervision is an integral component of therapist training and professional development because of its capacity for fostering knowledge, self-awareness, and clinical acumen. Individual supervision is part of many yoga therapy training programs and is referenced in the IAYT Standards as "mentoring." Group supervision is not typically used in the training of yoga therapists. We propose that group supervision effectively supports the growth and development of yoga therapists-in-training. We present a model of group supervision for yoga therapist trainees developed by the New England School of Integrative Yoga Therapeutics™ (The NESIYT Model) that includes the background, structure, format, and development of our inaugural 18-month supervision group. Pre-and post-supervision surveys and analyzed case notes, which captured key didactic and process themes, are discussed. Clinical issues, such as boundaries, performance anxiety, sense of self efficacy, the therapeutic alliance, transference and counter transference, pacing of yoga therapy sessions, evaluation of client progress, and adjunct therapist interaction are reviewed. The timing and sequence of didactic and process themes and benefits for yoga therapist trainees' professional development, are discussed. The NESIYT group supervision model is offered as an effective blueprint for yoga therapy training programs.

  12. Knowledge Work Supervision: Transforming School Systems into High Performing Learning Organizations.

    Science.gov (United States)

    Duffy, Francis M.

    1997-01-01

    This article describes a new supervision model conceived to help a school system redesign its anatomy (structures), physiology (flow of information and webs of relationships), and psychology (beliefs and values). The new paradigm (Knowledge Work Supervision) was constructed by reviewing the practices of several interrelated areas: sociotechnical…

  13. Efficient HIK SVM learning for image classification.

    Science.gov (United States)

    Wu, Jianxin

    2012-10-01

    Histograms are used in almost every aspect of image processing and computer vision, from visual descriptors to image representations. Histogram intersection kernel (HIK) and support vector machine (SVM) classifiers are shown to be very effective in dealing with histograms. This paper presents contributions concerning HIK SVM for image classification. First, we propose intersection coordinate descent (ICD), a deterministic and scalable HIK SVM solver. ICD is much faster than, and has similar accuracies to, general purpose SVM solvers and other fast HIK SVM training methods. We also extend ICD to the efficient training of a broader family of kernels. Second, we show an important empirical observation that ICD is not sensitive to the C parameter in SVM, and we provide some theoretical analyses to explain this observation. ICD achieves high accuracies in many problems, using its default parameters. This is an attractive property for practitioners, because many image processing tasks are too large to choose SVM parameters using cross-validation.

  14. An active learning based classification strategy for the minority class problem: application to histopathology annotation

    Directory of Open Access Journals (Sweden)

    Doyle Scott

    2011-10-01

    Full Text Available Abstract Background Supervised classifiers for digital pathology can improve the ability of physicians to detect and diagnose diseases such as cancer. Generating training data for classifiers is problematic, since only domain experts (e.g. pathologists can correctly label ground truth data. Additionally, digital pathology datasets suffer from the "minority class problem", an issue where the number of exemplars from the non-target class outnumber target class exemplars which can bias the classifier and reduce accuracy. In this paper, we develop a training strategy combining active learning (AL with class-balancing. AL identifies unlabeled samples that are "informative" (i.e. likely to increase classifier performance for annotation, avoiding non-informative samples. This yields high accuracy with a smaller training set size compared with random learning (RL. Previous AL methods have not explicitly accounted for the minority class problem in biomedical images. Pre-specifying a target class ratio mitigates the problem of training bias. Finally, we develop a mathematical model to predict the number of annotations (cost required to achieve balanced training classes. In addition to predicting training cost, the model reveals the theoretical properties of AL in the context of the minority class problem. Results Using this class-balanced AL training strategy (CBAL, we build a classifier to distinguish cancer from non-cancer regions on digitized prostate histopathology. Our dataset consists of 12,000 image regions sampled from 100 biopsies (58 prostate cancer patients. We compare CBAL against: (1 unbalanced AL (UBAL, which uses AL but ignores class ratio; (2 class-balanced RL (CBRL, which uses RL with a specific class ratio; and (3 unbalanced RL (UBRL. The CBAL-trained classifier yields 2% greater accuracy and 3% higher area under the receiver operating characteristic curve (AUC than alternatively-trained classifiers. Our cost model accurately predicts

  15. An active learning based classification strategy for the minority class problem: application to histopathology annotation.

    Science.gov (United States)

    Doyle, Scott; Monaco, James; Feldman, Michael; Tomaszewski, John; Madabhushi, Anant

    2011-10-28

    Supervised classifiers for digital pathology can improve the ability of physicians to detect and diagnose diseases such as cancer. Generating training data for classifiers is problematic, since only domain experts (e.g. pathologists) can correctly label ground truth data. Additionally, digital pathology datasets suffer from the "minority class problem", an issue where the number of exemplars from the non-target class outnumber target class exemplars which can bias the classifier and reduce accuracy. In this paper, we develop a training strategy combining active learning (AL) with class-balancing. AL identifies unlabeled samples that are "informative" (i.e. likely to increase classifier performance) for annotation, avoiding non-informative samples. This yields high accuracy with a smaller training set size compared with random learning (RL). Previous AL methods have not explicitly accounted for the minority class problem in biomedical images. Pre-specifying a target class ratio mitigates the problem of training bias. Finally, we develop a mathematical model to predict the number of annotations (cost) required to achieve balanced training classes. In addition to predicting training cost, the model reveals the theoretical properties of AL in the context of the minority class problem. Using this class-balanced AL training strategy (CBAL), we build a classifier to distinguish cancer from non-cancer regions on digitized prostate histopathology. Our dataset consists of 12,000 image regions sampled from 100 biopsies (58 prostate cancer patients). We compare CBAL against: (1) unbalanced AL (UBAL), which uses AL but ignores class ratio; (2) class-balanced RL (CBRL), which uses RL with a specific class ratio; and (3) unbalanced RL (UBRL). The CBAL-trained classifier yields 2% greater accuracy and 3% higher area under the receiver operating characteristic curve (AUC) than alternatively-trained classifiers. Our cost model accurately predicts the number of annotations necessary

  16. Attend in groups: a weakly-supervised deep learning framework for learning from web data

    OpenAIRE

    Zhuang, Bohan; Liu, Lingqiao; Li, Yao; Shen, Chunhua; Reid, Ian

    2016-01-01

    Large-scale datasets have driven the rapid development of deep neural networks for visual recognition. However, annotating a massive dataset is expensive and time-consuming. Web images and their labels are, in comparison, much easier to obtain, but direct training on such automatically harvested images can lead to unsatisfactory performance, because the noisy labels of Web images adversely affect the learned recognition models. To address this drawback we propose an end-to-end weakly-supervis...

  17. Unsupervised single-particle deep classification via statistical manifold learning

    CERN Document Server

    Wu, Jiayi; Condgon, Charles; Brett, Bevin; Chen, Shuobing; Ouyang, Qi; Mao, Youdong

    2016-01-01

    Structural heterogeneity in single-particle images presents a major challenge for high-resolution cryo-electron microscopy (cryo-EM) structure determination. Here we introduce a statistical manifold learning approach for unsupervised single-particle deep classification. When optimized for Intel high-performance computing (HPC) processors, our approach can generate thousands of reference-free class averages within several hours from hundreds of thousands of single-particle cryo-EM images. Deep classification thus assists in computational purification of single-particle datasets for high-resolution reconstruction.

  18. Information-Theoretic Dictionary Learning for Image Classification.

    Science.gov (United States)

    Qiu, Qiang; Patel, Vishal M; Chellappa, Rama

    2014-11-01

    We present a two-stage approach for learning dictionaries for object classification tasks based on the principle of information maximization. The proposed method seeks a dictionary that is compact, discriminative, and generative. In the first stage, dictionary atoms are selected from an initial dictionary by maximizing the mutual information measure on dictionary compactness, discrimination and reconstruction. In the second stage, the selected dictionary atoms are updated for improved reconstructive and discriminative power using a simple gradient ascent algorithm on mutual information. Experiments using real data sets demonstrate the effectiveness of our approach for image classification tasks.

  19. Supervised Classification in the Presence of Misclassified Training Data: A Monte Carlo Simulation Study in the Three Group Case

    Directory of Open Access Journals (Sweden)

    Jocelyn E Bolin

    2014-02-01

    Full Text Available Statistical classification of phenomena into observed groups is very common in the social and behavioral sciences. Statistical classification methods, however, are affected by the characteristics of the data under study. Statistical classification can be further complicated by initial misclassification of the observed groups. The purpose of this study is to investigate the impact of initial training data misclassification on several statistical classification and data mining techniques. Misclassification conditions in the three-group case will be simulated and results will be presented in terms of overall as well as subgroup classification accuracy. Results show decreased classification accuracy as sample size, group separation and group size ratio decrease and as misclassification percentage increases with random forests demonstrating the highest accuracy across conditions.

  20. Morphological Analysis as Classification an Inductive-Learning Approach

    CERN Document Server

    Van den Bosch, A; Weijters, T; Bosch, Antal van den; Daelemans, Walter; Weijters, Ton

    1996-01-01

    Morphological analysis is an important subtask in text-to-speech conversion, hyphenation, and other language engineering tasks. The traditional approach to performing morphological analysis is to combine a morpheme lexicon, sets of (linguistic) rules, and heuristics to find a most probable analysis. In contrast we present an inductive learning approach in which morphological analysis is reformulated as a segmentation task. We report on a number of experiments in which five inductive learning algorithms are applied to three variations of the task of morphological analysis. Results show (i) that the generalisation performance of the algorithms is good, and (ii) that the lazy learning algorithm IB1-IG performs best on all three tasks. We conclude that lazy learning of morphological analysis as a classification task is indeed a viable approach; moreover, it has the strong advantages over the traditional approach of avoiding the knowledge-acquisition bottleneck, being fast and deterministic in learning and process...

  1. Case-based statistical learning applied to SPECT image classification

    Science.gov (United States)

    Górriz, Juan M.; Ramírez, Javier; Illán, I. A.; Martínez-Murcia, Francisco J.; Segovia, Fermín.; Salas-Gonzalez, Diego; Ortiz, A.

    2017-03-01

    Statistical learning and decision theory play a key role in many areas of science and engineering. Some examples include time series regression and prediction, optical character recognition, signal detection in communications or biomedical applications for diagnosis and prognosis. This paper deals with the topic of learning from biomedical image data in the classification problem. In a typical scenario we have a training set that is employed to fit a prediction model or learner and a testing set on which the learner is applied to in order to predict the outcome for new unseen patterns. Both processes are usually completely separated to avoid over-fitting and due to the fact that, in practice, the unseen new objects (testing set) have unknown outcomes. However, the outcome yields one of a discrete set of values, i.e. the binary diagnosis problem. Thus, assumptions on these outcome values could be established to obtain the most likely prediction model at the training stage, that could improve the overall classification accuracy on the testing set, or keep its performance at least at the level of the selected statistical classifier. In this sense, a novel case-based learning (c-learning) procedure is proposed which combines hypothesis testing from a discrete set of expected outcomes and a cross-validated classification stage.

  2. Histopathological Image Classification Using Discriminative Feature-Oriented Dictionary Learning.

    Science.gov (United States)

    Vu, Tiep Huu; Mousavi, Hojjat Seyed; Monga, Vishal; Rao, Ganesh; Rao, U K Arvind

    2016-03-01

    In histopathological image analysis, feature extraction for classification is a challenging task due to the diversity of histology features suitable for each problem as well as presence of rich geometrical structures. In this paper, we propose an automatic feature discovery framework via learning class-specific dictionaries and present a low-complexity method for classification and disease grading in histopathology. Essentially, our Discriminative Feature-oriented Dictionary Learning (DFDL) method learns class-specific dictionaries such that under a sparsity constraint, the learned dictionaries allow representing a new image sample parsimoniously via the dictionary corresponding to the class identity of the sample. At the same time, the dictionary is designed to be poorly capable of representing samples from other classes. Experiments on three challenging real-world image databases: 1) histopathological images of intraductal breast lesions, 2) mammalian kidney, lung and spleen images provided by the Animal Diagnostics Lab (ADL) at Pennsylvania State University, and 3) brain tumor images from The Cancer Genome Atlas (TCGA) database, reveal the merits of our proposal over state-of-the-art alternatives. Moreover, we demonstrate that DFDL exhibits a more graceful decay in classification accuracy against the number of training images which is highly desirable in practice where generous training is often not available.

  3. Improved semi-supervised online boosting for object tracking

    Science.gov (United States)

    Li, Yicui; Qi, Lin; Tan, Shukun

    2016-10-01

    The advantage of an online semi-supervised boosting method which takes object tracking problem as a classification problem, is training a binary classifier from labeled and unlabeled examples. Appropriate object features are selected based on real time changes in the object. However, the online semi-supervised boosting method faces one key problem: The traditional self-training using the classification results to update the classifier itself, often leads to drifting or tracking failure, due to the accumulated error during each update of the tracker. To overcome the disadvantages of semi-supervised online boosting based on object tracking methods, the contribution of this paper is an improved online semi-supervised boosting method, in which the learning process is guided by positive (P) and negative (N) constraints, termed P-N constraints, which restrict the labeling of the unlabeled samples. First, we train the classification by an online semi-supervised boosting. Then, this classification is used to process the next frame. Finally, the classification is analyzed by the P-N constraints, which are used to verify if the labels of unlabeled data assigned by the classifier are in line with the assumptions made about positive and negative samples. The proposed algorithm can effectively improve the discriminative ability of the classifier and significantly alleviate the drifting problem in tracking applications. In the experiments, we demonstrate real-time tracking of our tracker on several challenging test sequences where our tracker outperforms other related on-line tracking methods and achieves promising tracking performance.

  4. A Cognitive Skill Classification Based on Multi Objective Optimization Using Learning Vector Quantization for Serious Games

    Directory of Open Access Journals (Sweden)

    Moh. Aries Syufagi

    2013-09-01

    Full Text Available Nowadays, serious games and game technology are poised to transform the way of educating and training students at all levels. However, pedagogical value in games do not help novice students learn, too many memorizing and reduce learning process due to no information of player’s ability. To asses the cognitive level of player ability, we propose a Cognitive Skill Game (CSG. CSG improves this cognitive concept to monitor how players interact with the game. This game employs Learning Vector Quantization (LVQ for optimizing the cognitive skill input classification of the player. CSG is using teacher’s data to obtain the neuron vector of cognitive skill pattern supervise. Three clusters multi objective XE "multi objective"  target will be classified as; trial and error, carefully and, expert cognitive skill. In the game play experiments employ 33 respondent players demonstrates that 61% of players have high trial and error, 21% have high carefully, and 18% have high expert cognitive skill. CSG may provide information to game engine when a player needs help or when wanting a formidable challenge. The game engine will provide the appropriate tasks according to players’ ability. CSG will help balance the emotions of players, so players do not get bored and frustrated. 

  5. Kollegial supervision

    DEFF Research Database (Denmark)

    Andersen, Ole Dibbern; Petersson, Erling

    Publikationen belyser, hvordan kollegial supervision i en kan organiseres i en uddannelsesinstitution......Publikationen belyser, hvordan kollegial supervision i en kan organiseres i en uddannelsesinstitution...

  6. Semi-supervised Machine Learning for Analysis of Hydrogeochemical Data and Models

    Science.gov (United States)

    Vesselinov, Velimir; O'Malley, Daniel; Alexandrov, Boian; Moore, Bryan

    2017-04-01

    Data- and model-based analyses such as uncertainty quantification, sensitivity analysis, and decision support using complex physics models with numerous model parameters and typically require a huge number of model evaluations (on order of 10^6). Furthermore, model simulations of complex physics may require substantial computational time. For example, accounting for simultaneously occurring physical processes such as fluid flow and biogeochemical reactions in heterogeneous porous medium may require several hours of wall-clock computational time. To address these issues, we have developed a novel methodology for semi-supervised machine learning based on Non-negative Matrix Factorization (NMF) coupled with customized k-means clustering. The algorithm allows for automated, robust Blind Source Separation (BSS) of groundwater types (contamination sources) based on model-free analyses of observed hydrogeochemical data. We have also developed reduced order modeling tools, which coupling support vector regression (SVR), genetic algorithms (GA) and artificial and convolutional neural network (ANN/CNN). SVR is applied to predict the model behavior within prior uncertainty ranges associated with the model parameters. ANN and CNN procedures are applied to upscale heterogeneity of the porous medium. In the upscaling process, fine-scale high-resolution models of heterogeneity are applied to inform coarse-resolution models which have improved computational efficiency while capturing the impact of fine-scale effects at the course scale of interest. These techniques are tested independently on a series of synthetic problems. We also present a decision analysis related to contaminant remediation where the developed reduced order models are applied to reproduce groundwater flow and contaminant transport in a synthetic heterogeneous aquifer. The tools are coded in Julia and are a part of the MADS high-performance computational framework (https://github.com/madsjulia/Mads.jl).

  7. Manifold regularized multitask feature learning for multimodality disease classification.

    Science.gov (United States)

    Jie, Biao; Zhang, Daoqiang; Cheng, Bo; Shen, Dinggang

    2015-02-01

    Multimodality based methods have shown great advantages in classification of Alzheimer's disease (AD) and its prodromal stage, that is, mild cognitive impairment (MCI). Recently, multitask feature selection methods are typically used for joint selection of common features across multiple modalities. However, one disadvantage of existing multimodality based methods is that they ignore the useful data distribution information in each modality, which is essential for subsequent classification. Accordingly, in this paper we propose a manifold regularized multitask feature learning method to preserve both the intrinsic relatedness among multiple modalities of data and the data distribution information in each modality. Specifically, we denote the feature learning on each modality as a single task, and use group-sparsity regularizer to capture the intrinsic relatedness among multiple tasks (i.e., modalities) and jointly select the common features from multiple tasks. Furthermore, we introduce a new manifold-based Laplacian regularizer to preserve the data distribution information from each task. Finally, we use the multikernel support vector machine method to fuse multimodality data for eventual classification. Conversely, we also extend our method to the semisupervised setting, where only partial data are labeled. We evaluate our method using the baseline magnetic resonance imaging (MRI), fluorodeoxyglucose positron emission tomography (FDG-PET), and cerebrospinal fluid (CSF) data of subjects from AD neuroimaging initiative database. The experimental results demonstrate that our proposed method can not only achieve improved classification performance, but also help to discover the disease-related brain regions useful for disease diagnosis.

  8. Studies of Machine Learning Photometric Classification of Supernovae

    Science.gov (United States)

    Macaluso, Joseph Nicholas; Cunningham, John; Kuhlmann, Stephen; Gupta, Ravi; Kovacs, Eve

    2017-01-01

    We studied the use of machine learning for the photometuric classification of Type Ia (SNIa) and core collapse (SNcc) supernovae. We used a combination of simulated data for the Dark Energy survey (DES) and real data from SDSS and chose our metrics to be the sample purity and the efficiency of identifying SNIa supernovae. Our focus was to quantify the effects of varying the training and parameters for random-forest decision-tree algorithms.

  9. HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    Atul Laxman Katole

    2015-08-01

    Full Text Available Evolution of visual object recognition architectures based on Convolutional Neural Networks & Convolutional Deep Belief Networks paradigms has revolutionized artificial Vision Science. These architectures extract & learn the real world hierarchical visual features utilizing supervised & unsupervised learning approaches respectively. Both the approaches yet cannot scale up realistically to provide recognition for a very large number of objects as high as 10K. We propose a two level hierarchical deep learning architecture inspired by divide & conquer principle that decomposes the large scale recognition architecture into root & leaf level model architectures. Each of the root & leaf level models is trained exclusively to provide superior results than possible by any 1-level deep learning architecture prevalent today. The proposed architecture classifies objects in two steps. In the first step the root level model classifies the object in a high level category. In the second step, the leaf level recognition model for the recognized high level category is selected among all the leaf models. This leaf level model is presented with the same input object image which classifies it in a specific category. Also we propose a blend of leaf level models trained with either supervised or unsupervised learning approaches. Unsupervised learning is suitable whenever labelled data is scarce for the specific leaf level models. Currently the training of leaf level models is in progress; where we have trained 25 out of the total 47 leaf level models as of now. We have trained the leaf models with the best case top-5 error rate of 3.2% on the validation data set for the particular leaf models. Also we demonstrate that the validation error of the leaf level models saturates towards the above mentioned accuracy as the number of epochs are increased to more than sixty. The top-5 error rate for the entire two-level architecture needs to be computed in conjunction with

  10. Learning a Markov Logic network for supervised gene regulatory network inference.

    Science.gov (United States)

    Brouard, Céline; Vrain, Christel; Dubois, Julie; Castel, David; Debily, Marie-Anne; d'Alché-Buc, Florence

    2013-09-12

    Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules. We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate "regulates", starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a

  11. Investigating the control of climatic oscillations over global terrestrial evaporation using a simple supervised learning method

    Science.gov (United States)

    Martens, Brecht; Miralles, Diego; Waegeman, Willem; Dorigo, Wouter; Verhoest, Niko

    2017-04-01

    Intra-annual and multi-decadal variations in the Earth's climate are to a large extent driven by periodic oscillations in the coupled state of atmosphere and ocean. These oscillations alter not only the climate in nearby regions, but also have an important impact on the local climate in remote areas, a phenomenon that is often referred to as 'teleconnection'. Because changes in local climate immediately impact terrestrial ecosystems through a series of complex processes and feedbacks, ocean-atmospheric teleconnections are expected to influence land evaporation - i.e. the return flux of water from land to atmosphere. In this presentation, the effects of these intra-annual and multi-decadal climate oscillations on global terrestrial evaporation are analysed. To this end, we use satellite observations of different essential climate variables in combination with a simple supervised learning method, the lasso regression. A total of sixteen Climate Oscillation Indices (COIs) - which are routinely used to diagnose the major ocean-atmospheric oscillations - are selected. Multi-decadal data of terrestrial evaporation are retrieved from the Global Land Evaporation Amsterdam Model (GLEAM, www.gleam.eu). Using the lasso regression, it is shown that more than 30% of the inter-annual variations in terrestrial evaporation can be explained by ocean-atmospheric oscillations. In addition, the impact in different regions across the globe can typically be attributed to a small subset of the sixteen COIs. For instance, the dynamics in terrestrial evaporation over Australia are substantially impacted by both the El Niño Southern Oscillation (here diagnosed using the Southern Oscillation Index, SOI) and the Indian Ocean Dipole Oscillation (here diagnosed using the Indian Dipole Mode Index, DMI). Subsequently, using the same learning method but regressing terrestrial evaporation to its local climatic drivers (air temperature, precipitation, radiation), allows us to discern through which

  12. PCANet: A Simple Deep Learning Baseline for Image Classification?

    Science.gov (United States)

    Chan, Tsung-Han; Jia, Kui; Gao, Shenghua; Lu, Jiwen; Zeng, Zinan; Ma, Yi

    2015-12-01

    In this paper, we propose a very simple deep learning network for image classification that is based on very basic data processing components: 1) cascaded principal component analysis (PCA); 2) binary hashing; and 3) blockwise histograms. In the proposed architecture, the PCA is employed to learn multistage filter banks. This is followed by simple binary hashing and block histograms for indexing and pooling. This architecture is thus called the PCA network (PCANet) and can be extremely easily and efficiently designed and learned. For comparison and to provide a better understanding, we also introduce and study two simple variations of PCANet: 1) RandNet and 2) LDANet. They share the same topology as PCANet, but their cascaded filters are either randomly selected or learned from linear discriminant analysis. We have extensively tested these basic networks on many benchmark visual data sets for different tasks, including Labeled Faces in the Wild (LFW) for face verification; the MultiPIE, Extended Yale B, AR, Facial Recognition Technology (FERET) data sets for face recognition; and MNIST for hand-written digit recognition. Surprisingly, for all tasks, such a seemingly naive PCANet model is on par with the state-of-the-art features either prefixed, highly hand-crafted, or carefully learned [by deep neural networks (DNNs)]. Even more surprisingly, the model sets new records for many classification tasks on the Extended Yale B, AR, and FERET data sets and on MNIST variations. Additional experiments on other public data sets also demonstrate the potential of PCANet to serve as a simple but highly competitive baseline for texture classification and object recognition.

  13. Supervised Discrete Hashing With Relaxation.

    Science.gov (United States)

    Gui, Jie; Liu, Tongliang; Sun, Zhenan; Tao, Dacheng; Tan, Tieniu

    2016-12-29

    Data-dependent hashing has recently attracted attention due to being able to support efficient retrieval and storage of high-dimensional data, such as documents, images, and videos. In this paper, we propose a novel learning-based hashing method called ''supervised discrete hashing with relaxation'' (SDHR) based on ''supervised discrete hashing'' (SDH). SDH uses ordinary least squares regression and traditional zero-one matrix encoding of class label information as the regression target (code words), thus fixing the regression target. In SDHR, the regression target is instead optimized. The optimized regression target matrix satisfies a large margin constraint for correct classification of each example. Compared with SDH, which uses the traditional zero-one matrix, SDHR utilizes the learned regression target matrix and, therefore, more accurately measures the classification error of the regression model and is more flexible. As expected, SDHR generally outperforms SDH. Experimental results on two large-scale image data sets (CIFAR-10 and MNIST) and a large-scale and challenging face data set (FRGC) demonstrate the effectiveness and efficiency of SDHR.

  14. Fast Low-Rank Shared Dictionary Learning for Image Classification.

    Science.gov (United States)

    Vu, Tiep Huu; Monga, Vishal

    2017-11-01

    Despite the fact that different objects possess distinct class-specific features, they also usually share common patterns. This observation has been exploited partially in a recently proposed dictionary learning framework by separating the particularity and the commonality (COPAR). Inspired by this, we propose a novel method to explicitly and simultaneously learn a set of common patterns as well as class-specific features for classification with more intuitive constraints. Our dictionary learning framework is hence characterized by both a shared dictionary and particular (class-specific) dictionaries. For the shared dictionary, we enforce a low-rank constraint, i.e., claim that its spanning subspace should have low dimension and the coefficients corresponding to this dictionary should be similar. For the particular dictionaries, we impose on them the well-known constraints stated in the Fisher discrimination dictionary learning (FDDL). Furthermore, we develop new fast and accurate algorithms to solve the subproblems in the learning step, accelerating its convergence. The said algorithms could also be applied to FDDL and its extensions. The efficiencies of these algorithms are theoretically and experimentally verified by comparing their complexities and running time with those of other well-known dictionary learning methods. Experimental results on widely used image data sets establish the advantages of our method over the state-of-the-art dictionary learning methods.

  15. 基于ENVI的遥感图像监督分类方法比较研究%The Comparative Study of Remote Sensing Image Supervised Classification Methods Based on ENVI

    Institute of Scientific and Technical Information of China (English)

    闫琰; 董秀兰; 李燕

    2011-01-01

    基于监督分类方法在遥感影像分类中的普遍应用,介绍了四种ENVI提供的常用的监督分类方法。对同一TM图像运用这四种方法进行分类,并对分类结果进行了对比,从而分析了这四种方法分类精度之间的差异。%This paper describes four commonly used methods of supervised classification ENVI provides,based on the universal application of supervised classification in remote sensing image classification.The same TM image is classified using four methods,the result was analyzed essentially.Therefore,the paper analyzes the difference between the classification accuracy of these four methods.

  16. Multimodal Task-Driven Dictionary Learning for Image Classification.

    Science.gov (United States)

    Bahrampour, Soheil; Nasrabadi, Nasser M; Ray, Asok; Jenkins, William Kenneth

    2016-01-01

    Dictionary learning algorithms have been successfully used for both reconstructive and discriminative tasks, where an input signal is represented with a sparse linear combination of dictionary atoms. While these methods are mostly developed for single-modality scenarios, recent studies have demonstrated the advantages of feature-level fusion based on the joint sparse representation of the multimodal inputs. In this paper, we propose a multimodal task-driven dictionary learning algorithm under the joint sparsity constraint (prior) to enforce collaborations among multiple homogeneous/heterogeneous sources of information. In this task-driven formulation, the multimodal dictionaries are learned simultaneously with their corresponding classifiers. The resulting multimodal dictionaries can generate discriminative latent features (sparse codes) from the data that are optimized for a given task such as binary or multiclass classification. Moreover, we present an extension of the proposed formulation using a mixed joint and independent sparsity prior, which facilitates more flexible fusion of the modalities at feature level. The efficacy of the proposed algorithms for multimodal classification is illustrated on four different applications--multimodal face recognition, multi-view face recognition, multi-view action recognition, and multimodal biometric recognition. It is also shown that, compared with the counterpart reconstructive-based dictionary learning algorithms, the task-driven formulations are more computationally efficient in the sense that they can be equipped with more compact dictionaries and still achieve superior performance.

  17. LEARNING STYLES CLASSIFICATION: LEARNER CONTROL IMPLICATIONS IN INSTRUCTION AND EDUCATION

    Directory of Open Access Journals (Sweden)

    ZEYNAB ABBASI KHALIFELU

    2011-12-01

    Full Text Available The world of information is becoming increasingly complicated and it gains more dynamic face these days. In this study, not only learning styles in web-based instruction are taken into account, but also the studies that give information about the effects of learner control are investigated. The styles are Kolb, Cognitive, McCarthy learning styles and etc according to a set of criteria of work relevancy their internal architecture techniques are discussed. Our work addresses the issue of software engineering usage through these methods using in elearning efficient systems. A classification considering the main goal of the methods has been made. For each category, a discussion of the suitability of learning techniques is proposed in cognition of requirements in inception phase of software engineering. The results of researches conducted indicate that in most cases, these methods very efficacious in web-based instruction and education.

  18. Super pixel-level dictionary learning for hyperspectral image classification

    Science.gov (United States)

    Zhao, Wei; Zhu, Wen; Liao, Bo; Fu, Xiangzheng

    2017-08-01

    This paper presents a superpixel-level dictionary learning model for hyperspectral data. The idea is to divide the hyperspectral image into a number of super-pixels by means of the super-pixel segmentation method. Each super-pixel is a spatial neighborhood called contextual group. That is, each pixel is represented using a linear combination of a few dictionary items learned from the train data, but since pixels inside a super-pixel are often consisting of the same materials, their linear combinations are constrained to use common items from the dictionary. To this end, the sparse coefficients of the context group have a common sparse pattern by using the joint sparse regularizer for dictionary learning. The sparse coefficients are then used for classification using linear support vector machines. The validity of the proposed method is experimentally verified on a real hyperspectral images.

  19. Nonparametric, Coupled ,Bayesian ,Dictionary ,and Classifier Learning for Hyperspectral Classification.

    Science.gov (United States)

    Akhtar, Naveed; Mian, Ajmal

    2017-10-03

    We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.

  20. Multiple instance learning for classification of dementia in brain MRI.

    Science.gov (United States)

    Tong, Tong; Wolz, Robin; Gao, Qinquan; Hajnal, Joseph V; Rueckert, Daniel

    2013-01-01

    Machine learning techniques have been widely used to support the diagnosis of neurological diseases such as dementia. Recent approaches utilize local intensity patterns within patches to derive voxelwise grading measures of disease. However, the relationships among these patches are usually ignored. In addition, there is some ambiguity in assigning disease labels to the extracted patches. Not all of the patches extracted from patients with dementia are characteristic of morphology associated with disease. In this paper, we propose to use a multiple instance learning method to address the problem of assigning training labels to the patches. In addition, a graph is built for each image to exploit the relationships among these patches, which aids the classification work. We illustrate the proposed approach in an application for the detection of Alzheimer's disease (AD): Using the baseline MR images of 834 subjects from the ADNI study, the proposed method can achieve a classification accuracy of 88.8% between AD patients and healthy controls, and 69.6% between patients with stable Mild Cognitive Impairment (MCI) and progressive MCI. These results compare favourably with state-of-the-art classification methods.

  1. Decomposition-based transfer distance metric learning for image classification.

    Science.gov (United States)

    Luo, Yong; Liu, Tongliang; Tao, Dacheng; Xu, Chao

    2014-09-01

    Distance metric learning (DML) is a critical factor for image analysis and pattern recognition. To learn a robust distance metric for a target task, we need abundant side information (i.e., the similarity/dissimilarity pairwise constraints over the labeled data), which is usually unavailable in practice due to the high labeling cost. This paper considers the transfer learning setting by exploiting the large quantity of side information from certain related, but different source tasks to help with target metric learning (with only a little side information). The state-of-the-art metric learning algorithms usually fail in this setting because the data distributions of the source task and target task are often quite different. We address this problem by assuming that the target distance metric lies in the space spanned by the eigenvectors of the source metrics (or other randomly generated bases). The target metric is represented as a combination of the base metrics, which are computed using the decomposed components of the source metrics (or simply a set of random bases); we call the proposed method, decomposition-based transfer DML (DTDML). In particular, DTDML learns a sparse combination of the base metrics to construct the target metric by forcing the target metric to be close to an integration of the source metrics. The main advantage of the proposed method compared with existing transfer metric learning approaches is that we directly learn the base metric coefficients instead of the target metric. To this end, far fewer variables need to be learned. We therefore obtain more reliable solutions given the limited side information and the optimization tends to be faster. Experiments on the popular handwritten image (digit, letter) classification and challenge natural image annotation tasks demonstrate the effectiveness of the proposed method.

  2. Representation learning with deep extreme learning machines for efficient image set classification

    KAUST Repository

    Uzair, Muhammad

    2016-12-09

    Efficient and accurate representation of a collection of images, that belong to the same class, is a major research challenge for practical image set classification. Existing methods either make prior assumptions about the data structure, or perform heavy computations to learn structure from the data itself. In this paper, we propose an efficient image set representation that does not make any prior assumptions about the structure of the underlying data. We learn the nonlinear structure of image sets with deep extreme learning machines that are very efficient and generalize well even on a limited number of training samples. Extensive experiments on a broad range of public datasets for image set classification show that the proposed algorithm consistently outperforms state-of-the-art image set classification methods both in terms of speed and accuracy.

  3. Intra-individual gait patterns across different time-scales as revealed by means of a supervised learning model using kernel-based discriminant regression.

    Science.gov (United States)

    Horst, Fabian; Eekhoff, Alexander; Newell, Karl M; Schöllhorn, Wolfgang I

    2017-01-01

    Traditionally, gait analysis has been centered on the idea of average behavior and normality. On one hand, clinical diagnoses and therapeutic interventions typically assume that average gait patterns remain constant over time. On the other hand, it is well known that all our movements are accompanied by a certain amount of variability, which does not allow us to make two identical steps. The purpose of this study was to examine changes in the intra-individual gait patterns across different time-scales (i.e., tens-of-mins, tens-of-hours). Nine healthy subjects performed 15 gait trials at a self-selected speed on 6 sessions within one day (duration between two subsequent sessions from 10 to 90 mins). For each trial, time-continuous ground reaction forces and lower body joint angles were measured. A supervised learning model using a kernel-based discriminant regression was applied for classifying sessions within individual gait patterns. Discernable characteristics of intra-individual gait patterns could be distinguished between repeated sessions by classification rates of 67.8 ± 8.8% and 86.3 ± 7.9% for the six-session-classification of ground reaction forces and lower body joint angles, respectively. Furthermore, the one-on-one-classification showed that increasing classification rates go along with increasing time durations between two sessions and indicate that changes of gait patterns appear at different time-scales. Discernable characteristics between repeated sessions indicate continuous intrinsic changes in intra-individual gait patterns and suggest a predominant role of deterministic processes in human motor control and learning. Natural changes of gait patterns without any externally induced injury or intervention may reflect continuous adaptations of the motor system over several time-scales. Accordingly, the modelling of walking by means of average gait patterns that are assumed to be near constant over time needs to be reconsidered in the context of

  4. Conducting Supervised Experiential Learning/Field Experiences for Students' Development and Career Reinforcement.

    Science.gov (United States)

    Leventhal, Jerome I.

    A major problem in the educational system of the United States is that a great number of students and graduates lack a career objective, and, therefore, many workers are unhappy. Offering a variety of supervised field experiences, paid or unpaid, in which students see workers in their occupations will help students identify career choices.…

  5. Don't Leave Teaching to Chance: Learning Objectives for Psychodynamic Psychotherapy Supervision

    Science.gov (United States)

    Rojas, Alicia; Arbuckle, Melissa; Cabaniss, Deborah

    2010-01-01

    Objective: The way in which the competencies for psychodynamic psychotherapy specified by the Psychiatry Residency Review Committee of the Accreditation Council for Graduate Medical Education translate into the day-to-day work of individual supervision remains unstudied and unspecified. The authors hypothesized that despite the existence of…

  6. Fieldwork online: a GIS-based electronic learning environment for supervising fieldwork

    NARCIS (Netherlands)

    Alberti, K.; Marra, W.A.; Baarsma, R.J.; Karssenberg, D.J.

    2016-01-01

    Fieldwork comes in many forms: individual research projects in unique places, large groups of students on organized fieldtrips, and everything in between those extremes. Supervising students in often distant places can be a logistical challenge and requires a significant time investment of their

  7. Enabling Connections in Postgraduate Supervision for an Applied eLearning Professional Development Programme

    Science.gov (United States)

    Donnelly, Roisin

    2013-01-01

    This article describes the practice of postgraduate supervision on a blended professional development programme for academics, and discusses how connectivism has been a useful lens to explore a complex form of instruction. By examining the processes by which supervisors and their students on a two-year part-time masters in Applied eLearning…

  8. An Early Historical Examination of the Educational Intent of Supervised Agricultural Experiences (SAEs) and Project-Based Learning in Agricultural Education

    Science.gov (United States)

    Smith, Kasee L.; Rayfield, John

    2016-01-01

    Project-based learning has been a component of agricultural education since its inception. In light of the current call for additional emphasis of the Supervised Agricultural Experience (SAE) component of agricultural education, there is a need to revisit the roots of project-based learning. This early historical research study was conducted to…

  9. Sow-activity classification from acceleration patterns

    DEFF Research Database (Denmark)

    Escalante, Hugo Jair; Rodriguez, Sara V.; Cordero, Jorge

    2013-01-01

    This paper describes a supervised learning approach to sow-activity classification from accelerometer measurements. In the proposed methodology, pairs of accelerometer measurements and activity types are considered as labeled instances of a usual supervised classification task. Under this scenario...... sow-activity classification can be approached with standard machine learning methods for pattern classification. Individual predictions for elements of times series of arbitrary length are combined to classify it as a whole. An extensive comparison of representative learning algorithms, including...... neural networks, support vector machines, and ensemble methods, is presented. Experimental results are reported using a data set for sow-activity classification collected in a real production herd. The data set, which has been widely used in related works, includes measurements from active (Feeding...

  10. Ensemble Deep Learning for Biomedical Time Series Classification

    Directory of Open Access Journals (Sweden)

    Lin-peng Jin

    2016-01-01

    Full Text Available Ensemble learning has been proved to improve the generalization ability effectively in both theory and practice. In this paper, we briefly outline the current status of research on it first. Then, a new deep neural network-based ensemble method that integrates filtering views, local views, distorted views, explicit training, implicit training, subview prediction, and Simple Average is proposed for biomedical time series classification. Finally, we validate its effectiveness on the Chinese Cardiovascular Disease Database containing a large number of electrocardiogram recordings. The experimental results show that the proposed method has certain advantages compared to some well-known ensemble methods, such as Bagging and AdaBoost.

  11. Ensemble Deep Learning for Biomedical Time Series Classification.

    Science.gov (United States)

    Jin, Lin-Peng; Dong, Jun

    2016-01-01

    Ensemble learning has been proved to improve the generalization ability effectively in both theory and practice. In this paper, we briefly outline the current status of research on it first. Then, a new deep neural network-based ensemble method that integrates filtering views, local views, distorted views, explicit training, implicit training, subview prediction, and Simple Average is proposed for biomedical time series classification. Finally, we validate its effectiveness on the Chinese Cardiovascular Disease Database containing a large number of electrocardiogram recordings. The experimental results show that the proposed method has certain advantages compared to some well-known ensemble methods, such as Bagging and AdaBoost.

  12. Optimization of deep learning algorithms for object classification

    Science.gov (United States)

    Horváth, András.

    2017-02-01

    Deep learning is currently the state of the art algorithm for image classification. The complexity of these feedforward neural networks have overcome a critical point, resulting algorithmic breakthroughs in various fields. On the other hand their complexity makes them executable in tasks, where High-throughput computing powers are available. The optimization of these networks -considering computational complexity and applicability on embedded systems- has not yet been studied and investigated in details. In this paper I show some examples how this algorithms can be optimized and accelerated on embedded systems.

  13. Ensemble Deep Learning for Biomedical Time Series Classification

    Science.gov (United States)

    2016-01-01

    Ensemble learning has been proved to improve the generalization ability effectively in both theory and practice. In this paper, we briefly outline the current status of research on it first. Then, a new deep neural network-based ensemble method that integrates filtering views, local views, distorted views, explicit training, implicit training, subview prediction, and Simple Average is proposed for biomedical time series classification. Finally, we validate its effectiveness on the Chinese Cardiovascular Disease Database containing a large number of electrocardiogram recordings. The experimental results show that the proposed method has certain advantages compared to some well-known ensemble methods, such as Bagging and AdaBoost.

  14. Clinical supervision in a community setting.

    Science.gov (United States)

    Evans, Carol; Marcroft, Emma

    Clinical supervision is a formal process of professional support, reflection and learning that contributes to individual development. First Community Health and Care is committed to providing clinical supervision to nurses and allied healthcare professionals to support the provision and maintenance of high-quality care. In 2012, we developed new guidelines for nurses and AHPs on supervision, incorporating a clinical supervision framework. This offers a range of options to staff so supervision accommodates variations in work settings and individual learning needs and styles.

  15. Development of a Late-Life Dementia Prediction Index with Supervised Machine Learning in the Population-Based CAIDE Study

    Science.gov (United States)

    Pekkala, Timo; Hall, Anette; Lötjönen, Jyrki; Mattila, Jussi; Soininen, Hilkka; Ngandu, Tiia; Laatikainen, Tiina; Kivipelto, Miia; Solomon, Alina

    2016-01-01

    Background and objective: This study aimed to develop a late-life dementia prediction model using a novel validated supervised machine learning method, the Disease State Index (DSI), in the Finnish population-based CAIDE study. Methods: The CAIDE study was based on previous population-based midlife surveys. CAIDE participants were re-examined twice in late-life, and the first late-life re-examination was used as baseline for the present study. The main study population included 709 cognitively normal subjects at first re-examination who returned to the second re-examination up to 10 years later (incident dementia n = 39). An extended population (n = 1009, incident dementia 151) included non-participants/non-survivors (national registers data). DSI was used to develop a dementia index based on first re-examination assessments. Performance in predicting dementia was assessed as area under the ROC curve (AUC). Results: AUCs for DSI were 0.79 and 0.75 for main and extended populations. Included predictors were cognition, vascular factors, age, subjective memory complaints, and APOE genotype. Conclusion: The supervised machine learning method performed well in identifying comprehensive profiles for predicting dementia development up to 10 years later. DSI could thus be useful for identifying individuals who are most at risk and may benefit from dementia prevention interventions. PMID:27802228

  16. Development of a Late-Life Dementia Prediction Index with Supervised Machine Learning in the Population-Based CAIDE Study.

    Science.gov (United States)

    Pekkala, Timo; Hall, Anette; Lötjönen, Jyrki; Mattila, Jussi; Soininen, Hilkka; Ngandu, Tiia; Laatikainen, Tiina; Kivipelto, Miia; Solomon, Alina

    2017-01-01

    This study aimed to develop a late-life dementia prediction model using a novel validated supervised machine learning method, the Disease State Index (DSI), in the Finnish population-based CAIDE study. The CAIDE study was based on previous population-based midlife surveys. CAIDE participants were re-examined twice in late-life, and the first late-life re-examination was used as baseline for the present study. The main study population included 709 cognitively normal subjects at first re-examination who returned to the second re-examination up to 10 years later (incident dementia n = 39). An extended population (n = 1009, incident dementia 151) included non-participants/non-survivors (national registers data). DSI was used to develop a dementia index based on first re-examination assessments. Performance in predicting dementia was assessed as area under the ROC curve (AUC). AUCs for DSI were 0.79 and 0.75 for main and extended populations. Included predictors were cognition, vascular factors, age, subjective memory complaints, and APOE genotype. The supervised machine learning method performed well in identifying comprehensive profiles for predicting dementia development up to 10 years later. DSI could thus be useful for identifying individuals who are most at risk and may benefit from dementia prevention interventions.

  17. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

    OpenAIRE

    C. V. Subbulakshmi; Deepa, S. N.

    2015-01-01

    Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learni...

  18. Poster abstract: Water level estimation in urban ultrasonic/passive infrared flash flood sensor networks using supervised learning

    KAUST Repository

    Mousa, Mustafa

    2014-04-01

    This article describes a machine learning approach to water level estimation in a dual ultrasonic/passive infrared urban flood sensor system. We first show that an ultrasonic rangefinder alone is unable to accurately measure the level of water on a road due to thermal effects. Using additional passive infrared sensors, we show that ground temperature and local sensor temperature measurements are sufficient to correct the rangefinder readings and improve the flood detection performance. Since floods occur very rarely, we use a supervised learning approach to estimate the correction to the ultrasonic rangefinder caused by temperature fluctuations. Preliminary data shows that water level can be estimated with an absolute error of less than 2 cm. © 2014 IEEE.

  19. Protein sequence classification with improved extreme learning machine algorithms.

    Science.gov (United States)

    Cao, Jiuwen; Xiong, Lianglin

    2014-01-01

    Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms.

  20. Internet Traffic Classification for Educational Institutions Using Machine Learning

    Directory of Open Access Journals (Sweden)

    Jaspreet Kaur

    2012-07-01

    Full Text Available In recent times machine learning algorithms are used for internet traffic classification. The infinite number of websites in the internet world can be classified into different categories in different ways. In educational institutions, these websites can be classified into two categories, educational websites and non-educational websites. Educational websites are used to acquire knowledge, to explore educational topics while the non-educational websites are used for entertainment and to keep in touch with people. In case of blocking these non-educational websites students use proxy websites to unblock them. Therefore, in educational institutes for the optimum use of network resources the use of non-educational and proxy websites should be banned. In this paper, we use five ML classifiers Naïve Bayes, RBF, C4.5, MLP and Bayes Net to classify the educational and non-educational websites. Results show that Bayes Net gives best performance in both full feature and reduced feature data sets for intended classification of internet traffic in terms of classification accuracy, recall and precision values as compared to other classifiers.

  1. Cavity contour segmentation in chest radiographs using supervised learning and dynamic programming

    Energy Technology Data Exchange (ETDEWEB)

    Maduskar, Pragnya, E-mail: pragnya.maduskar@radboudumc.nl; Hogeweg, Laurens; Sánchez, Clara I.; Ginneken, Bram van [Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, 6525 GA (Netherlands); Jong, Pim A. de [Department of Radiology, University Medical Center Utrecht, 3584 CX (Netherlands); Peters-Bax, Liesbeth [Department of Radiology, Radboud University Medical Center, Nijmegen, 6525 GA (Netherlands); Dawson, Rodney [University of Cape Town Lung Institute, Cape Town 7700 (South Africa); Ayles, Helen [Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London WC1E 7HT (United Kingdom)

    2014-07-15

    Purpose: Efficacy of tuberculosis (TB) treatment is often monitored using chest radiography. Monitoring size of cavities in pulmonary tuberculosis is important as the size predicts severity of the disease and its persistence under therapy predicts relapse. The authors present a method for automatic cavity segmentation in chest radiographs. Methods: A two stage method is proposed to segment the cavity borders, given a user defined seed point close to the center of the cavity. First, a supervised learning approach is employed to train a pixel classifier using texture and radial features to identify the border pixels of the cavity. A likelihood value of belonging to the cavity border is assigned to each pixel by the classifier. The authors experimented with four different classifiers:k-nearest neighbor (kNN), linear discriminant analysis (LDA), GentleBoost (GB), and random forest (RF). Next, the constructed likelihood map was used as an input cost image in the polar transformed image space for dynamic programming to trace the optimal maximum cost path. This constructed path corresponds to the segmented cavity contour in image space. Results: The method was evaluated on 100 chest radiographs (CXRs) containing 126 cavities. The reference segmentation was manually delineated by an experienced chest radiologist. An independent observer (a chest radiologist) also delineated all cavities to estimate interobserver variability. Jaccard overlap measure Ω was computed between the reference segmentation and the automatic segmentation; and between the reference segmentation and the independent observer's segmentation for all cavities. A median overlap Ω of 0.81 (0.76 ± 0.16), and 0.85 (0.82 ± 0.11) was achieved between the reference segmentation and the automatic segmentation, and between the segmentations by the two radiologists, respectively. The best reported mean contour distance and Hausdorff distance between the reference and the automatic segmentation were

  2. Using Machine Learning for Advanced Anomaly Detection and Classification

    Science.gov (United States)

    Lane, B.; Poole, M.; Camp, M.; Murray-Krezan, J.

    2016-09-01

    Machine Learning (ML) techniques have successfully been used in a wide variety of applications to automatically detect and potentially classify changes in activity, or a series of activities by utilizing large amounts data, sometimes even seemingly-unrelated data. The amount of data being collected, processed, and stored in the Space Situational Awareness (SSA) domain has grown at an exponential rate and is now better suited for ML. This paper describes development of advanced algorithms to deliver significant improvements in characterization of deep space objects and indication and warning (I&W) using a global network of telescopes that are collecting photometric data on a multitude of space-based objects. The Phase II Air Force Research Laboratory (AFRL) Small Business Innovative Research (SBIR) project Autonomous Characterization Algorithms for Change Detection and Characterization (ACDC), contracted to ExoAnalytic Solutions Inc. is providing the ability to detect and identify photometric signature changes due to potential space object changes (e.g. stability, tumble rate, aspect ratio), and correlate observed changes to potential behavioral changes using a variety of techniques, including supervised learning. Furthermore, these algorithms run in real-time on data being collected and processed by the ExoAnalytic Space Operations Center (EspOC), providing timely alerts and warnings while dynamically creating collection requirements to the EspOC for the algorithms that generate higher fidelity I&W. This paper will discuss the recently implemented ACDC algorithms, including the general design approach and results to date. The usage of supervised algorithms, such as Support Vector Machines, Neural Networks, k-Nearest Neighbors, etc., and unsupervised algorithms, for example k-means, Principle Component Analysis, Hierarchical Clustering, etc., and the implementations of these algorithms is explored. Results of applying these algorithms to EspOC data both in an off

  3. Supervised Learning-Based tagSNP Selection for Genome-Wide Disease Classifications

    OpenAIRE

    Yang Mary Qu; Chen Zhongxue; Yang Jack; Liu Qingzhong; Sung Andrew H; Huang Xudong

    2008-01-01

    Abstract Background Comprehensive evaluation of common genetic variations through association of single nucleotide polymorphisms (SNPs) with complex human diseases on the genome-wide scale is an active area in human genome research. One of the fundamental questions in a SNP-disease association study is to find an optimal subset of SNPs with predicting power for disease status. To find that subset while reducing study burden in terms of time and costs, one can potentially reconcile information...

  4. Grapevine Yield and Leaf Area Estimation Using Supervised Classification Methodology on RGB Images Taken under Field Conditions

    Science.gov (United States)

    Diago, Maria-Paz; Correa, Christian; Millán, Borja; Barreiro, Pilar; Valero, Constantino; Tardaguila, Javier

    2012-01-01

    The aim of this research was to implement a methodology through the generation of a supervised classifier based on the Mahalanobis distance to characterize the grapevine canopy and assess leaf area and yield using RGB images. The method automatically processes sets of images, and calculates the areas (number of pixels) corresponding to seven different classes (Grapes, Wood, Background, and four classes of Leaf, of increasing leaf age). Each one is initialized by the user, who selects a set of representative pixels for every class in order to induce the clustering around them. The proposed methodology was evaluated with 70 grapevine (V. vinifera L. cv. Tempranillo) images, acquired in a commercial vineyard located in La Rioja (Spain), after several defoliation and de-fruiting events on 10 vines, with a conventional RGB camera and no artificial illumination. The segmentation results showed a performance of 92% for leaves and 98% for clusters, and allowed to assess the grapevine’s leaf area and yield with R2 values of 0.81 (p < 0.001) and 0.73 (p = 0.002), respectively. This methodology, which operates with a simple image acquisition setup and guarantees the right number and kind of pixel classes, has shown to be suitable and robust enough to provide valuable information for vineyard management. PMID:23235443

  5. Two Linear Unmixing Algorithms to Recognize Targets Using Supervised Classification and Orthogonal Rotation in Airborne Hyperspectral Images

    Directory of Open Access Journals (Sweden)

    Michael Zheludev

    2012-02-01

    Full Text Available The goal of the paper is to detect pixels that contain targets of known spectra. The target can be present in a sub- or above pixel. Pixels without targets are classified as background pixels. Each pixel is treated via the content of its neighborhood. A pixel whose spectrum is different from its neighborhood is classified as a “suspicious point”. In each suspicious point there is a mix of target(s and background. The main objective in a supervised detection (also called “target detection” is to search for a specific given spectral material (target in hyperspectral imaging (HSI where the spectral signature of the target is known a priori from laboratory measurements. In addition, the fractional abundance of the target is computed. To achieve this we present two linear unmixing algorithms that recognize targets with known (given spectral signatures. The CLUN is based on automatic feature extraction from the target’s spectrum. These features separate the target from the background. The ROTU algorithm is based on embedding the spectra space into a special space by random orthogonal transformation and on the statistical properties of the embedded result. Experimental results demonstrate that the targets’ locations were extracted correctly and these algorithms are robust and efficient.

  6. Classification of Phishing Email Using Random Forest Machine Learning Technique

    Directory of Open Access Journals (Sweden)

    Andronicus A. Akinyelu

    2014-01-01

    Full Text Available Phishing is one of the major challenges faced by the world of e-commerce today. Thanks to phishing attacks, billions of dollars have been lost by many companies and individuals. In 2012, an online report put the loss due to phishing attack at about $1.5 billion. This global impact of phishing attacks will continue to be on the increase and thus requires more efficient phishing detection techniques to curb the menace. This paper investigates and reports the use of random forest machine learning algorithm in classification of phishing attacks, with the major objective of developing an improved phishing email classifier with better prediction accuracy and fewer numbers of features. From a dataset consisting of 2000 phishing and ham emails, a set of prominent phishing email features (identified from the literature were extracted and used by the machine learning algorithm with a resulting classification accuracy of 99.7% and low false negative (FN and false positive (FP rates.

  7. Application of machine learning on brain cancer multiclass classification

    Science.gov (United States)

    Panca, V.; Rustam, Z.

    2017-07-01

    Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.

  8. Tree Species Abundance Predictions in a Tropical Agricultural Landscape with a Supervised Classification Model and Imbalanced Data

    Directory of Open Access Journals (Sweden)

    Sarah J. Graves

    2016-02-01

    Full Text Available Mapping species through classification of imaging spectroscopy data is facilitating research to understand tree species distributions at increasingly greater spatial scales. Classification requires a dataset of field observations matched to the image, which will often reflect natural species distributions, resulting in an imbalanced dataset with many samples for common species and few samples for less common species. Despite the high prevalence of imbalanced datasets in multiclass species predictions, the effect on species prediction accuracy and landscape species abundance has not yet been quantified. First, we trained and assessed the accuracy of a support vector machine (SVM model with a highly imbalanced dataset of 20 tropical species and one mixed-species class of 24 species identified in a hyperspectral image mosaic (350–2500 nm of Panamanian farmland and secondary forest fragments. The model, with an overall accuracy of 62% ± 2.3% and F-score of 59% ± 2.7%, was applied to the full image mosaic (23,000 ha at a 2-m resolution to produce a species prediction map, which suggested that this tropical agricultural landscape is more diverse than what has been presented in field-based studies. Second, we quantified the effect of class imbalance on model accuracy. Model assessment showed a trend where species with more samples were consistently over predicted while species with fewer samples were under predicted. Standardizing sample size reduced model accuracy, but also reduced the level of species over- and under-prediction. This study advances operational species mapping of diverse tropical landscapes by detailing the effect of imbalanced data on classification accuracy and providing estimates of tree species abundance in an agricultural landscape. Species maps using data and methods presented here can be used in landscape analyses of species distributions to understand human or environmental effects, in addition to focusing conservation

  9. Classification of JERS-1 Image Mosaic of Central Africa Using A Supervised Multiscale Classifier of Texture Features

    Science.gov (United States)

    Saatchi, Sassan; DeGrandi, Franco; Simard, Marc; Podest, Erika

    1999-01-01

    In this paper, a multiscale approach is introduced to classify the Japanese Research Satellite-1 (JERS-1) mosaic image over the Central African rainforest. A series of texture maps are generated from the 100 m mosaic image at various scales. Using a quadtree model and relating classes at each scale by a Markovian relationship, the multiscale images are classified from course to finer scale. The results are verified at various scales and the evolution of classification is monitored by calculating the error at each stage.

  10. A Machine-learning approach to classification of X-ray sources

    Science.gov (United States)

    Hare, Jeremy; Kargaltsev, Oleg; Rangelov, Blagoy; Pavlov, George; Posselt, Bettina; Volkov, Igor

    2017-08-01

    Chandra and XMM-Newton X-ray observatories have serendipitously detected a large number of Galactic sources. Although their properties are automatically extracted and stored in catalogs, most of these sources remain unexplored. Classifying these sources can enable population studies on much larger scales and may also reveal new types of X-ray sources. For most of these sources the X-ray data alone are not enough to identify their nature, and multiwavelength data must be used. We developed a multiwavelength classification pipeline (MUWCLASS), which relies on supervised machine learning and a rich training dataset. We describe the training dataset, the pipeline and its testing, and will show/discuss how the code performs in different example environments, such as unidentified gamma-ray sources, supernova remnants, dwarf galaxies, stellar clusters, and the inner Galactic plane. We also discuss the application of this approach to the data from upcoming new X-ray observatories (e.g., eROSITA, Athena).

  11. Comparison of Classification Algorithms and Training Sample Sizes in Urban Land Classification with Landsat Thematic Mapper Imagery

    OpenAIRE

    Congcong Li; Jie Wang; Lei Wang; Luanyun Hu; Peng Gong

    2014-01-01

    Although a large number of new image classification algorithms have been developed, they are rarely tested with the same classification task. In this research, with the same Landsat Thematic Mapper (TM) data set and the same classification scheme over Guangzhou City, China, we tested two unsupervised and 13 supervised classification algorithms, including a number of machine learning algorithms that became popular in remote sensing during the past 20 years. Our analysis focused primarily on ...

  12. Supervised Classification of Agricultural Land Cover Using a Modified k-NN Technique (MNN and Landsat Remote Sensing Imagery

    Directory of Open Access Journals (Sweden)

    Karsten Schulz

    2009-11-01

    Full Text Available Nearest neighbor techniques are commonly used in remote sensing, pattern recognition and statistics to classify objects into a predefined number of categories based on a given set of predictors. These techniques are especially useful for highly nonlinear relationship between the variables. In most studies the distance measure is adopted a priori. In contrast we propose a general procedure to find an adaptive metric that combines a local variance reducing technique and a linear embedding of the observation space into an appropriate Euclidean space. To illustrate the application of this technique, two agricultural land cover classifications using mono-temporal and multi-temporal Landsat scenes are presented. The results of the study, compared with standard approaches used in remote sensing such as maximum likelihood (ML or k-Nearest Neighbor (k-NN indicate substantial improvement with regard to the overall accuracy and the cardinality of the calibration data set. Also, using MNN in a soft/fuzzy classification framework demonstrated to be a very useful tool in order to derive critical areas that need some further attention and investment concerning additional calibration data.

  13. Manifold learning based feature extraction for classification of hyper-spectral data

    CSIR Research Space (South Africa)

    Lunga, D

    2013-08-01

    Full Text Available often lie on sparse, nonlinear manifolds whose geometric and topological structures can be exploited via manifold learning techniques. In this article, we focused on demonstrating the opportunities provided by manifold learning for classification...

  14. Adaptation and validation of the instrument Clinical Learning Environment and Supervision for medical students in primary health care.

    Science.gov (United States)

    Öhman, Eva; Alinaghizadeh, Hassan; Kaila, Päivi; Hult, Håkan; Nilsson, Gunnar H; Salminen, Helena

    2016-12-01

    Clinical learning takes place in complex socio-cultural environments that are workplaces for the staff and learning places for the students. In the clinical context, the students learn by active participation and in interaction with the rest of the community at the workplace. Clinical learning occurs outside the university, therefore is it important for both the university and the student that the student is given opportunities to evaluate the clinical placements with an instrument that allows evaluation from many perspectives. The instrument Clinical Learning Environment and Supervision (CLES) was originally developed for evaluation of nursing students' clinical learning environment. The aim of this study was to adapt and validate the CLES instrument to measure medical students' perceptions of their learning environment in primary health care. In the adaptation process the face validity was tested by an expert panel of primary care physicians, who were also active clinical supervisors. The adapted CLES instrument with 25 items and six background questions was sent electronically to 1,256 medical students from one university. Answers from 394 students were eligible for inclusion. Exploratory factor analysis based on principal component methods followed by oblique rotation was used to confirm the adequate number of factors in the data. Construct validity was assessed by factor analysis. Confirmatory factor analysis was used to confirm the dimensions of CLES instrument. The construct validity showed a clearly indicated four-factor model. The cumulative variance explanation was 0.65, and the overall Cronbach's alpha was 0.95. All items loaded similarly with the dimensions in the non-adapted CLES except for one item that loaded to another dimension. The CLES instrument in its adapted form had high construct validity and high reliability and internal consistency. CLES, in its adapted form, appears to be a valid instrument to evaluate medical students' perceptions of

  15. Adaptation and validation of the instrument Clinical Learning Environment and Supervision for medical students in primary health care

    Directory of Open Access Journals (Sweden)

    Eva Öhman

    2016-12-01

    Full Text Available Abstract Background Clinical learning takes place in complex socio-cultural environments that are workplaces for the staff and learning places for the students. In the clinical context, the students learn by active participation and in interaction with the rest of the community at the workplace. Clinical learning occurs outside the university, therefore is it important for both the university and the student that the student is given opportunities to evaluate the clinical placements with an instrument that allows evaluation from many perspectives. The instrument Clinical Learning Environment and Supervision (CLES was originally developed for evaluation of nursing students’ clinical learning environment. The aim of this study was to adapt and validate the CLES instrument to measure medical students’ perceptions of their learning environment in primary health care. Methods In the adaptation process the face validity was tested by an expert panel of primary care physicians, who were also active clinical supervisors. The adapted CLES instrument with 25 items and six background questions was sent electronically to 1,256 medical students from one university. Answers from 394 students were eligible for inclusion. Exploratory factor analysis based on principal component methods followed by oblique rotation was used to confirm the adequate number of factors in the data. Construct validity was assessed by factor analysis. Confirmatory factor analysis was used to confirm the dimensions of CLES instrument. Results The construct validity showed a clearly indicated four-factor model. The cumulative variance explanation was 0.65, and the overall Cronbach’s alpha was 0.95. All items loaded similarly with the dimensions in the non-adapted CLES except for one item that loaded to another dimension. The CLES instrument in its adapted form had high construct validity and high reliability and internal consistency. Conclusion CLES, in its adapted form, appears

  16. Sparse Representation for Time-Series Classification

    Science.gov (United States)

    2015-02-08

    Comput. Vision and Pattern Recognition (CVPR), pp. 4114–4121 (2014). 18. J. Mairal, F. Bach , A. Zisserman, and G. Sapiro. Supervised dictionary learn...ing. In Advances Neural Inform. Process. Syst. (NIPS), pp. 1033–1040 (2008). 19. J. Mairal, F. Bach , and J. Ponce, Task-driven dictionary learning...Series Classification 17 compressive sensing, SISC. 33(1), 250–278 (2011). 41. J. Mairal, F. Bach , J. Ponce, and G. Sapiro, Online dictionary learning for

  17. Deep Transfer Learning for Modality Classification of Medical Images

    Directory of Open Access Journals (Sweden)

    Yuhai Yu

    2017-07-01

    Full Text Available Medical images are valuable for clinical diagnosis and decision making. Image modality is an important primary step, as it is capable of aiding clinicians to access required medical image in retrieval systems. Traditional methods of modality classification are dependent on the choice of hand-crafted features and demand a clear awareness of prior domain knowledge. The feature learning approach may detect efficiently visual characteristics of different modalities, but it is limited to the number of training datasets. To overcome the absence of labeled data, on the one hand, we take deep convolutional neural networks (VGGNet, ResNet with different depths pre-trained on ImageNet, fix most of the earlier layers to reserve generic features of natural images, and only train their higher-level portion on ImageCLEF to learn domain-specific features of medical figures. Then, we train from scratch deep CNNs with only six weight layers to capture more domain-specific features. On the other hand, we employ two data augmentation methods to help CNNs to give the full scope to their potential characterizing image modality features. The final prediction is given by our voting system based on the outputs of three CNNs. After evaluating our proposed model on the subfigure classification task in ImageCLEF2015 and ImageCLEF2016, we obtain new, state-of-the-art results—76.87% in ImageCLEF2015 and 87.37% in ImageCLEF2016—which imply that CNNs, based on our proposed transfer learning methods and data augmentation skills, can identify more efficiently modalities of medical images.

  18. Fieldwork online: a GIS-based electronic learning environment for supervising fieldwork

    Science.gov (United States)

    Alberti, Koko; Marra, Wouter; Baarsma, Rein; Karssenberg, Derek

    2016-04-01

    Fieldwork comes in many forms: individual research projects in unique places, large groups of students on organized fieldtrips, and everything in between those extremes. Supervising students in often distant places can be a logistical challenge and requires a significant time investment of their supervisors. We developed an online application for remote supervision of students on fieldwork. In our fieldworkonline webapp, which is accessible through a web browser, students can upload their field data in the form of a spreadsheet with coordinates (in a system of choice) and data-fields. Field data can be any combination of quantitative or qualitative data, and can contain references to photos or other documents uploaded to the app. The student's data is converted to a map with data-points that contain all the data-fields and links to photos and documents associated with that location. Supervisors can review the data of their students and provide feedback on observations, or geo-referenced feedback on the map. Similarly, students can ask geo-referenced questions to their supervisors. Furthermore, supervisors can choose different basemaps or upload their own. Fieldwork online is a useful tool for supervising students at a distant location in the field and is most suitable for first-order feedback on students' observations, can be used to guide students to interesting locations, and allows for short discussions on phenomena observed in the field. We seek user that like to use this system, we are able to provide support and add new features if needed. The website is built and controlled using Flask, an open-source Python Framework. The maps are generated and controlled using MapServer and OpenLayers, and the database is built in PostgreSQL with PostGIS support. Fieldworkonline and all tools used to create it are open-source. Experience fieldworkonline at our demo during this session, or online at fieldworkonline.geo.uu.nl (username: EGU2016, password: Vienna).

  19. Leveraging Sequence Classification by Taxonomy-Based Multitask Learning

    Science.gov (United States)

    Widmer, Christian; Leiva, Jose; Altun, Yasemin; Rätsch, Gunnar

    In this work we consider an inference task that biologists are very good at: deciphering biological processes by bringing together knowledge that has been obtained by experiments using various organisms, while respecting the differences and commonalities of these organisms. We look at this problem from an sequence analysis point of view, where we aim at solving the same classification task in different organisms. We investigate the challenge of combining information from several organisms, whereas we consider the relation between the organisms to be defined by a tree structure derived from their phylogeny. Multitask learning, a machine learning technique that recently received considerable attention, considers the problem of learning across tasks that are related to each other. We treat each organism as one task and present three novel multitask learning methods to handle situations in which the relationships among tasks can be described by a hierarchy. These algorithms are designed for large-scale applications and are therefore applicable to problems with a large number of training examples, which are frequently encountered in sequence analysis. We perform experimental analyses on synthetic data sets in order to illustrate the properties of our algorithms. Moreover, we consider a problem from genomic sequence analysis, namely splice site recognition, to illustrate the usefulness of our approach. We show that intelligently combining data from 15 eukaryotic organisms can indeed significantly improve the prediction performance compared to traditional learning approaches. On a broader perspective, we expect that algorithms like the ones presented in this work have the potential to complement and enrich the strategy of homology-based sequence analysis that are currently the quasi-standard in biological sequence analysis.

  20. Development and psychometric testing of the Clinical Learning Environment, Supervision and Nurse Teacher evaluation scale (CLES+T): the Spanish version.

    Science.gov (United States)

    Vizcaya-Moreno, M Flores; Pérez-Cañaveras, Rosa M; De Juan, Joaquín; Saarikoski, Mikko

    2015-01-01

    The Clinical Learning Environment, Supervision and Nurse Teacher scale is a reliable and valid instrument to evaluate the quality of the clinical learning process in international nursing education contexts. This paper reports the development and psychometric testing of the Spanish version of the Clinical Learning Environment, Supervision and Nurse Teacher scale. Cross-sectional validation study of the scale. 10 public and private hospitals in the Alicante area, and the Faculty of Health Sciences (University of Alicante, Spain). 370 student nurses on clinical placement (January 2011-March 2012). The Clinical Learning Environment, Supervision and Nurse Teacher scale was translated using the modified direct translation method. Statistical analyses were performed using PASW Statistics 18 and AMOS 18.0.0 software. A multivariate analysis was conducted in order to assess construct validity. Cronbach's alpha coefficient was used to evaluate instrument reliability. An exploratory factorial analysis identified the five dimensions from the original version, and explained 66.4% of the variance. Confirmatory factor analysis supported the factor structure of the Spanish version of the instrument. Cronbach's alpha coefficient for the scale was .95, ranging from .80 to .97 for the subscales. This version of the Clinical Learning Environment, Supervision and Nurse Teacher scale instrument showed acceptable psychometric properties for use as an assessment scale in Spanish-speaking countries. Copyright © 2014 Elsevier Ltd. All rights reserved.