WorldWideScience

Sample records for supervised learning unsupervised

  1. Integrating the Supervised Information into Unsupervised Learning

    Directory of Open Access Journals (Sweden)

    Ping Ling

    2013-01-01

    Full Text Available This paper presents an assembling unsupervised learning framework that adopts the information coming from the supervised learning process and gives the corresponding implementation algorithm. The algorithm consists of two phases: extracting and clustering data representatives (DRs firstly to obtain labeled training data and then classifying non-DRs based on labeled DRs. The implementation algorithm is called SDSN since it employs the tuning-scaled Support vector domain description to collect DRs, uses spectrum-based method to cluster DRs, and adopts the nearest neighbor classifier to label non-DRs. The validation of the clustering procedure of the first-phase is analyzed theoretically. A new metric is defined data dependently in the second phase to allow the nearest neighbor classifier to work with the informed information. A fast training approach for DRs’ extraction is provided to bring more efficiency. Experimental results on synthetic and real datasets verify that the proposed idea is of correctness and performance and SDSN exhibits higher popularity in practice over the traditional pure clustering procedure.

  2. Teacher and learner: Supervised and unsupervised learning in communities.

    Science.gov (United States)

    Shafto, Michael G; Seifert, Colleen M

    2015-01-01

    How far can teaching methods go to enhance learning? Optimal methods of teaching have been considered in research on supervised and unsupervised learning. Locally optimal methods are usually hybrids of teaching and self-directed approaches. The costs and benefits of specific methods have been shown to depend on the structure of the learning task, the learners, the teachers, and the environment.

  3. Semi-supervised and unsupervised extreme learning machines.

    Science.gov (United States)

    Huang, Gao; Song, Shiji; Gupta, Jatinder N D; Wu, Cheng

    2014-12-01

    Extreme learning machines (ELMs) have proven to be efficient and effective learning mechanisms for pattern classification and regression. However, ELMs are primarily applied to supervised learning problems. Only a few existing research papers have used ELMs to explore unlabeled data. In this paper, we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization, thus greatly expanding the applicability of ELMs. The key advantages of the proposed algorithms are as follows: 1) both the semi-supervised ELM (SS-ELM) and the unsupervised ELM (US-ELM) exhibit learning capability and computational efficiency of ELMs; 2) both algorithms naturally handle multiclass classification or multicluster clustering; and 3) both algorithms are inductive and can handle unseen data at test time directly. Moreover, it is shown in this paper that all the supervised, semi-supervised, and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping, which is the key concept in ELM theory. Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with the state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency.

  4. Function approximation using combined unsupervised and supervised learning.

    Science.gov (United States)

    Andras, Peter

    2014-03-01

    Function approximation is one of the core tasks that are solved using neural networks in the context of many engineering problems. However, good approximation results need good sampling of the data space, which usually requires exponentially increasing volume of data as the dimensionality of the data increases. At the same time, often the high-dimensional data is arranged around a much lower dimensional manifold. Here we propose the breaking of the function approximation task for high-dimensional data into two steps: (1) the mapping of the high-dimensional data onto a lower dimensional space corresponding to the manifold on which the data resides and (2) the approximation of the function using the mapped lower dimensional data. We use over-complete self-organizing maps (SOMs) for the mapping through unsupervised learning, and single hidden layer neural networks for the function approximation through supervised learning. We also extend the two-step procedure by considering support vector machines and Bayesian SOMs for the determination of the best parameters for the nonlinear neurons in the hidden layer of the neural networks used for the function approximation. We compare the approximation performance of the proposed neural networks using a set of functions and show that indeed the neural networks using combined unsupervised and supervised learning outperform in most cases the neural networks that learn the function approximation using the original high-dimensional data.

  5. Unsupervised/supervised learning concept for 24-hour load forecasting

    Energy Technology Data Exchange (ETDEWEB)

    Djukanovic, M [Electrical Engineering Inst. ' Nikola Tesla' , Belgrade (Yugoslavia); Babic, B [Electrical Power Industry of Serbia, Belgrade (Yugoslavia); Sobajic, D J; Pao, Y -H [Case Western Reserve Univ., Cleveland, OH (United States). Dept. of Electrical Engineering and Computer Science

    1993-07-01

    An application of artificial neural networks in short-term load forecasting is described. An algorithm using an unsupervised/supervised learning concept and historical relationship between the load and temperature for a given season, day type and hour of the day to forecast hourly electric load with a lead time of 24 hours is proposed. An additional approach using functional link net, temperature variables, average load and last one-hour load of previous day is introduced and compared with the ANN model with one hidden layer load forecast. In spite of limited available weather variables (maximum, minimum and average temperature for the day) quite acceptable results have been achieved. The 24-hour-ahead forecast errors (absolute average) ranged from 2.78% for Saturdays and 3.12% for working days to 3.54% for Sundays. (Author)

  6. A comparative evaluation of supervised and unsupervised representation learning approaches for anaplastic medulloblastoma differentiation

    Science.gov (United States)

    Cruz-Roa, Angel; Arevalo, John; Basavanhally, Ajay; Madabhushi, Anant; González, Fabio

    2015-01-01

    Learning data representations directly from the data itself is an approach that has shown great success in different pattern recognition problems, outperforming state-of-the-art feature extraction schemes for different tasks in computer vision, speech recognition and natural language processing. Representation learning applies unsupervised and supervised machine learning methods to large amounts of data to find building-blocks that better represent the information in it. Digitized histopathology images represents a very good testbed for representation learning since it involves large amounts of high complex, visual data. This paper presents a comparative evaluation of different supervised and unsupervised representation learning architectures to specifically address open questions on what type of learning architectures (deep or shallow), type of learning (unsupervised or supervised) is optimal. In this paper we limit ourselves to addressing these questions in the context of distinguishing between anaplastic and non-anaplastic medulloblastomas from routine haematoxylin and eosin stained images. The unsupervised approaches evaluated were sparse autoencoders and topographic reconstruct independent component analysis, and the supervised approach was convolutional neural networks. Experimental results show that shallow architectures with more neurons are better than deeper architectures without taking into account local space invariances and that topographic constraints provide useful invariant features in scale and rotations for efficient tumor differentiation.

  7. Ensemble learning with trees and rules: supervised, semi-supervised, unsupervised

    Science.gov (United States)

    In this article, we propose several new approaches for post processing a large ensemble of conjunctive rules for supervised and semi-supervised learning problems. We show with various examples that for high dimensional regression problems the models constructed by the post processing the rules with ...

  8. Supervised and Unsupervised Learning of Multidimensional Acoustic Categories

    Science.gov (United States)

    Goudbeek, Martijn; Swingley, Daniel; Smits, Roel

    2009-01-01

    Learning to recognize the contrasts of a language-specific phonemic repertoire can be viewed as forming categories in a multidimensional psychophysical space. Research on the learning of distributionally defined visual categories has shown that categories defined over 1 dimension are easy to learn and that learning multidimensional categories is…

  9. Learning rates in supervised and unsupervised intelligent systems

    International Nuclear Information System (INIS)

    Hora, S.C.

    1986-01-01

    Classifying observations from a mixture distribution is considered a simple model for learning. Existing results are integrated to obtain asymptotically optimal estimators of the classification rule. The asymptotic relative efficiencies show that a tutored learner is considerably more efficient on difficult problems, but only slightly more efficient on easy problems. This suggests a combined method that seeks instruction on hard cases

  10. Conduction Delay Learning Model for Unsupervised and Supervised Classification of Spatio-Temporal Spike Patterns.

    Science.gov (United States)

    Matsubara, Takashi

    2017-01-01

    Precise spike timing is considered to play a fundamental role in communications and signal processing in biological neural networks. Understanding the mechanism of spike timing adjustment would deepen our understanding of biological systems and enable advanced engineering applications such as efficient computational architectures. However, the biological mechanisms that adjust and maintain spike timing remain unclear. Existing algorithms adopt a supervised approach, which adjusts the axonal conduction delay and synaptic efficacy until the spike timings approximate the desired timings. This study proposes a spike timing-dependent learning model that adjusts the axonal conduction delay and synaptic efficacy in both unsupervised and supervised manners. The proposed learning algorithm approximates the Expectation-Maximization algorithm, and classifies the input data encoded into spatio-temporal spike patterns. Even in the supervised classification, the algorithm requires no external spikes indicating the desired spike timings unlike existing algorithms. Furthermore, because the algorithm is consistent with biological models and hypotheses found in existing biological studies, it could capture the mechanism underlying biological delay learning.

  11. A Hybrid Supervised/Unsupervised Machine Learning Approach to Solar Flare Prediction

    Science.gov (United States)

    Benvenuto, Federico; Piana, Michele; Campi, Cristina; Massone, Anna Maria

    2018-01-01

    This paper introduces a novel method for flare forecasting, combining prediction accuracy with the ability to identify the most relevant predictive variables. This result is obtained by means of a two-step approach: first, a supervised regularization method for regression, namely, LASSO is applied, where a sparsity-enhancing penalty term allows the identification of the significance with which each data feature contributes to the prediction; then, an unsupervised fuzzy clustering technique for classification, namely, Fuzzy C-Means, is applied, where the regression outcome is partitioned through the minimization of a cost function and without focusing on the optimization of a specific skill score. This approach is therefore hybrid, since it combines supervised and unsupervised learning; realizes classification in an automatic, skill-score-independent way; and provides effective prediction performances even in the case of imbalanced data sets. Its prediction power is verified against NOAA Space Weather Prediction Center data, using as a test set, data in the range between 1996 August and 2010 December and as training set, data in the range between 1988 December and 1996 June. To validate the method, we computed several skill scores typically utilized in flare prediction and compared the values provided by the hybrid approach with the ones provided by several standard (non-hybrid) machine learning methods. The results showed that the hybrid approach performs classification better than all other supervised methods and with an effectiveness comparable to the one of clustering methods; but, in addition, it provides a reliable ranking of the weights with which the data properties contribute to the forecast.

  12. Unsupervised Learning and Generalization

    DEFF Research Database (Denmark)

    Hansen, Lars Kai; Larsen, Jan

    1996-01-01

    The concept of generalization is defined for a general class of unsupervised learning machines. The generalization error is a straightforward extension of the corresponding concept for supervised learning, and may be estimated empirically using a test set or by statistical means-in close analogy ...... with supervised learning. The empirical and analytical estimates are compared for principal component analysis and for K-means clustering based density estimation......The concept of generalization is defined for a general class of unsupervised learning machines. The generalization error is a straightforward extension of the corresponding concept for supervised learning, and may be estimated empirically using a test set or by statistical means-in close analogy...

  13. Model–Free Visualization of Suspicious Lesions in Breast MRI Based on Supervised and Unsupervised Learning

    NARCIS (Netherlands)

    Twellmann, T.; Meyer-Bäse, A.; Lange, O.; Foo, S.; Nattkemper, T.W.

    2008-01-01

    Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) has become an important tool in breast cancer diagnosis, but evaluation of multitemporal 3D image data holds new challenges for human observers. To aid the image analysis process, we apply supervised and unsupervised pattern recognition

  14. Learning Microbial Community Structures with Supervised and Unsupervised Non-negative Matrix Factorization.

    Science.gov (United States)

    Cai, Yun; Gu, Hong; Kenney, Toby

    2017-08-31

    Learning the structure of microbial communities is critical in understanding the different community structures and functions of microbes in distinct individuals. We view microbial communities as consisting of many subcommunities which are formed by certain groups of microbes functionally dependent on each other. The focus of this paper is on methods for extracting the subcommunities from the data, in particular Non-Negative Matrix Factorization (NMF). Our methods can be applied to both OTU data and functional metagenomic data. We apply the existing unsupervised NMF method and also develop a new supervised NMF method for extracting interpretable information from classification problems. The relevance of the subcommunities identified by NMF is demonstrated by their excellent performance for classification. Through three data examples, we demonstrate how to interpret the features identified by NMF to draw meaningful biological conclusions and discover hitherto unidentified patterns in the data. Comparing whole metagenomes of various mammals, (Muegge et al., Science 332:970-974, 2011), the biosynthesis of macrolides pathway is found in hindgut-fermenting herbivores, but not carnivores. This is consistent with results in veterinary science that macrolides should not be given to non-ruminant herbivores. For time series microbiome data from various body sites (Caporaso et al., Genome Biol 12:50, 2011), a shift in the microbial communities is identified for one individual. The shift occurs at around the same time in the tongue and gut microbiomes, indicating that the shift is a genuine biological trait, rather than an artefact of the method. For whole metagenome data from IBD patients and healthy controls (Qin et al., Nature 464:59-65, 2010), we identify differences in a number of pathways (some known, others new). NMF is a powerful tool for identifying the key features of microbial communities. These identified features can not only be used to perform difficult

  15. An evaluation of unsupervised and supervised learning algorithms for clustering landscape types in the United States

    Science.gov (United States)

    Wendel, Jochen; Buttenfield, Barbara P.; Stanislawski, Larry V.

    2016-01-01

    Knowledge of landscape type can inform cartographic generalization of hydrographic features, because landscape characteristics provide an important geographic context that affects variation in channel geometry, flow pattern, and network configuration. Landscape types are characterized by expansive spatial gradients, lacking abrupt changes between adjacent classes; and as having a limited number of outliers that might confound classification. The US Geological Survey (USGS) is exploring methods to automate generalization of features in the National Hydrography Data set (NHD), to associate specific sequences of processing operations and parameters with specific landscape characteristics, thus obviating manual selection of a unique processing strategy for every NHD watershed unit. A chronology of methods to delineate physiographic regions for the United States is described, including a recent maximum likelihood classification based on seven input variables. This research compares unsupervised and supervised algorithms applied to these seven input variables, to evaluate and possibly refine the recent classification. Evaluation metrics for unsupervised methods include the Davies–Bouldin index, the Silhouette index, and the Dunn index as well as quantization and topographic error metrics. Cross validation and misclassification rate analysis are used to evaluate supervised classification methods. The paper reports the comparative analysis and its impact on the selection of landscape regions. The compared solutions show problems in areas of high landscape diversity. There is some indication that additional input variables, additional classes, or more sophisticated methods can refine the existing classification.

  16. Supervised versus unsupervised categorization: two sides of the same coin?

    Science.gov (United States)

    Pothos, Emmanuel M; Edwards, Darren J; Perlman, Amotz

    2011-09-01

    Supervised and unsupervised categorization have been studied in separate research traditions. A handful of studies have attempted to explore a possible convergence between the two. The present research builds on these studies, by comparing the unsupervised categorization results of Pothos et al. ( 2011 ; Pothos et al., 2008 ) with the results from two procedures of supervised categorization. In two experiments, we tested 375 participants with nine different stimulus sets and examined the relation between ease of learning of a classification, memory for a classification, and spontaneous preference for a classification. After taking into account the role of the number of category labels (clusters) in supervised learning, we found the three variables to be closely associated with each other. Our results provide encouragement for researchers seeking unified theoretical explanations for supervised and unsupervised categorization, but raise a range of challenging theoretical questions.

  17. Unsupervised Labeling Of Data For Supervised Learning And Its Application To Medical Claims Prediction

    Directory of Open Access Journals (Sweden)

    Che Ngufor

    2013-01-01

    Full Text Available The task identifying changes and irregularities in medical insurance claim pay-ments is a difficult process of which the traditional practice involves queryinghistorical claims databases and flagging potential claims as normal or abnor-mal. Because what is considered as normal payment is usually unknown andmay change over time, abnormal payments often pass undetected; only to bediscovered when the payment period has passed.This paper presents the problem of on-line unsupervised learning from datastreams when the distribution that generates the data changes or drifts overtime. Automated algorithms for detecting drifting concepts in a probabilitydistribution of the data are presented. The idea behind the presented driftdetection methods is to transform the distribution of the data within a slidingwindow into a more convenient distribution. Then, a test statistics p-value ata given significance level can be used to infer the drift rate, adjust the windowsize and decide on the status of the drift. The detected concepts drifts areused to label the data, for subsequent learning of classification models by asupervised learner. The algorithms were tested on several synthetic and realmedical claims data sets.

  18. Unsupervised learning algorithms

    CERN Document Server

    Aydin, Kemal

    2016-01-01

    This book summarizes the state-of-the-art in unsupervised learning. The contributors discuss how with the proliferation of massive amounts of unlabeled data, unsupervised learning algorithms, which can automatically discover interesting and useful patterns in such data, have gained popularity among researchers and practitioners. The authors outline how these algorithms have found numerous applications including pattern recognition, market basket analysis, web mining, social network analysis, information retrieval, recommender systems, market research, intrusion detection, and fraud detection. They present how the difficulty of developing theoretically sound approaches that are amenable to objective evaluation have resulted in the proposal of numerous unsupervised learning algorithms over the past half-century. The intended audience includes researchers and practitioners who are increasingly using unsupervised learning algorithms to analyze their data. Topics of interest include anomaly detection, clustering,...

  19. Combining Unsupervised and Supervised Statistical Learning Methods for Currency Exchange Rate Forecasting

    OpenAIRE

    Vasiljeva, Polina

    2016-01-01

    In this thesis we revisit the challenging problem of forecasting currency exchange rate. We combine machine learning methods such as agglomerative hierarchical clustering and random forest to construct a two-step approach for predicting movements in currency exchange prices of the Swedish krona and the US dollar. We use a data set with over 200 predictors comprised of different financial and macro-economic time series and their transformations. We perform forecasting for one week ahead with d...

  20. An Accurate CT Saturation Classification Using a Deep Learning Approach Based on Unsupervised Feature Extraction and Supervised Fine-Tuning Strategy

    Directory of Open Access Journals (Sweden)

    Muhammad Ali

    2017-11-01

    Full Text Available Current transformer (CT saturation is one of the significant problems for protection engineers. If CT saturation is not tackled properly, it can cause a disastrous effect on the stability of the power system, and may even create a complete blackout. To cope with CT saturation properly, an accurate detection or classification should be preceded. Recently, deep learning (DL methods have brought a subversive revolution in the field of artificial intelligence (AI. This paper presents a new DL classification method based on unsupervised feature extraction and supervised fine-tuning strategy to classify the saturated and unsaturated regions in case of CT saturation. In other words, if protection system is subjected to a CT saturation, proposed method will correctly classify the different levels of saturation with a high accuracy. Traditional AI methods are mostly based on supervised learning and rely heavily on human crafted features. This paper contributes to an unsupervised feature extraction, using autoencoders and deep neural networks (DNNs to extract features automatically without prior knowledge of optimal features. To validate the effectiveness of proposed method, a variety of simulation tests are conducted, and classification results are analyzed using standard classification metrics. Simulation results confirm that proposed method classifies the different levels of CT saturation with a remarkable accuracy and has unique feature extraction capabilities. Lastly, we provided a potential future research direction to conclude this paper.

  1. An Overview of Deep Learning Based Methods for Unsupervised and Semi-Supervised Anomaly Detection in Videos

    Directory of Open Access Journals (Sweden)

    B. Ravi Kiran

    2018-02-01

    Full Text Available Videos represent the primary source of information for surveillance applications. Video material is often available in large quantities but in most cases it contains little or no annotation for supervised learning. This article reviews the state-of-the-art deep learning based methods for video anomaly detection and categorizes them based on the type of model and criteria of detection. We also perform simple studies to understand the different approaches and provide the criteria of evaluation for spatio-temporal anomaly detection.

  2. Supervised Learning

    Science.gov (United States)

    Rokach, Lior; Maimon, Oded

    This chapter summarizes the fundamental aspects of supervised methods. The chapter provides an overview of concepts from various interrelated fields used in subsequent chapters. It presents basic definitions and arguments from the supervised machine learning literature and considers various issues, such as performance evaluation techniques and challenges for data mining tasks.

  3. Concept formation knowledge and experience in unsupervised learning

    CERN Document Server

    Fisher, Douglas H; Langley, Pat

    1991-01-01

    Concept Formation: Knowledge and Experience in Unsupervised Learning presents the interdisciplinary interaction between machine learning and cognitive psychology on unsupervised incremental methods. This book focuses on measures of similarity, strategies for robust incremental learning, and the psychological consistency of various approaches.Organized into three parts encompassing 15 chapters, this book begins with an overview of inductive concept learning in machine learning and psychology, with emphasis on issues that distinguish concept formation from more prevalent supervised methods and f

  4. Clustervision: Visual Supervision of Unsupervised Clustering.

    Science.gov (United States)

    Kwon, Bum Chul; Eysenbach, Ben; Verma, Janu; Ng, Kenney; De Filippi, Christopher; Stewart, Walter F; Perer, Adam

    2018-01-01

    Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While having access to a wide variety of algorithms is helpful, in practice, it is quite difficult for data scientists to choose and parameterize algorithms to get the clustering results relevant for their dataset and analytical tasks. To alleviate this problem, we built Clustervision, a visual analytics tool that helps ensure data scientists find the right clustering among the large amount of techniques and parameters available. Our system clusters data using a variety of clustering techniques and parameters and then ranks clustering results utilizing five quality metrics. In addition, users can guide the system to produce more relevant results by providing task-relevant constraints on the data. Our visual user interface allows users to find high quality clustering results, explore the clusters using several coordinated visualization techniques, and select the cluster result that best suits their task. We demonstrate this novel approach using a case study with a team of researchers in the medical domain and showcase that our system empowers users to choose an effective representation of their complex data.

  5. Supervised and unsupervised condition monitoring of non-stationary acoustic emission signals

    DEFF Research Database (Denmark)

    Sigurdsson, Sigurdur; Pontoppidan, Niels Henrik; Larsen, Jan

    2005-01-01

    condition changes across load changes. In this paper we approach this load interpolation problem with supervised and unsupervised learning, i.e. model with normal and fault examples and normal examples only, respectively. We apply non-linear methods for the learning of engine condition changes. Both...

  6. Specialization processes in on-line unsupervised learning

    NARCIS (Netherlands)

    Biehl, M.; Freking, A.; Reents, G.; Schlösser, E.

    1998-01-01

    From the recent analysis of supervised learning by on-line gradient descent in multilayered neural networks it is known that the necessary process of student specialization can be delayed significantly. We demonstrate that this phenomenon also occurs in various models of unsupervised learning. A

  7. Decomposition methods for unsupervised learning

    DEFF Research Database (Denmark)

    Mørup, Morten

    2008-01-01

    This thesis presents the application and development of decomposition methods for Unsupervised Learning. It covers topics from classical factor analysis based decomposition and its variants such as Independent Component Analysis, Non-negative Matrix Factorization and Sparse Coding...... methods and clustering problems is derived both in terms of classical point clustering but also in terms of community detection in complex networks. A guiding principle throughout this thesis is the principle of parsimony. Hence, the goal of Unsupervised Learning is here posed as striving for simplicity...... in the decompositions. Thus, it is demonstrated how a wide range of decomposition methods explicitly or implicitly strive to attain this goal. Applications of the derived decompositions are given ranging from multi-media analysis of image and sound data, analysis of biomedical data such as electroencephalography...

  8. Flexible manifold embedding: a framework for semi-supervised and unsupervised dimension reduction.

    Science.gov (United States)

    Nie, Feiping; Xu, Dong; Tsang, Ivor Wai-Hung; Zhang, Changshui

    2010-07-01

    We propose a unified manifold learning framework for semi-supervised and unsupervised dimension reduction by employing a simple but effective linear regression function to map the new data points. For semi-supervised dimension reduction, we aim to find the optimal prediction labels F for all the training samples X, the linear regression function h(X) and the regression residue F(0) = F - h(X) simultaneously. Our new objective function integrates two terms related to label fitness and manifold smoothness as well as a flexible penalty term defined on the residue F(0). Our Semi-Supervised learning framework, referred to as flexible manifold embedding (FME), can effectively utilize label information from labeled data as well as a manifold structure from both labeled and unlabeled data. By modeling the mismatch between h(X) and F, we show that FME relaxes the hard linear constraint F = h(X) in manifold regularization (MR), making it better cope with the data sampled from a nonlinear manifold. In addition, we propose a simplified version (referred to as FME/U) for unsupervised dimension reduction. We also show that our proposed framework provides a unified view to explain and understand many semi-supervised, supervised and unsupervised dimension reduction techniques. Comprehensive experiments on several benchmark databases demonstrate the significant improvement over existing dimension reduction algorithms.

  9. Unsupervised Learning of Spatiotemporal Features by Video Completion

    OpenAIRE

    Nallabolu, Adithya Reddy

    2017-01-01

    In this work, we present an unsupervised representation learning approach for learning rich spatiotemporal features from videos without the supervision from semantic labels. We propose to learn the spatiotemporal features by training a 3D convolutional neural network (CNN) using video completion as a surrogate task. Using a large collection of unlabeled videos, we train the CNN to predict the missing pixels of a spatiotemporal hole given the remaining parts of the video through minimizing per...

  10. Unsupervised Learning of Action Primitives

    DEFF Research Database (Denmark)

    Baby, Sanmohan; Krüger, Volker; Kragic, Danica

    2010-01-01

    and scale, the use of the object can provide a strong invariant for the detection of motion primitives. In this paper we propose an unsupervised learning approach for action primitives that makes use of the human movements as well as the object state changes. We group actions according to the changes......Action representation is a key issue in imitation learning for humanoids. With the recent finding of mirror neurons there has been a growing interest in expressing actions as a combination meaningful subparts called primitives. Primitives could be thought of as an alphabet for the human actions....... In this paper we observe that human actions and objects can be seen as being intertwined: we can interpret actions from the way the body parts are moving, but as well from how their effect on the involved object. While human movements can look vastly different even under minor changes in location, orientation...

  11. Improved Anomaly Detection using Integrated Supervised and Unsupervised Processing

    Science.gov (United States)

    Hunt, B.; Sheppard, D. G.; Wetterer, C. J.

    There are two broad technologies of signal processing applicable to space object feature identification using nonresolved imagery: supervised processing analyzes a large set of data for common characteristics that can be then used to identify, transform, and extract information from new data taken of the same given class (e.g. support vector machine); unsupervised processing utilizes detailed physics-based models that generate comparison data that can then be used to estimate parameters presumed to be governed by the same models (e.g. estimation filters). Both processes have been used in non-resolved space object identification and yield similar results yet arrived at using vastly different processes. The goal of integrating the results of the two is to seek to achieve an even greater performance by building on the process diversity. Specifically, both supervised processing and unsupervised processing will jointly operate on the analysis of brightness (radiometric flux intensity) measurements reflected by space objects and observed by a ground station to determine whether a particular day conforms to a nominal operating mode (as determined from a training set) or exhibits anomalous behavior where a particular parameter (e.g. attitude, solar panel articulation angle) has changed in some way. It is demonstrated in a variety of different scenarios that the integrated process achieves a greater performance than each of the separate processes alone.

  12. Human semi-supervised learning.

    Science.gov (United States)

    Gibson, Bryan R; Rogers, Timothy T; Zhu, Xiaojin

    2013-01-01

    Most empirical work in human categorization has studied learning in either fully supervised or fully unsupervised scenarios. Most real-world learning scenarios, however, are semi-supervised: Learners receive a great deal of unlabeled information from the world, coupled with occasional experiences in which items are directly labeled by a knowledgeable source. A large body of work in machine learning has investigated how learning can exploit both labeled and unlabeled data provided to a learner. Using equivalences between models found in human categorization and machine learning research, we explain how these semi-supervised techniques can be applied to human learning. A series of experiments are described which show that semi-supervised learning models prove useful for explaining human behavior when exposed to both labeled and unlabeled data. We then discuss some machine learning models that do not have familiar human categorization counterparts. Finally, we discuss some challenges yet to be addressed in the use of semi-supervised models for modeling human categorization. Copyright © 2013 Cognitive Science Society, Inc.

  13. On the asymptotic improvement of supervised learning by utilizing additional unlabeled samples - Normal mixture density case

    Science.gov (United States)

    Shahshahani, Behzad M.; Landgrebe, David A.

    1992-01-01

    The effect of additional unlabeled samples in improving the supervised learning process is studied in this paper. Three learning processes. supervised, unsupervised, and combined supervised-unsupervised, are compared by studying the asymptotic behavior of the estimates obtained under each process. Upper and lower bounds on the asymptotic covariance matrices are derived. It is shown that under a normal mixture density assumption for the probability density function of the feature space, the combined supervised-unsupervised learning is always superior to the supervised learning in achieving better estimates. Experimental results are provided to verify the theoretical concepts.

  14. Supervised and Unsupervised Classification for Pattern Recognition Purposes

    Directory of Open Access Journals (Sweden)

    Catalina COCIANU

    2006-01-01

    Full Text Available A cluster analysis task has to identify the grouping trends of data, to decide on the sound clusters as well as to validate somehow the resulted structure. The identification of the grouping tendency existing in a data collection assumes the selection of a framework stated in terms of a mathematical model allowing to express the similarity degree between couples of particular objects, quasi-metrics expressing the similarity between an object an a cluster and between clusters, respectively. In supervised classification, we are provided with a collection of preclassified patterns, and the problem is to label a newly encountered pattern. Typically, the given training patterns are used to learn the descriptions of classes which in turn are used to label a new pattern. The final section of the paper presents a new methodology for supervised learning based on PCA. The classes are represented in the measurement/feature space by a continuous repartitions

  15. Unsupervised learning of facial expression components

    OpenAIRE

    Egede, Joy Onyekachukwu

    2013-01-01

    The face is one of the most important means of non-verbal communication. A lot of information can be gotten about the emotional state of a person just by merely observing their facial expression. This is relatively easy in face to face communication but not so in human computer interaction. Supervised learning has been widely used by researchers to train machines to recognise facial expressions just like humans. However, supervised learning has significant limitations one of which is the fact...

  16. Unsupervised active learning based on hierarchical graph-theoretic clustering.

    Science.gov (United States)

    Hu, Weiming; Hu, Wei; Xie, Nianhua; Maybank, Steve

    2009-10-01

    Most existing active learning approaches are supervised. Supervised active learning has the following problems: inefficiency in dealing with the semantic gap between the distribution of samples in the feature space and their labels, lack of ability in selecting new samples that belong to new categories that have not yet appeared in the training samples, and lack of adaptability to changes in the semantic interpretation of sample categories. To tackle these problems, we propose an unsupervised active learning framework based on hierarchical graph-theoretic clustering. In the framework, two promising graph-theoretic clustering algorithms, namely, dominant-set clustering and spectral clustering, are combined in a hierarchical fashion. Our framework has some advantages, such as ease of implementation, flexibility in architecture, and adaptability to changes in the labeling. Evaluations on data sets for network intrusion detection, image classification, and video classification have demonstrated that our active learning framework can effectively reduce the workload of manual classification while maintaining a high accuracy of automatic classification. It is shown that, overall, our framework outperforms the support-vector-machine-based supervised active learning, particularly in terms of dealing much more efficiently with new samples whose categories have not yet appeared in the training samples.

  17. Multispectral and Panchromatic used Enhancement Resolution and Study Effective Enhancement on Supervised and Unsupervised Classification Land – Cover

    Science.gov (United States)

    Salman, S. S.; Abbas, W. A.

    2018-05-01

    The goal of the study is to support analysis Enhancement of Resolution and study effect on classification methods on bands spectral information of specific and quantitative approaches. In this study introduce a method to enhancement resolution Landsat 8 of combining the bands spectral of 30 meters resolution with panchromatic band 8 of 15 meters resolution, because of importance multispectral imagery to extracting land - cover. Classification methods used in this study to classify several lands -covers recorded from OLI- 8 imagery. Two methods of Data mining can be classified as either supervised or unsupervised. In supervised methods, there is a particular predefined target, that means the algorithm learn which values of the target are associated with which values of the predictor sample. K-nearest neighbors and maximum likelihood algorithms examine in this work as supervised methods. In other hand, no sample identified as target in unsupervised methods, the algorithm of data extraction searches for structure and patterns between all the variables, represented by Fuzzy C-mean clustering method as one of the unsupervised methods, NDVI vegetation index used to compare the results of classification method, the percent of dense vegetation in maximum likelihood method give a best results.

  18. CHISSL: A Human-Machine Collaboration Space for Unsupervised Learning

    Energy Technology Data Exchange (ETDEWEB)

    Arendt, Dustin L.; Komurlu, Caner; Blaha, Leslie M.

    2017-07-14

    We developed CHISSL, a human-machine interface that utilizes supervised machine learning in an unsupervised context to help the user group unlabeled instances by her own mental model. The user primarily interacts via correction (moving a misplaced instance into its correct group) or confirmation (accepting that an instance is placed in its correct group). Concurrent with the user's interactions, CHISSL trains a classification model guided by the user's grouping of the data. It then predicts the group of unlabeled instances and arranges some of these alongside the instances manually organized by the user. We hypothesize that this mode of human and machine collaboration is more effective than Active Learning, wherein the machine decides for itself which instances should be labeled by the user. We found supporting evidence for this hypothesis in a pilot study where we applied CHISSL to organize a collection of handwritten digits.

  19. Unsupervised learning of facial emotion decoding skills

    Directory of Open Access Journals (Sweden)

    Jan Oliver Huelle

    2014-02-01

    Full Text Available Research on the mechanisms underlying human facial emotion recognition has long focussed on genetically determined neural algorithms and often neglected the question of how these algorithms might be tuned by social learning. Here we show that facial emotion decoding skills can be significantly and sustainably improved by practise without an external teaching signal. Participants saw video clips of dynamic facial expressions of five different women and were asked to decide which of four possible emotions (anger, disgust, fear and sadness was shown in each clip. Although no external information about the correctness of the participant’s response or the sender’s true affective state was provided, participants showed a significant increase of facial emotion recognition accuracy both within and across two training sessions two days to several weeks apart. We discuss several similarities and differences between the unsupervised improvement of facial decoding skills observed in the current study, unsupervised perceptual learning of simple stimuli described in previous studies and practise effects often observed in cognitive tasks.

  20. Unsupervised learning of facial emotion decoding skills.

    Science.gov (United States)

    Huelle, Jan O; Sack, Benjamin; Broer, Katja; Komlewa, Irina; Anders, Silke

    2014-01-01

    Research on the mechanisms underlying human facial emotion recognition has long focussed on genetically determined neural algorithms and often neglected the question of how these algorithms might be tuned by social learning. Here we show that facial emotion decoding skills can be significantly and sustainably improved by practice without an external teaching signal. Participants saw video clips of dynamic facial expressions of five different women and were asked to decide which of four possible emotions (anger, disgust, fear, and sadness) was shown in each clip. Although no external information about the correctness of the participant's response or the sender's true affective state was provided, participants showed a significant increase of facial emotion recognition accuracy both within and across two training sessions two days to several weeks apart. We discuss several similarities and differences between the unsupervised improvement of facial decoding skills observed in the current study, unsupervised perceptual learning of simple stimuli described in previous studies and practice effects often observed in cognitive tasks.

  1. Weakly Supervised Dictionary Learning

    Science.gov (United States)

    You, Zeyu; Raich, Raviv; Fern, Xiaoli Z.; Kim, Jinsub

    2018-05-01

    We present a probabilistic modeling and inference framework for discriminative analysis dictionary learning under a weak supervision setting. Dictionary learning approaches have been widely used for tasks such as low-level signal denoising and restoration as well as high-level classification tasks, which can be applied to audio and image analysis. Synthesis dictionary learning aims at jointly learning a dictionary and corresponding sparse coefficients to provide accurate data representation. This approach is useful for denoising and signal restoration, but may lead to sub-optimal classification performance. By contrast, analysis dictionary learning provides a transform that maps data to a sparse discriminative representation suitable for classification. We consider the problem of analysis dictionary learning for time-series data under a weak supervision setting in which signals are assigned with a global label instead of an instantaneous label signal. We propose a discriminative probabilistic model that incorporates both label information and sparsity constraints on the underlying latent instantaneous label signal using cardinality control. We present the expectation maximization (EM) procedure for maximum likelihood estimation (MLE) of the proposed model. To facilitate a computationally efficient E-step, we propose both a chain and a novel tree graph reformulation of the graphical model. The performance of the proposed model is demonstrated on both synthetic and real-world data.

  2. Learning Dynamics in Doctoral Supervision

    DEFF Research Database (Denmark)

    Kobayashi, Sofie

    investigates learning opportunities in supervision with multiple supervisors. This was investigated through observations and recording of supervision, and subsequent analysis of transcripts. The analyses used different perspectives on learning; learning as participation, positioning theory and variation theory....... The research illuminates how learning opportunities are created in the interaction through the scientific discussions. It also shows how multiple supervisors can contribute to supervision by providing new perspectives and opinions that have a potential for creating new understandings. The combination...... of different theoretical frameworks from the perspectives of learning as individual acquisition and a sociocultural perspective on learning contributed to a nuanced illustration of the otherwise implicit practices of supervision....

  3. Weakly supervised visual dictionary learning by harnessing image attributes.

    Science.gov (United States)

    Gao, Yue; Ji, Rongrong; Liu, Wei; Dai, Qionghai; Hua, Gang

    2014-12-01

    Bag-of-features (BoFs) representation has been extensively applied to deal with various computer vision applications. To extract discriminative and descriptive BoF, one important step is to learn a good dictionary to minimize the quantization loss between local features and codewords. While most existing visual dictionary learning approaches are engaged with unsupervised feature quantization, the latest trend has turned to supervised learning by harnessing the semantic labels of images or regions. However, such labels are typically too expensive to acquire, which restricts the scalability of supervised dictionary learning approaches. In this paper, we propose to leverage image attributes to weakly supervise the dictionary learning procedure without requiring any actual labels. As a key contribution, our approach establishes a generative hidden Markov random field (HMRF), which models the quantized codewords as the observed states and the image attributes as the hidden states, respectively. Dictionary learning is then performed by supervised grouping the observed states, where the supervised information is stemmed from the hidden states of the HMRF. In such a way, the proposed dictionary learning approach incorporates the image attributes to learn a semantic-preserving BoF representation without any genuine supervision. Experiments in large-scale image retrieval and classification tasks corroborate that our approach significantly outperforms the state-of-the-art unsupervised dictionary learning approaches.

  4. Deep supervised, but not unsupervised, models may explain IT cortical representation.

    Directory of Open Access Journals (Sweden)

    Seyed-Mahdi Khaligh-Razavi

    2014-11-01

    Full Text Available Inferior temporal (IT cortex in human and nonhuman primates serves visual object recognition. Computational object-vision models, although continually improving, do not yet reach human performance. It is unclear to what extent the internal representations of computational models can explain the IT representation. Here we investigate a wide range of computational model representations (37 in total, testing their categorization performance and their ability to account for the IT representational geometry. The models include well-known neuroscientific object-recognition models (e.g. HMAX, VisNet along with several models from computer vision (e.g. SIFT, GIST, self-similarity features, and a deep convolutional neural network. We compared the representational dissimilarity matrices (RDMs of the model representations with the RDMs obtained from human IT (measured with fMRI and monkey IT (measured with cell recording for the same set of stimuli (not used in training the models. Better performing models were more similar to IT in that they showed greater clustering of representational patterns by category. In addition, better performing models also more strongly resembled IT in terms of their within-category representational dissimilarities. Representational geometries were significantly correlated between IT and many of the models. However, the categorical clustering observed in IT was largely unexplained by the unsupervised models. The deep convolutional network, which was trained by supervision with over a million category-labeled images, reached the highest categorization performance and also best explained IT, although it did not fully explain the IT data. Combining the features of this model with appropriate weights and adding linear combinations that maximize the margin between animate and inanimate objects and between faces and other objects yielded a representation that fully explained our IT data. Overall, our results suggest that explaining

  5. Towards Statistical Unsupervised Online Learning for Music Listening with Hearing Devices

    DEFF Research Database (Denmark)

    Purwins, Hendrik; Marchini, Marco; Marxer, Richard

    of sounds into phonetic/instrument categories and learning of instrument event sequences is performed jointly using a Hierarchical Dirichlet Process Hidden Markov Model. Whereas machines often learn by processing a large data base and subsequently updating parameters of the algorithm, humans learn...... and their respective transition counts. We propose to use online learning for the co-evolution of both CI user and machine in (re-)learning musical language. [1] Marco Marchini and Hendrik Purwins. Unsupervised analysis and generation of audio percussion sequences. In International Symposium on Computer Music Modeling...... categories) as well as the temporal context horizon (e.g. storing up to 2-note sequences or up to 10-note sequences) is adaptable. The framework in [1] is based on two cognitively plausible principles: unsupervised learning and statistical learning. Opposed to supervised learning in primary school children...

  6. Supervised and Unsupervised Aspect Category Detection for Sentiment Analysis with Co-occurrence Data.

    Science.gov (United States)

    Schouten, Kim; van der Weijde, Onne; Frasincar, Flavius; Dekker, Rommert

    2018-04-01

    Using online consumer reviews as electronic word of mouth to assist purchase-decision making has become increasingly popular. The Web provides an extensive source of consumer reviews, but one can hardly read all reviews to obtain a fair evaluation of a product or service. A text processing framework that can summarize reviews, would therefore be desirable. A subtask to be performed by such a framework would be to find the general aspect categories addressed in review sentences, for which this paper presents two methods. In contrast to most existing approaches, the first method presented is an unsupervised method that applies association rule mining on co-occurrence frequency data obtained from a corpus to find these aspect categories. While not on par with state-of-the-art supervised methods, the proposed unsupervised method performs better than several simple baselines, a similar but supervised method, and a supervised baseline, with an -score of 67%. The second method is a supervised variant that outperforms existing methods with an -score of 84%.

  7. Automated lesion detection on MRI scans using combined unsupervised and supervised methods

    International Nuclear Information System (INIS)

    Guo, Dazhou; Fridriksson, Julius; Fillmore, Paul; Rorden, Christopher; Yu, Hongkai; Zheng, Kang; Wang, Song

    2015-01-01

    Accurate and precise detection of brain lesions on MR images (MRI) is paramount for accurately relating lesion location to impaired behavior. In this paper, we present a novel method to automatically detect brain lesions from a T1-weighted 3D MRI. The proposed method combines the advantages of both unsupervised and supervised methods. First, unsupervised methods perform a unified segmentation normalization to warp images from the native space into a standard space and to generate probability maps for different tissue types, e.g., gray matter, white matter and fluid. This allows us to construct an initial lesion probability map by comparing the normalized MRI to healthy control subjects. Then, we perform non-rigid and reversible atlas-based registration to refine the probability maps of gray matter, white matter, external CSF, ventricle, and lesions. These probability maps are combined with the normalized MRI to construct three types of features, with which we use supervised methods to train three support vector machine (SVM) classifiers for a combined classifier. Finally, the combined classifier is used to accomplish lesion detection. We tested this method using T1-weighted MRIs from 60 in-house stroke patients. Using leave-one-out cross validation, the proposed method can achieve an average Dice coefficient of 73.1 % when compared to lesion maps hand-delineated by trained neurologists. Furthermore, we tested the proposed method on the T1-weighted MRIs in the MICCAI BRATS 2012 dataset. The proposed method can achieve an average Dice coefficient of 66.5 % in comparison to the expert annotated tumor maps provided in MICCAI BRATS 2012 dataset. In addition, on these two test datasets, the proposed method shows competitive performance to three state-of-the-art methods, including Stamatakis et al., Seghier et al., and Sanjuan et al. In this paper, we introduced a novel automated procedure for lesion detection from T1-weighted MRIs by combining both an unsupervised and a

  8. Learning from label proportions in brain-computer interfaces: Online unsupervised learning with guarantees

    Science.gov (United States)

    Verhoeven, Thibault; Schmid, Konstantin; Müller, Klaus-Robert; Tangermann, Michael; Kindermans, Pieter-Jan

    2017-01-01

    Objective Using traditional approaches, a brain-computer interface (BCI) requires the collection of calibration data for new subjects prior to online use. Calibration time can be reduced or eliminated e.g., by subject-to-subject transfer of a pre-trained classifier or unsupervised adaptive classification methods which learn from scratch and adapt over time. While such heuristics work well in practice, none of them can provide theoretical guarantees. Our objective is to modify an event-related potential (ERP) paradigm to work in unison with the machine learning decoder, and thus to achieve a reliable unsupervised calibrationless decoding with a guarantee to recover the true class means. Method We introduce learning from label proportions (LLP) to the BCI community as a new unsupervised, and easy-to-implement classification approach for ERP-based BCIs. The LLP estimates the mean target and non-target responses based on known proportions of these two classes in different groups of the data. We present a visual ERP speller to meet the requirements of LLP. For evaluation, we ran simulations on artificially created data sets and conducted an online BCI study with 13 subjects performing a copy-spelling task. Results Theoretical considerations show that LLP is guaranteed to minimize the loss function similar to a corresponding supervised classifier. LLP performed well in simulations and in the online application, where 84.5% of characters were spelled correctly on average without prior calibration. Significance The continuously adapting LLP classifier is the first unsupervised decoder for ERP BCIs guaranteed to find the optimal decoder. This makes it an ideal solution to avoid tedious calibration sessions. Additionally, LLP works on complementary principles compared to existing unsupervised methods, opening the door for their further enhancement when combined with LLP. PMID:28407016

  9. Automatic microseismic event picking via unsupervised machine learning

    Science.gov (United States)

    Chen, Yangkang

    2018-01-01

    Effective and efficient arrival picking plays an important role in microseismic and earthquake data processing and imaging. Widely used short-term-average long-term-average ratio (STA/LTA) based arrival picking algorithms suffer from the sensitivity to moderate-to-strong random ambient noise. To make the state-of-the-art arrival picking approaches effective, microseismic data need to be first pre-processed, for example, removing sufficient amount of noise, and second analysed by arrival pickers. To conquer the noise issue in arrival picking for weak microseismic or earthquake event, I leverage the machine learning techniques to help recognizing seismic waveforms in microseismic or earthquake data. Because of the dependency of supervised machine learning algorithm on large volume of well-designed training data, I utilize an unsupervised machine learning algorithm to help cluster the time samples into two groups, that is, waveform points and non-waveform points. The fuzzy clustering algorithm has been demonstrated to be effective for such purpose. A group of synthetic, real microseismic and earthquake data sets with different levels of complexity show that the proposed method is much more robust than the state-of-the-art STA/LTA method in picking microseismic events, even in the case of moderately strong background noise.

  10. Estimating extinction using unsupervised machine learning

    Science.gov (United States)

    Meingast, Stefan; Lombardi, Marco; Alves, João

    2017-05-01

    Dust extinction is the most robust tracer of the gas distribution in the interstellar medium, but measuring extinction is limited by the systematic uncertainties involved in estimating the intrinsic colors to background stars. In this paper we present a new technique, Pnicer, that estimates intrinsic colors and extinction for individual stars using unsupervised machine learning algorithms. This new method aims to be free from any priors with respect to the column density and intrinsic color distribution. It is applicable to any combination of parameters and works in arbitrary numbers of dimensions. Furthermore, it is not restricted to color space. Extinction toward single sources is determined by fitting Gaussian mixture models along the extinction vector to (extinction-free) control field observations. In this way it becomes possible to describe the extinction for observed sources with probability densities, rather than a single value. Pnicer effectively eliminates known biases found in similar methods and outperforms them in cases of deep observational data where the number of background galaxies is significant, or when a large number of parameters is used to break degeneracies in the intrinsic color distributions. This new method remains computationally competitive, making it possible to correctly de-redden millions of sources within a matter of seconds. With the ever-increasing number of large-scale high-sensitivity imaging surveys, Pnicer offers a fast and reliable way to efficiently calculate extinction for arbitrary parameter combinations without prior information on source characteristics. The Pnicer software package also offers access to the well-established Nicer technique in a simple unified interface and is capable of building extinction maps including the Nicest correction for cloud substructure. Pnicer is offered to the community as an open-source software solution and is entirely written in Python.

  11. Performance of some supervised and unsupervised multivariate techniques for grouping authentic and unauthentic Viagra and Cialis

    Directory of Open Access Journals (Sweden)

    Michel J. Anzanello

    2014-09-01

    Full Text Available A typical application of multivariate techniques in forensic analysis consists of discriminating between authentic and unauthentic samples of seized drugs, in addition to finding similar properties in the unauthentic samples. In this paper, the performance of several methods belonging to two different classes of multivariate techniques–supervised and unsupervised techniques–were compared. The supervised techniques (ST are the k-Nearest Neighbor (KNN, Support Vector Machine (SVM, Probabilistic Neural Networks (PNN and Linear Discriminant Analysis (LDA; the unsupervised techniques are the k-Means CA and the Fuzzy C-Means (FCM. The methods are applied to Infrared Spectroscopy by Fourier Transform (FTIR from authentic and unauthentic Cialis and Viagra. The FTIR data are also transformed by Principal Components Analysis (PCA and kernel functions aimed at improving the grouping performance. ST proved to be a more reasonable choice when the analysis is conducted on the original data, while the UT led to better results when applied to transformed data.

  12. Genetic classification of populations using supervised learning.

    Directory of Open Access Journals (Sweden)

    Michael Bridges

    2011-05-01

    Full Text Available There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories. This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available.In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines to the classification of three populations (two from Scotland and one from Bulgaria. The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.

  13. Genetic classification of populations using supervised learning.

    LENUS (Irish Health Repository)

    Bridges, Michael

    2011-01-01

    There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available.In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.

  14. Unsupervised process monitoring and fault diagnosis with machine learning methods

    CERN Document Server

    Aldrich, Chris

    2013-01-01

    This unique text/reference describes in detail the latest advances in unsupervised process monitoring and fault diagnosis with machine learning methods. Abundant case studies throughout the text demonstrate the efficacy of each method in real-world settings. The broad coverage examines such cutting-edge topics as the use of information theory to enhance unsupervised learning in tree-based methods, the extension of kernel methods to multiple kernel learning for feature extraction from data, and the incremental training of multilayer perceptrons to construct deep architectures for enhanced data

  15. A Novel Unsupervised Adaptive Learning Method for Long-Term Electromyography (EMG) Pattern Recognition

    Science.gov (United States)

    Huang, Qi; Yang, Dapeng; Jiang, Li; Zhang, Huajie; Liu, Hong; Kotani, Kiyoshi

    2017-01-01

    Performance degradation will be caused by a variety of interfering factors for pattern recognition-based myoelectric control methods in the long term. This paper proposes an adaptive learning method with low computational cost to mitigate the effect in unsupervised adaptive learning scenarios. We presents a particle adaptive classifier (PAC), by constructing a particle adaptive learning strategy and universal incremental least square support vector classifier (LS-SVC). We compared PAC performance with incremental support vector classifier (ISVC) and non-adapting SVC (NSVC) in a long-term pattern recognition task in both unsupervised and supervised adaptive learning scenarios. Retraining time cost and recognition accuracy were compared by validating the classification performance on both simulated and realistic long-term EMG data. The classification results of realistic long-term EMG data showed that the PAC significantly decreased the performance degradation in unsupervised adaptive learning scenarios compared with NSVC (9.03% ± 2.23%, p < 0.05) and ISVC (13.38% ± 2.62%, p = 0.001), and reduced the retraining time cost compared with ISVC (2 ms per updating cycle vs. 50 ms per updating cycle). PMID:28608824

  16. A Novel Unsupervised Adaptive Learning Method for Long-Term Electromyography (EMG Pattern Recognition

    Directory of Open Access Journals (Sweden)

    Qi Huang

    2017-06-01

    Full Text Available Performance degradation will be caused by a variety of interfering factors for pattern recognition-based myoelectric control methods in the long term. This paper proposes an adaptive learning method with low computational cost to mitigate the effect in unsupervised adaptive learning scenarios. We presents a particle adaptive classifier (PAC, by constructing a particle adaptive learning strategy and universal incremental least square support vector classifier (LS-SVC. We compared PAC performance with incremental support vector classifier (ISVC and non-adapting SVC (NSVC in a long-term pattern recognition task in both unsupervised and supervised adaptive learning scenarios. Retraining time cost and recognition accuracy were compared by validating the classification performance on both simulated and realistic long-term EMG data. The classification results of realistic long-term EMG data showed that the PAC significantly decreased the performance degradation in unsupervised adaptive learning scenarios compared with NSVC (9.03% ± 2.23%, p < 0.05 and ISVC (13.38% ± 2.62%, p = 0.001, and reduced the retraining time cost compared with ISVC (2 ms per updating cycle vs. 50 ms per updating cycle.

  17. QUEST: Eliminating Online Supervised Learning for Efficient Classification Algorithms

    Directory of Open Access Journals (Sweden)

    Ardjan Zwartjes

    2016-10-01

    Full Text Available In this work, we introduce QUEST (QUantile Estimation after Supervised Training, an adaptive classification algorithm for Wireless Sensor Networks (WSNs that eliminates the necessity for online supervised learning. Online processing is important for many sensor network applications. Transmitting raw sensor data puts high demands on the battery, reducing network life time. By merely transmitting partial results or classifications based on the sampled data, the amount of traffic on the network can be significantly reduced. Such classifications can be made by learning based algorithms using sampled data. An important issue, however, is the training phase of these learning based algorithms. Training a deployed sensor network requires a lot of communication and an impractical amount of human involvement. QUEST is a hybrid algorithm that combines supervised learning in a controlled environment with unsupervised learning on the location of deployment. Using the SITEX02 dataset, we demonstrate that the presented solution works with a performance penalty of less than 10% in 90% of the tests. Under some circumstances, it even outperforms a network of classifiers completely trained with supervised learning. As a result, the need for on-site supervised learning and communication for training is completely eliminated by our solution.

  18. QUEST: Eliminating Online Supervised Learning for Efficient Classification Algorithms.

    Science.gov (United States)

    Zwartjes, Ardjan; Havinga, Paul J M; Smit, Gerard J M; Hurink, Johann L

    2016-10-01

    In this work, we introduce QUEST (QUantile Estimation after Supervised Training), an adaptive classification algorithm for Wireless Sensor Networks (WSNs) that eliminates the necessity for online supervised learning. Online processing is important for many sensor network applications. Transmitting raw sensor data puts high demands on the battery, reducing network life time. By merely transmitting partial results or classifications based on the sampled data, the amount of traffic on the network can be significantly reduced. Such classifications can be made by learning based algorithms using sampled data. An important issue, however, is the training phase of these learning based algorithms. Training a deployed sensor network requires a lot of communication and an impractical amount of human involvement. QUEST is a hybrid algorithm that combines supervised learning in a controlled environment with unsupervised learning on the location of deployment. Using the SITEX02 dataset, we demonstrate that the presented solution works with a performance penalty of less than 10% in 90% of the tests. Under some circumstances, it even outperforms a network of classifiers completely trained with supervised learning. As a result, the need for on-site supervised learning and communication for training is completely eliminated by our solution.

  19. A SURVEY OF SEMI-SUPERVISED LEARNING

    OpenAIRE

    Amrita Sadarangani *, Dr. Anjali Jivani

    2016-01-01

    Semi Supervised Learning involves using both labeled and unlabeled data to train a classifier or for clustering. Semi supervised learning finds usage in many applications, since labeled data can be hard to find in many cases. Currently, a lot of research is being conducted in this area. This paper discusses the different algorithms of semi supervised learning and then their advantages and limitations are compared. The differences between supervised classification and semi-supervised classific...

  20. Spectral Learning for Supervised Topic Models.

    Science.gov (United States)

    Ren, Yong; Wang, Yining; Zhu, Jun

    2018-03-01

    Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on variational approximation or Monte Carlo sampling, which often suffers from the local minimum defect. Spectral methods have been applied to learn unsupervised topic models, such as latent Dirichlet allocation (LDA), with provable guarantees. This paper investigates the possibility of applying spectral methods to recover the parameters of supervised LDA (sLDA). We first present a two-stage spectral method, which recovers the parameters of LDA followed by a power update method to recover the regression model parameters. Then, we further present a single-phase spectral algorithm to jointly recover the topic distribution matrix as well as the regression weights. Our spectral algorithms are provably correct and computationally efficient. We prove a sample complexity bound for each algorithm and subsequently derive a sufficient condition for the identifiability of sLDA. Thorough experiments on synthetic and real-world datasets verify the theory and demonstrate the practical effectiveness of the spectral algorithms. In fact, our results on a large-scale review rating dataset demonstrate that our single-phase spectral algorithm alone gets comparable or even better performance than state-of-the-art methods, while previous work on spectral methods has rarely reported such promising performance.

  1. Intelligent Fault Diagnosis of Rotary Machinery Based on Unsupervised Multiscale Representation Learning

    Science.gov (United States)

    Jiang, Guo-Qian; Xie, Ping; Wang, Xiao; Chen, Meng; He, Qun

    2017-11-01

    The performance of traditional vibration based fault diagnosis methods greatly depends on those handcrafted features extracted using signal processing algorithms, which require significant amounts of domain knowledge and human labor, and do not generalize well to new diagnosis domains. Recently, unsupervised representation learning provides an alternative promising solution to feature extraction in traditional fault diagnosis due to its superior learning ability from unlabeled data. Given that vibration signals usually contain multiple temporal structures, this paper proposes a multiscale representation learning (MSRL) framework to learn useful features directly from raw vibration signals, with the aim to capture rich and complementary fault pattern information at different scales. In our proposed approach, a coarse-grained procedure is first employed to obtain multiple scale signals from an original vibration signal. Then, sparse filtering, a newly developed unsupervised learning algorithm, is applied to automatically learn useful features from each scale signal, respectively, and then the learned features at each scale to be concatenated one by one to obtain multiscale representations. Finally, the multiscale representations are fed into a supervised classifier to achieve diagnosis results. Our proposed approach is evaluated using two different case studies: motor bearing and wind turbine gearbox fault diagnosis. Experimental results show that the proposed MSRL approach can take full advantages of the availability of unlabeled data to learn discriminative features and achieved better performance with higher accuracy and stability compared to the traditional approaches.

  2. Unsupervised feature learning for autonomous rock image classification

    Science.gov (United States)

    Shu, Lei; McIsaac, Kenneth; Osinski, Gordon R.; Francis, Raymond

    2017-09-01

    Autonomous rock image classification can enhance the capability of robots for geological detection and enlarge the scientific returns, both in investigation on Earth and planetary surface exploration on Mars. Since rock textural images are usually inhomogeneous and manually hand-crafting features is not always reliable, we propose an unsupervised feature learning method to autonomously learn the feature representation for rock images. In our tests, rock image classification using the learned features shows that the learned features can outperform manually selected features. Self-taught learning is also proposed to learn the feature representation from a large database of unlabelled rock images of mixed class. The learned features can then be used repeatedly for classification of any subclass. This takes advantage of the large dataset of unlabelled rock images and learns a general feature representation for many kinds of rocks. We show experimental results supporting the feasibility of self-taught learning on rock images.

  3. Towards unsupervised ontology learning from data

    CSIR Research Space (South Africa)

    Klarman, S

    2015-07-01

    Full Text Available from facts [Shapiro, 1981], finite automata descriptions from observations [Pitt, 1989], logic programs from interpretations [De Raedt and Lavracˇ, 1993; De Raedt, 1994]. In the area of DLs, a few learning scenarios have been formally addressed..., concerned largely with learning concept descriptions via different learn- ing operators [Straccia and Mucci, 2015; Lehmann and Hit- zler, 2008; Fanizzi et al., 2008; Cohen and Hirsh, 1994] and applications of formal concept analysis techniques to auto- mated...

  4. Interactive Algorithms for Unsupervised Machine Learning

    Science.gov (United States)

    2015-06-01

    in Neural Information Processing Systems, 2013. 14 [3] Louigi Addario-Berry, Nicolas Broutin, Luc Devroye, and Gábor Lugosi. On combinato- rial...Myung Jin Choi, Vincent Y F Tan , Animashree Anandkumar, and Alan S Willsky. Learn- ing Latent Tree Graphical Models. Journal of Machine Learning

  5. Coupled Semi-Supervised Learning

    Science.gov (United States)

    2010-05-01

    Additionally, specify the expected category of each relation argument to enable type-checking. Subsystem components and the KI can benefit from methods that...confirm that our coupled semi-supervised learning approaches can scale to hun- dreds of predicates and can benefit from using a diverse set of...organization yes California Institute of Technology vegetable food yes carrots vehicle item yes airplanes vertebrate animal yes videoGame product yes

  6. Slow feature analysis: unsupervised learning of invariances.

    Science.gov (United States)

    Wiskott, Laurenz; Sejnowski, Terrence J

    2002-04-01

    Invariant features of temporally varying signals are useful for analysis and classification. Slow feature analysis (SFA) is a new method for learning invariant or slowly varying features from a vectorial input signal. It is based on a nonlinear expansion of the input signal and application of principal component analysis to this expanded signal and its time derivative. It is guaranteed to find the optimal solution within a family of functions directly and can learn to extract a large number of decorrelated features, which are ordered by their degree of invariance. SFA can be applied hierarchically to process high-dimensional input signals and extract complex features. SFA is applied first to complex cell tuning properties based on simple cell output, including disparity and motion. Then more complicated input-output functions are learned by repeated application of SFA. Finally, a hierarchical network of SFA modules is presented as a simple model of the visual system. The same unstructured network can learn translation, size, rotation, contrast, or, to a lesser degree, illumination invariance for one-dimensional objects, depending on only the training stimulus. Surprisingly, only a few training objects suffice to achieve good generalization to new objects. The generated representation is suitable for object recognition. Performance degrades if the network is trained to learn multiple invariances simultaneously.

  7. Unsupervised Learning (Clustering) of Odontocete Echolocation Clicks

    Science.gov (United States)

    2015-09-30

    develop methods for clustering of marine mammal echolocation clicks to learn about species assemblages where little or no prior knowledge exists about... Mexico or the Atlanic. 2 APPROACH Acoustic encounters with odontocetes are detected automatically and noise-corrected cepstral features...Estmation of Marine Mammals Using Passive Acoustic Monitoring (DCLDE). KL divergence maps were created for all known species, but the sperm whale

  8. Analog memristive synapse in spiking networks implementing unsupervised learning

    Directory of Open Access Journals (Sweden)

    Erika Covi

    2016-10-01

    Full Text Available Emerging brain-inspired architectures call for devices that can emulate the functionality of biological synapses in order to implement new efficient computational schemes able to solve ill-posed problems. Various devices and solutions are still under investigation and, in this respect, a challenge is opened to the researchers in the field. Indeed, the optimal candidate is a device able to reproduce the complete functionality of a synapse, i.e. the typical synaptic process underlying learning in biological systems (activity-dependent synaptic plasticity. This implies a device able to change its resistance (synaptic strength, or weight upon proper electrical stimuli (synaptic activity and showing several stable resistive states throughout its dynamic range (analog behavior. Moreover, it should be able to perform spike timing dependent plasticity (STDP, an associative homosynaptic plasticity learning rule based on the delay time between the two firing neurons the synapse is connected to. This rule is a fundamental learning protocol in state-of-art networks, because it allows unsupervised learning. Notwithstanding this fact, STDP-based unsupervised learning has been proposed several times mainly for binary synapses rather than multilevel synapses composed of many binary memristors. This paper proposes an HfO2-based analog memristor as a synaptic element which performs STDP within a small spiking neuromorphic network operating unsupervised learning for character recognition. The trained network is able to recognize five characters even in case incomplete or noisy characters are displayed and it is robust to a device-to-device variability of up to +/-30%.

  9. Analog Memristive Synapse in Spiking Networks Implementing Unsupervised Learning.

    Science.gov (United States)

    Covi, Erika; Brivio, Stefano; Serb, Alexander; Prodromakis, Themis; Fanciulli, Marco; Spiga, Sabina

    2016-01-01

    Emerging brain-inspired architectures call for devices that can emulate the functionality of biological synapses in order to implement new efficient computational schemes able to solve ill-posed problems. Various devices and solutions are still under investigation and, in this respect, a challenge is opened to the researchers in the field. Indeed, the optimal candidate is a device able to reproduce the complete functionality of a synapse, i.e., the typical synaptic process underlying learning in biological systems (activity-dependent synaptic plasticity). This implies a device able to change its resistance (synaptic strength, or weight) upon proper electrical stimuli (synaptic activity) and showing several stable resistive states throughout its dynamic range (analog behavior). Moreover, it should be able to perform spike timing dependent plasticity (STDP), an associative homosynaptic plasticity learning rule based on the delay time between the two firing neurons the synapse is connected to. This rule is a fundamental learning protocol in state-of-art networks, because it allows unsupervised learning. Notwithstanding this fact, STDP-based unsupervised learning has been proposed several times mainly for binary synapses rather than multilevel synapses composed of many binary memristors. This paper proposes an HfO 2 -based analog memristor as a synaptic element which performs STDP within a small spiking neuromorphic network operating unsupervised learning for character recognition. The trained network is able to recognize five characters even in case incomplete or noisy images are displayed and it is robust to a device-to-device variability of up to ±30%.

  10. Supervised Learning for Dynamical System Learning.

    Science.gov (United States)

    Hefny, Ahmed; Downey, Carlton; Gordon, Geoffrey J

    2015-01-01

    Recently there has been substantial interest in spectral methods for learning dynamical systems. These methods are popular since they often offer a good tradeoff between computational and statistical efficiency. Unfortunately, they can be difficult to use and extend in practice: e.g., they can make it difficult to incorporate prior information such as sparsity or structure. To address this problem, we present a new view of dynamical system learning: we show how to learn dynamical systems by solving a sequence of ordinary supervised learning problems, thereby allowing users to incorporate prior knowledge via standard techniques such as L 1 regularization. Many existing spectral methods are special cases of this new framework, using linear regression as the supervised learner. We demonstrate the effectiveness of our framework by showing examples where nonlinear regression or lasso let us learn better state representations than plain linear regression does; the correctness of these instances follows directly from our general analysis.

  11. Supervised and Unsupervised Self-Testing for HIV in High- and Low-Risk Populations: A Systematic Review

    Science.gov (United States)

    Pant Pai, Nitika; Sharma, Jigyasa; Shivkumar, Sushmita; Pillay, Sabrina; Vadnais, Caroline; Joseph, Lawrence; Dheda, Keertan; Peeling, Rosanna W.

    2013-01-01

    Background Stigma, discrimination, lack of privacy, and long waiting times partly explain why six out of ten individuals living with HIV do not access facility-based testing. By circumventing these barriers, self-testing offers potential for more people to know their sero-status. Recent approval of an in-home HIV self test in the US has sparked self-testing initiatives, yet data on acceptability, feasibility, and linkages to care are limited. We systematically reviewed evidence on supervised (self-testing and counselling aided by a health care professional) and unsupervised (performed by self-tester with access to phone/internet counselling) self-testing strategies. Methods and Findings Seven databases (Medline [via PubMed], Biosis, PsycINFO, Cinahl, African Medicus, LILACS, and EMBASE) and conference abstracts of six major HIV/sexually transmitted infections conferences were searched from 1st January 2000–30th October 2012. 1,221 citations were identified and 21 studies included for review. Seven studies evaluated an unsupervised strategy and 14 evaluated a supervised strategy. For both strategies, data on acceptability (range: 74%–96%), preference (range: 61%–91%), and partner self-testing (range: 80%–97%) were high. A high specificity (range: 99.8%–100%) was observed for both strategies, while a lower sensitivity was reported in the unsupervised (range: 92.9%–100%; one study) versus supervised (range: 97.4%–97.9%; three studies) strategy. Regarding feasibility of linkage to counselling and care, 96% (n = 102/106) of individuals testing positive for HIV stated they would seek post-test counselling (unsupervised strategy, one study). No extreme adverse events were noted. The majority of data (n = 11,019/12,402 individuals, 89%) were from high-income settings and 71% (n = 15/21) of studies were cross-sectional in design, thus limiting our analysis. Conclusions Both supervised and unsupervised testing strategies were highly acceptable

  12. On parameterized deformations and unsupervised learning

    DEFF Research Database (Denmark)

    Hansen, Michael Sass

    matrix. Spline approximations of functions and in particular image registration warp fields are discussed. It is shown how spline bases may be learned from the optimization process, i.e. image registration optimization, and how this may contribute with a reasonable prior, or regularization in the method...... on an unrestricted linear parameter space, where all derivatives are defined, is introduced. Furthermore, it is shown that L2-norm the parameter space introduces a reasonable metric in the actual space of modelled diffeomorphisms. A new parametrization of 3D deformation fields, using potentials and Helmholtz...... of the multivariate B-splines, the warp field is automatically refined in areas where it results in the minimization of the registration cost function....

  13. A new avenue for classification and prediction of olive cultivars using supervised and unsupervised algorithms.

    Directory of Open Access Journals (Sweden)

    Amir H Beiki

    Full Text Available Various methods have been used to identify cultivares of olive trees; herein we used different bioinformatics algorithms to propose new tools to classify 10 cultivares of olive based on RAPD and ISSR genetic markers datasets generated from PCR reactions. Five RAPD markers (OPA0a21, OPD16a, OP01a1, OPD16a1 and OPA0a8 and five ISSR markers (UBC841a4, UBC868a7, UBC841a14, U12BC807a and UBC810a13 selected as the most important markers by all attribute weighting models. K-Medoids unsupervised clustering run on SVM dataset was fully able to cluster each olive cultivar to the right classes. All trees (176 induced by decision tree models generated meaningful trees and UBC841a4 attribute clearly distinguished between foreign and domestic olive cultivars with 100% accuracy. Predictive machine learning algorithms (SVM and Naïve Bayes were also able to predict the right class of olive cultivares with 100% accuracy. For the first time, our results showed data mining techniques can be effectively used to distinguish between plant cultivares and proposed machine learning based systems in this study can predict new olive cultivars with the best possible accuracy.

  14. Supervised Convolutional Sparse Coding

    KAUST Repository

    Affara, Lama Ahmed; Ghanem, Bernard; Wonka, Peter

    2018-01-01

    coding, which aims at learning discriminative dictionaries instead of purely reconstructive ones. We incorporate a supervised regularization term into the traditional unsupervised CSC objective to encourage the final dictionary elements

  15. Unsupervised spike sorting based on discriminative subspace learning.

    Science.gov (United States)

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2014-01-01

    Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. In this paper, we present two unsupervised spike sorting algorithms based on discriminative subspace learning. The first algorithm simultaneously learns the discriminative feature subspace and performs clustering. It uses histogram of features in the most discriminative projection to detect the number of neurons. The second algorithm performs hierarchical divisive clustering that learns a discriminative 1-dimensional subspace for clustering in each level of the hierarchy until achieving almost unimodal distribution in the subspace. The algorithms are tested on synthetic and in-vivo data, and are compared against two widely used spike sorting methods. The comparative results demonstrate that our spike sorting methods can achieve substantially higher accuracy in lower dimensional feature space, and they are highly robust to noise. Moreover, they provide significantly better cluster separability in the learned subspace than in the subspace obtained by principal component analysis or wavelet transform.

  16. A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning

    DEFF Research Database (Denmark)

    Fraccaro, Marco; Kamronn, Simon Due; Paquet, Ulrich

    2017-01-01

    This paper takes a step towards temporal reasoning in a dynamically changing video, not in the pixel space that constitutes its frames, but in a latent space that describes the non-linear dynamics of the objects in its world. We introduce the Kalman variational auto-encoder, a framework...... for unsupervised learning of sequential data that disentangles two latent representations: an object’s representation, coming from a recognition model, and a latent state describing its dynamics. As a result, the evolution of the world can be imagined and missing data imputed, both without the need to generate...

  17. Unsupervised behaviour-specific dictionary learning for abnormal event detection

    DEFF Research Database (Denmark)

    Ren, Huamin; Liu, Weifeng; Olsen, Søren Ingvor

    2015-01-01

    the training data is only a small proportion of the surveillance data. Therefore, we propose behavior-specific dictionaries (BSD) through unsupervised learning, pursuing atoms from the same type of behavior to represent one behavior dictionary. To further improve the dictionary by introducing information from...... potential infrequent normal patterns, we refine the dictionary by searching ‘missed atoms’ that have compact coefficients. Experimental results show that our BSD algorithm outperforms state-of-the-art dictionaries in abnormal event detection on the public UCSD dataset. Moreover, BSD has less false alarms...

  18. Supervised and Unsupervised Speaker Adaptation in the NIST 2005 Speaker Recognition Evaluation

    National Research Council Canada - National Science Library

    Hansen, Eric G; Slyh, Raymond E; Anderson, Timothy R

    2006-01-01

    Starting in 2004, the annual NIST Speaker Recognition Evaluation (SRE) has added an optional unsupervised speaker adaptation track where test files are processed sequentially and one may update the target model...

  19. Unsupervised Feature Learning for Heart Sounds Classification Using Autoencoder

    Science.gov (United States)

    Hu, Wei; Lv, Jiancheng; Liu, Dongbo; Chen, Yao

    2018-04-01

    Cardiovascular disease seriously threatens the health of many people. It is usually diagnosed during cardiac auscultation, which is a fast and efficient method of cardiovascular disease diagnosis. In recent years, deep learning approach using unsupervised learning has made significant breakthroughs in many fields. However, to our knowledge, deep learning has not yet been used for heart sound classification. In this paper, we first use the average Shannon energy to extract the envelope of the heart sounds, then find the highest point of S1 to extract the cardiac cycle. We convert the time-domain signals of the cardiac cycle into spectrograms and apply principal component analysis whitening to reduce the dimensionality of the spectrogram. Finally, we apply a two-layer autoencoder to extract the features of the spectrogram. The experimental results demonstrate that the features from the autoencoder are suitable for heart sound classification.

  20. An automatic taxonomy of galaxy morphology using unsupervised machine learning

    Science.gov (United States)

    Hocking, Alex; Geach, James E.; Sun, Yi; Davey, Neil

    2018-01-01

    We present an unsupervised machine learning technique that automatically segments and labels galaxies in astronomical imaging surveys using only pixel data. Distinct from previous unsupervised machine learning approaches used in astronomy we use no pre-selection or pre-filtering of target galaxy type to identify galaxies that are similar. We demonstrate the technique on the Hubble Space Telescope (HST) Frontier Fields. By training the algorithm using galaxies from one field (Abell 2744) and applying the result to another (MACS 0416.1-2403), we show how the algorithm can cleanly separate early and late type galaxies without any form of pre-directed training for what an 'early' or 'late' type galaxy is. We then apply the technique to the HST Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) fields, creating a catalogue of approximately 60 000 classifications. We show how the automatic classification groups galaxies of similar morphological (and photometric) type and make the classifications public via a catalogue, a visual catalogue and galaxy similarity search. We compare the CANDELS machine-based classifications to human-classifications from the Galaxy Zoo: CANDELS project. Although there is not a direct mapping between Galaxy Zoo and our hierarchical labelling, we demonstrate a good level of concordance between human and machine classifications. Finally, we show how the technique can be used to identify rarer objects and present lensed galaxy candidates from the CANDELS imaging.

  1. Unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma.

    Science.gov (United States)

    Young, Jonathan D; Cai, Chunhui; Lu, Xinghua

    2017-10-03

    One approach to improving the personalized treatment of cancer is to understand the cellular signaling transduction pathways that cause cancer at the level of the individual patient. In this study, we used unsupervised deep learning to learn the hierarchical structure within cancer gene expression data. Deep learning is a group of machine learning algorithms that use multiple layers of hidden units to capture hierarchically related, alternative representations of the input data. We hypothesize that this hierarchical structure learned by deep learning will be related to the cellular signaling system. Robust deep learning model selection identified a network architecture that is biologically plausible. Our model selection results indicated that the 1st hidden layer of our deep learning model should contain about 1300 hidden units to most effectively capture the covariance structure of the input data. This agrees with the estimated number of human transcription factors, which is approximately 1400. This result lends support to our hypothesis that the 1st hidden layer of a deep learning model trained on gene expression data may represent signals related to transcription factor activation. Using the 3rd hidden layer representation of each tumor as learned by our unsupervised deep learning model, we performed consensus clustering on all tumor samples-leading to the discovery of clusters of glioblastoma multiforme with differential survival. One of these clusters contained all of the glioblastoma samples with G-CIMP, a known methylation phenotype driven by the IDH1 mutation and associated with favorable prognosis, suggesting that the hidden units in the 3rd hidden layer representations captured a methylation signal without explicitly using methylation data as input. We also found differentially expressed genes and well-known mutations (NF1, IDH1, EGFR) that were uniquely correlated with each of these clusters. Exploring these unique genes and mutations will allow us to

  2. Unsupervised versus Supervised Identification of Prognostic Factors in Patients with Localized Retroperitoneal Sarcoma: A Data Clustering and Mahalanobis Distance Approach

    Directory of Open Access Journals (Sweden)

    Rita De Sanctis

    2018-01-01

    Full Text Available The aim of this report is to unveil specific prognostic factors for retroperitoneal sarcoma (RPS patients by univariate and multivariate statistical techniques. A phase I-II study on localized RPS treated with high-dose ifosfamide and radiotherapy followed by surgery (ISG-STS 0303 protocol demonstrated that chemo/radiotherapy was safe and increased the 3-year relapse-free survival (RFS with respect to historical controls. Of 70 patients, twenty-six developed local, 10 distant, and 5 combined relapse. Median disease-free interval (DFI was 29.47 months. According to a discriminant function analysis, DFI, histology, relapse pattern, and the first treatment approach at relapse had a statistically significant prognostic impact. Based on scientific literature and clinical expertise, clinicopathological data were analyzed using both a supervised and an unsupervised classification method to predict the prognosis, with similar sample sizes (66 and 65, resp., in casewise approach and 70 in mean-substitution one. This is the first attempt to predict patients’ prognosis by means of multivariate statistics, and in this light, it looks noticable that (i some clinical data have a well-defined prognostic value, (ii the unsupervised model produced comparable results with respect to the supervised one, and (iii the appropriate combination of both models appears fruitful and easily extensible to different clinical contexts.

  3. Accuracy of latent-variable estimation in Bayesian semi-supervised learning.

    Science.gov (United States)

    Yamazaki, Keisuke

    2015-09-01

    Hierarchical probabilistic models, such as Gaussian mixture models, are widely used for unsupervised learning tasks. These models consist of observable and latent variables, which represent the observable data and the underlying data-generation process, respectively. Unsupervised learning tasks, such as cluster analysis, are regarded as estimations of latent variables based on the observable ones. The estimation of latent variables in semi-supervised learning, where some labels are observed, will be more precise than that in unsupervised, and one of the concerns is to clarify the effect of the labeled data. However, there has not been sufficient theoretical analysis of the accuracy of the estimation of latent variables. In a previous study, a distribution-based error function was formulated, and its asymptotic form was calculated for unsupervised learning with generative models. It has been shown that, for the estimation of latent variables, the Bayes method is more accurate than the maximum-likelihood method. The present paper reveals the asymptotic forms of the error function in Bayesian semi-supervised learning for both discriminative and generative models. The results show that the generative model, which uses all of the given data, performs better when the model is well specified. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. Automated age-related macular degeneration classification in OCT using unsupervised feature learning

    Science.gov (United States)

    Venhuizen, Freerk G.; van Ginneken, Bram; Bloemen, Bart; van Grinsven, Mark J. J. P.; Philipsen, Rick; Hoyng, Carel; Theelen, Thomas; Sánchez, Clara I.

    2015-03-01

    Age-related Macular Degeneration (AMD) is a common eye disorder with high prevalence in elderly people. The disease mainly affects the central part of the retina, and could ultimately lead to permanent vision loss. Optical Coherence Tomography (OCT) is becoming the standard imaging modality in diagnosis of AMD and the assessment of its progression. However, the evaluation of the obtained volumetric scan is time consuming, expensive and the signs of early AMD are easy to miss. In this paper we propose a classification method to automatically distinguish AMD patients from healthy subjects with high accuracy. The method is based on an unsupervised feature learning approach, and processes the complete image without the need for an accurate pre-segmentation of the retina. The method can be divided in two steps: an unsupervised clustering stage that extracts a set of small descriptive image patches from the training data, and a supervised training stage that uses these patches to create a patch occurrence histogram for every image on which a random forest classifier is trained. Experiments using 384 volume scans show that the proposed method is capable of identifying AMD patients with high accuracy, obtaining an area under the Receiver Operating Curve of 0:984. Our method allows for a quick and reliable assessment of the presence of AMD pathology in OCT volume scans without the need for accurate layer segmentation algorithms.

  5. Application of cluster analysis and unsupervised learning to multivariate tissue characterization

    International Nuclear Information System (INIS)

    Momenan, R.; Insana, M.F.; Wagner, R.F.; Garra, B.S.; Loew, M.H.

    1987-01-01

    This paper describes a procedure for classifying tissue types from unlabeled acoustic measurements (data type unknown) using unsupervised cluster analysis. These techniques are being applied to unsupervised ultrasonic image segmentation and tissue characterization. The performance of a new clustering technique is measured and compared with supervised methods, such as a linear Bayes classifier. In these comparisons two objectives are sought: a) How well does the clustering method group the data?; b) Do the clusters correspond to known tissue classes? The first question is investigated by a measure of cluster similarity and dispersion. The second question involves a comparison with a supervised technique using labeled data

  6. Continuous Online Sequence Learning with an Unsupervised Neural Network Model.

    Science.gov (United States)

    Cui, Yuwei; Ahmad, Subutar; Hawkins, Jeff

    2016-09-14

    The ability to recognize and predict temporal sequences of sensory inputs is vital for survival in natural environments. Based on many known properties of cortical neurons, hierarchical temporal memory (HTM) sequence memory recently has been proposed as a theoretical framework for sequence learning in the cortex. In this letter, we analyze properties of HTM sequence memory and apply it to sequence learning and prediction problems with streaming data. We show the model is able to continuously learn a large number of variableorder temporal sequences using an unsupervised Hebbian-like learning rule. The sparse temporal codes formed by the model can robustly handle branching temporal sequences by maintaining multiple predictions until there is sufficient disambiguating evidence. We compare the HTM sequence memory with other sequence learning algorithms, including statistical methods: autoregressive integrated moving average; feedforward neural networks-time delay neural network and online sequential extreme learning machine; and recurrent neural networks-long short-term memory and echo-state networks on sequence prediction problems with both artificial and real-world data. The HTM model achieves comparable accuracy to other state-of-the-art algorithms. The model also exhibits properties that are critical for sequence learning, including continuous online learning, the ability to handle multiple predictions and branching sequences with high-order statistics, robustness to sensor noise and fault tolerance, and good performance without task-specific hyperparameter tuning. Therefore, the HTM sequence memory not only advances our understanding of how the brain may solve the sequence learning problem but is also applicable to real-world sequence learning problems from continuous data streams.

  7. Predicting protein complexes using a supervised learning method combined with local structural information.

    Science.gov (United States)

    Dong, Yadong; Sun, Yongqi; Qin, Chao

    2018-01-01

    The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.

  8. Application of unsupervised learning methods in high energy physics

    Energy Technology Data Exchange (ETDEWEB)

    Koevesarki, Peter; Nuncio Quiroz, Adriana Elizabeth; Brock, Ian C. [Physikalisches Institut, Universitaet Bonn, Bonn (Germany)

    2011-07-01

    High energy physics is a home for a variety of multivariate techniques, mainly due to the fundamentally probabilistic behaviour of nature. These methods generally require training based on some theory, in order to discriminate a known signal from a background. Nevertheless, new physics can show itself in ways that previously no one thought about, and in these cases conventional methods give little or no help. A possible way to discriminate between known processes (like vector bosons or top-quark production) or look for new physics is using unsupervised machine learning to extract the features of the data. A technique was developed, based on the combination of neural networks and the method of principal curves, to find a parametrisation of the non-linear correlations of the data. The feasibility of the method is shown on ATLAS data.

  9. Fully Decentralized Semi-supervised Learning via Privacy-preserving Matrix Completion.

    Science.gov (United States)

    Fierimonte, Roberto; Scardapane, Simone; Uncini, Aurelio; Panella, Massimo

    2016-08-26

    Distributed learning refers to the problem of inferring a function when the training data are distributed among different nodes. While significant work has been done in the contexts of supervised and unsupervised learning, the intermediate case of Semi-supervised learning in the distributed setting has received less attention. In this paper, we propose an algorithm for this class of problems, by extending the framework of manifold regularization. The main component of the proposed algorithm consists of a fully distributed computation of the adjacency matrix of the training patterns. To this end, we propose a novel algorithm for low-rank distributed matrix completion, based on the framework of diffusion adaptation. Overall, the distributed Semi-supervised algorithm is efficient and scalable, and it can preserve privacy by the inclusion of flexible privacy-preserving mechanisms for similarity computation. The experimental results and comparison on a wide range of standard Semi-supervised benchmarks validate our proposal.

  10. Conditional High-Order Boltzmann Machines for Supervised Relation Learning.

    Science.gov (United States)

    Huang, Yan; Wang, Wei; Wang, Liang; Tan, Tieniu

    2017-09-01

    Relation learning is a fundamental problem in many vision tasks. Recently, high-order Boltzmann machine and its variants have shown their great potentials in learning various types of data relation in a range of tasks. But most of these models are learned in an unsupervised way, i.e., without using relation class labels, which are not very discriminative for some challenging tasks, e.g., face verification. In this paper, with the goal to perform supervised relation learning, we introduce relation class labels into conventional high-order multiplicative interactions with pairwise input samples, and propose a conditional high-order Boltzmann Machine (CHBM), which can learn to classify the data relation in a binary classification way. To be able to deal with more complex data relation, we develop two improved variants of CHBM: 1) latent CHBM, which jointly performs relation feature learning and classification, by using a set of latent variables to block the pathway from pairwise input samples to output relation labels and 2) gated CHBM, which untangles factors of variation in data relation, by exploiting a set of latent variables to multiplicatively gate the classification of CHBM. To reduce the large number of model parameters generated by the multiplicative interactions, we approximately factorize high-order parameter tensors into multiple matrices. Then, we develop efficient supervised learning algorithms, by first pretraining the models using joint likelihood to provide good parameter initialization, and then finetuning them using conditional likelihood to enhance the discriminant ability. We apply the proposed models to a series of tasks including invariant recognition, face verification, and action similarity labeling. Experimental results demonstrate that by exploiting supervised relation labels, our models can greatly improve the performance.

  11. How Supervisor Experience Influences Trust, Supervision, and Trainee Learning: A Qualitative Study.

    Science.gov (United States)

    Sheu, Leslie; Kogan, Jennifer R; Hauer, Karen E

    2017-09-01

    Appropriate trust and supervision facilitate trainees' growth toward unsupervised practice. The authors investigated how supervisor experience influences trust, supervision, and subsequently trainee learning. In a two-phase qualitative inductive content analysis, phase one entailed reviewing 44 internal medicine resident and attending supervisor interviews from two institutions (July 2013 to September 2014) for themes on how supervisor experience influences trust and supervision. Three supervisor exemplars (early, developing, experienced) were developed and shared in phase two focus groups at a single institution, wherein 23 trainees validated the exemplars and discussed how each impacted learning (November 2015). Phase one: Four domains of trust and supervision varying with experience emerged: data, approach, perspective, clinical. Early supervisors were detail oriented and determined trust depending on task completion (data), were rule based (approach), drew on their experiences as trainees to guide supervision (perspective), and felt less confident clinically compared with more experienced supervisors (clinical). Experienced supervisors determined trust holistically (data), checked key aspects of patient care selectively and covertly (approach), reflected on individual experiences supervising (perspective), and felt comfortable managing clinical problems and gauging trainee abilities (clinical). Phase two: Trainees felt the exemplars reflected their experiences, described their preferences and learning needs shifting over time, and emphasized the importance of supervisor flexibility to match their learning needs. With experience, supervisors differ in their approach to trust and supervision. Supervisors need to trust themselves before being able to trust others. Trainees perceive these differences and seek supervision approaches that align with their learning needs.

  12. Gymnasium-based unsupervised exercise maintains benefits in oxygen uptake kinetics obtained following supervised training in type 2 diabetes.

    Science.gov (United States)

    Macananey, Oscar; O'Shea, Donal; Warmington, Stuart A; Green, Simon; Egaña, Mikel

    2012-08-01

    Supervised exercise (SE) in patients with type 2 diabetes improves oxygen uptake kinetics at the onset of exercise. Maintenance of these improvements, however, has not been examined when supervision is removed. We explored if potential improvements in oxygen uptake kinetics following a 12-week SE that combined aerobic and resistance training were maintained after a subsequent 12-week unsupervised exercise (UE). The involvement of cardiac output (CO) in these improvements was also tested. Nineteen volunteers with type 2 diabetes were recruited. Oxygen uptake kinetics and CO (inert gas rebreathing) responses to constant-load cycling at 50% ventilatory threshold (V(T)), 80% V(T), and mid-point between V(T) and peak workload (50% Δ) were examined at baseline (on 2 occasions) and following each 12-week training period. Participants decided to exercise at a local gymnasium during the UE. Thirteen subjects completed all the interventions. The time constant of phase 2 of oxygen uptake was significantly faster (p exercise maintained benefits in oxygen uptake kinetics obtained during a supervised exercise in subjects with diabetes, and these benefits were associated with a faster dynamic response of heart rate after training.

  13. Unsupervised learning of binary vectors: A Gaussian scenario

    International Nuclear Information System (INIS)

    Copelli, Mauro; Van den Broeck, Christian

    2000-01-01

    We study a model of unsupervised learning where the real-valued data vectors are isotropically distributed, except for a single symmetry-breaking binary direction B(set-membership sign){-1,+1} N , onto which the projections have a Gaussian distribution. We show that a candidate vector J undergoing Gibbs learning in this discrete space, approaches the perfect match J=B exponentially. In addition to the second-order ''retarded learning'' phase transition for unbiased distributions, we show that first-order transitions can also occur. Extending the known result that the center of mass of the Gibbs ensemble has Bayes-optimal performance, we show that taking the sign of the components of this vector (clipping) leads to the vector with optimal performance in the binary space. These upper bounds are shown generally not to be saturated with the technique of transforming the components of a special continuous vector, except in asymptotic limits and in a special linear case. Simulations are presented which are in excellent agreement with the theoretical results. (c) 2000 The American Physical Society

  14. Unsupervised learning of a steerable basis for invariant image representations

    Science.gov (United States)

    Bethge, Matthias; Gerwinn, Sebastian; Macke, Jakob H.

    2007-02-01

    There are two aspects to unsupervised learning of invariant representations of images: First, we can reduce the dimensionality of the representation by finding an optimal trade-off between temporal stability and informativeness. We show that the answer to this optimization problem is generally not unique so that there is still considerable freedom in choosing a suitable basis. Which of the many optimal representations should be selected? Here, we focus on this second aspect, and seek to find representations that are invariant under geometrical transformations occuring in sequences of natural images. We utilize ideas of 'steerability' and Lie groups, which have been developed in the context of filter design. In particular, we show how an anti-symmetric version of canonical correlation analysis can be used to learn a full-rank image basis which is steerable with respect to rotations. We provide a geometric interpretation of this algorithm by showing that it finds the two-dimensional eigensubspaces of the average bivector. For data which exhibits a variety of transformations, we develop a bivector clustering algorithm, which we use to learn a basis of generalized quadrature pairs (i.e. 'complex cells') from sequences of natural images.

  15. Post-Graduate Student Performance in "Supervised In-Class" vs. "Unsupervised Online" Multiple Choice Tests: Implications for Cheating and Test Security

    Science.gov (United States)

    Ladyshewsky, Richard K.

    2015-01-01

    This research explores differences in multiple choice test (MCT) scores in a cohort of post-graduate students enrolled in a management and leadership course. A total of 250 students completed the MCT in either a supervised in-class paper and pencil test or an unsupervised online test. The only statistically significant difference between the nine…

  16. Detection of Erroneous Payments Utilizing Supervised And Unsupervised Data Mining Techniques

    National Research Council Canada - National Science Library

    Yanik, Todd

    2004-01-01

    ... (C&RT)) modeling algorithms. S-Plus software was used to construct a supervised model of vendor payment data using Logistic Regression, along with the Hosmer-Lemeshow Test, for testing the predictive ability of the model...

  17. Effects of Supervised vs. Unsupervised Training Programs on Balance and Muscle Strength in Older Adults: A Systematic Review and Meta-Analysis.

    Science.gov (United States)

    Lacroix, André; Hortobágyi, Tibor; Beurskens, Rainer; Granacher, Urs

    2017-11-01

    Balance and resistance training can improve healthy older adults' balance and muscle strength. Delivering such exercise programs at home without supervision may facilitate participation for older adults because they do not have to leave their homes. To date, no systematic literature analysis has been conducted to determine if supervision affects the effectiveness of these programs to improve healthy older adults' balance and muscle strength/power. The objective of this systematic review and meta-analysis was to quantify the effectiveness of supervised vs. unsupervised balance and/or resistance training programs on measures of balance and muscle strength/power in healthy older adults. In addition, the impact of supervision on training-induced adaptive processes was evaluated in the form of dose-response relationships by analyzing randomized controlled trials that compared supervised with unsupervised trials. A computerized systematic literature search was performed in the electronic databases PubMed, Web of Science, and SportDiscus to detect articles examining the role of supervision in balance and/or resistance training in older adults. The initially identified 6041 articles were systematically screened. Studies were included if they examined balance and/or resistance training in adults aged ≥65 years with no relevant diseases and registered at least one behavioral balance (e.g., time during single leg stance) and/or muscle strength/power outcome (e.g., time for 5-Times-Chair-Rise-Test). Finally, 11 studies were eligible for inclusion in this meta-analysis. Weighted mean standardized mean differences between subjects (SMD bs ) of supervised vs. unsupervised balance/resistance training studies were calculated. The included studies were coded for the following variables: number of participants, sex, age, number and type of interventions, type of balance/strength tests, and change (%) from pre- to post-intervention values. Additionally, we coded training according

  18. Pediatric Program Director Minimum Milestone Expectations before Allowing Supervision of Others and Unsupervised Practice.

    Science.gov (United States)

    Li, Su-Ting T; Tancredi, Daniel J; Schwartz, Alan; Guillot, Ann; Burke, Ann E; Trimm, R Franklin; Guralnick, Susan; Mahan, John D; Gifford, Kimberly

    2018-04-25

    The Accreditation Council for Graduate Medical Education requires semiannual Milestone reporting on all residents. Milestone expectations of performance are unknown. Determine pediatric program director (PD) minimum Milestone expectations for residents prior to being ready to supervise and prior to being ready to graduate. Mixed methods survey of pediatric PDs on their programs' Milestone expectations before residents are ready to supervise and before they are ready to graduate, and in what ways PDs use Milestones to make supervision and graduation decisions. If programs had no established Milestone expectations, PDs indicated expectations they considered for use in their program. Mean minimum Milestone level expectations adjusted for program size, region, and clustering of Milestone expectations by program were calculated for prior to supervise and prior to graduate. Free-text questions were analyzed using thematic analysis. The response rate was 56.8% (113/199). Most programs had no required minimum Milestone level before residents are ready to supervise (80%; 76/95) or ready to graduate (84%; 80/95). For readiness to supervise, minimum Milestone expectations PDs considered establishing for their program were highest for humanism (2.46, 95% CI: 2.21-2.71) and professionalization (2.37, 2.15-2.60). Minimum Milestone expectations for graduates were highest for help-seeking (3.14, 2.83-3.46). Main themes included the use of Milestones in combination with other information to assess learner performance and Milestones are not equally weighted when making advancement decisions. Most PDs have not established program minimum Milestones, but would vary such expectations by competency. Copyright © 2018. Published by Elsevier Inc.

  19. Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification

    OpenAIRE

    Zhang, Chenrui; Peng, Yuxin

    2018-01-01

    Video representation learning is a vital problem for classification task. Recently, a promising unsupervised paradigm termed self-supervised learning has emerged, which explores inherent supervisory signals implied in massive data for feature learning via solving auxiliary tasks. However, existing methods in this regard suffer from two limitations when extended to video classification. First, they focus only on a single task, whereas ignoring complementarity among different task-specific feat...

  20. Semi-supervised Learning for Phenotyping Tasks.

    Science.gov (United States)

    Dligach, Dmitriy; Miller, Timothy; Savova, Guergana K

    2015-01-01

    Supervised learning is the dominant approach to automatic electronic health records-based phenotyping, but it is expensive due to the cost of manual chart review. Semi-supervised learning takes advantage of both scarce labeled and plentiful unlabeled data. In this work, we study a family of semi-supervised learning algorithms based on Expectation Maximization (EM) in the context of several phenotyping tasks. We first experiment with the basic EM algorithm. When the modeling assumptions are violated, basic EM leads to inaccurate parameter estimation. Augmented EM attenuates this shortcoming by introducing a weighting factor that downweights the unlabeled data. Cross-validation does not always lead to the best setting of the weighting factor and other heuristic methods may be preferred. We show that accurate phenotyping models can be trained with only a few hundred labeled (and a large number of unlabeled) examples, potentially providing substantial savings in the amount of the required manual chart review.

  1. Action learning in undergraduate engineering thesis supervision

    Directory of Open Access Journals (Sweden)

    Brad Stappenbelt

    2017-03-01

    Full Text Available In the present action learning implementation, twelve action learning sets were conducted over eight years. The action learning sets consisted of students involved in undergraduate engineering research thesis work. The concurrent study accompanying this initiative, investigated the influence of the action learning environment on student approaches to learning and any accompanying academic, learning and personal benefits realised. The influence of preferred learning styles on set function and student adoption of the action learning process were also examined. The action learning environment implemented had a measurable significant positive effect on student academic performance, their ability to cope with the stresses associated with conducting a research thesis, the depth of learning, the development of autonomous learners and student perception of the research thesis experience. The present study acts as an addendum to a smaller scale implementation of this action learning approach, applied to supervision of third and fourth year research projects and theses, published in 2010.

  2. Modelling unsupervised online-learning of artificial grammars: linking implicit and statistical learning.

    Science.gov (United States)

    Rohrmeier, Martin A; Cross, Ian

    2014-07-01

    Humans rapidly learn complex structures in various domains. Findings of above-chance performance of some untrained control groups in artificial grammar learning studies raise questions about the extent to which learning can occur in an untrained, unsupervised testing situation with both correct and incorrect structures. The plausibility of unsupervised online-learning effects was modelled with n-gram, chunking and simple recurrent network models. A novel evaluation framework was applied, which alternates forced binary grammaticality judgments and subsequent learning of the same stimulus. Our results indicate a strong online learning effect for n-gram and chunking models and a weaker effect for simple recurrent network models. Such findings suggest that online learning is a plausible effect of statistical chunk learning that is possible when ungrammatical sequences contain a large proportion of grammatical chunks. Such common effects of continuous statistical learning may underlie statistical and implicit learning paradigms and raise implications for study design and testing methodologies. Copyright © 2014 Elsevier Inc. All rights reserved.

  3. Information-theoretic semi-supervised metric learning via entropy regularization.

    Science.gov (United States)

    Niu, Gang; Dai, Bo; Yamada, Makoto; Sugiyama, Masashi

    2014-08-01

    We propose a general information-theoretic approach to semi-supervised metric learning called SERAPH (SEmi-supervised metRic leArning Paradigm with Hypersparsity) that does not rely on the manifold assumption. Given the probability parameterized by a Mahalanobis distance, we maximize its entropy on labeled data and minimize its entropy on unlabeled data following entropy regularization. For metric learning, entropy regularization improves manifold regularization by considering the dissimilarity information of unlabeled data in the unsupervised part, and hence it allows the supervised and unsupervised parts to be integrated in a natural and meaningful way. Moreover, we regularize SERAPH by trace-norm regularization to encourage low-dimensional projections associated with the distance metric. The nonconvex optimization problem of SERAPH could be solved efficiently and stably by either a gradient projection algorithm or an EM-like iterative algorithm whose M-step is convex. Experiments demonstrate that SERAPH compares favorably with many well-known metric learning methods, and the learned Mahalanobis distance possesses high discriminability even under noisy environments.

  4. Observation versus classification in supervised category learning.

    Science.gov (United States)

    Levering, Kimery R; Kurtz, Kenneth J

    2015-02-01

    The traditional supervised classification paradigm encourages learners to acquire only the knowledge needed to predict category membership (a discriminative approach). An alternative that aligns with important aspects of real-world concept formation is learning with a broader focus to acquire knowledge of the internal structure of each category (a generative approach). Our work addresses the impact of a particular component of the traditional classification task: the guess-and-correct cycle. We compare classification learning to a supervised observational learning task in which learners are shown labeled examples but make no classification response. The goals of this work sit at two levels: (1) testing for differences in the nature of the category representations that arise from two basic learning modes; and (2) evaluating the generative/discriminative continuum as a theoretical tool for understand learning modes and their outcomes. Specifically, we view the guess-and-correct cycle as consistent with a more discriminative approach and therefore expected it to lead to narrower category knowledge. Across two experiments, the observational mode led to greater sensitivity to distributional properties of features and correlations between features. We conclude that a relatively subtle procedural difference in supervised category learning substantially impacts what learners come to know about the categories. The results demonstrate the value of the generative/discriminative continuum as a tool for advancing the psychology of category learning and also provide a valuable constraint for formal models and associated theories.

  5. Modeling Visit Behaviour in Smart Homes using Unsupervised Learning

    NARCIS (Netherlands)

    Nait Aicha, A.; Englebienne, G.; Kröse, B.

    2014-01-01

    Many algorithms on health monitoring from ambient sensor networks assume that only a single person is present in the home. We present an unsupervised method that models visit behaviour. A Markov modulated multidimensional non-homogeneous Poisson process (M3P2) is described that allows us to model

  6. Balancing Design Project Supervision and Learning Facilitation

    DEFF Research Database (Denmark)

    Nielsen, Louise Møller

    2012-01-01

    experiences and expertise to guide the students’ decisions in relation to the design project. This paper focuses on project supervision in the context of design education – and more specifically on how this supervision is unfolded in a Problem Based Learning culture. The paper explores the supervisor......’s balance between the roles: 1) Design Project Supervisor – and 2) Learning Facilitator – with the aim to understand when to apply the different roles, and what to be aware of when doing so. This paper represents the first pilot-study of a larger research effort. It is based on a Lego Serious Play workshop......In design there is a long tradition for apprenticeship, as well as tradition for learning through design projects. Today many design educations are positioned within the University context, and have to be aligned with the learning culture and structure, which they represent. This raises a specific...

  7. Exploiting unsupervised and supervised classification for segmentation of the pathological lung in CT

    International Nuclear Information System (INIS)

    Korfiatis, P; Costaridou, L; Kalogeropoulou, C; Petsas, T; Daoussis, D; Adonopoulos, A

    2009-01-01

    Delineation of lung fields in presence of diffuse lung diseases (DLPDs), such as interstitial pneumonias (IP), challenges segmentation algorithms. To deal with IP patterns affecting the lung border an automated image texture classification scheme is proposed. The proposed segmentation scheme is based on supervised texture classification between lung tissue (normal and abnormal) and surrounding tissue (pleura and thoracic wall) in the lung border region. This region is coarsely defined around an initial estimate of lung border, provided by means of Markov Radom Field modeling and morphological operations. Subsequently, a support vector machine classifier was trained to distinguish between the above two classes of tissue, using textural feature of gray scale and wavelet domains. 17 patients diagnosed with IP, secondary to connective tissue diseases were examined. Segmentation performance in terms of overlap was 0.924±0.021, and for shape differentiation mean, rms and maximum distance were 1.663±0.816, 2.334±1.574 and 8.0515±6.549 mm, respectively. An accurate, automated scheme is proposed for segmenting abnormal lung fields in HRC affected by IP

  8. Exploiting unsupervised and supervised classification for segmentation of the pathological lung in CT

    Science.gov (United States)

    Korfiatis, P.; Kalogeropoulou, C.; Daoussis, D.; Petsas, T.; Adonopoulos, A.; Costaridou, L.

    2009-07-01

    Delineation of lung fields in presence of diffuse lung diseases (DLPDs), such as interstitial pneumonias (IP), challenges segmentation algorithms. To deal with IP patterns affecting the lung border an automated image texture classification scheme is proposed. The proposed segmentation scheme is based on supervised texture classification between lung tissue (normal and abnormal) and surrounding tissue (pleura and thoracic wall) in the lung border region. This region is coarsely defined around an initial estimate of lung border, provided by means of Markov Radom Field modeling and morphological operations. Subsequently, a support vector machine classifier was trained to distinguish between the above two classes of tissue, using textural feature of gray scale and wavelet domains. 17 patients diagnosed with IP, secondary to connective tissue diseases were examined. Segmentation performance in terms of overlap was 0.924±0.021, and for shape differentiation mean, rms and maximum distance were 1.663±0.816, 2.334±1.574 and 8.0515±6.549 mm, respectively. An accurate, automated scheme is proposed for segmenting abnormal lung fields in HRC affected by IP

  9. Unsupervised Learning of Digit Recognition Using Spike-Timing-Dependent Plasticity

    Directory of Open Access Journals (Sweden)

    Peter U. Diehl

    2015-08-01

    Full Text Available In order to understand how the mammalian neocortex is performing computations, two things are necessary; we need to have a good understanding of the available neuronal processing units and mechanisms, and we need to gain a better understanding of how those mechanisms are combined to build functioning systems. Therefore, in recent years there is an increasing interest in how spiking neural networks (SNN can be used to perform complex computations or solve pattern recognition tasks. However, it remains a challenging task to design SNNs which use biologically plausible mechanisms (especially for learning new patterns, since most of such SNN architectures rely on training in a rate-based network and subsequent conversion to a SNN. We present a SNN for digit recognition which is based on mechanisms with increased biological plausibility, i.e. conductance-based instead of current-based synapses, spike-timing-dependent plasticity with time-dependent weight change, lateral inhibition, and an adaptive spiking threshold. Unlike most other systems, we do not use a teaching signal and do not present any class labels to the network. Using this unsupervised learning scheme, our architecture achieves 95% accuracy on the MNIST benchmark, which is better than previous SNN implementations without supervision. The fact that we used no domain-specific knowledge points toward the general applicability of our network design. Also, the performance of our network scales well with the number of neurons used and shows similar performance for four different learning rules, indicating robustness of the full combination of mechanisms, which suggests applicability in heterogeneous biological neural networks.

  10. Effects of a Supervised versus an Unsupervised Combined Balance and Strength Training Program on Balance and Muscle Power in Healthy Older Adults: A Randomized Controlled Trial.

    Science.gov (United States)

    Lacroix, André; Kressig, Reto W; Muehlbauer, Thomas; Gschwind, Yves J; Pfenninger, Barbara; Bruegger, Othmar; Granacher, Urs

    2016-01-01

    Losses in lower extremity muscle strength/power, muscle mass and deficits in static and particularly dynamic balance due to aging are associated with impaired functional performance and an increased fall risk. It has been shown that the combination of balance and strength training (BST) mitigates these age-related deficits. However, it is unresolved whether supervised versus unsupervised BST is equally effective in improving muscle power and balance in older adults. This study examined the impact of a 12-week BST program followed by 12 weeks of detraining on measures of balance and muscle power in healthy older adults enrolled in supervised (SUP) or unsupervised (UNSUP) training. Sixty-six older adults (men: 25, women: 41; age 73 ± 4 years) were randomly assigned to a SUP group (2/week supervised training, 1/week unsupervised training; n = 22), an UNSUP group (3/week unsupervised training; n = 22) or a passive control group (CON; n = 22). Static (i.e., Romberg Test) and dynamic (i.e., 10-meter walk test) steady-state, proactive (i.e., Timed Up and Go Test, Functional Reach Test), and reactive balance (e.g., Push and Release Test), as well as lower extremity muscle power (i.e., Chair Stand Test; Stair Ascent and Descent Test) were tested before and after the active training phase as well as after detraining. Adherence rates to training were 92% for SUP and 97% for UNSUP. BST resulted in significant group × time interactions. Post hoc analyses showed, among others, significant training-related improvements for the Romberg Test, stride velocity, Timed Up and Go Test, and Chair Stand Test in favor of the SUP group. Following detraining, significantly enhanced performances (compared to baseline) were still present in 13 variables for the SUP group and in 10 variables for the UNSUP group. Twelve weeks of BST proved to be safe (no training-related injuries) and feasible (high attendance rates of >90%). Deficits of balance and lower extremity muscle power can be

  11. Supervised Learning for Visual Pattern Classification

    Science.gov (United States)

    Zheng, Nanning; Xue, Jianru

    This chapter presents an overview of the topics and major ideas of supervised learning for visual pattern classification. Two prevalent algorithms, i.e., the support vector machine (SVM) and the boosting algorithm, are briefly introduced. SVMs and boosting algorithms are two hot topics of recent research in supervised learning. SVMs improve the generalization of the learning machine by implementing the rule of structural risk minimization (SRM). It exhibits good generalization even when little training data are available for machine training. The boosting algorithm can boost a weak classifier to a strong classifier by means of the so-called classifier combination. This algorithm provides a general way for producing a classifier with high generalization capability from a great number of weak classifiers.

  12. Information-Based Approach to Unsupervised Machine Learning

    Science.gov (United States)

    2013-06-19

    samples with large fitting error. The above optimization problem can be reduced to a quadratic program (Mangasarian & Musicant , 2000), which can be...recognition. Technicheskaya Kibernetica, 3. in Russian. Mallows, C. L. (1973). Some comments on CP . Technometrics, 15, 661–675. Mangasarian, O. L., & Musicant ...to find correspondence between two sets of objects in different domains in an unsupervised way. Photo album summa- rization is a typical application

  13. Opportunities to Learn Scientific Thinking in Joint Doctoral Supervision

    Science.gov (United States)

    Kobayashi, Sofie; Grout, Brian W.; Rump, Camilla Østerberg

    2015-01-01

    Research into doctoral supervision has increased rapidly over the last decades, yet our understanding of how doctoral students learn scientific thinking from supervision is limited. Most studies are based on interviews with little work being reported that is based on observation of actual supervision. While joint supervision has become widely…

  14. Semi-supervised prediction of gene regulatory networks using machine learning algorithms.

    Science.gov (United States)

    Patel, Nihir; Wang, Jason T L

    2015-10-01

    Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.

  15. Semi-supervised learning of hyperspectral image segmentation applied to vine tomatoes and table grapes

    Directory of Open Access Journals (Sweden)

    Jeroen van Roy

    2018-03-01

    Full Text Available Nowadays, quality inspection of fruit and vegetables is typically accomplished through visual inspection. Automation of this inspection is desirable to make it more objective. For this, hyperspectral imaging has been identified as a promising technique. When the field of view includes multiple objects, hypercubes should be segmented to assign individual pixels to different objects. Unsupervised and supervised methods have been proposed. While the latter are labour intensive as they require masking of the training images, the former are too computationally intensive for in-line use and may provide different results for different hypercubes. Therefore, a semi-supervised method is proposed to train a computationally efficient segmentation algorithm with minimal human interaction. As a first step, an unsupervised classification model is used to cluster spectra in similar groups. In the second step, a pixel selection algorithm applied to the output of the unsupervised classification is used to build a supervised model which is fast enough for in-line use. To evaluate this approach, it is applied to hypercubes of vine tomatoes and table grapes. After first derivative spectral preprocessing to remove intensity variation due to curvature and gloss effects, the unsupervised models segmented 86.11% of the vine tomato images correctly. Considering overall accuracy, sensitivity, specificity and time needed to segment one hypercube, partial least squares discriminant analysis (PLS-DA was found to be the best choice for in-line use, when using one training image. By adding a second image, the segmentation results improved considerably, yielding an overall accuracy of 96.95% for segmentation of vine tomatoes and 98.52% for segmentation of table grapes, demonstrating the added value of the learning phase in the algorithm.

  16. Unsupervised learning via self-organization a dynamic approach

    CERN Document Server

    Kyan, Matthew; Jarrah, Kambiz; Guan, Ling

    2014-01-01

    To aid in intelligent data mining, this book introduces a new family of unsupervised algorithms that have a basis in self-organization, yet are free from many of the constraints typical of other well known self-organizing architectures. It then moves through a series of pertinent real world applications with regards to the processing of multimedia data from its role in generic image processing techniques such as the automated modeling and removal of impulse noise in digital images, to problems in digital asset management, and its various roles in feature extraction, visual enhancement, segmentation, and analysis of microbiological image data.

  17. SemiBoost: boosting for semi-supervised learning.

    Science.gov (United States)

    Mallapragada, Pavan Kumar; Jin, Rong; Jain, Anil K; Liu, Yi

    2009-11-01

    Semi-supervised learning has attracted a significant amount of attention in pattern recognition and machine learning. Most previous studies have focused on designing special algorithms to effectively exploit the unlabeled data in conjunction with labeled data. Our goal is to improve the classification accuracy of any given supervised learning algorithm by using the available unlabeled examples. We call this as the Semi-supervised improvement problem, to distinguish the proposed approach from the existing approaches. We design a metasemi-supervised learning algorithm that wraps around the underlying supervised algorithm and improves its performance using unlabeled data. This problem is particularly important when we need to train a supervised learning algorithm with a limited number of labeled examples and a multitude of unlabeled examples. We present a boosting framework for semi-supervised learning, termed as SemiBoost. The key advantages of the proposed semi-supervised learning approach are: 1) performance improvement of any supervised learning algorithm with a multitude of unlabeled data, 2) efficient computation by the iterative boosting algorithm, and 3) exploiting both manifold and cluster assumption in training classification models. An empirical study on 16 different data sets and text categorization demonstrates that the proposed framework improves the performance of several commonly used supervised learning algorithms, given a large number of unlabeled examples. We also show that the performance of the proposed algorithm, SemiBoost, is comparable to the state-of-the-art semi-supervised learning algorithms.

  18. Effect of early supervised progressive resistance training compared to unsupervised home-based exercise after fast-track total hip replacement applied to patients with preoperative functional limitations

    DEFF Research Database (Denmark)

    Mikkelsen, L R; Mechlenburg, I; Søballe, K

    2014-01-01

    OBJECTIVE: To examine if 2 weekly sessions of supervised progressive resistance training (PRT) in combination with 5 weekly sessions of unsupervised home-based exercise is more effective than 7 weekly sessions of unsupervised home-based exercise in improving leg-extension power of the operated leg...... 10 weeks after total hip replacement (THR) in patients with lower pre-operative function. METHOD: A total of 73 patients scheduled for THR were randomised (1:1) to intervention group (IG, home based exercise 5 days/week and PRT 2 days/week) or control group (CG, home based exercise 7 days...... of the operated leg, at the primary endpoint 10 weeks after surgery in THR patients with lower pre-operative function. TRIAL REGISTRATION: NCT01214954....

  19. Graph-based semi-supervised learning

    CERN Document Server

    Subramanya, Amarnag

    2014-01-01

    While labeled data is expensive to prepare, ever increasing amounts of unlabeled data is becoming widely available. In order to adapt to this phenomenon, several semi-supervised learning (SSL) algorithms, which learn from labeled as well as unlabeled data, have been developed. In a separate line of work, researchers have started to realize that graphs provide a natural way to represent data in a variety of domains. Graph-based SSL algorithms, which bring together these two lines of work, have been shown to outperform the state-of-the-art in many applications in speech processing, computer visi

  20. QUEST : Eliminating online supervised learning for efficient classification algorithms

    NARCIS (Netherlands)

    Zwartjes, Ardjan; Havinga, Paul J.M.; Smit, Gerard J.M.; Hurink, Johann L.

    2016-01-01

    In this work, we introduce QUEST (QUantile Estimation after Supervised Training), an adaptive classification algorithm for Wireless Sensor Networks (WSNs) that eliminates the necessity for online supervised learning. Online processing is important for many sensor network applications. Transmitting

  1. Bayesian feature weighting for unsupervised learning, with application to object recognition

    OpenAIRE

    Carbonetto , Peter; De Freitas , Nando; Gustafson , Paul; Thompson , Natalie

    2003-01-01

    International audience; We present a method for variable selection/weighting in an unsupervised learning context using Bayesian shrinkage. The basis for the model parameters and cluster assignments can be computed simultaneous using an efficient EM algorithm. Applying our Bayesian shrinkage model to a complex problem in object recognition (Duygulu, Barnard, de Freitas and Forsyth 2002), our experiments yied good results.

  2. An Introduction to Topic Modeling as an Unsupervised Machine Learning Way to Organize Text Information

    Science.gov (United States)

    Snyder, Robin M.

    2015-01-01

    The field of topic modeling has become increasingly important over the past few years. Topic modeling is an unsupervised machine learning way to organize text (or image or DNA, etc.) information such that related pieces of text can be identified. This paper/session will present/discuss the current state of topic modeling, why it is important, and…

  3. Network Supervision of Adult Experience and Learning Dependent Sensory Cortical Plasticity.

    Science.gov (United States)

    Blake, David T

    2017-06-18

    The brain is capable of remodeling throughout life. The sensory cortices provide a useful preparation for studying neuroplasticity both during development and thereafter. In adulthood, sensory cortices change in the cortical area activated by behaviorally relevant stimuli, by the strength of response within that activated area, and by the temporal profiles of those responses. Evidence supports forms of unsupervised, reinforcement, and fully supervised network learning rules. Studies on experience-dependent plasticity have mostly not controlled for learning, and they find support for unsupervised learning mechanisms. Changes occur with greatest ease in neurons containing α-CamKII, which are pyramidal neurons in layers II/III and layers V/VI. These changes use synaptic mechanisms including long term depression. Synaptic strengthening at NMDA-containing synapses does occur, but its weak association with activity suggests other factors also initiate changes. Studies that control learning find support of reinforcement learning rules and limited evidence of other forms of supervised learning. Behaviorally associating a stimulus with reinforcement leads to a strengthening of cortical response strength and enlarging of response area with poor selectivity. Associating a stimulus with omission of reinforcement leads to a selective weakening of responses. In some preparations in which these associations are not as clearly made, neurons with the most informative discharges are relatively stronger after training. Studies analyzing the temporal profile of responses associated with omission of reward, or of plasticity in studies with different discriminanda but statistically matched stimuli, support the existence of limited supervised network learning. © 2017 American Physiological Society. Compr Physiol 7:977-1008, 2017. Copyright © 2017 John Wiley & Sons, Inc.

  4. The efficacy of unsupervised home-based exercise regimens in comparison to supervised lab-based exercise training upon cardio-respiratory health facets

    OpenAIRE

    Blackwell, J.; Atherton, Philip J.; Smith, Kenneth; Doleman, Brett; Williams, John P.; Lund, Jonathan N.; Phillips, Bethan E.

    2017-01-01

    Abstract Supervised high?intensity interval training (HIIT) can rapidly improve cardiorespiratory fitness (CRF). However, the effectiveness of time?efficient unsupervised home?based interventions is unknown. Eighteen volunteers completed either: laboratory?HIIT (L?HIIT); home?HIIT (H?HIIT) or home?isometric hand?grip training (H?IHGT). CRF improved significantly in L?HIIT and H?HIIT groups, with blood pressure improvements in the H?IHGT group only. H?HIIT offers a practical, time?efficient ex...

  5. Supervised / unsupervised change detection

    OpenAIRE

    de Alwis Pitts, Dilkushi; De Vecchi, Daniele; Harb, Mostapha; So, Emily; Dell'Acqua, Fabio

    2014-01-01

    The aim of this deliverable is to provide an overview of the state of the art in change detection techniques and a critique of what could be programmed to derive SENSUM products. It is the product of the collaboration between UCAM and EUCENTRE. The document includes as a necessary requirement a discussion about a proposed technique for co-registration. Since change detection techniques require an assessment of a series of images and the basic process involves comparing and contrasting the sim...

  6. Supervised Convolutional Sparse Coding

    KAUST Repository

    Affara, Lama Ahmed

    2018-04-08

    Convolutional Sparse Coding (CSC) is a well-established image representation model especially suited for image restoration tasks. In this work, we extend the applicability of this model by proposing a supervised approach to convolutional sparse coding, which aims at learning discriminative dictionaries instead of purely reconstructive ones. We incorporate a supervised regularization term into the traditional unsupervised CSC objective to encourage the final dictionary elements to be discriminative. Experimental results show that using supervised convolutional learning results in two key advantages. First, we learn more semantically relevant filters in the dictionary and second, we achieve improved image reconstruction on unseen data.

  7. Recent progresses of neural network unsupervised learning: I. Independent component analyses generalizing PCA

    Science.gov (United States)

    Szu, Harold H.

    1999-03-01

    The early vision principle of redundancy reduction of 108 sensor excitations is understandable from computer vision viewpoint toward sparse edge maps. It is only recently derived using a truly unsupervised learning paradigm of artificial neural networks (ANN). In fact, the biological vision, Hubel- Wiesel edge maps, is reproduced seeking the underlying independent components analyses (ICA) among 102 image samples by maximizing the ANN output entropy (partial)H(V)/(partial)[W] equals (partial)[W]/(partial)t. When a pair of newborn eyes or ears meet the bustling and hustling world without supervision, they seek ICA by comparing 2 sensory measurements (x1(t), x2(t))T equalsV X(t). Assuming a linear and instantaneous mixture model of the external world X(t) equals [A] S(t), where both the mixing matrix ([A] equalsV [a1, a2] of ICA vectors and the source percentages (s1(t), s2(t))T equalsV S(t) are unknown, we seek the independent sources approximately equals [I] where the approximated sign indicates that higher order statistics (HOS) may not be trivial. Without a teacher, the ANN weight matrix [W] equalsV [w1, w2] adjusts the outputs V(t) equals tanh([W]X(t)) approximately equals [W]X(t) until no desired outputs except the (Gaussian) 'garbage' (neither YES '1' nor NO '-1' but at linear may-be range 'origin 0') defined by Gaussian covariance G equals [I] equals [W][A] the internal knowledge representation [W], as the inverse of the external world matrix [A]-1. To unify IC, PCA, ANN & HOS theories since 1991 (advanced by Jutten & Herault, Comon, Oja, Bell-Sejnowski, Amari-Cichocki, Cardoso), the LYAPONOV function L(v1,...,vn, w1,...wn,) equals E(v1,...,vn) - H(w1,...wn) is constructed as the HELMHOTZ free energy to prove both convergences of supervised energy E and unsupervised entropy H learning. Consequently, rather using the faithful but dumb computer: 'GARBAGE-IN, GARBAGE-OUT,' the smarter neurocomputer will be equipped with an unsupervised learning that extracts

  8. Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning.

    Science.gov (United States)

    Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi; Mao, Youdong

    2017-01-01

    Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.

  9. Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning.

    Directory of Open Access Journals (Sweden)

    Jiayi Wu

    Full Text Available Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM. We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.

  10. Unsupervised Learning of Word-Sequence Representations from Scratch via Convolutional Tensor Decomposition

    OpenAIRE

    Huang, Furong; Anandkumar, Animashree

    2016-01-01

    Unsupervised text embeddings extraction is crucial for text understanding in machine learning. Word2Vec and its variants have received substantial success in mapping words with similar syntactic or semantic meaning to vectors close to each other. However, extracting context-aware word-sequence embedding remains a challenging task. Training over large corpus is difficult as labels are difficult to get. More importantly, it is challenging for pre-trained models to obtain word-...

  11. Modeling Language and Cognition with Deep Unsupervised Learning:A Tutorial Overview

    OpenAIRE

    Marco eZorzi; Marco eZorzi; Alberto eTestolin; Ivilin Peev Stoianov; Ivilin Peev Stoianov

    2013-01-01

    Deep unsupervised learning in stochastic recurrent neural networks with many layers of hidden units is a recent breakthrough in neural computation research. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. In this article we discuss the theoretical foundations of this approach and we review key issues related to training, testing and analysis of deep networks for modeling language and cog...

  12. Modeling language and cognition with deep unsupervised learning: a tutorial overview

    OpenAIRE

    Zorzi, Marco; Testolin, Alberto; Stoianov, Ivilin P.

    2013-01-01

    Deep unsupervised learning in stochastic recurrent neural networks with many layers of hidden units is a recent breakthrough in neural computation research. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. In this article we discuss the theoretical foundations of this approach and we review key issues related to training, testing and analysis of deep networks for modeling language and cog...

  13. Large-scale weakly supervised object localization via latent category learning.

    Science.gov (United States)

    Chong Wang; Kaiqi Huang; Weiqiang Ren; Junge Zhang; Maybank, Steve

    2015-04-01

    Localizing objects in cluttered backgrounds is challenging under large-scale weakly supervised conditions. Due to the cluttered image condition, objects usually have large ambiguity with backgrounds. Besides, there is also a lack of effective algorithm for large-scale weakly supervised localization in cluttered backgrounds. However, backgrounds contain useful latent information, e.g., the sky in the aeroplane class. If this latent information can be learned, object-background ambiguity can be largely reduced and background can be suppressed effectively. In this paper, we propose the latent category learning (LCL) in large-scale cluttered conditions. LCL is an unsupervised learning method which requires only image-level class labels. First, we use the latent semantic analysis with semantic object representation to learn the latent categories, which represent objects, object parts or backgrounds. Second, to determine which category contains the target object, we propose a category selection strategy by evaluating each category's discrimination. Finally, we propose the online LCL for use in large-scale conditions. Evaluation on the challenging PASCAL Visual Object Class (VOC) 2007 and the large-scale imagenet large-scale visual recognition challenge 2013 detection data sets shows that the method can improve the annotation precision by 10% over previous methods. More importantly, we achieve the detection precision which outperforms previous results by a large margin and can be competitive to the supervised deformable part model 5.0 baseline on both data sets.

  14. Semi-supervised Learning with Deep Generative Models

    NARCIS (Netherlands)

    Kingma, D.P.; Rezende, D.J.; Mohamed, S.; Welling, M.

    2014-01-01

    The ever-increasing size of modern data sets combined with the difficulty of obtaining label information has made semi-supervised learning one of the problems of significant practical importance in modern data analysis. We revisit the approach to semi-supervised learning with generative models and

  15. Deep Unsupervised Learning on a Desktop PC: A Primer for Cognitive Scientists.

    Science.gov (United States)

    Testolin, Alberto; Stoianov, Ivilin; De Filippo De Grazia, Michele; Zorzi, Marco

    2013-01-01

    Deep belief networks hold great promise for the simulation of human cognition because they show how structured and abstract representations may emerge from probabilistic unsupervised learning. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. However, learning in deep networks typically requires big datasets and it can involve millions of connection weights, which implies that simulations on standard computers are unfeasible. Developing realistic, medium-to-large-scale learning models of cognition would therefore seem to require expertise in programing parallel-computing hardware, and this might explain why the use of this promising approach is still largely confined to the machine learning community. Here we show how simulations of deep unsupervised learning can be easily performed on a desktop PC by exploiting the processors of low cost graphic cards (graphic processor units) without any specific programing effort, thanks to the use of high-level programming routines (available in MATLAB or Python). We also show that even an entry-level graphic card can outperform a small high-performance computing cluster in terms of learning time and with no loss of learning quality. We therefore conclude that graphic card implementations pave the way for a widespread use of deep learning among cognitive scientists for modeling cognition and behavior.

  16. Deep unsupervised learning on a desktop PC: A primer for cognitive scientists

    Directory of Open Access Journals (Sweden)

    Alberto eTestolin

    2013-05-01

    Full Text Available Deep belief networks hold great promise for the simulation of human cognition because they show how structured and abstract representations may emerge from probabilistic unsupervised learning. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. However, learning in deep networks typically requires big datasets and it can involve millions of connection weights, which implies that simulations on standard computers are unfeasible. Developing realistic, medium-to-large-scale learning models of cognition would therefore seem to require expertise in programming parallel-computing hardware, and this might explain why the use of this promising approach is still largely confined to the machine learning community. Here we show how simulations of deep unsupervised learning can be easily performed on a desktop PC by exploiting the processors of low-cost graphic cards (GPUs without any specific programming effort, thanks to the use of high-level programming routines (available in MATLAB or Python. We also show that even an entry-level graphic card can outperform a small high-performance computing cluster in terms of learning time and with no loss of learning quality. We therefore conclude that graphic card implementations pave the way for a widespread use of deep learning among cognitive scientists for modeling cognition and behavior.

  17. Deep Unsupervised Learning on a Desktop PC: A Primer for Cognitive Scientists

    Science.gov (United States)

    Testolin, Alberto; Stoianov, Ivilin; De Filippo De Grazia, Michele; Zorzi, Marco

    2013-01-01

    Deep belief networks hold great promise for the simulation of human cognition because they show how structured and abstract representations may emerge from probabilistic unsupervised learning. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. However, learning in deep networks typically requires big datasets and it can involve millions of connection weights, which implies that simulations on standard computers are unfeasible. Developing realistic, medium-to-large-scale learning models of cognition would therefore seem to require expertise in programing parallel-computing hardware, and this might explain why the use of this promising approach is still largely confined to the machine learning community. Here we show how simulations of deep unsupervised learning can be easily performed on a desktop PC by exploiting the processors of low cost graphic cards (graphic processor units) without any specific programing effort, thanks to the use of high-level programming routines (available in MATLAB or Python). We also show that even an entry-level graphic card can outperform a small high-performance computing cluster in terms of learning time and with no loss of learning quality. We therefore conclude that graphic card implementations pave the way for a widespread use of deep learning among cognitive scientists for modeling cognition and behavior. PMID:23653617

  18. Opportunities to learn scientific thinking in joint doctoral supervision

    DEFF Research Database (Denmark)

    Kobayashi, Sofie; Grout, Brian William Wilson; Rump, Camilla Østerberg

    2015-01-01

    Research into doctoral supervision has increased rapidly over the last decades, yet our understanding of how doctoral students learn scientific thinking from supervision is limited. Most studies are based on interviews with little work being reported that is based on observation of actual...... supervision. While joint supervision has become widely used, its learning dynamics remains under-researched and this paper aims to address these gaps in research by exploring learning opportunities in doctoral supervision with two supervisors. The study explores how the tensions in scientific discussion...... between supervisors can become learning opportunities. We combine two different theoretical perspectives, using participation and positioning theory as a sociocultural perspective and variation theory as an individual constructivist perspective on learning. Based on our analysis of a complex episode we...

  19. Advancing Affect Modeling via Preference Learning and Unsupervised Feature Extraction

    DEFF Research Database (Denmark)

    Martínez, Héctor Pérez

    strategies (error functions and training algorithms) for artificial neural networks are examined across synthetic and psycho-physiological datasets, and compared against support vector machines and Cohen’s method. Results reveal the best training strategies for neural networks and suggest their superiority...... difficulties, ordinal reports such as rankings and ratings can yield more reliable affect annotations than alternative tools. This thesis explores preference learning methods to automatically learn computational models from ordinal annotations of affect. In particular, an extensive collection of training...... over the other examined methods. The second challenge addressed in this thesis refers to the extraction of relevant information from physiological modalities. Deep learning is proposed as an automatic approach to extract input features for models of affect from physiological signals. Experiments...

  20. Unsupervised Learning and Pattern Recognition of Biological Data Structures with Density Functional Theory and Machine Learning.

    Science.gov (United States)

    Chen, Chien-Chang; Juan, Hung-Hui; Tsai, Meng-Yuan; Lu, Henry Horng-Shing

    2018-01-11

    By introducing the methods of machine learning into the density functional theory, we made a detour for the construction of the most probable density function, which can be estimated by learning relevant features from the system of interest. Using the properties of universal functional, the vital core of density functional theory, the most probable cluster numbers and the corresponding cluster boundaries in a studying system can be simultaneously and automatically determined and the plausibility is erected on the Hohenberg-Kohn theorems. For the method validation and pragmatic applications, interdisciplinary problems from physical to biological systems were enumerated. The amalgamation of uncharged atomic clusters validated the unsupervised searching process of the cluster numbers and the corresponding cluster boundaries were exhibited likewise. High accurate clustering results of the Fisher's iris dataset showed the feasibility and the flexibility of the proposed scheme. Brain tumor detections from low-dimensional magnetic resonance imaging datasets and segmentations of high-dimensional neural network imageries in the Brainbow system were also used to inspect the method practicality. The experimental results exhibit the successful connection between the physical theory and the machine learning methods and will benefit the clinical diagnoses.

  1. Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions.

    Science.gov (United States)

    Chen, Ke; Wang, Shihai

    2011-01-01

    Semi-supervised learning concerns the problem of learning in the presence of labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. To our knowledge, however, none of them takes all three semi-supervised assumptions, i.e., smoothness, cluster, and manifold assumptions, together into account during boosting learning. In this paper, we propose a novel cost functional consisting of the margin cost on labeled data and the regularization penalty on unlabeled data based on three fundamental semi-supervised assumptions. Thus, minimizing our proposed cost functional with a greedy yet stagewise functional optimization procedure leads to a generic boosting framework for semi-supervised learning. Extensive experiments demonstrate that our algorithm yields favorite results for benchmark and real-world classification tasks in comparison to state-of-the-art semi-supervised learning algorithms, including newly developed boosting algorithms. Finally, we discuss relevant issues and relate our algorithm to the previous work.

  2. A new supervised learning algorithm for spiking neurons.

    Science.gov (United States)

    Xu, Yan; Zeng, Xiaoqin; Zhong, Shuiming

    2013-06-01

    The purpose of supervised learning with temporal encoding for spiking neurons is to make the neurons emit a specific spike train encoded by the precise firing times of spikes. If only running time is considered, the supervised learning for a spiking neuron is equivalent to distinguishing the times of desired output spikes and the other time during the running process of the neuron through adjusting synaptic weights, which can be regarded as a classification problem. Based on this idea, this letter proposes a new supervised learning method for spiking neurons with temporal encoding; it first transforms the supervised learning into a classification problem and then solves the problem by using the perceptron learning rule. The experiment results show that the proposed method has higher learning accuracy and efficiency over the existing learning methods, so it is more powerful for solving complex and real-time problems.

  3. Large-Scale Unsupervised Hashing with Shared Structure Learning.

    Science.gov (United States)

    Liu, Xianglong; Mu, Yadong; Zhang, Danchen; Lang, Bo; Li, Xuelong

    2015-09-01

    Hashing methods are effective in generating compact binary signatures for images and videos. This paper addresses an important open issue in the literature, i.e., how to learn compact hash codes by enhancing the complementarity among different hash functions. Most of prior studies solve this problem either by adopting time-consuming sequential learning algorithms or by generating the hash functions which are subject to some deliberately-designed constraints (e.g., enforcing hash functions orthogonal to one another). We analyze the drawbacks of past works and propose a new solution to this problem. Our idea is to decompose the feature space into a subspace shared by all hash functions and its complementary subspace. On one hand, the shared subspace, corresponding to the common structure across different hash functions, conveys most relevant information for the hashing task. Similar to data de-noising, irrelevant information is explicitly suppressed during hash function generation. On the other hand, in case that the complementary subspace also contains useful information for specific hash functions, the final form of our proposed hashing scheme is a compromise between these two kinds of subspaces. To make hash functions not only preserve the local neighborhood structure but also capture the global cluster distribution of the whole data, an objective function incorporating spectral embedding loss, binary quantization loss, and shared subspace contribution is introduced to guide the hash function learning. We propose an efficient alternating optimization method to simultaneously learn both the shared structure and the hash functions. Experimental results on three well-known benchmarks CIFAR-10, NUS-WIDE, and a-TRECVID demonstrate that our approach significantly outperforms state-of-the-art hashing methods.

  4. The efficacy of unsupervised home-based exercise regimens in comparison to supervised laboratory-based exercise training upon cardio-respiratory health facets.

    Science.gov (United States)

    Blackwell, James; Atherton, Philip J; Smith, Kenneth; Doleman, Brett; Williams, John P; Lund, Jonathan N; Phillips, Bethan E

    2017-09-01

    Supervised high-intensity interval training (HIIT) can rapidly improve cardiorespiratory fitness (CRF). However, the effectiveness of time-efficient unsupervised home-based interventions is unknown. Eighteen volunteers completed either: laboratory-HIIT (L-HIIT); home-HIIT (H-HIIT) or home-isometric hand-grip training (H-IHGT). CRF improved significantly in L-HIIT and H-HIIT groups, with blood pressure improvements in the H-IHGT group only. H-HIIT offers a practical, time-efficient exercise mode to improve CRF, away from the laboratory environment. H-IHGT potentially provides a viable alternative to modify blood pressure in those unable to participate in whole-body exercise. © 2017 The Authors. Physiological Reports published by Wiley Periodicals, Inc. on behalf of The Physiological Society and the American Physiological Society.

  5. Unsupervised learning in neural networks with short range synapses

    Science.gov (United States)

    Brunnet, L. G.; Agnes, E. J.; Mizusaki, B. E. P.; Erichsen, R., Jr.

    2013-01-01

    Different areas of the brain are involved in specific aspects of the information being processed both in learning and in memory formation. For example, the hippocampus is important in the consolidation of information from short-term memory to long-term memory, while emotional memory seems to be dealt by the amygdala. On the microscopic scale the underlying structures in these areas differ in the kind of neurons involved, in their connectivity, or in their clustering degree but, at this level, learning and memory are attributed to neuronal synapses mediated by longterm potentiation and long-term depression. In this work we explore the properties of a short range synaptic connection network, a nearest neighbor lattice composed mostly by excitatory neurons and a fraction of inhibitory ones. The mechanism of synaptic modification responsible for the emergence of memory is Spike-Timing-Dependent Plasticity (STDP), a Hebbian-like rule, where potentiation/depression is acquired when causal/non-causal spikes happen in a synapse involving two neurons. The system is intended to store and recognize memories associated to spatial external inputs presented as simple geometrical forms. The synaptic modifications are continuously applied to excitatory connections, including a homeostasis rule and STDP. In this work we explore the different scenarios under which a network with short range connections can accomplish the task of storing and recognizing simple connected patterns.

  6. Supervised Machine Learning for Regionalization of Environmental Data: Distribution of Uranium in Groundwater in Ukraine

    Science.gov (United States)

    Govorov, Michael; Gienko, Gennady; Putrenko, Viktor

    2018-05-01

    In this paper, several supervised machine learning algorithms were explored to define homogeneous regions of con-centration of uranium in surface waters in Ukraine using multiple environmental parameters. The previous study was focused on finding the primary environmental parameters related to uranium in ground waters using several methods of spatial statistics and unsupervised classification. At this step, we refined the regionalization using Artifi-cial Neural Networks (ANN) techniques including Multilayer Perceptron (MLP), Radial Basis Function (RBF), and Convolutional Neural Network (CNN). The study is focused on building local ANN models which may significantly improve the prediction results of machine learning algorithms by taking into considerations non-stationarity and autocorrelation in spatial data.

  7. The efficacy of early initiated, supervised, progressive resistance training compared to unsupervised, home-based exercise after unicompartmental knee arthroplasty

    DEFF Research Database (Denmark)

    Jørgensen, Peter Bo; Bogh, Søren B; Kierkegaard, Signe

    2017-01-01

    OBJECTIVE: To examine if supervised progressive resistance training was superior to home-based exercise in rehabilitation after unicompartmental knee arthroplasty. DESIGN: Single blinded, randomized clinical trial. SETTING: Surgery, progressive resistance training and testing was carried out...

  8. Unsupervised multiple kernel learning for heterogeneous data integration.

    Science.gov (United States)

    Mariette, Jérôme; Villa-Vialaneix, Nathalie

    2018-03-15

    Recent high-throughput sequencing advances have expanded the breadth of available omics datasets and the integrated analysis of multiple datasets obtained on the same samples has allowed to gain important insights in a wide range of applications. However, the integration of various sources of information remains a challenge for systems biology since produced datasets are often of heterogeneous types, with the need of developing generic methods to take their different specificities into account. We propose a multiple kernel framework that allows to integrate multiple datasets of various types into a single exploratory analysis. Several solutions are provided to learn either a consensus meta-kernel or a meta-kernel that preserves the original topology of the datasets. We applied our framework to analyse two public multi-omics datasets. First, the multiple metagenomic datasets, collected during the TARA Oceans expedition, was explored to demonstrate that our method is able to retrieve previous findings in a single kernel PCA as well as to provide a new image of the sample structures when a larger number of datasets are included in the analysis. To perform this analysis, a generic procedure is also proposed to improve the interpretability of the kernel PCA in regards with the original data. Second, the multi-omics breast cancer datasets, provided by The Cancer Genome Atlas, is analysed using a kernel Self-Organizing Maps with both single and multi-omics strategies. The comparison of these two approaches demonstrates the benefit of our integration method to improve the representation of the studied biological system. Proposed methods are available in the R package mixKernel, released on CRAN. It is fully compatible with the mixOmics package and a tutorial describing the approach can be found on mixOmics web site http://mixomics.org/mixkernel/. jerome.mariette@inra.fr or nathalie.villa-vialaneix@inra.fr. Supplementary data are available at Bioinformatics online.

  9. Subsampled Hessian Newton Methods for Supervised Learning.

    Science.gov (United States)

    Wang, Chien-Chih; Huang, Chun-Heng; Lin, Chih-Jen

    2015-08-01

    Newton methods can be applied in many supervised learning approaches. However, for large-scale data, the use of the whole Hessian matrix can be time-consuming. Recently, subsampled Newton methods have been proposed to reduce the computational time by using only a subset of data for calculating an approximation of the Hessian matrix. Unfortunately, we find that in some situations, the running speed is worse than the standard Newton method because cheaper but less accurate search directions are used. In this work, we propose some novel techniques to improve the existing subsampled Hessian Newton method. The main idea is to solve a two-dimensional subproblem per iteration to adjust the search direction to better minimize the second-order approximation of the function value. We prove the theoretical convergence of the proposed method. Experiments on logistic regression, linear SVM, maximum entropy, and deep networks indicate that our techniques significantly reduce the running time of the subsampled Hessian Newton method. The resulting algorithm becomes a compelling alternative to the standard Newton method for large-scale data classification.

  10. Modeling language and cognition with deep unsupervised learning: a tutorial overview.

    Science.gov (United States)

    Zorzi, Marco; Testolin, Alberto; Stoianov, Ivilin P

    2013-01-01

    Deep unsupervised learning in stochastic recurrent neural networks with many layers of hidden units is a recent breakthrough in neural computation research. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. In this article we discuss the theoretical foundations of this approach and we review key issues related to training, testing and analysis of deep networks for modeling language and cognitive processing. The classic letter and word perception problem of McClelland and Rumelhart (1981) is used as a tutorial example to illustrate how structured and abstract representations may emerge from deep generative learning. We argue that the focus on deep architectures and generative (rather than discriminative) learning represents a crucial step forward for the connectionist modeling enterprise, because it offers a more plausible model of cortical learning as well as a way to bridge the gap between emergentist connectionist models and structured Bayesian models of cognition.

  11. Modeling Language and Cognition with Deep Unsupervised Learning:A Tutorial Overview

    Directory of Open Access Journals (Sweden)

    Marco eZorzi

    2013-08-01

    Full Text Available Deep unsupervised learning in stochastic recurrent neural networks with many layers of hidden units is a recent breakthrough in neural computation research. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. In this article we discuss the theoretical foundations of this approach and we review key issues related to training, testing and analysis of deep networks for modeling language and cognitive processing. The classic letter and word perception problem of McClelland and Rumelhart (1981 is used as a tutorial example to illustrate how structured and abstract representations may emerge from deep generative learning. We argue that the focus on deep architectures and generative (rather than discriminative learning represents a crucial step forward for the connectionist modeling enterprise, because it offers a more plausible model of cortical learning as well as way to bridge the gap between emergentist connectionist models and structured Bayesian models of cognition.

  12. Modeling language and cognition with deep unsupervised learning: a tutorial overview

    Science.gov (United States)

    Zorzi, Marco; Testolin, Alberto; Stoianov, Ivilin P.

    2013-01-01

    Deep unsupervised learning in stochastic recurrent neural networks with many layers of hidden units is a recent breakthrough in neural computation research. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. In this article we discuss the theoretical foundations of this approach and we review key issues related to training, testing and analysis of deep networks for modeling language and cognitive processing. The classic letter and word perception problem of McClelland and Rumelhart (1981) is used as a tutorial example to illustrate how structured and abstract representations may emerge from deep generative learning. We argue that the focus on deep architectures and generative (rather than discriminative) learning represents a crucial step forward for the connectionist modeling enterprise, because it offers a more plausible model of cortical learning as well as a way to bridge the gap between emergentist connectionist models and structured Bayesian models of cognition. PMID:23970869

  13. Safe semi-supervised learning based on weighted likelihood.

    Science.gov (United States)

    Kawakita, Masanori; Takeuchi, Jun'ichi

    2014-05-01

    We are interested in developing a safe semi-supervised learning that works in any situation. Semi-supervised learning postulates that n(') unlabeled data are available in addition to n labeled data. However, almost all of the previous semi-supervised methods require additional assumptions (not only unlabeled data) to make improvements on supervised learning. If such assumptions are not met, then the methods possibly perform worse than supervised learning. Sokolovska, Cappé, and Yvon (2008) proposed a semi-supervised method based on a weighted likelihood approach. They proved that this method asymptotically never performs worse than supervised learning (i.e., it is safe) without any assumption. Their method is attractive because it is easy to implement and is potentially general. Moreover, it is deeply related to a certain statistical paradox. However, the method of Sokolovska et al. (2008) assumes a very limited situation, i.e., classification, discrete covariates, n(')→∞ and a maximum likelihood estimator. In this paper, we extend their method by modifying the weight. We prove that our proposal is safe in a significantly wide range of situations as long as n≤n('). Further, we give a geometrical interpretation of the proof of safety through the relationship with the above-mentioned statistical paradox. Finally, we show that the above proposal is asymptotically safe even when n(')

  14. Unsupervised Learning Through Randomized Algorithms for High-Volume High-Velocity Data (ULTRA-HV).

    Energy Technology Data Exchange (ETDEWEB)

    Pinar, Ali [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Kolda, Tamara G. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Carlberg, Kevin Thomas [Wake Forest Univ., Winston-Salem, MA (United States); Ballard, Grey [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mahoney, Michael [Univ. of California, Berkeley, CA (United States)

    2018-01-01

    Through long-term investments in computing, algorithms, facilities, and instrumentation, DOE is an established leader in massive-scale, high-fidelity simulations, as well as science-leading experimentation. In both cases, DOE is generating more data than it can analyze and the problem is intensifying quickly. The need for advanced algorithms that can automatically convert the abundance of data into a wealth of useful information by discovering hidden structures is well recognized. Such efforts however, are hindered by the massive volume of the data and its high velocity. Here, the challenge is developing unsupervised learning methods to discover hidden structure in high-volume, high-velocity data.

  15. Characterizing Interference in Radio Astronomy Observations through Active and Unsupervised Learning

    Science.gov (United States)

    Doran, G.

    2013-01-01

    In the process of observing signals from astronomical sources, radio astronomers must mitigate the effects of manmade radio sources such as cell phones, satellites, aircraft, and observatory equipment. Radio frequency interference (RFI) often occurs as short bursts (active learning approach in which an astronomer labels events that are most confusing to a classifier, minimizing the human effort required for classification. We also explore the use of unsupervised clustering techniques, which automatically group events into classes without user input. We apply these techniques to data from the Parkes Multibeam Pulsar Survey to characterize several million detected RFI events from over a thousand hours of observation.

  16. Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP.

    Directory of Open Access Journals (Sweden)

    Yoonsik Shim

    2016-10-01

    Full Text Available We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP. The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.

  17. Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP.

    Science.gov (United States)

    Shim, Yoonsik; Philippides, Andrew; Staras, Kevin; Husbands, Phil

    2016-10-01

    We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.

  18. Cross-Domain Semi-Supervised Learning Using Feature Formulation.

    Science.gov (United States)

    Xingquan Zhu

    2011-12-01

    Semi-Supervised Learning (SSL) traditionally makes use of unlabeled samples by including them into the training set through an automated labeling process. Such a primitive Semi-Supervised Learning (pSSL) approach suffers from a number of disadvantages including false labeling and incapable of utilizing out-of-domain samples. In this paper, we propose a formative Semi-Supervised Learning (fSSL) framework which explores hidden features between labeled and unlabeled samples to achieve semi-supervised learning. fSSL regards that both labeled and unlabeled samples are generated from some hidden concepts with labeling information partially observable for some samples. The key of the fSSL is to recover the hidden concepts, and take them as new features to link labeled and unlabeled samples for semi-supervised learning. Because unlabeled samples are only used to generate new features, but not to be explicitly included in the training set like pSSL does, fSSL overcomes the inherent disadvantages of the traditional pSSL methods, especially for samples not within the same domain as the labeled instances. Experimental results and comparisons demonstrate that fSSL significantly outperforms pSSL-based methods for both within-domain and cross-domain semi-supervised learning.

  19. Coupled dimensionality reduction and classification for supervised and semi-supervised multilabel learning.

    Science.gov (United States)

    Gönen, Mehmet

    2014-03-01

    Coupled training of dimensionality reduction and classification is proposed previously to improve the prediction performance for single-label problems. Following this line of research, in this paper, we first introduce a novel Bayesian method that combines linear dimensionality reduction with linear binary classification for supervised multilabel learning and present a deterministic variational approximation algorithm to learn the proposed probabilistic model. We then extend the proposed method to find intrinsic dimensionality of the projected subspace using automatic relevance determination and to handle semi-supervised learning using a low-density assumption. We perform supervised learning experiments on four benchmark multilabel learning data sets by comparing our method with baseline linear dimensionality reduction algorithms. These experiments show that the proposed approach achieves good performance values in terms of hamming loss, average AUC, macro F 1 , and micro F 1 on held-out test data. The low-dimensional embeddings obtained by our method are also very useful for exploratory data analysis. We also show the effectiveness of our approach in finding intrinsic subspace dimensionality and semi-supervised learning tasks.

  20. An Improved EMD-Based Dissimilarity Metric for Unsupervised Linear Subspace Learning

    Directory of Open Access Journals (Sweden)

    Xiangchun Yu

    2018-01-01

    Full Text Available We investigate a novel way of robust face image feature extraction by adopting the methods based on Unsupervised Linear Subspace Learning to extract a small number of good features. Firstly, the face image is divided into blocks with the specified size, and then we propose and extract pooled Histogram of Oriented Gradient (pHOG over each block. Secondly, an improved Earth Mover’s Distance (EMD metric is adopted to measure the dissimilarity between blocks of one face image and the corresponding blocks from the rest of face images. Thirdly, considering the limitations of the original Locality Preserving Projections (LPP, we proposed the Block Structure LPP (BSLPP, which effectively preserves the structural information of face images. Finally, an adjacency graph is constructed and a small number of good features of a face image are obtained by methods based on Unsupervised Linear Subspace Learning. A series of experiments have been conducted on several well-known face databases to evaluate the effectiveness of the proposed algorithm. In addition, we construct the noise, geometric distortion, slight translation, slight rotation AR, and Extended Yale B face databases, and we verify the robustness of the proposed algorithm when faced with a certain degree of these disturbances.

  1. Wavelet-based unsupervised learning method for electrocardiogram suppression in surface electromyograms.

    Science.gov (United States)

    Niegowski, Maciej; Zivanovic, Miroslav

    2016-03-01

    We present a novel approach aimed at removing electrocardiogram (ECG) perturbation from single-channel surface electromyogram (EMG) recordings by means of unsupervised learning of wavelet-based intensity images. The general idea is to combine the suitability of certain wavelet decomposition bases which provide sparse electrocardiogram time-frequency representations, with the capacity of non-negative matrix factorization (NMF) for extracting patterns from images. In order to overcome convergence problems which often arise in NMF-related applications, we design a novel robust initialization strategy which ensures proper signal decomposition in a wide range of ECG contamination levels. Moreover, the method can be readily used because no a priori knowledge or parameter adjustment is needed. The proposed method was evaluated on real surface EMG signals against two state-of-the-art unsupervised learning algorithms and a singular spectrum analysis based method. The results, expressed in terms of high-to-low energy ratio, normalized median frequency, spectral power difference and normalized average rectified value, suggest that the proposed method enables better ECG-EMG separation quality than the reference methods. Copyright © 2015 IPEM. Published by Elsevier Ltd. All rights reserved.

  2. Learning representation hierarchies by sharing visual features: a computational investigation of Persian character recognition with unsupervised deep learning.

    Science.gov (United States)

    Sadeghi, Zahra; Testolin, Alberto

    2017-08-01

    In humans, efficient recognition of written symbols is thought to rely on a hierarchical processing system, where simple features are progressively combined into more abstract, high-level representations. Here, we present a computational model of Persian character recognition based on deep belief networks, where increasingly more complex visual features emerge in a completely unsupervised manner by fitting a hierarchical generative model to the sensory data. Crucially, high-level internal representations emerging from unsupervised deep learning can be easily read out by a linear classifier, achieving state-of-the-art recognition accuracy. Furthermore, we tested the hypothesis that handwritten digits and letters share many common visual features: A generative model that captures the statistical structure of the letters distribution should therefore also support the recognition of written digits. To this aim, deep networks trained on Persian letters were used to build high-level representations of Persian digits, which were indeed read out with high accuracy. Our simulations show that complex visual features, such as those mediating the identification of Persian symbols, can emerge from unsupervised learning in multilayered neural networks and can support knowledge transfer across related domains.

  3. Unsupervised learning by spike timing dependent plasticity in phase change memory (PCM synapses

    Directory of Open Access Journals (Sweden)

    Stefano eAmbrogio

    2016-03-01

    Full Text Available We present a novel one-transistor/one-resistor (1T1R synapse for neuromorphic networks, based on phase change memory (PCM technology. The synapse is capable of spike-timing dependent plasticity (STDP, where gradual potentiation relies on set transition, namely crystallization, in the PCM, while depression is achieved via reset or amorphization of a chalcogenide active volume. STDP characteristics are demonstrated by experiments under variable initial conditions and number of pulses. Finally, we support the applicability of the 1T1R synapse for learning and recognition of visual patterns by simulations of fully connected neuromorphic networks with 2 or 3 layers with high recognition efficiency. The proposed scheme provides a feasible low-power solution for on-line unsupervised machine learning in smart reconfigurable sensors.

  4. Action learning in undergraduate engineering thesis supervision

    OpenAIRE

    Stappenbelt, Brad

    2017-01-01

    In the present action learning implementation, twelve action learning sets were conducted over eight years. The action learning sets consisted of students involved in undergraduate engineering research thesis work. The concurrent study accompanying this initiative investigated the influence of the action learning environment on student approaches to learning and any accompanying academic, learning and personal benefits realised. The influence of preferred learning styles on set function and s...

  5. Can Semi-Supervised Learning Explain Incorrect Beliefs about Categories?

    Science.gov (United States)

    Kalish, Charles W.; Rogers, Timothy T.; Lang, Jonathan; Zhu, Xiaojin

    2011-01-01

    Three experiments with 88 college-aged participants explored how unlabeled experiences--learning episodes in which people encounter objects without information about their category membership--influence beliefs about category structure. Participants performed a simple one-dimensional categorization task in a brief supervised learning phase, then…

  6. Label Information Guided Graph Construction for Semi-Supervised Learning.

    Science.gov (United States)

    Zhuang, Liansheng; Zhou, Zihan; Gao, Shenghua; Yin, Jingwen; Lin, Zhouchen; Ma, Yi

    2017-09-01

    In the literature, most existing graph-based semi-supervised learning methods only use the label information of observed samples in the label propagation stage, while ignoring such valuable information when learning the graph. In this paper, we argue that it is beneficial to consider the label information in the graph learning stage. Specifically, by enforcing the weight of edges between labeled samples of different classes to be zero, we explicitly incorporate the label information into the state-of-the-art graph learning methods, such as the low-rank representation (LRR), and propose a novel semi-supervised graph learning method called semi-supervised low-rank representation. This results in a convex optimization problem with linear constraints, which can be solved by the linearized alternating direction method. Though we take LRR as an example, our proposed method is in fact very general and can be applied to any self-representation graph learning methods. Experiment results on both synthetic and real data sets demonstrate that the proposed graph learning method can better capture the global geometric structure of the data, and therefore is more effective for semi-supervised learning tasks.

  7. Visual texture perception via graph-based semi-supervised learning

    Science.gov (United States)

    Zhang, Qin; Dong, Junyu; Zhong, Guoqiang

    2018-04-01

    Perceptual features, for example direction, contrast and repetitiveness, are important visual factors for human to perceive a texture. However, it needs to perform psychophysical experiment to quantify these perceptual features' scale, which requires a large amount of human labor and time. This paper focuses on the task of obtaining perceptual features' scale of textures by small number of textures with perceptual scales through a rating psychophysical experiment (what we call labeled textures) and a mass of unlabeled textures. This is the scenario that the semi-supervised learning is naturally suitable for. This is meaningful for texture perception research, and really helpful for the perceptual texture database expansion. A graph-based semi-supervised learning method called random multi-graphs, RMG for short, is proposed to deal with this task. We evaluate different kinds of features including LBP, Gabor, and a kind of unsupervised deep features extracted by a PCA-based deep network. The experimental results show that our method can achieve satisfactory effects no matter what kind of texture features are used.

  8. Robust Semi-Supervised Manifold Learning Algorithm for Classification

    Directory of Open Access Journals (Sweden)

    Mingxia Chen

    2018-01-01

    Full Text Available In the recent years, manifold learning methods have been widely used in data classification to tackle the curse of dimensionality problem, since they can discover the potential intrinsic low-dimensional structures of the high-dimensional data. Given partially labeled data, the semi-supervised manifold learning algorithms are proposed to predict the labels of the unlabeled points, taking into account label information. However, these semi-supervised manifold learning algorithms are not robust against noisy points, especially when the labeled data contain noise. In this paper, we propose a framework for robust semi-supervised manifold learning (RSSML to address this problem. The noisy levels of the labeled points are firstly predicted, and then a regularization term is constructed to reduce the impact of labeled points containing noise. A new robust semi-supervised optimization model is proposed by adding the regularization term to the traditional semi-supervised optimization model. Numerical experiments are given to show the improvement and efficiency of RSSML on noisy data sets.

  9. Semi-supervised manifold learning with affinity regularization for Alzheimer's disease identification using positron emission tomography imaging.

    Science.gov (United States)

    Lu, Shen; Xia, Yong; Cai, Tom Weidong; Feng, David Dagan

    2015-01-01

    Dementia, Alzheimer's disease (AD) in particular is a global problem and big threat to the aging population. An image based computer-aided dementia diagnosis method is needed to providing doctors help during medical image examination. Many machine learning based dementia classification methods using medical imaging have been proposed and most of them achieve accurate results. However, most of these methods make use of supervised learning requiring fully labeled image dataset, which usually is not practical in real clinical environment. Using large amount of unlabeled images can improve the dementia classification performance. In this study we propose a new semi-supervised dementia classification method based on random manifold learning with affinity regularization. Three groups of spatial features are extracted from positron emission tomography (PET) images to construct an unsupervised random forest which is then used to regularize the manifold learning objective function. The proposed method, stat-of-the-art Laplacian support vector machine (LapSVM) and supervised SVM are applied to classify AD and normal controls (NC). The experiment results show that learning with unlabeled images indeed improves the classification performance. And our method outperforms LapSVM on the same dataset.

  10. A Distributed Algorithm for the Cluster-Based Outlier Detection Using Unsupervised Extreme Learning Machines

    Directory of Open Access Journals (Sweden)

    Xite Wang

    2017-01-01

    Full Text Available Outlier detection is an important data mining task, whose target is to find the abnormal or atypical objects from a given dataset. The techniques for detecting outliers have a lot of applications, such as credit card fraud detection and environment monitoring. Our previous work proposed the Cluster-Based (CB outlier and gave a centralized method using unsupervised extreme learning machines to compute CB outliers. In this paper, we propose a new distributed algorithm for the CB outlier detection (DACB. On the master node, we collect a small number of points from the slave nodes to obtain a threshold. On each slave node, we design a new filtering method that can use the threshold to efficiently speed up the computation. Furthermore, we also propose a ranking method to optimize the order of cluster scanning. At last, the effectiveness and efficiency of the proposed approaches are verified through a plenty of simulation experiments.

  11. Action Learning in Undergraduate Engineering Thesis Supervision

    Science.gov (United States)

    Stappenbelt, Brad

    2017-01-01

    In the present action learning implementation, twelve action learning sets were conducted over eight years. The action learning sets consisted of students involved in undergraduate engineering research thesis work. The concurrent study accompanying this initiative investigated the influence of the action learning environment on student approaches…

  12. Unsupervised Feature Subset Selection

    DEFF Research Database (Denmark)

    Søndberg-Madsen, Nicolaj; Thomsen, C.; Pena, Jose

    2003-01-01

    This paper studies filter and hybrid filter-wrapper feature subset selection for unsupervised learning (data clustering). We constrain the search for the best feature subset by scoring the dependence of every feature on the rest of the features, conjecturing that these scores discriminate some ir...... irrelevant features. We report experimental results on artificial and real data for unsupervised learning of naive Bayes models. Both the filter and hybrid approaches perform satisfactorily....

  13. Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling.

    Science.gov (United States)

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2017-06-01

    Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.

  14. Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling

    Science.gov (United States)

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2017-06-01

    Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.

  15. Transfer learning improves supervised image segmentation across imaging protocols.

    Science.gov (United States)

    van Opbroek, Annegreet; Ikram, M Arfan; Vernooij, Meike W; de Bruijne, Marleen

    2015-05-01

    The variation between images obtained with different scanners or different imaging protocols presents a major challenge in automatic segmentation of biomedical images. This variation especially hampers the application of otherwise successful supervised-learning techniques which, in order to perform well, often require a large amount of labeled training data that is exactly representative of the target data. We therefore propose to use transfer learning for image segmentation. Transfer-learning techniques can cope with differences in distributions between training and target data, and therefore may improve performance over supervised learning for segmentation across scanners and scan protocols. We present four transfer classifiers that can train a classification scheme with only a small amount of representative training data, in addition to a larger amount of other training data with slightly different characteristics. The performance of the four transfer classifiers was compared to that of standard supervised classification on two magnetic resonance imaging brain-segmentation tasks with multi-site data: white matter, gray matter, and cerebrospinal fluid segmentation; and white-matter-/MS-lesion segmentation. The experiments showed that when there is only a small amount of representative training data available, transfer learning can greatly outperform common supervised-learning approaches, minimizing classification errors by up to 60%.

  16. Improving Semi-Supervised Learning with Auxiliary Deep Generative Models

    DEFF Research Database (Denmark)

    Maaløe, Lars; Sønderby, Casper Kaae; Sønderby, Søren Kaae

    Deep generative models based upon continuous variational distributions parameterized by deep networks give state-of-the-art performance. In this paper we propose a framework for extending the latent representation with extra auxiliary variables in order to make the variational distribution more...... expressive for semi-supervised learning. By utilizing the stochasticity of the auxiliary variable we demonstrate how to train discriminative classifiers resulting in state-of-the-art performance within semi-supervised learning exemplified by an 0.96% error on MNIST using 100 labeled data points. Furthermore...

  17. Assessment of various supervised learning algorithms using different performance metrics

    Science.gov (United States)

    Susheel Kumar, S. M.; Laxkar, Deepak; Adhikari, Sourav; Vijayarajan, V.

    2017-11-01

    Our work brings out comparison based on the performance of supervised machine learning algorithms on a binary classification task. The supervised machine learning algorithms which are taken into consideration in the following work are namely Support Vector Machine(SVM), Decision Tree(DT), K Nearest Neighbour (KNN), Naïve Bayes(NB) and Random Forest(RF). This paper mostly focuses on comparing the performance of above mentioned algorithms on one binary classification task by analysing the Metrics such as Accuracy, F-Measure, G-Measure, Precision, Misclassification Rate, False Positive Rate, True Positive Rate, Specificity, Prevalence.

  18. Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings

    OpenAIRE

    Aldarmaki, Hanan; Mohan, Mahesh; Diab, Mona

    2017-01-01

    Most existing methods for automatic bilingual dictionary induction rely on prior alignments between the source and target languages, such as parallel corpora or seed dictionaries. For many language pairs, such supervised alignments are not readily available. We propose an unsupervised approach for learning a bilingual dictionary for a pair of languages given their independently-learned monolingual word embeddings. The proposed method exploits local and global structures in monolingual vector ...

  19. Extracting aerobic system dynamics during unsupervised activities of daily living using wearable sensor machine learning models.

    Science.gov (United States)

    Beltrame, Thomas; Amelard, Robert; Wong, Alexander; Hughson, Richard L

    2018-02-01

    sensors in unsupervised activities of daily living in combination with novel machine learning algorithms to investigate the aerobic system dynamics with the potential to contribute to models of functional health status and guide future individualized health care in the normal population.

  20. Fast and robust segmentation of white blood cell images by self-supervised learning.

    Science.gov (United States)

    Zheng, Xin; Wang, Yong; Wang, Guoyou; Liu, Jianguo

    2018-04-01

    A fast and accurate white blood cell (WBC) segmentation remains a challenging task, as different WBCs vary significantly in color and shape due to cell type differences, staining technique variations and the adhesion between the WBC and red blood cells. In this paper, a self-supervised learning approach, consisting of unsupervised initial segmentation and supervised segmentation refinement, is presented. The first module extracts the overall foreground region from the cell image by K-means clustering, and then generates a coarse WBC region by touching-cell splitting based on concavity analysis. The second module further uses the coarse segmentation result of the first module as automatic labels to actively train a support vector machine (SVM) classifier. Then, the trained SVM classifier is further used to classify each pixel of the image and achieve a more accurate segmentation result. To improve its segmentation accuracy, median color features representing the topological structure and a new weak edge enhancement operator (WEEO) handling fuzzy boundary are introduced. To further reduce its time cost, an efficient cluster sampling strategy is also proposed. We tested the proposed approach with two blood cell image datasets obtained under various imaging and staining conditions. The experiment results show that our approach has a superior performance of accuracy and time cost on both datasets. Copyright © 2018 Elsevier Ltd. All rights reserved.

  1. Semi-supervised Eigenvectors for Locally-biased Learning

    DEFF Research Database (Denmark)

    Hansen, Toke Jansen; Mahoney, Michael W.

    2012-01-01

    In many applications, one has side information, e.g., labels that are provided in a semi-supervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks "nearby" that pre-specified target region. Locally-biased problems of t...

  2. Generalization of Supervised Learning for Binary Mask Estimation

    DEFF Research Database (Denmark)

    May, Tobias; Gerkmann, Timo

    2014-01-01

    This paper addresses the problem of speech segregation by es- timating the ideal binary mask (IBM) from noisy speech. Two methods will be compared, one supervised learning approach that incorporates a priori knowledge about the feature distri- bution observed during training. The second method...

  3. Robust semi-supervised learning : projections, limits & constraints

    NARCIS (Netherlands)

    Krijthe, J.H.

    2018-01-01

    In many domains of science and society, the amount of data being gathered is increasing rapidly. To estimate input-output relationships that are often of interest, supervised learning techniques rely on a specific type of data: labeled examples for which we know both the input and an outcome. The

  4. Discriminatory Data Mapping by Matrix-Based Supervised Learning Metrics

    NARCIS (Netherlands)

    Strickert, M.; Schneider, P.; Keilwagen, J.; Villmann, T.; Biehl, M.; Hammer, B.

    2008-01-01

    Supervised attribute relevance detection using cross-comparisons (SARDUX), a recently proposed method for data-driven metric learning, is extended from dimension-weighted Minkowski distances to metrics induced by a data transformation matrix Ω for modeling mutual attribute dependence. Given class

  5. Self-supervised Chinese ontology learning from online encyclopedias.

    Science.gov (United States)

    Hu, Fanghuai; Shao, Zhiqing; Ruan, Tong

    2014-01-01

    Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.

  6. Arabic Supervised Learning Method Using N-Gram

    Science.gov (United States)

    Sanan, Majed; Rammal, Mahmoud; Zreik, Khaldoun

    2008-01-01

    Purpose: Recently, classification of Arabic documents is a real problem for juridical centers. In this case, some of the Lebanese official journal documents are classified, and the center has to classify new documents based on these documents. This paper aims to study and explain the useful application of supervised learning method on Arabic texts…

  7. An online learning space facilitating supervision pedagogies in ...

    African Journals Online (AJOL)

    Quality research supervision leading to timely completion and student satisfaction involves explicit pedagogy and effective communication. This article describes the development within an action research cycle of an online learning space designed to achieve these goals. The research 'spirals' involved interventions in the ...

  8. First regulatory inspections measuring adherence to Good Pharmacy Practices in the public sector in Uganda: a cross-sectional comparison of performance between supervised and unsupervised facilities.

    Science.gov (United States)

    Trap, Birna; Kikule, Kate; Vialle-Valentin, Catherine; Musoke, Richard; Lajul, Grace Otto; Hoppenworth, Kim; Konradsen, Dorthe

    2016-01-01

    Since its inception, the Uganda National Drug Authority (NDA) has regularly inspected private sector pharmacies to monitor adherence to Good Pharmacy Practices (GPP). This study reports findings from the first public facility inspections following an intervention (SPARS: Supervision, Performance Assessment, and Recognition Strategy) to build GPP and medicines management capacity in the public sector. The study includes 455 public facilities: 417 facilities were inspected after at least four SPARS visits by trained managerial district staff (SPARS group), 38 before any exposure to SPARS. NDA inspectors measured 10 critical, 20 major, and 37 minor GPP indicators in every facility and only accredited facilities that passed all 10 critical and failed no more than 7 major indicators. Lack of compliance for a given indicator was defined as less than 75 % facilities passing that indicator. We assessed factors associated with certification using logistic regression analysis and compared number of failed indicators between the SPARS and comparative groups using two sample t-tests with equal or unequal variance. 57.4 % of inspected facilities obtained GPP certification: 57.1 % in the SPARS and 60.5 % in the comparative group (Adj. OR = 0.91, 95 % CI 0.45-1.85, p = 0.802). Overall, facilities failed an average of 10 indicators. SPARS facilities performed better than comparative facilities (9 (SD 6.1) vs. 13 (SD 7.7) failed indicators respectively; p = 0.017), and SPARS supported facilities scored better on indicators covered by SPARS. For all indicators but one minor, performance in the SPARS group was equal to or significantly better than in unsupervised facilities. Within the SPARS (intervention) group, certified facilities had practices and behavioral changes; some require infrastructure investments. We conclude that regular NDA inspections of public sector pharmacies in conjunction with interventions to improve GPP adherence can revolutionize patient

  9. The clustering-based case-based reasoning for imbalanced business failure prediction: a hybrid approach through integrating unsupervised process with supervised process

    Science.gov (United States)

    Li, Hui; Yu, Jun-Ling; Yu, Le-An; Sun, Jie

    2014-05-01

    Case-based reasoning (CBR) is one of the main forecasting methods in business forecasting, which performs well in prediction and holds the ability of giving explanations for the results. In business failure prediction (BFP), the number of failed enterprises is relatively small, compared with the number of non-failed ones. However, the loss is huge when an enterprise fails. Therefore, it is necessary to develop methods (trained on imbalanced samples) which forecast well for this small proportion of failed enterprises and performs accurately on total accuracy meanwhile. Commonly used methods constructed on the assumption of balanced samples do not perform well in predicting minority samples on imbalanced samples consisting of the minority/failed enterprises and the majority/non-failed ones. This article develops a new method called clustering-based CBR (CBCBR), which integrates clustering analysis, an unsupervised process, with CBR, a supervised process, to enhance the efficiency of retrieving information from both minority and majority in CBR. In CBCBR, various case classes are firstly generated through hierarchical clustering inside stored experienced cases, and class centres are calculated out by integrating cases information in the same clustered class. When predicting the label of a target case, its nearest clustered case class is firstly retrieved by ranking similarities between the target case and each clustered case class centre. Then, nearest neighbours of the target case in the determined clustered case class are retrieved. Finally, labels of the nearest experienced cases are used in prediction. In the empirical experiment with two imbalanced samples from China, the performance of CBCBR was compared with the classical CBR, a support vector machine, a logistic regression and a multi-variant discriminate analysis. The results show that compared with the other four methods, CBCBR performed significantly better in terms of sensitivity for identifying the

  10. Efficient tuning in supervised machine learning

    NARCIS (Netherlands)

    Koch, Patrick

    2013-01-01

    The tuning of learning algorithm parameters has become more and more important during the last years. With the fast growth of computational power and available memory databases have grown dramatically. This is very challenging for the tuning of parameters arising in machine learning, since the

  11. Enhancing fieldwork learning using blended learning, GIS and remote supervision

    Science.gov (United States)

    Marra, Wouter A.; Alberti, Koko; Karssenberg, Derek

    2015-04-01

    Fieldwork is an important part of education in geosciences and essential to put theoretical knowledge into an authentic context. Fieldwork as teaching tool can take place in various forms, such as field-tutorial, excursion, or supervised research. Current challenges with fieldwork in education are to incorporate state-of-the art methods for digital data collection, on-site GIS-analysis and providing high-quality feedback to large groups of students in the field. We present a case on first-year earth-sciences fieldwork with approximately 80 students in the French Alps focused on geological and geomorphological mapping. Here, students work in couples and each couple maps their own fieldwork area to reconstruct the formative history. We present several major improvements for this fieldwork using a blended-learning approach, relying on open source software only. An important enhancement to the French Alps fieldwork is improving students' preparation. In a GIS environment, students explore their fieldwork areas using existing remote sensing data, a digital elevation model and derivatives to formulate testable hypotheses before the actual fieldwork. The advantage of this is that the students already know their area when arriving in the field, have started to apply the empirical cycle prior to their field visit, and are therefore eager to investigate their own research questions. During the fieldwork, students store and analyze their field observations in the same GIS environment. This enables them to get a better overview of their own collected data, and to integrate existing data sources also used in the preparation phase. This results in a quicker and enhanced understanding by the students. To enable remote access to observational data collected by students, the students synchronize their data daily with a webserver running a web map application. Supervisors can review students' progress remotely, examine and evaluate their observations in a GIS, and provide

  12. A review of supervised machine learning applied to ageing research.

    Science.gov (United States)

    Fabris, Fabio; Magalhães, João Pedro de; Freitas, Alex A

    2017-04-01

    Broadly speaking, supervised machine learning is the computational task of learning correlations between variables in annotated data (the training set), and using this information to create a predictive model capable of inferring annotations for new data, whose annotations are not known. Ageing is a complex process that affects nearly all animal species. This process can be studied at several levels of abstraction, in different organisms and with different objectives in mind. Not surprisingly, the diversity of the supervised machine learning algorithms applied to answer biological questions reflects the complexities of the underlying ageing processes being studied. Many works using supervised machine learning to study the ageing process have been recently published, so it is timely to review these works, to discuss their main findings and weaknesses. In summary, the main findings of the reviewed papers are: the link between specific types of DNA repair and ageing; ageing-related proteins tend to be highly connected and seem to play a central role in molecular pathways; ageing/longevity is linked with autophagy and apoptosis, nutrient receptor genes, and copper and iron ion transport. Additionally, several biomarkers of ageing were found by machine learning. Despite some interesting machine learning results, we also identified a weakness of current works on this topic: only one of the reviewed papers has corroborated the computational results of machine learning algorithms through wet-lab experiments. In conclusion, supervised machine learning has contributed to advance our knowledge and has provided novel insights on ageing, yet future work should have a greater emphasis in validating the predictions.

  13. Collaborative Supervised Learning for Sensor Networks

    Science.gov (United States)

    Wagstaff, Kiri L.; Rebbapragada, Umaa; Lane, Terran

    2011-01-01

    Collaboration methods for distributed machine-learning algorithms involve the specification of communication protocols for the learners, which can query other learners and/or broadcast their findings preemptively. Each learner incorporates information from its neighbors into its own training set, and they are thereby able to bootstrap each other to higher performance. Each learner resides at a different node in the sensor network and makes observations (collects data) independently of the other learners. After being seeded with an initial labeled training set, each learner proceeds to learn in an iterative fashion. New data is collected and classified. The learner can then either broadcast its most confident classifications for use by other learners, or can query neighbors for their classifications of its least confident items. As such, collaborative learning combines elements of both passive (broadcast) and active (query) learning. It also uses ideas from ensemble learning to combine the multiple responses to a given query into a single useful label. This approach has been evaluated against current non-collaborative alternatives, including training a single classifier and deploying it at all nodes with no further learning possible, and permitting learners to learn from their own most confident judgments, absent interaction with their neighbors. On several data sets, it has been consistently found that active collaboration is the best strategy for a distributed learner network. The main advantages include the ability for learning to take place autonomously by collaboration rather than by requiring intervention from an oracle (usually human), and also the ability to learn in a distributed environment, permitting decisions to be made in situ and to yield faster response time.

  14. CAPES: Unsupervised Storage Performance Tuning Using Neural Network-Based Deep Reinforcement Learning

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Parameter tuning is an important task of storage performance optimization. Current practice usually involves numerous tweak-benchmark cycles that are slow and costly. To address this issue, we developed CAPES, a model-less deep reinforcement learning-based unsupervised parameter tuning system driven by a deep neural network (DNN). It is designed to nd the optimal values of tunable parameters in computer systems, from a simple client-server system to a large data center, where human tuning can be costly and often cannot achieve optimal performance. CAPES takes periodic measurements of a target computer system’s state, and trains a DNN which uses Q-learning to suggest changes to the system’s current parameter values. CAPES is minimally intrusive, and can be deployed into a production system to collect training data and suggest tuning actions during the system’s daily operation. Evaluation of a prototype on a Lustre system demonstrates an increase in I/O throughput up to 45% at saturation point. About the...

  15. Unsupervised obstacle detection in driving environments using deep-learning-based stereovision

    KAUST Repository

    Dairi, Abdelkader; Harrou, Fouzi; Senouci, Mohamed; Sun, Ying

    2017-01-01

    A vision-based obstacle detection system is a key enabler for the development of autonomous robots and vehicles and intelligent transportation systems. This paper addresses the problem of urban scene monitoring and tracking of obstacles based on unsupervised, deep-learning approaches. Here, we design an innovative hybrid encoder that integrates deep Boltzmann machines (DBM) and auto-encoders (AE). This hybrid auto-encode (HAE) model combines the greedy learning features of DBM with the dimensionality reduction capacity of AE to accurately and reliably detect the presence of obstacles. We combine the proposed hybrid model with the one-class support vector machines (OCSVM) to visually monitor an urban scene. We also propose an efficient approach to estimating obstacles location and track their positions via scene densities. Specifically, we address obstacle detection as an anomaly detection problem. If an obstacle is detected by the OCSVM algorithm, then localization and tracking algorithm is executed. We validated the effectiveness of our approach by using experimental data from two publicly available dataset, the Malaga stereovision urban dataset (MSVUD) and the Daimler urban segmentation dataset (DUSD). Results show the capacity of the proposed approach to reliably detect obstacles.

  16. Unsupervised obstacle detection in driving environments using deep-learning-based stereovision

    KAUST Repository

    Dairi, Abdelkader

    2017-12-06

    A vision-based obstacle detection system is a key enabler for the development of autonomous robots and vehicles and intelligent transportation systems. This paper addresses the problem of urban scene monitoring and tracking of obstacles based on unsupervised, deep-learning approaches. Here, we design an innovative hybrid encoder that integrates deep Boltzmann machines (DBM) and auto-encoders (AE). This hybrid auto-encode (HAE) model combines the greedy learning features of DBM with the dimensionality reduction capacity of AE to accurately and reliably detect the presence of obstacles. We combine the proposed hybrid model with the one-class support vector machines (OCSVM) to visually monitor an urban scene. We also propose an efficient approach to estimating obstacles location and track their positions via scene densities. Specifically, we address obstacle detection as an anomaly detection problem. If an obstacle is detected by the OCSVM algorithm, then localization and tracking algorithm is executed. We validated the effectiveness of our approach by using experimental data from two publicly available dataset, the Malaga stereovision urban dataset (MSVUD) and the Daimler urban segmentation dataset (DUSD). Results show the capacity of the proposed approach to reliably detect obstacles.

  17. Learning Semantic Segmentation with Diverse Supervision

    OpenAIRE

    Ye, Linwei; Liu, Zhi; Wang, Yang

    2018-01-01

    Models based on deep convolutional neural networks (CNN) have significantly improved the performance of semantic segmentation. However, learning these models requires a large amount of training images with pixel-level labels, which are very costly and time-consuming to collect. In this paper, we propose a method for learning CNN-based semantic segmentation models from images with several types of annotations that are available for various computer vision tasks, including image-level labels fo...

  18. Effects of coaching supervision, mentoring supervision and abusive supervision on talent development among trainee doctors in public hospitals: moderating role of clinical learning environment.

    Science.gov (United States)

    Subramaniam, Anusuiya; Silong, Abu Daud; Uli, Jegak; Ismail, Ismi Arif

    2015-08-13

    Effective talent development requires robust supervision. However, the effects of supervisory styles (coaching, mentoring and abusive supervision) on talent development and the moderating effects of clinical learning environment in the relationship between supervisory styles and talent development among public hospital trainee doctors have not been thoroughly researched. In this study, we aim to achieve the following, (1) identify the extent to which supervisory styles (coaching, mentoring and abusive supervision) can facilitate talent development among trainee doctors in public hospital and (2) examine whether coaching, mentoring and abusive supervision are moderated by clinical learning environment in predicting talent development among trainee doctors in public hospital. A questionnaire-based critical survey was conducted among trainee doctors undergoing housemanship at six public hospitals in the Klang Valley, Malaysia. Prior permission was obtained from the Ministry of Health Malaysia to conduct the research in the identified public hospitals. The survey yielded 355 responses. The results were analysed using SPSS 20.0 and SEM with AMOS 20.0. The findings of this research indicate that coaching and mentoring supervision are positively associated with talent development, and that there is no significant relationship between abusive supervision and talent development. The findings also support the moderating role of clinical learning environment on the relationships between coaching supervision-talent development, mentoring supervision-talent development and abusive supervision-talent development among public hospital trainee doctors. Overall, the proposed model indicates a 26 % variance in talent development. This study provides an improved understanding on the role of the supervisory styles (coaching and mentoring supervision) on facilitating talent development among public hospital trainee doctors. Furthermore, this study extends the literature to better

  19. Supervised Machine Learning for Population Genetics: A New Paradigm

    Science.gov (United States)

    Schrider, Daniel R.; Kern, Andrew D.

    2018-01-01

    As population genomic datasets grow in size, researchers are faced with the daunting task of making sense of a flood of information. To keep pace with this explosion of data, computational methodologies for population genetic inference are rapidly being developed to best utilize genomic sequence data. In this review we discuss a new paradigm that has emerged in computational population genomics: that of supervised machine learning (ML). We review the fundamentals of ML, discuss recent applications of supervised ML to population genetics that outperform competing methods, and describe promising future directions in this area. Ultimately, we argue that supervised ML is an important and underutilized tool that has considerable potential for the world of evolutionary genomics. PMID:29331490

  20. Classification and unsupervised clustering of LIGO data with Deep Transfer Learning

    Science.gov (United States)

    George, Daniel; Shen, Hongyu; Huerta, E. A.

    2018-05-01

    Gravitational wave detection requires a detailed understanding of the response of the LIGO and Virgo detectors to true signals in the presence of environmental and instrumental noise. Of particular interest is the study of anomalous non-Gaussian transients, such as glitches, since their occurrence rate in LIGO and Virgo data can obscure or even mimic true gravitational wave signals. Therefore, successfully identifying and excising these anomalies from gravitational wave data is of utmost importance for the detection and characterization of true signals and for the accurate computation of their significance. To facilitate this work, we present the first application of deep learning combined with transfer learning to show that knowledge from pretrained models for real-world object recognition can be transferred for classifying spectrograms of glitches. To showcase this new method, we use a data set of twenty-two classes of glitches, curated and labeled by the Gravity Spy project using data collected during LIGO's first discovery campaign. We demonstrate that our Deep Transfer Learning method enables an optimal use of very deep convolutional neural networks for glitch classification given small and unbalanced training data sets, significantly reduces the training time, and achieves state-of-the-art accuracy above 98.8%, lowering the previous error rate by over 60%. More importantly, once trained via transfer learning on the known classes, we show that our neural networks can be truncated and used as feature extractors for unsupervised clustering to automatically group together new unknown classes of glitches and anomalous signals. This novel capability is of paramount importance to identify and remove new types of glitches which will occur as the LIGO/Virgo detectors gradually attain design sensitivity.

  1. Transfer learning improves supervised image segmentation across imaging protocols

    DEFF Research Database (Denmark)

    van Opbroek, Annegreet; Ikram, M. Arfan; Vernooij, Meike W.

    2015-01-01

    with slightly different characteristics. The performance of the four transfer classifiers was compared to that of standard supervised classification on two MRI brain-segmentation tasks with multi-site data: white matter, gray matter, and CSF segmentation; and white-matter- /MS-lesion segmentation......The variation between images obtained with different scanners or different imaging protocols presents a major challenge in automatic segmentation of biomedical images. This variation especially hampers the application of otherwise successful supervised-learning techniques which, in order to perform...... well, often require a large amount of labeled training data that is exactly representative of the target data. We therefore propose to use transfer learning for image segmentation. Transfer-learning techniques can cope with differences in distributions between training and target data, and therefore...

  2. A supervised learning rule for classification of spatiotemporal spike patterns.

    Science.gov (United States)

    Lilin Guo; Zhenzhong Wang; Adjouadi, Malek

    2016-08-01

    This study introduces a novel supervised algorithm for spiking neurons that take into consideration synapse delays and axonal delays associated with weights. It can be utilized for both classification and association and uses several biologically influenced properties, such as axonal and synaptic delays. This algorithm also takes into consideration spike-timing-dependent plasticity as in Remote Supervised Method (ReSuMe). This paper focuses on the classification aspect alone. Spiked neurons trained according to this proposed learning rule are capable of classifying different categories by the associated sequences of precisely timed spikes. Simulation results have shown that the proposed learning method greatly improves classification accuracy when compared to the Spike Pattern Association Neuron (SPAN) and the Tempotron learning rule.

  3. Stochastic microstructure characterization and reconstruction via supervised learning

    International Nuclear Information System (INIS)

    Bostanabad, Ramin; Bui, Anh Tuan; Xie, Wei; Apley, Daniel W.; Chen, Wei

    2016-01-01

    Microstructure characterization and reconstruction have become indispensable parts of computational materials science. The main contribution of this paper is to introduce a general methodology for practical and efficient characterization and reconstruction of stochastic microstructures based on supervised learning. The methodology is general in that it can be applied to a broad range of microstructures (clustered, porous, and anisotropic). By treating the digitized microstructure image as a set of training data, we generically learn the stochastic nature of the microstructure via fitting a supervised learning model to it (we focus on classification trees). The fitted supervised learning model provides an implicit characterization of the joint distribution of the collection of pixel phases in the image. Based on this characterization, we propose two different approaches to efficiently reconstruct any number of statistically equivalent microstructure samples. We test the approach on five examples and show that the spatial dependencies within the microstructures are well preserved, as evaluated via correlation and lineal-path functions. The main advantages of our approach stem from having a compact empirically-learned model that characterizes the stochastic nature of the microstructure, which not only makes reconstruction more computationally efficient than existing methods, but also provides insight into morphological complexity.

  4. Unsupervised learning of mixture models based on swarm intelligence and neural networks with optimal completion using incomplete data

    Directory of Open Access Journals (Sweden)

    Ahmed R. Abas

    2012-07-01

    Full Text Available In this paper, a new algorithm is presented for unsupervised learning of finite mixture models (FMMs using data set with missing values. This algorithm overcomes the local optima problem of the Expectation-Maximization (EM algorithm via integrating the EM algorithm with Particle Swarm Optimization (PSO. In addition, the proposed algorithm overcomes the problem of biased estimation due to overlapping clusters in estimating missing values in the input data set by integrating locally-tuned general regression neural networks with Optimal Completion Strategy (OCS. A comparison study shows the superiority of the proposed algorithm over other algorithms commonly used in the literature in unsupervised learning of FMM parameters that result in minimum mis-classification errors when used in clustering incomplete data set that is generated from overlapping clusters and these clusters are largely different in their sizes.

  5. Modeling Time Series Data for Supervised Learning

    Science.gov (United States)

    Baydogan, Mustafa Gokce

    2012-01-01

    Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning…

  6. Applying active learning to supervised word sense disambiguation in MEDLINE

    Science.gov (United States)

    Chen, Yukun; Cao, Hongxin; Mei, Qiaozhu; Zheng, Kai; Xu, Hua

    2013-01-01

    Objectives This study was to assess whether active learning strategies can be integrated with supervised word sense disambiguation (WSD) methods, thus reducing the number of annotated samples, while keeping or improving the quality of disambiguation models. Methods We developed support vector machine (SVM) classifiers to disambiguate 197 ambiguous terms and abbreviations in the MSH WSD collection. Three different uncertainty sampling-based active learning algorithms were implemented with the SVM classifiers and were compared with a passive learner (PL) based on random sampling. For each ambiguous term and each learning algorithm, a learning curve that plots the accuracy computed from the test set as a function of the number of annotated samples used in the model was generated. The area under the learning curve (ALC) was used as the primary metric for evaluation. Results Our experiments demonstrated that active learners (ALs) significantly outperformed the PL, showing better performance for 177 out of 197 (89.8%) WSD tasks. Further analysis showed that to achieve an average accuracy of 90%, the PL needed 38 annotated samples, while the ALs needed only 24, a 37% reduction in annotation effort. Moreover, we analyzed cases where active learning algorithms did not achieve superior performance and identified three causes: (1) poor models in the early learning stage; (2) easy WSD cases; and (3) difficult WSD cases, which provide useful insight for future improvements. Conclusions This study demonstrated that integrating active learning strategies with supervised WSD methods could effectively reduce annotation cost and improve the disambiguation models. PMID:23364851

  7. Applying active learning to supervised word sense disambiguation in MEDLINE.

    Science.gov (United States)

    Chen, Yukun; Cao, Hongxin; Mei, Qiaozhu; Zheng, Kai; Xu, Hua

    2013-01-01

    This study was to assess whether active learning strategies can be integrated with supervised word sense disambiguation (WSD) methods, thus reducing the number of annotated samples, while keeping or improving the quality of disambiguation models. We developed support vector machine (SVM) classifiers to disambiguate 197 ambiguous terms and abbreviations in the MSH WSD collection. Three different uncertainty sampling-based active learning algorithms were implemented with the SVM classifiers and were compared with a passive learner (PL) based on random sampling. For each ambiguous term and each learning algorithm, a learning curve that plots the accuracy computed from the test set as a function of the number of annotated samples used in the model was generated. The area under the learning curve (ALC) was used as the primary metric for evaluation. Our experiments demonstrated that active learners (ALs) significantly outperformed the PL, showing better performance for 177 out of 197 (89.8%) WSD tasks. Further analysis showed that to achieve an average accuracy of 90%, the PL needed 38 annotated samples, while the ALs needed only 24, a 37% reduction in annotation effort. Moreover, we analyzed cases where active learning algorithms did not achieve superior performance and identified three causes: (1) poor models in the early learning stage; (2) easy WSD cases; and (3) difficult WSD cases, which provide useful insight for future improvements. This study demonstrated that integrating active learning strategies with supervised WSD methods could effectively reduce annotation cost and improve the disambiguation models.

  8. Semi-supervised learning for ordinal Kernel Discriminant Analysis.

    Science.gov (United States)

    Pérez-Ortiz, M; Gutiérrez, P A; Carbonero-Ruz, M; Hervás-Martínez, C

    2016-12-01

    Ordinal classification considers those classification problems where the labels of the variable to predict follow a given order. Naturally, labelled data is scarce or difficult to obtain in this type of problems because, in many cases, ordinal labels are given by a user or expert (e.g. in recommendation systems). Firstly, this paper develops a new strategy for ordinal classification where both labelled and unlabelled data are used in the model construction step (a scheme which is referred to as semi-supervised learning). More specifically, the ordinal version of kernel discriminant learning is extended for this setting considering the neighbourhood information of unlabelled data, which is proposed to be computed in the feature space induced by the kernel function. Secondly, a new method for semi-supervised kernel learning is devised in the context of ordinal classification, which is combined with our developed classification strategy to optimise the kernel parameters. The experiments conducted compare 6 different approaches for semi-supervised learning in the context of ordinal classification in a battery of 30 datasets, showing (1) the good synergy of the ordinal version of discriminant analysis and the use of unlabelled data and (2) the advantage of computing distances in the feature space induced by the kernel function. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Unsupervised machine learning account of magnetic transitions in the Hubbard model

    Science.gov (United States)

    Ch'ng, Kelvin; Vazquez, Nick; Khatami, Ehsan

    2018-01-01

    We employ several unsupervised machine learning techniques, including autoencoders, random trees embedding, and t -distributed stochastic neighboring ensemble (t -SNE), to reduce the dimensionality of, and therefore classify, raw (auxiliary) spin configurations generated, through Monte Carlo simulations of small clusters, for the Ising and Fermi-Hubbard models at finite temperatures. Results from a convolutional autoencoder for the three-dimensional Ising model can be shown to produce the magnetization and the susceptibility as a function of temperature with a high degree of accuracy. Quantum fluctuations distort this picture and prevent us from making such connections between the output of the autoencoder and physical observables for the Hubbard model. However, we are able to define an indicator based on the output of the t -SNE algorithm that shows a near perfect agreement with the antiferromagnetic structure factor of the model in two and three spatial dimensions in the weak-coupling regime. t -SNE also predicts a transition to the canted antiferromagnetic phase for the three-dimensional model when a strong magnetic field is present. We show that these techniques cannot be expected to work away from half filling when the "sign problem" in quantum Monte Carlo simulations is present.

  10. Unsupervised Learning for Efficient Texture Estimation From Limited Discrete Orientation Data

    Science.gov (United States)

    Niezgoda, Stephen R.; Glover, Jared

    2013-11-01

    The estimation of orientation distribution functions (ODFs) from discrete orientation data, as produced by electron backscatter diffraction or crystal plasticity micromechanical simulations, is typically achieved via techniques such as the Williams-Imhof-Matthies-Vinel (WIMV) algorithm or generalized spherical harmonic expansions, which were originally developed for computing an ODF from pole figures measured by X-ray or neutron diffraction. These techniques rely on ad-hoc methods for choosing parameters, such as smoothing half-width and bandwidth, and for enforcing positivity constraints and appropriate normalization. In general, such approaches provide little or no information-theoretic guarantees as to their optimality in describing the given dataset. In the current study, an unsupervised learning algorithm is proposed which uses a finite mixture of Bingham distributions for the estimation of ODFs from discrete orientation data. The Bingham distribution is an antipodally-symmetric, max-entropy distribution on the unit quaternion hypersphere. The proposed algorithm also introduces a minimum message length criterion, a common tool in information theory for balancing data likelihood with model complexity, to determine the number of components in the Bingham mixture. This criterion leads to ODFs which are less likely to overfit (or underfit) the data, eliminating the need for a priori parameter choices.

  11. Machine learning in APOGEE. Unsupervised spectral classification with K-means

    Science.gov (United States)

    Garcia-Dias, Rafael; Allende Prieto, Carlos; Sánchez Almeida, Jorge; Ordovás-Pascual, Ignacio

    2018-05-01

    Context. The volume of data generated by astronomical surveys is growing rapidly. Traditional analysis techniques in spectroscopy either demand intensive human interaction or are computationally expensive. In this scenario, machine learning, and unsupervised clustering algorithms in particular, offer interesting alternatives. The Apache Point Observatory Galactic Evolution Experiment (APOGEE) offers a vast data set of near-infrared stellar spectra, which is perfect for testing such alternatives. Aims: Our research applies an unsupervised classification scheme based on K-means to the massive APOGEE data set. We explore whether the data are amenable to classification into discrete classes. Methods: We apply the K-means algorithm to 153 847 high resolution spectra (R ≈ 22 500). We discuss the main virtues and weaknesses of the algorithm, as well as our choice of parameters. Results: We show that a classification based on normalised spectra captures the variations in stellar atmospheric parameters, chemical abundances, and rotational velocity, among other factors. The algorithm is able to separate the bulge and halo populations, and distinguish dwarfs, sub-giants, RC, and RGB stars. However, a discrete classification in flux space does not result in a neat organisation in the parameters' space. Furthermore, the lack of obvious groups in flux space causes the results to be fairly sensitive to the initialisation, and disrupts the efficiency of commonly-used methods to select the optimal number of clusters. Our classification is publicly available, including extensive online material associated with the APOGEE Data Release 12 (DR12). Conclusions: Our description of the APOGEE database can help greatly with the identification of specific types of targets for various applications. We find a lack of obvious groups in flux space, and identify limitations of the K-means algorithm in dealing with this kind of data. Full Tables B.1-B.4 are only available at the CDS via

  12. Supervised Learning with Complex-valued Neural Networks

    CERN Document Server

    Suresh, Sundaram; Savitha, Ramasamy

    2013-01-01

    Recent advancements in the field of telecommunications, medical imaging and signal processing deal with signals that are inherently time varying, nonlinear and complex-valued. The time varying, nonlinear characteristics of these signals can be effectively analyzed using artificial neural networks.  Furthermore, to efficiently preserve the physical characteristics of these complex-valued signals, it is important to develop complex-valued neural networks and derive their learning algorithms to represent these signals at every step of the learning process. This monograph comprises a collection of new supervised learning algorithms along with novel architectures for complex-valued neural networks. The concepts of meta-cognition equipped with a self-regulated learning have been known to be the best human learning strategy. In this monograph, the principles of meta-cognition have been introduced for complex-valued neural networks in both the batch and sequential learning modes. For applications where the computati...

  13. Using Supervised Deep Learning for Human Age Estimation Problem

    Science.gov (United States)

    Drobnyh, K. A.; Polovinkin, A. N.

    2017-05-01

    Automatic facial age estimation is a challenging task upcoming in recent years. In this paper, we propose using the supervised deep learning features to improve an accuracy of the existing age estimation algorithms. There are many approaches solving the problem, an active appearance model and the bio-inspired features are two of them which showed the best accuracy. For experiments we chose popular publicly available FG-NET database, which contains 1002 images with a broad variety of light, pose, and expression. LOPO (leave-one-person-out) method was used to estimate the accuracy. Experiments demonstrated that adding supervised deep learning features has improved accuracy for some basic models. For example, adding the features to an active appearance model gave the 4% gain (the error decreased from 4.59 to 4.41).

  14. Learning Supervised Topic Models for Classification and Regression from Crowds.

    Science.gov (United States)

    Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete; Pereira, Francisco C

    2017-12-01

    The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages of the proposed model over state-of-the-art approaches.

  15. Learning Supervised Topic Models for Classification and Regression from Crowds

    DEFF Research Database (Denmark)

    Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete

    2017-01-01

    problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages...... annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression...

  16. Audio-based, unsupervised machine learning reveals cyclic changes in earthquake mechanisms in the Geysers geothermal field, California

    Science.gov (United States)

    Holtzman, B. K.; Paté, A.; Paisley, J.; Waldhauser, F.; Repetto, D.; Boschi, L.

    2017-12-01

    The earthquake process reflects complex interactions of stress, fracture and frictional properties. New machine learning methods reveal patterns in time-dependent spectral properties of seismic signals and enable identification of changes in faulting processes. Our methods are based closely on those developed for music information retrieval and voice recognition, using the spectrogram instead of the waveform directly. Unsupervised learning involves identification of patterns based on differences among signals without any additional information provided to the algorithm. Clustering of 46,000 earthquakes of $0.3

  17. Unsupervised Learning —A Novel Clustering Method for Rolling Bearing Faults Identification

    Science.gov (United States)

    Kai, Li; Bo, Luo; Tao, Ma; Xuefeng, Yang; Guangming, Wang

    2017-12-01

    To promptly process the massive fault data and automatically provide accurate diagnosis results, numerous studies have been conducted on intelligent fault diagnosis of rolling bearing. Among these studies, such as artificial neural networks, support vector machines, decision trees and other supervised learning methods are used commonly. These methods can detect the failure of rolling bearing effectively, but to achieve better detection results, it often requires a lot of training samples. Based on above, a novel clustering method is proposed in this paper. This novel method is able to find the correct number of clusters automatically the effectiveness of the proposed method is validated using datasets from rolling element bearings. The diagnosis results show that the proposed method can accurately detect the fault types of small samples. Meanwhile, the diagnosis results are also relative high accuracy even for massive samples.

  18. Superior arm-movement decoding from cortex with a new, unsupervised-learning algorithm

    Science.gov (United States)

    Makin, Joseph G.; O'Doherty, Joseph E.; Cardoso, Mariana M. B.; Sabes, Philip N.

    2018-04-01

    Objective. The aim of this work is to improve the state of the art for motor-control with a brain-machine interface (BMI). BMIs use neurological recording devices and decoding algorithms to transform brain activity directly into real-time control of a machine, archetypically a robotic arm or a cursor. The standard procedure treats neural activity—vectors of spike counts in small temporal windows—as noisy observations of the kinematic state (position, velocity, acceleration) of the fingertip. Inferring the state from the observations then takes the form of a dynamical filter, typically some variant on Kalman’s (KF). The KF, however, although fairly robust in practice, is optimal only when the relationships between variables are linear and the noise is Gaussian, conditions usually violated in practice. Approach. To overcome these limitations we introduce a new filter, the ‘recurrent exponential-family harmonium’ (rEFH), that models the spike counts explicitly as Poisson-distributed, and allows for arbitrary nonlinear dynamics and observation models. Furthermore, the model underlying the filter is acquired through unsupervised learning, which allows temporal correlations in spike counts to be explained by latent dynamics that do not necessarily correspond to the kinematic state of the fingertip. Main results. We test the rEFH on offline reconstruction of the kinematics of reaches in the plane. The rEFH outperforms the standard, as well as three other state-of-the-art, decoders, across three monkeys, two different tasks, most kinematic variables, and a range of bin widths, amounts of training data, and numbers of neurons. Significance. Our algorithm establishes a new state of the art for offline decoding of reaches—in particular, for fingertip velocities, the variable used for control in most online decoders.

  19. Double-Barrier Memristive Devices for Unsupervised Learning and Pattern Recognition.

    Science.gov (United States)

    Hansen, Mirko; Zahari, Finn; Ziegler, Martin; Kohlstedt, Hermann

    2017-01-01

    The use of interface-based resistive switching devices for neuromorphic computing is investigated. In a combined experimental and numerical study, the important device parameters and their impact on a neuromorphic pattern recognition system are studied. The memristive cells consist of a layer sequence Al/Al 2 O 3 /Nb x O y /Au and are fabricated on a 4-inch wafer. The key functional ingredients of the devices are a 1.3 nm thick Al 2 O 3 tunnel barrier and a 2.5 mm thick Nb x O y memristive layer. Voltage pulse measurements are used to study the electrical conditions for the emulation of synaptic functionality of single cells for later use in a recognition system. The results are evaluated and modeled in the framework of the plasticity model of Ziegler et al. Based on this model, which is matched to experimental data from 84 individual devices, the network performance with regard to yield, reliability, and variability is investigated numerically. As the network model, a computing scheme for pattern recognition and unsupervised learning based on the work of Querlioz et al. (2011), Sheridan et al. (2014), Zahari et al. (2015) is employed. This is a two-layer feedforward network with a crossbar array of memristive devices, leaky integrate-and-fire output neurons including a winner-takes-all strategy, and a stochastic coding scheme for the input pattern. As input pattern, the full data set of digits from the MNIST database is used. The numerical investigation indicates that the experimentally obtained yield, reliability, and variability of the memristive cells are suitable for such a network. Furthermore, evidence is presented that their strong I - V non-linearity might avoid the need for selector devices in crossbar array structures.

  20. On the Multi-Modal Object Tracking and Image Fusion Using Unsupervised Deep Learning Methodologies

    Science.gov (United States)

    LaHaye, N.; Ott, J.; Garay, M. J.; El-Askary, H. M.; Linstead, E.

    2017-12-01

    The number of different modalities of remote-sensors has been on the rise, resulting in large datasets with different complexity levels. Such complex datasets can provide valuable information separately, yet there is a bigger value in having a comprehensive view of them combined. As such, hidden information can be deduced through applying data mining techniques on the fused data. The curse of dimensionality of such fused data, due to the potentially vast dimension space, hinders our ability to have deep understanding of them. This is because each dataset requires a user to have instrument-specific and dataset-specific knowledge for optimum and meaningful usage. Once a user decides to use multiple datasets together, deeper understanding of translating and combining these datasets in a correct and effective manner is needed. Although there exists data centric techniques, generic automated methodologies that can potentially solve this problem completely don't exist. Here we are developing a system that aims to gain a detailed understanding of different data modalities. Such system will provide an analysis environment that gives the user useful feedback and can aid in research tasks. In our current work, we show the initial outputs our system implementation that leverages unsupervised deep learning techniques so not to burden the user with the task of labeling input data, while still allowing for a detailed machine understanding of the data. Our goal is to be able to track objects, like cloud systems or aerosols, across different image-like data-modalities. The proposed system is flexible, scalable and robust to understand complex likenesses within multi-modal data in a similar spatio-temporal range, and also to be able to co-register and fuse these images when needed.

  1. Glaucomatous patterns in Frequency Doubling Technology (FDT) perimetry data identified by unsupervised machine learning classifiers.

    Science.gov (United States)

    Bowd, Christopher; Weinreb, Robert N; Balasubramanian, Madhusudhanan; Lee, Intae; Jang, Giljin; Yousefi, Siamak; Zangwill, Linda M; Medeiros, Felipe A; Girkin, Christopher A; Liebmann, Jeffrey M; Goldbaum, Michael H

    2014-01-01

    The variational Bayesian independent component analysis-mixture model (VIM), an unsupervised machine-learning classifier, was used to automatically separate Matrix Frequency Doubling Technology (FDT) perimetry data into clusters of healthy and glaucomatous eyes, and to identify axes representing statistically independent patterns of defect in the glaucoma clusters. FDT measurements were obtained from 1,190 eyes with normal FDT results and 786 eyes with abnormal FDT results from the UCSD-based Diagnostic Innovations in Glaucoma Study (DIGS) and African Descent and Glaucoma Evaluation Study (ADAGES). For all eyes, VIM input was 52 threshold test points from the 24-2 test pattern, plus age. FDT mean deviation was -1.00 dB (S.D. = 2.80 dB) and -5.57 dB (S.D. = 5.09 dB) in FDT-normal eyes and FDT-abnormal eyes, respectively (p<0.001). VIM identified meaningful clusters of FDT data and positioned a set of statistically independent axes through the mean of each cluster. The optimal VIM model separated the FDT fields into 3 clusters. Cluster N contained primarily normal fields (1109/1190, specificity 93.1%) and clusters G1 and G2 combined, contained primarily abnormal fields (651/786, sensitivity 82.8%). For clusters G1 and G2 the optimal number of axes were 2 and 5, respectively. Patterns automatically generated along axes within the glaucoma clusters were similar to those known to be indicative of glaucoma. Fields located farther from the normal mean on each glaucoma axis showed increasing field defect severity. VIM successfully separated FDT fields from healthy and glaucoma eyes without a priori information about class membership, and identified familiar glaucomatous patterns of loss.

  2. Supervised hub-detection for brain connectivity

    DEFF Research Database (Denmark)

    Kasenburg, Niklas; Liptrot, Matthew George; Reislev, Nina Linde

    2016-01-01

    , but can smooth discriminative signals in the population, degrading predictive performance. We present a novel hub-detection optimized for supervised learning that both clusters network nodes based on population level variation in connectivity and also takes the learning problem into account. The found......A structural brain network consists of physical connections between brain regions. Brain network analysis aims to find features associated with a parameter of interest through supervised prediction models such as regression. Unsupervised preprocessing steps like clustering are often applied...... hubs are a low-dimensional representation of the network and are chosen based on predictive performance as features for a linear regression. We apply our method to the problem of finding age-related changes in structural connectivity. We compare our supervised hub-detection (SHD) to an unsupervised hub...

  3. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

    Directory of Open Access Journals (Sweden)

    Qingyu Chen

    Full Text Available First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases.We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.

  4. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

    Science.gov (United States)

    Chen, Qingyu; Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.

  5. Resting-state fMRI activity predicts unsupervised learning and memory in an immersive virtual reality environment.

    Directory of Open Access Journals (Sweden)

    Chi Wah Wong

    Full Text Available In the real world, learning often proceeds in an unsupervised manner without explicit instructions or feedback. In this study, we employed an experimental paradigm in which subjects explored an immersive virtual reality environment on each of two days. On day 1, subjects implicitly learned the location of 39 objects in an unsupervised fashion. On day 2, the locations of some of the objects were changed, and object location recall performance was assessed and found to vary across subjects. As prior work had shown that functional magnetic resonance imaging (fMRI measures of resting-state brain activity can predict various measures of brain performance across individuals, we examined whether resting-state fMRI measures could be used to predict object location recall performance. We found a significant correlation between performance and the variability of the resting-state fMRI signal in the basal ganglia, hippocampus, amygdala, thalamus, insula, and regions in the frontal and temporal lobes, regions important for spatial exploration, learning, memory, and decision making. In addition, performance was significantly correlated with resting-state fMRI connectivity between the left caudate and the right fusiform gyrus, lateral occipital complex, and superior temporal gyrus. Given the basal ganglia's role in exploration, these findings suggest that tighter integration of the brain systems responsible for exploration and visuospatial processing may be critical for learning in a complex environment.

  6. Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis

    NARCIS (Netherlands)

    Cheplygina, Veronika; de Bruijne, Marleen; Pluim, Josien P. W.

    2018-01-01

    Machine learning (ML) algorithms have made a tremendous impact in the field of medical imaging. While medical imaging datasets have been growing in size, a challenge for supervised ML algorithms that is frequently mentioned is the lack of annotated data. As a result, various methods which can learn

  7. Maximum margin semi-supervised learning with irrelevant data.

    Science.gov (United States)

    Yang, Haiqin; Huang, Kaizhu; King, Irwin; Lyu, Michael R

    2015-10-01

    Semi-supervised learning (SSL) is a typical learning paradigms training a model from both labeled and unlabeled data. The traditional SSL models usually assume unlabeled data are relevant to the labeled data, i.e., following the same distributions of the targeted labeled data. In this paper, we address a different, yet formidable scenario in semi-supervised classification, where the unlabeled data may contain irrelevant data to the labeled data. To tackle this problem, we develop a maximum margin model, named tri-class support vector machine (3C-SVM), to utilize the available training data, while seeking a hyperplane for separating the targeted data well. Our 3C-SVM exhibits several characteristics and advantages. First, it does not need any prior knowledge and explicit assumption on the data relatedness. On the contrary, it can relieve the effect of irrelevant unlabeled data based on the logistic principle and maximum entropy principle. That is, 3C-SVM approaches an ideal classifier. This classifier relies heavily on labeled data and is confident on the relevant data lying far away from the decision hyperplane, while maximally ignoring the irrelevant data, which are hardly distinguished. Second, theoretical analysis is provided to prove that in what condition, the irrelevant data can help to seek the hyperplane. Third, 3C-SVM is a generalized model that unifies several popular maximum margin models, including standard SVMs, Semi-supervised SVMs (S(3)VMs), and SVMs learned from the universum (U-SVMs) as its special cases. More importantly, we deploy a concave-convex produce to solve the proposed 3C-SVM, transforming the original mixed integer programming, to a semi-definite programming relaxation, and finally to a sequence of quadratic programming subproblems, which yields the same worst case time complexity as that of S(3)VMs. Finally, we demonstrate the effectiveness and efficiency of our proposed 3C-SVM through systematical experimental comparisons. Copyright

  8. Towards a new classification of stable phase schizophrenia into major and simple neuro-cognitive psychosis: Results of unsupervised machine learning analysis.

    Science.gov (United States)

    Kanchanatawan, Buranee; Sriswasdi, Sira; Thika, Supaksorn; Stoyanov, Drozdstoy; Sirivichayakul, Sunee; Carvalho, André F; Geffard, Michel; Maes, Michael

    2018-05-23

    Deficit schizophrenia, as defined by the Schedule for Deficit Syndrome, may represent a distinct diagnostic class defined by neurocognitive impairments coupled with changes in IgA/IgM responses to tryptophan catabolites (TRYCATs). Adequate classifications should be based on supervised and unsupervised learning rather than on consensus criteria. This study used machine learning as means to provide a more accurate classification of patients with stable phase schizophrenia. We found that using negative symptoms as discriminatory variables, schizophrenia patients may be divided into two distinct classes modelled by (A) impairments in IgA/IgM responses to noxious and generally more protective tryptophan catabolites, (B) impairments in episodic and semantic memory, paired associative learning and false memory creation, and (C) psychotic, excitation, hostility, mannerism, negative, and affective symptoms. The first cluster shows increased negative, psychotic, excitation, hostility, mannerism, depression and anxiety symptoms, and more neuroimmune and cognitive disorders and is therefore called "major neurocognitive psychosis" (MNP). The second cluster, called "simple neurocognitive psychosis" (SNP) is discriminated from normal controls by the same features although the impairments are less well developed than in MNP. The latter is additionally externally validated by lowered quality of life, body mass (reflecting a leptosome body type), and education (reflecting lower cognitive reserve). Previous distinctions including "type 1" (positive)/"type 2" (negative) and DSM-IV-TR (eg, paranoid) schizophrenia could not be validated using machine learning techniques. Previous names of the illness, including schizophrenia, are not very adequate because they do not describe the features of the illness, namely, interrelated neuroimmune, cognitive, and clinical features. Stable-phase schizophrenia consists of 2 relevant qualitatively distinct categories or nosological entities with SNP

  9. Robust head pose estimation via supervised manifold learning.

    Science.gov (United States)

    Wang, Chao; Song, Xubo

    2014-05-01

    Head poses can be automatically estimated using manifold learning algorithms, with the assumption that with the pose being the only variable, the face images should lie in a smooth and low-dimensional manifold. However, this estimation approach is challenging due to other appearance variations related to identity, head location in image, background clutter, facial expression, and illumination. To address the problem, we propose to incorporate supervised information (pose angles of training samples) into the process of manifold learning. The process has three stages: neighborhood construction, graph weight computation and projection learning. For the first two stages, we redefine inter-point distance for neighborhood construction as well as graph weight by constraining them with the pose angle information. For Stage 3, we present a supervised neighborhood-based linear feature transformation algorithm to keep the data points with similar pose angles close together but the data points with dissimilar pose angles far apart. The experimental results show that our method has higher estimation accuracy than the other state-of-art algorithms and is robust to identity and illumination variations. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. Supervised Learning in Spiking Neural Networks for Precise Temporal Encoding.

    Science.gov (United States)

    Gardner, Brian; Grüning, André

    2016-01-01

    Precise spike timing as a means to encode information in neural networks is biologically supported, and is advantageous over frequency-based codes by processing input features on a much shorter time-scale. For these reasons, much recent attention has been focused on the development of supervised learning rules for spiking neural networks that utilise a temporal coding scheme. However, despite significant progress in this area, there still lack rules that have a theoretical basis, and yet can be considered biologically relevant. Here we examine the general conditions under which synaptic plasticity most effectively takes place to support the supervised learning of a precise temporal code. As part of our analysis we examine two spike-based learning methods: one of which relies on an instantaneous error signal to modify synaptic weights in a network (INST rule), and the other one relying on a filtered error signal for smoother synaptic weight modifications (FILT rule). We test the accuracy of the solutions provided by each rule with respect to their temporal encoding precision, and then measure the maximum number of input patterns they can learn to memorise using the precise timings of individual spikes as an indication of their storage capacity. Our results demonstrate the high performance of the FILT rule in most cases, underpinned by the rule's error-filtering mechanism, which is predicted to provide smooth convergence towards a desired solution during learning. We also find the FILT rule to be most efficient at performing input pattern memorisations, and most noticeably when patterns are identified using spikes with sub-millisecond temporal precision. In comparison with existing work, we determine the performance of the FILT rule to be consistent with that of the highly efficient E-learning Chronotron rule, but with the distinct advantage that our FILT rule is also implementable as an online method for increased biological realism.

  11. Supervised learning of probability distributions by neural networks

    Science.gov (United States)

    Baum, Eric B.; Wilczek, Frank

    1988-01-01

    Supervised learning algorithms for feedforward neural networks are investigated analytically. The back-propagation algorithm described by Werbos (1974), Parker (1985), and Rumelhart et al. (1986) is generalized by redefining the values of the input and output neurons as probabilities. The synaptic weights are then varied to follow gradients in the logarithm of likelihood rather than in the error. This modification is shown to provide a more rigorous theoretical basis for the algorithm and to permit more accurate predictions. A typical application involving a medical-diagnosis expert system is discussed.

  12. Combining theories to reach multi-faceted insights into learning opportunities in doctoral supervision

    DEFF Research Database (Denmark)

    Kobayashi, Sofie; Rump, Camilla Østerberg

    The aim of this paper is to illustrate how theories can be combined to explore opportunities for learning in doctoral supervision. While our earlier research into learning dynamics in doctoral supervision in life science research (Kobayashi, 2014) has focused on illustrating learning opportunitie...

  13. Mining FDA drug labels using an unsupervised learning technique--topic modeling.

    Science.gov (United States)

    Bisgin, Halil; Liu, Zhichao; Fang, Hong; Xu, Xiaowei; Tong, Weida

    2011-10-18

    The Food and Drug Administration (FDA) approved drug labels contain a broad array of information, ranging from adverse drug reactions (ADRs) to drug efficacy, risk-benefit consideration, and more. However, the labeling language used to describe these information is free text often containing ambiguous semantic descriptions, which poses a great challenge in retrieving useful information from the labeling text in a consistent and accurate fashion for comparative analysis across drugs. Consequently, this task has largely relied on the manual reading of the full text by experts, which is time consuming and labor intensive. In this study, a novel text mining method with unsupervised learning in nature, called topic modeling, was applied to the drug labeling with a goal of discovering "topics" that group drugs with similar safety concerns and/or therapeutic uses together. A total of 794 FDA-approved drug labels were used in this study. First, the three labeling sections (i.e., Boxed Warning, Warnings and Precautions, Adverse Reactions) of each drug label were processed by the Medical Dictionary for Regulatory Activities (MedDRA) to convert the free text of each label to the standard ADR terms. Next, the topic modeling approach with latent Dirichlet allocation (LDA) was applied to generate 100 topics, each associated with a set of drugs grouped together based on the probability analysis. Lastly, the efficacy of the topic modeling was evaluated based on known information about the therapeutic uses and safety data of drugs. The results demonstrate that drugs grouped by topics are associated with the same safety concerns and/or therapeutic uses with statistical significance (P<0.05). The identified topics have distinct context that can be directly linked to specific adverse events (e.g., liver injury or kidney injury) or therapeutic application (e.g., antiinfectives for systemic use). We were also able to identify potential adverse events that might arise from specific

  14. Mining FDA drug labels using an unsupervised learning technique - topic modeling

    Science.gov (United States)

    2011-01-01

    Background The Food and Drug Administration (FDA) approved drug labels contain a broad array of information, ranging from adverse drug reactions (ADRs) to drug efficacy, risk-benefit consideration, and more. However, the labeling language used to describe these information is free text often containing ambiguous semantic descriptions, which poses a great challenge in retrieving useful information from the labeling text in a consistent and accurate fashion for comparative analysis across drugs. Consequently, this task has largely relied on the manual reading of the full text by experts, which is time consuming and labor intensive. Method In this study, a novel text mining method with unsupervised learning in nature, called topic modeling, was applied to the drug labeling with a goal of discovering “topics” that group drugs with similar safety concerns and/or therapeutic uses together. A total of 794 FDA-approved drug labels were used in this study. First, the three labeling sections (i.e., Boxed Warning, Warnings and Precautions, Adverse Reactions) of each drug label were processed by the Medical Dictionary for Regulatory Activities (MedDRA) to convert the free text of each label to the standard ADR terms. Next, the topic modeling approach with latent Dirichlet allocation (LDA) was applied to generate 100 topics, each associated with a set of drugs grouped together based on the probability analysis. Lastly, the efficacy of the topic modeling was evaluated based on known information about the therapeutic uses and safety data of drugs. Results The results demonstrate that drugs grouped by topics are associated with the same safety concerns and/or therapeutic uses with statistical significance (P<0.05). The identified topics have distinct context that can be directly linked to specific adverse events (e.g., liver injury or kidney injury) or therapeutic application (e.g., antiinfectives for systemic use). We were also able to identify potential adverse events that

  15. An unsupervised strategy for biomedical image segmentation

    Directory of Open Access Journals (Sweden)

    Roberto Rodríguez

    2010-09-01

    Full Text Available Roberto Rodríguez1, Rubén Hernández21Digital Signal Processing Group, Institute of Cybernetics, Mathematics, and Physics, Havana, Cuba; 2Interdisciplinary Professional Unit of Engineering and Advanced Technology, IPN, MexicoAbstract: Many segmentation techniques have been published, and some of them have been widely used in different application problems. Most of these segmentation techniques have been motivated by specific application purposes. Unsupervised methods, which do not assume any prior scene knowledge can be learned to help the segmentation process, and are obviously more challenging than the supervised ones. In this paper, we present an unsupervised strategy for biomedical image segmentation using an algorithm based on recursively applying mean shift filtering, where entropy is used as a stopping criterion. This strategy is proven with many real images, and a comparison is carried out with manual segmentation. With the proposed strategy, errors less than 20% for false positives and 0% for false negatives are obtained.Keywords: segmentation, mean shift, unsupervised segmentation, entropy

  16. Semi-Supervised Multitask Learning for Scene Recognition.

    Science.gov (United States)

    Lu, Xiaoqiang; Li, Xuelong; Mou, Lichao

    2015-09-01

    Scene recognition has been widely studied to understand visual information from the level of objects and their relationships. Toward scene recognition, many methods have been proposed. They, however, encounter difficulty to improve the accuracy, mainly due to two limitations: 1) lack of analysis of intrinsic relationships across different scales, say, the initial input and its down-sampled versions and 2) existence of redundant features. This paper develops a semi-supervised learning mechanism to reduce the above two limitations. To address the first limitation, we propose a multitask model to integrate scene images of different resolutions. For the second limitation, we build a model of sparse feature selection-based manifold regularization (SFSMR) to select the optimal information and preserve the underlying manifold structure of data. SFSMR coordinates the advantages of sparse feature selection and manifold regulation. Finally, we link the multitask model and SFSMR, and propose the semi-supervised learning method to reduce the two limitations. Experimental results report the improvements of the accuracy in scene recognition.

  17. Prototype Vector Machine for Large Scale Semi-Supervised Learning

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Kai; Kwok, James T.; Parvin, Bahram

    2009-04-29

    Practicaldataminingrarelyfalls exactlyinto the supervisedlearning scenario. Rather, the growing amount of unlabeled data poses a big challenge to large-scale semi-supervised learning (SSL). We note that the computationalintensivenessofgraph-based SSLarises largely from the manifold or graph regularization, which in turn lead to large models that are dificult to handle. To alleviate this, we proposed the prototype vector machine (PVM), a highlyscalable,graph-based algorithm for large-scale SSL. Our key innovation is the use of"prototypes vectors" for effcient approximation on both the graph-based regularizer and model representation. The choice of prototypes are grounded upon two important criteria: they not only perform effective low-rank approximation of the kernel matrix, but also span a model suffering the minimum information loss compared with the complete model. We demonstrate encouraging performance and appealing scaling properties of the PVM on a number of machine learning benchmark data sets.

  18. Supervised learning with decision margins in pools of spiking neurons.

    Science.gov (United States)

    Le Mouel, Charlotte; Harris, Kenneth D; Yger, Pierre

    2014-10-01

    Learning to categorise sensory inputs by generalising from a few examples whose category is precisely known is a crucial step for the brain to produce appropriate behavioural responses. At the neuronal level, this may be performed by adaptation of synaptic weights under the influence of a training signal, in order to group spiking patterns impinging on the neuron. Here we describe a framework that allows spiking neurons to perform such "supervised learning", using principles similar to the Support Vector Machine, a well-established and robust classifier. Using a hinge-loss error function, we show that requesting a margin similar to that of the SVM improves performance on linearly non-separable problems. Moreover, we show that using pools of neurons to discriminate categories can also increase the performance by sharing the load among neurons.

  19. Multicultural supervision: lessons learned about an ongoing struggle.

    Science.gov (United States)

    Christiansen, Abigail Tolhurst; Thomas, Volker; Kafescioglu, Nilufer; Karakurt, Gunnur; Lowe, Walter; Smith, William; Wittenborn, Andrea

    2011-01-01

    This article examines the experiences of seven diverse therapists in a supervision course as they wrestled with the real-world application of multicultural supervision. Existing literature on multicultural supervision does not address the difficulties that arise in addressing multicultural issues in the context of the supervision relationship. The experiences of six supervisory candidates and one mentoring supervisor in addressing multicultural issues in supervision are explored. Guidelines for conversations regarding multicultural issues are provided. © 2011 American Association for Marriage and Family Therapy.

  20. SuperSpike: Supervised Learning in Multilayer Spiking Neural Networks.

    Science.gov (United States)

    Zenke, Friedemann; Ganguli, Surya

    2018-04-13

    A vast majority of computation in the brain is performed by spiking neural networks. Despite the ubiquity of such spiking, we currently lack an understanding of how biological spiking neural circuits learn and compute in vivo, as well as how we can instantiate such capabilities in artificial spiking circuits in silico. Here we revisit the problem of supervised learning in temporally coding multilayer spiking neural networks. First, by using a surrogate gradient approach, we derive SuperSpike, a nonlinear voltage-based three-factor learning rule capable of training multilayer networks of deterministic integrate-and-fire neurons to perform nonlinear computations on spatiotemporal spike patterns. Second, inspired by recent results on feedback alignment, we compare the performance of our learning rule under different credit assignment strategies for propagating output errors to hidden units. Specifically, we test uniform, symmetric, and random feedback, finding that simpler tasks can be solved with any type of feedback, while more complex tasks require symmetric feedback. In summary, our results open the door to obtaining a better scientific understanding of learning and computation in spiking neural networks by advancing our ability to train them to solve nonlinear problems involving transformations between different spatiotemporal spike time patterns.

  1. Spiking Neural Networks with Unsupervised Learning Based on STDP Using Resistive Synaptic Devices and Analog CMOS Neuron Circuit.

    Science.gov (United States)

    Kwon, Min-Woo; Baek, Myung-Hyun; Hwang, Sungmin; Kim, Sungjun; Park, Byung-Gook

    2018-09-01

    We designed the CMOS analog integrate and fire (I&F) neuron circuit can drive resistive synaptic device. The neuron circuit consists of a current mirror for spatial integration, a capacitor for temporal integration, asymmetric negative and positive pulse generation part, a refractory part, and finally a back-propagation pulse generation part for learning of the synaptic devices. The resistive synaptic devices were fabricated using HfOx switching layer by atomic layer deposition (ALD). The resistive synaptic device had gradual set and reset characteristics and the conductance was adjusted by spike-timing-dependent-plasticity (STDP) learning rule. We carried out circuit simulation of synaptic device and CMOS neuron circuit. And we have developed an unsupervised spiking neural networks (SNNs) for 5 × 5 pattern recognition and classification using the neuron circuit and synaptic devices. The hardware-based SNNs can autonomously and efficiently control the weight updates of the synapses between neurons, without the aid of software calculations.

  2. Descriptor Learning via Supervised Manifold Regularization for Multioutput Regression.

    Science.gov (United States)

    Zhen, Xiantong; Yu, Mengyang; Islam, Ali; Bhaduri, Mousumi; Chan, Ian; Li, Shuo

    2017-09-01

    Multioutput regression has recently shown great ability to solve challenging problems in both computer vision and medical image analysis. However, due to the huge image variability and ambiguity, it is fundamentally challenging to handle the highly complex input-target relationship of multioutput regression, especially with indiscriminate high-dimensional representations. In this paper, we propose a novel supervised descriptor learning (SDL) algorithm for multioutput regression, which can establish discriminative and compact feature representations to improve the multivariate estimation performance. The SDL is formulated as generalized low-rank approximations of matrices with a supervised manifold regularization. The SDL is able to simultaneously extract discriminative features closely related to multivariate targets and remove irrelevant and redundant information by transforming raw features into a new low-dimensional space aligned to targets. The achieved discriminative while compact descriptor largely reduces the variability and ambiguity for multioutput regression, which enables more accurate and efficient multivariate estimation. We conduct extensive evaluation of the proposed SDL on both synthetic data and real-world multioutput regression tasks for both computer vision and medical image analysis. Experimental results have shown that the proposed SDL can achieve high multivariate estimation accuracy on all tasks and largely outperforms the algorithms in the state of the arts. Our method establishes a novel SDL framework for multioutput regression, which can be widely used to boost the performance in different applications.

  3. Phenotype classification of zebrafish embryos by supervised learning.

    Directory of Open Access Journals (Sweden)

    Nathalie Jeanray

    Full Text Available Zebrafish is increasingly used to assess biological properties of chemical substances and thus is becoming a specific tool for toxicological and pharmacological studies. The effects of chemical substances on embryo survival and development are generally evaluated manually through microscopic observation by an expert and documented by several typical photographs. Here, we present a methodology to automatically classify brightfield images of wildtype zebrafish embryos according to their defects by using an image analysis approach based on supervised machine learning. We show that, compared to manual classification, automatic classification results in 90 to 100% agreement with consensus voting of biological experts in nine out of eleven considered defects in 3 days old zebrafish larvae. Automation of the analysis and classification of zebrafish embryo pictures reduces the workload and time required for the biological expert and increases the reproducibility and objectivity of this classification.

  4. Semi-Supervised Learning to Identify UMLS Semantic Relations.

    Science.gov (United States)

    Luo, Yuan; Uzuner, Ozlem

    2014-01-01

    The UMLS Semantic Network is constructed by experts and requires periodic expert review to update. We propose and implement a semi-supervised approach for automatically identifying UMLS semantic relations from narrative text in PubMed. Our method analyzes biomedical narrative text to collect semantic entity pairs, and extracts multiple semantic, syntactic and orthographic features for the collected pairs. We experiment with seeded k-means clustering with various distance metrics. We create and annotate a ground truth corpus according to the top two levels of the UMLS semantic relation hierarchy. We evaluate our system on this corpus and characterize the learning curves of different clustering configuration. Using KL divergence consistently performs the best on the held-out test data. With full seeding, we obtain macro-averaged F-measures above 70% for clustering the top level UMLS relations (2-way), and above 50% for clustering the second level relations (7-way).

  5. Unbiased and non-supervised learning methods for disruption prediction at JET

    International Nuclear Information System (INIS)

    Murari, A.; Vega, J.; Ratta, G.A.; Vagliasindi, G.; Johnson, M.F.; Hong, S.H.

    2009-01-01

    The importance of predicting the occurrence of disruptions is going to increase significantly in the next generation of tokamak devices. The expected energy content of ITER plasmas, for example, is such that disruptions could have a significant detrimental impact on various parts of the device, ranging from erosion of plasma facing components to structural damage. Early detection of disruptions is therefore needed with evermore increasing urgency. In this paper, the results of a series of methods to predict disruptions at JET are reported. The main objective of the investigation consists of trying to determine how early before a disruption it is possible to perform acceptable predictions on the basis of the raw data, keeping to a minimum the number of 'ad hoc' hypotheses. Therefore, the chosen learning techniques have the common characteristic of requiring a minimum number of assumptions. Classification and Regression Trees (CART) is a supervised but, on the other hand, a completely unbiased and nonlinear method, since it simply constructs the best classification tree by working directly on the input data. A series of unsupervised techniques, mainly K-means and hierarchical, have also been tested, to investigate to what extent they can autonomously distinguish between disruptive and non-disruptive groups of discharges. All these independent methods indicate that, in general, prediction with a success rate above 80% can be achieved not earlier than 180 ms before the disruption. The agreement between various completely independent methods increases the confidence in the results, which are also confirmed by a visual inspection of the data performed with pseudo Grand Tour algorithms.

  6. Supervised dictionary learning for inferring concurrent brain networks.

    Science.gov (United States)

    Zhao, Shijie; Han, Junwei; Lv, Jinglei; Jiang, Xi; Hu, Xintao; Zhao, Yu; Ge, Bao; Guo, Lei; Liu, Tianming

    2015-10-01

    Task-based fMRI (tfMRI) has been widely used to explore functional brain networks via predefined stimulus paradigm in the fMRI scan. Traditionally, the general linear model (GLM) has been a dominant approach to detect task-evoked networks. However, GLM focuses on task-evoked or event-evoked brain responses and possibly ignores the intrinsic brain functions. In comparison, dictionary learning and sparse coding methods have attracted much attention recently, and these methods have shown the promise of automatically and systematically decomposing fMRI signals into meaningful task-evoked and intrinsic concurrent networks. Nevertheless, two notable limitations of current data-driven dictionary learning method are that the prior knowledge of task paradigm is not sufficiently utilized and that the establishment of correspondences among dictionary atoms in different brains have been challenging. In this paper, we propose a novel supervised dictionary learning and sparse coding method for inferring functional networks from tfMRI data, which takes both of the advantages of model-driven method and data-driven method. The basic idea is to fix the task stimulus curves as predefined model-driven dictionary atoms and only optimize the other portion of data-driven dictionary atoms. Application of this novel methodology on the publicly available human connectome project (HCP) tfMRI datasets has achieved promising results.

  7. Supervised Learning Applied to Air Traffic Trajectory Classification

    Science.gov (United States)

    Bosson, Christabelle; Nikoleris, Tasos

    2018-01-01

    Given the recent increase of interest in introducing new vehicle types and missions into the National Airspace System, a transition towards a more autonomous air traffic control system is required in order to enable and handle increased density and complexity. This paper presents an exploratory effort of the needed autonomous capabilities by exploring supervised learning techniques in the context of aircraft trajectories. In particular, it focuses on the application of machine learning algorithms and neural network models to a runway recognition trajectory-classification study. It investigates the applicability and effectiveness of various classifiers using datasets containing trajectory records for a month of air traffic. A feature importance and sensitivity analysis are conducted to challenge the chosen time-based datasets and the ten selected features. The study demonstrates that classification accuracy levels of 90% and above can be reached in less than 40 seconds of training for most machine learning classifiers when one track data point, described by the ten selected features at a particular time step, per trajectory is used as input. It also shows that neural network models can achieve similar accuracy levels but at higher training time costs.

  8. Healthcare students' evaluation of the clinical learning environment and supervision - a cross-sectional study.

    Science.gov (United States)

    Pitkänen, Salla; Kääriäinen, Maria; Oikarainen, Ashlee; Tuomikoski, Anna-Maria; Elo, Satu; Ruotsalainen, Heidi; Saarikoski, Mikko; Kärsämänoja, Taina; Mikkonen, Kristina

    2018-03-01

    The purpose of clinical placements and supervision is to promote the development of healthcare students´ professional skills. High-quality clinical learning environments and supervision were shown to have significant influence on healthcare students´ professional development. This study aimed to describe healthcare students` evaluation of the clinical learning environment and supervision, and to identify the factors that affect these. The study was performed as a cross-sectional study. The data (n = 1973) were gathered through an online survey using the Clinical Learning Environment, Supervision and Nurse Teacher scale during the academic year 2015-2016 from all healthcare students (N = 2500) who completed their clinical placement at a certain university hospital in Finland. The data were analysed using descriptive statistics and binary logistic regression analysis. More than half of the healthcare students had a named supervisor and supervision was completed as planned. The students evaluated the clinical learning environment and supervision as 'good'. The students´ readiness to recommend the unit to other students and the frequency of separate private unscheduled sessions with the supervisor were the main factors that affect healthcare students` evaluation of the clinical learning environment and supervision. Individualized and goal-oriented supervision in which the student had a named supervisor and where supervision was completed as planned in a positive environment that supported learning had a significant impact on student's learning. The clinical learning environment and supervision support the development of future healthcare professionals' clinical competence. The supervisory relationship was shown to have a significant effect on the outcomes of students' experiences. We recommend the planning of educational programmes for supervisors of healthcare students for the enhancement of supervisors' pedagogical competencies in supervising students in

  9. Learning How to Supervise: Midlevel Managers' Individual Learning Journeys

    Science.gov (United States)

    David, Keegan

    2010-01-01

    The purpose of this study was to explore how midlevel managers in student affairs learn supervisory skills. Student affairs professionals are given tremendous responsibility for the lives of students outside the classroom. The Association of College Personnel Administrators and other sources outlined the necessary competencies for student affairs…

  10. Supervised spike-timing-dependent plasticity: a spatiotemporal neuronal learning rule for function approximation and decisions.

    Science.gov (United States)

    Franosch, Jan-Moritz P; Urban, Sebastian; van Hemmen, J Leo

    2013-12-01

    How can an animal learn from experience? How can it train sensors, such as the auditory or tactile system, based on other sensory input such as the visual system? Supervised spike-timing-dependent plasticity (supervised STDP) is a possible answer. Supervised STDP trains one modality using input from another one as "supervisor." Quite complex time-dependent relationships between the senses can be learned. Here we prove that under very general conditions, supervised STDP converges to a stable configuration of synaptic weights leading to a reconstruction of primary sensory input.

  11. Evaluation of Semi-supervised Learning for Classification of Protein Crystallization Imagery.

    Science.gov (United States)

    Sigdel, Madhav; Dinç, İmren; Dinç, Semih; Sigdel, Madhu S; Pusey, Marc L; Aygün, Ramazan S

    2014-03-01

    In this paper, we investigate the performance of two wrapper methods for semi-supervised learning algorithms for classification of protein crystallization images with limited labeled images. Firstly, we evaluate the performance of semi-supervised approach using self-training with naïve Bayesian (NB) and sequential minimum optimization (SMO) as the base classifiers. The confidence values returned by these classifiers are used to select high confident predictions to be used for self-training. Secondly, we analyze the performance of Yet Another Two Stage Idea (YATSI) semi-supervised learning using NB, SMO, multilayer perceptron (MLP), J48 and random forest (RF) classifiers. These results are compared with the basic supervised learning using the same training sets. We perform our experiments on a dataset consisting of 2250 protein crystallization images for different proportions of training and test data. Our results indicate that NB and SMO using both self-training and YATSI semi-supervised approaches improve accuracies with respect to supervised learning. On the other hand, MLP, J48 and RF perform better using basic supervised learning. Overall, random forest classifier yields the best accuracy with supervised learning for our dataset.

  12. Automated Glioblastoma Segmentation Based on a Multiparametric Structured Unsupervised Classification

    Science.gov (United States)

    Juan-Albarracín, Javier; Fuster-Garcia, Elies; Manjón, José V.; Robles, Montserrat; Aparici, F.; Martí-Bonmatí, L.; García-Gómez, Juan M.

    2015-01-01

    Automatic brain tumour segmentation has become a key component for the future of brain tumour treatment. Currently, most of brain tumour segmentation approaches arise from the supervised learning standpoint, which requires a labelled training dataset from which to infer the models of the classes. The performance of these models is directly determined by the size and quality of the training corpus, whose retrieval becomes a tedious and time-consuming task. On the other hand, unsupervised approaches avoid these limitations but often do not reach comparable results than the supervised methods. In this sense, we propose an automated unsupervised method for brain tumour segmentation based on anatomical Magnetic Resonance (MR) images. Four unsupervised classification algorithms, grouped by their structured or non-structured condition, were evaluated within our pipeline. Considering the non-structured algorithms, we evaluated K-means, Fuzzy K-means and Gaussian Mixture Model (GMM), whereas as structured classification algorithms we evaluated Gaussian Hidden Markov Random Field (GHMRF). An automated postprocess based on a statistical approach supported by tissue probability maps is proposed to automatically identify the tumour classes after the segmentations. We evaluated our brain tumour segmentation method with the public BRAin Tumor Segmentation (BRATS) 2013 Test and Leaderboard datasets. Our approach based on the GMM model improves the results obtained by most of the supervised methods evaluated with the Leaderboard set and reaches the second position in the ranking. Our variant based on the GHMRF achieves the first position in the Test ranking of the unsupervised approaches and the seventh position in the general Test ranking, which confirms the method as a viable alternative for brain tumour segmentation. PMID:25978453

  13. Learning outcomes using video in supervision and peer feedback during clinical skills training

    DEFF Research Database (Denmark)

    Lauridsen, Henrik Hein; Toftgård, Rie Castella; Nørgaard, Cita

    supervision of clinical skills (formative assessment). Demonstrations of these principles will be presented as video podcasts during the session. The learning outcomes of video supervision and peer-feedback were assessed in an online questionnaire survey. Results Results of the supervision showed large self......Objective New technology and learning principles were introduced in a clinical skills training laboratory (iLab). The intension was to move from apprenticeship to active learning principles including peer feedback and supervision using video. The objective of this study was to evaluate student...... learning outcomes in a manual skills training subject using video during feedback and supervision. Methods The iLab classroom was designed to fit four principles of teaching using video. Two of these principles were (a) group work using peer-feedback on videos produced by the students and, (b) video...

  14. Automated Spirometry Quality Assurance: Supervised Learning From Multiple Experts.

    Science.gov (United States)

    Velickovski, Filip; Ceccaroni, Luigi; Marti, Robert; Burgos, Felip; Gistau, Concepcion; Alsina-Restoy, Xavier; Roca, Josep

    2018-01-01

    Forced spirometry testing is gradually becoming available across different healthcare tiers including primary care. It has been demonstrated in earlier work that commercially available spirometers are not fully able to assure the quality of individual spirometry manoeuvres. Thus, a need to expand the availability of high-quality spirometry assessment beyond specialist pulmonary centres has arisen. In this paper, we propose a method to select and optimise a classifier using supervised learning techniques by learning from previously classified forced spirometry tests from a group of experts. Such a method is able to take into account the shape of the curve as an expert would during visual inspection. We evaluated the final classifier on a dataset put aside for evaluation yielding an area under the receiver operating characteristic curve of 0.88 and specificities of 0.91 and 0.86 for sensitivities of 0.60 and 0.82. Furthermore, other specificities and sensitivities along the receiver operating characteristic curve were close to the level of the experts when compared against each-other, and better than an earlier rules-based method assessed on the same dataset. We foresee key benefits in raising diagnostic quality, saving time, reducing cost, and also improving remote care and monitoring services for patients with chronic respiratory diseases in the future if a clinical decision support system with the encapsulated classifier is to be integrated into the work-flow of forced spirometry testing.

  15. Supervised learning with restricted training sets: a generating functional analysis

    Energy Technology Data Exchange (ETDEWEB)

    Heimel, J.A.F.; Coolen, A.C.C. [Department of Mathematics, King' s College London, Strand, London (United Kingdom)

    2001-10-26

    We study the dynamics of supervised on-line learning of realizable tasks in feed-forward neural networks. We focus on the regime where the number of examples used for training is proportional to the number of input channels N. Using generating functional techniques from spin glass theory, we are able to average over the composition of the training set and transform the problem for N{yields}{infinity} to an effective single pattern system described completely by the student autocovariance, the student-teacher overlap and the student response function with exact closed equations. Our method applies to arbitrary learning rules, i.e., not necessarily of a gradient-descent type. The resulting exact macroscopic dynamical equations can be integrated without finite-size effects up to any degree of accuracy, but their main value is in providing an exact and simple starting point for analytical approximation schemes. Finally, we show how, in the region of absent anomalous response and using the hypothesis that (as in detailed balance systems) the short-time part of the various operators can be transformed away, one can describe the stationary state of the network successfully by a set of coupled equations involving only four scalar order parameters. (author)

  16. Building an Arabic Sentiment Lexicon Using Semi-supervised Learning

    Directory of Open Access Journals (Sweden)

    Fawaz H.H. Mahyoub

    2014-12-01

    Full Text Available Sentiment analysis is the process of determining a predefined sentiment from text written in a natural language with respect to the entity to which it is referring. A number of lexical resources are available to facilitate this task in English. One such resource is the SentiWordNet, which assigns sentiment scores to words found in the English WordNet. In this paper, we present an Arabic sentiment lexicon that assigns sentiment scores to the words found in the Arabic WordNet. Starting from a small seed list of positive and negative words, we used semi-supervised learning to propagate the scores in the Arabic WordNet by exploiting the synset relations. Our algorithm assigned a positive sentiment score to more than 800, a negative score to more than 600 and a neutral score to more than 6000 words in the Arabic WordNet. The lexicon was evaluated by incorporating it into a machine learning-based classifier. The experiments were conducted on several Arabic sentiment corpora, and we were able to achieve a 96% classification accuracy.

  17. Supervised Filter Learning for Representation Based Face Recognition.

    Directory of Open Access Journals (Sweden)

    Chao Bi

    Full Text Available Representation based classification methods, such as Sparse Representation Classification (SRC and Linear Regression Classification (LRC have been developed for face recognition problem successfully. However, most of these methods use the original face images without any preprocessing for recognition. Thus, their performances may be affected by some problematic factors (such as illumination and expression variances in the face images. In order to overcome this limitation, a novel supervised filter learning algorithm is proposed for representation based face recognition in this paper. The underlying idea of our algorithm is to learn a filter so that the within-class representation residuals of the faces' Local Binary Pattern (LBP features are minimized and the between-class representation residuals of the faces' LBP features are maximized. Therefore, the LBP features of filtered face images are more discriminative for representation based classifiers. Furthermore, we also extend our algorithm for heterogeneous face recognition problem. Extensive experiments are carried out on five databases and the experimental results verify the efficacy of the proposed algorithm.

  18. A numeric comparison of variable selection algorithms for supervised learning

    International Nuclear Information System (INIS)

    Palombo, G.; Narsky, I.

    2009-01-01

    Datasets in modern High Energy Physics (HEP) experiments are often described by dozens or even hundreds of input variables. Reducing a full variable set to a subset that most completely represents information about data is therefore an important task in analysis of HEP data. We compare various variable selection algorithms for supervised learning using several datasets such as, for instance, imaging gamma-ray Cherenkov telescope (MAGIC) data found at the UCI repository. We use classifiers and variable selection methods implemented in the statistical package StatPatternRecognition (SPR), a free open-source C++ package developed in the HEP community ( (http://sourceforge.net/projects/statpatrec/)). For each dataset, we select a powerful classifier and estimate its learning accuracy on variable subsets obtained by various selection algorithms. When possible, we also estimate the CPU time needed for the variable subset selection. The results of this analysis are compared with those published previously for these datasets using other statistical packages such as R and Weka. We show that the most accurate, yet slowest, method is a wrapper algorithm known as generalized sequential forward selection ('Add N Remove R') implemented in SPR.

  19. Competencies to enable learning-focused clinical supervision: a thematic analysis of the literature.

    Science.gov (United States)

    Pront, Leeanne; Gillham, David; Schuwirth, Lambert W T

    2016-04-01

    Clinical supervision is essential for development of health professional students and widely recognised as a significant factor influencing student learning. Although considered important, delivery is often founded on personal experience or a series of predetermined steps that offer standardised behavioural approaches. Such a view may limit the capacity to promote individualised student learning in complex clinical environments. The objective of this review was to develop a comprehensive understanding of what is considered 'good' clinical supervision, within health student education. The literature provides many perspectives, so collation and interpretation were needed to aid development and understanding for all clinicians required to perform clinical supervision within their daily practice. A comprehensive thematic literature review was carried out, which included a variety of health disciplines and geographical environments. Literature addressing 'good' clinical supervision consists primarily of descriptive qualitative research comprising mostly small studies that repeated descriptions of student and supervisor opinions of 'good' supervision. Synthesis and thematic analysis of the literature resulted in four 'competency' domains perceived to inform delivery of learning-focused or 'good' clinical supervision. Domains understood to promote student learning are co-dependent and include 'to partner', 'to nurture', 'to engage' and 'to facilitate meaning'. Clinical supervision is a complex phenomenon and establishing a comprehensive understanding across health disciplines can influence the future health workforce. The learning-focused clinical supervision domains presented here provide an alternative perspective of clinical supervision of health students. This paper is the first step in establishing a more comprehensive understanding of learning-focused clinical supervision, which may lead to development of competencies for clinical supervision. © 2016 John Wiley

  20. Effects of Supervised vs. Unsupervised Training Programs on Balance and Muscle Strength in Older Adults : A Systematic Review and Meta-Analysis

    NARCIS (Netherlands)

    Lacroix, Andre; Hortobagyi, Tibor; Beurskens, Rainer; Granacher, Urs

    Background Balance and resistance training can improve healthy older adults' balance and muscle strength. Delivering such exercise programs at home without supervision may facilitate participation for older adults because they do not have to leave their homes. To date, no systematic literature

  1. Effects of Supervised vs. Unsupervised Training Programs on Balance and Muscle Strength in Older Adults : A Systematic Review and Meta-Analysis

    NARCIS (Netherlands)

    Lacroix, Andre; Hortobagyi, Tibor; Beurskens, Rainer; Granacher, Urs

    2017-01-01

    Background Balance and resistance training can improve healthy older adults' balance and muscle strength. Delivering such exercise programs at home without supervision may facilitate participation for older adults because they do not have to leave their homes. To date, no systematic literature

  2. SPAM CLASSIFICATION BASED ON SUPERVISED LEARNING USING MACHINE LEARNING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    T. Hamsapriya

    2011-12-01

    Full Text Available E-mail is one of the most popular and frequently used ways of communication due to its worldwide accessibility, relatively fast message transfer, and low sending cost. The flaws in the e-mail protocols and the increasing amount of electronic business and financial transactions directly contribute to the increase in e-mail-based threats. Email spam is one of the major problems of the today’s Internet, bringing financial damage to companies and annoying individual users. Spam emails are invading users without their consent and filling their mail boxes. They consume more network capacity as well as time in checking and deleting spam mails. The vast majority of Internet users are outspoken in their disdain for spam, although enough of them respond to commercial offers that spam remains a viable source of income to spammers. While most of the users want to do right think to avoid and get rid of spam, they need clear and simple guidelines on how to behave. In spite of all the measures taken to eliminate spam, they are not yet eradicated. Also when the counter measures are over sensitive, even legitimate emails will be eliminated. Among the approaches developed to stop spam, filtering is the one of the most important technique. Many researches in spam filtering have been centered on the more sophisticated classifier-related issues. In recent days, Machine learning for spam classification is an important research issue. The effectiveness of the proposed work is explores and identifies the use of different learning algorithms for classifying spam messages from e-mail. A comparative analysis among the algorithms has also been presented.

  3. Active semi-supervised learning method with hybrid deep belief networks.

    Science.gov (United States)

    Zhou, Shusen; Chen, Qingcai; Wang, Xiaolong

    2014-01-01

    In this paper, we develop a novel semi-supervised learning algorithm called active hybrid deep belief networks (AHD), to address the semi-supervised sentiment classification problem with deep learning. First, we construct the previous several hidden layers using restricted Boltzmann machines (RBM), which can reduce the dimension and abstract the information of the reviews quickly. Second, we construct the following hidden layers using convolutional restricted Boltzmann machines (CRBM), which can abstract the information of reviews effectively. Third, the constructed deep architecture is fine-tuned by gradient-descent based supervised learning with an exponential loss function. Finally, active learning method is combined based on the proposed deep architecture. We did several experiments on five sentiment classification datasets, and show that AHD is competitive with previous semi-supervised learning algorithm. Experiments are also conducted to verify the effectiveness of our proposed method with different number of labeled reviews and unlabeled reviews respectively.

  4. Classification and Diagnostic Output Prediction of Cancer Using Gene Expression Profiling and Supervised Machine Learning Algorithms

    DEFF Research Database (Denmark)

    Yoo, C.; Gernaey, Krist

    2008-01-01

    importance in the projection (VIP) information of the DPLS method. The power of the gene selection method and the proposed supervised hierarchical clustering method is illustrated on a three microarray data sets of leukemia, breast, and colon cancer. Supervised machine learning algorithms thus enable...

  5. The Practice of Supervision for Professional Learning: The Example of Future Forensic Specialists

    Science.gov (United States)

    Köpsén, Susanne; Nyström, Sofia

    2015-01-01

    Supervision intended to support learning is of great interest in professional knowledge development. No single definition governs the implementation and enactment of supervision because of different conditions, intentions, and pedagogical approaches. Uncertainty exists at a time when knowledge and methods are undergoing constant development. This…

  6. Factored Translation with Unsupervised Word Clusters

    DEFF Research Database (Denmark)

    Rishøj, Christian; Søgaard, Anders

    2011-01-01

    Unsupervised word clustering algorithms — which form word clusters based on a measure of distributional similarity — have proven to be useful in providing beneficial features for various natural language processing tasks involving supervised learning. This work explores the utility of such word...... clusters as factors in statistical machine translation. Although some of the language pairs in this work clearly benefit from the factor augmentation, there is no consistent improvement in translation accuracy across the board. For all language pairs, the word clusters clearly improve translation for some...... proportion of the sentences in the test set, but has a weak or even detrimental effect on the rest. It is shown that if one could determine whether or not to use a factor when translating a given sentence, rather substantial improvements in precision could be achieved for all of the language pairs evaluated...

  7. Regular graph construction for semi-supervised learning

    International Nuclear Information System (INIS)

    Vega-Oliveros, Didier A; Berton, Lilian; Eberle, Andre Mantini; Lopes, Alneu de Andrade; Zhao, Liang

    2014-01-01

    Semi-supervised learning (SSL) stands out for using a small amount of labeled points for data clustering and classification. In this scenario graph-based methods allow the analysis of local and global characteristics of the available data by identifying classes or groups regardless data distribution and representing submanifold in Euclidean space. Most of methods used in literature for SSL classification do not worry about graph construction. However, regular graphs can obtain better classification accuracy compared to traditional methods such as k-nearest neighbor (kNN), since kNN benefits the generation of hubs and it is not appropriate for high-dimensionality data. Nevertheless, methods commonly used for generating regular graphs have high computational cost. We tackle this problem introducing an alternative method for generation of regular graphs with better runtime performance compared to methods usually find in the area. Our technique is based on the preferential selection of vertices according some topological measures, like closeness, generating at the end of the process a regular graph. Experiments using the global and local consistency method for label propagation show that our method provides better or equal classification rate in comparison with kNN

  8. Semi-Supervised Learning for Classification of Protein Sequence Data

    Directory of Open Access Journals (Sweden)

    Brian R. King

    2008-01-01

    Full Text Available Protein sequence data continue to become available at an exponential rate. Annotation of functional and structural attributes of these data lags far behind, with only a small fraction of the data understood and labeled by experimental methods. Classification methods that are based on semi-supervised learning can increase the overall accuracy of classifying partly labeled data in many domains, but very few methods exist that have shown their effect on protein sequence classification. We show how proven methods from text classification can be applied to protein sequence data, as we consider both existing and novel extensions to the basic methods, and demonstrate restrictions and differences that must be considered. We demonstrate comparative results against the transductive support vector machine, and show superior results on the most difficult classification problems. Our results show that large repositories of unlabeled protein sequence data can indeed be used to improve predictive performance, particularly in situations where there are fewer labeled protein sequences available, and/or the data are highly unbalanced in nature.

  9. Experiments on Supervised Learning Algorithms for Text Categorization

    Science.gov (United States)

    Namburu, Setu Madhavi; Tu, Haiying; Luo, Jianhui; Pattipati, Krishna R.

    2005-01-01

    Modern information society is facing the challenge of handling massive volume of online documents, news, intelligence reports, and so on. How to use the information accurately and in a timely manner becomes a major concern in many areas. While the general information may also include images and voice, we focus on the categorization of text data in this paper. We provide a brief overview of the information processing flow for text categorization, and discuss two supervised learning algorithms, viz., support vector machines (SVM) and partial least squares (PLS), which have been successfully applied in other domains, e.g., fault diagnosis [9]. While SVM has been well explored for binary classification and was reported as an efficient algorithm for text categorization, PLS has not yet been applied to text categorization. Our experiments are conducted on three data sets: Reuter's- 21578 dataset about corporate mergers and data acquisitions (ACQ), WebKB and the 20-Newsgroups. Results show that the performance of PLS is comparable to SVM in text categorization. A major drawback of SVM for multi-class categorization is that it requires a voting scheme based on the results of pair-wise classification. PLS does not have this drawback and could be a better candidate for multi-class text categorization.

  10. Mastication Evaluation With Unsupervised Learning: Using an Inertial Sensor-Based System

    Science.gov (United States)

    Lucena, Caroline Vieira; Lacerda, Marcelo; Caldas, Rafael; De Lima Neto, Fernando Buarque

    2018-01-01

    There is a direct relationship between the prevalence of musculoskeletal disorders of the temporomandibular joint and orofacial disorders. A well-elaborated analysis of the jaw movements provides relevant information for healthcare professionals to conclude their diagnosis. Different approaches have been explored to track jaw movements such that the mastication analysis is getting less subjective; however, all methods are still highly subjective, and the quality of the assessments depends much on the experience of the health professional. In this paper, an accurate and non-invasive method based on a commercial low-cost inertial sensor (MPU6050) to measure jaw movements is proposed. The jaw-movement feature values are compared to the obtained with clinical analysis, showing no statistically significant difference between both methods. Moreover, We propose to use unsupervised paradigm approaches to cluster mastication patterns of healthy subjects and simulated patients with facial trauma. Two techniques were used in this paper to instantiate the method: Kohonen’s Self-Organizing Maps and K-Means Clustering. Both algorithms have excellent performances to process jaw-movements data, showing encouraging results and potential to bring a full assessment of the masticatory function. The proposed method can be applied in real-time providing relevant dynamic information for health-care professionals. PMID:29651365

  11. Spectrum Hole Identification in IEEE 802.22 WRAN using Unsupervised Learning

    Directory of Open Access Journals (Sweden)

    V. Balaji

    2016-01-01

    Full Text Available In this paper we present a Cooperative Spectrum Sensing (CSS algorithm for Cognitive Radios (CR based on IEEE 802.22Wireless Regional Area Network (WRAN standard. The core objective is to improve cooperative sensing efficiency which specifies how fast a decision can be reached in each round of cooperation (iteration to sense an appropriate number of channels/bands (i.e. 86 channels of 7MHz bandwidth as per IEEE 802.22 within a time constraint (channel sensing time. To meet this objective, we have developed CSS algorithm using unsupervised K-means clustering classification approach. The received energy level of each Secondary User (SU is considered as the parameter for determining channel availability. The performance of proposed algorithm is quantified in terms of detection accuracy, training and classification delay time. Further, the detection accuracy of our proposed scheme meets the requirement of IEEE 802.22 WRAN with the target probability of falsealrm as 0.1. All the simulations are carried out using Matlab tool.

  12. Mastication Evaluation With Unsupervised Learning: Using an Inertial Sensor-Based System.

    Science.gov (United States)

    Lucena, Caroline Vieira; Lacerda, Marcelo; Caldas, Rafael; De Lima Neto, Fernando Buarque; Rativa, Diego

    2018-01-01

    There is a direct relationship between the prevalence of musculoskeletal disorders of the temporomandibular joint and orofacial disorders. A well-elaborated analysis of the jaw movements provides relevant information for healthcare professionals to conclude their diagnosis. Different approaches have been explored to track jaw movements such that the mastication analysis is getting less subjective; however, all methods are still highly subjective, and the quality of the assessments depends much on the experience of the health professional. In this paper, an accurate and non-invasive method based on a commercial low-cost inertial sensor (MPU6050) to measure jaw movements is proposed. The jaw-movement feature values are compared to the obtained with clinical analysis, showing no statistically significant difference between both methods. Moreover, We propose to use unsupervised paradigm approaches to cluster mastication patterns of healthy subjects and simulated patients with facial trauma. Two techniques were used in this paper to instantiate the method: Kohonen's Self-Organizing Maps and K-Means Clustering. Both algorithms have excellent performances to process jaw-movements data, showing encouraging results and potential to bring a full assessment of the masticatory function. The proposed method can be applied in real-time providing relevant dynamic information for health-care professionals.

  13. Baccalaureate nursing students' perceptions of learning and supervision in the clinical environment.

    Science.gov (United States)

    Dimitriadou, Maria; Papastavrou, Evridiki; Efstathiou, Georgios; Theodorou, Mamas

    2015-06-01

    This study is an exploration of nursing students' experiences within the clinical learning environment (CLE) and supervision provided in hospital settings. A total of 357 second-year nurse students from all universities in Cyprus participated in the study. Data were collected using the Clinical Learning Environment, Supervision and Nurse Teacher instrument. The dimension "supervisory relationship (mentor)", as well as the frequency of individualized supervision meetings, were found to be important variables in the students' clinical learning. However, no statistically-significant connection was established between successful mentor relationship and team supervision. The majority of students valued their mentor's supervision more highly than a nurse teacher's supervision toward the fulfillment of learning outcomes. The dimensions "premises of nursing care" and "premises of learning" were highly correlated, indicating that a key component of a quality clinical learning environment is the quality of care delivered. The results suggest the need to modify educational strategies that foster desirable learning for students in response to workplace demands. © 2014 Wiley Publishing Asia Pty Ltd.

  14. Seizure Classification From EEG Signals Using Transfer Learning, Semi-Supervised Learning and TSK Fuzzy System.

    Science.gov (United States)

    Jiang, Yizhang; Wu, Dongrui; Deng, Zhaohong; Qian, Pengjiang; Wang, Jun; Wang, Guanjin; Chung, Fu-Lai; Choi, Kup-Sze; Wang, Shitong

    2017-12-01

    Recognition of epileptic seizures from offline EEG signals is very important in clinical diagnosis of epilepsy. Compared with manual labeling of EEG signals by doctors, machine learning approaches can be faster and more consistent. However, the classification accuracy is usually not satisfactory for two main reasons: the distributions of the data used for training and testing may be different, and the amount of training data may not be enough. In addition, most machine learning approaches generate black-box models that are difficult to interpret. In this paper, we integrate transductive transfer learning, semi-supervised learning and TSK fuzzy system to tackle these three problems. More specifically, we use transfer learning to reduce the discrepancy in data distribution between the training and testing data, employ semi-supervised learning to use the unlabeled testing data to remedy the shortage of training data, and adopt TSK fuzzy system to increase model interpretability. Two learning algorithms are proposed to train the system. Our experimental results show that the proposed approaches can achieve better performance than many state-of-the-art seizure classification algorithms.

  15. Scaling up machine learning: parallel and distributed approaches

    National Research Council Canada - National Science Library

    Bekkerman, Ron; Bilenko, Mikhail; Langford, John

    2012-01-01

    ... presented in the book cover a range of parallelization platforms from FPGAs and GPUs to multi-core systems and commodity clusters; concurrent programming frameworks that include CUDA, MPI, MapReduce, and DryadLINQ; and various learning settings: supervised, unsupervised, semi-supervised, and online learning. Extensive coverage of parallelizat...

  16. The Costs of Supervised Classification: The Effect of Learning Task on Conceptual Flexibility

    Science.gov (United States)

    Hoffman, Aaron B.; Rehder, Bob

    2010-01-01

    Research has shown that learning a concept via standard supervised classification leads to a focus on diagnostic features, whereas learning by inferring missing features promotes the acquisition of within-category information. Accordingly, we predicted that classification learning would produce a deficit in people's ability to draw "novel…

  17. Predicting the Failure of Dental Implants Using Supervised Learning Techniques

    Directory of Open Access Journals (Sweden)

    Chia-Hui Liu

    2018-05-01

    Full Text Available Prosthodontic treatment has been a crucial part of dental treatment for patients with full mouth rehabilitation. Dental implant surgeries that replace conventional dentures using titanium fixtures have become the top choice. However, because of the wide-ranging scope of implant surgeries, patients’ body conditions, surgeons’ experience, and the choice of implant system should be considered during treatment. The higher price charged by dental implant treatments compared to conventional dentures has led to a rush among medical staff; therefore, the future impact of surgeries has not been analyzed in detail, resulting in medial disputes. Previous literature on the success factors of dental implants is mainly focused on single factors such as patients’ systemic diseases, operation methods, or prosthesis types for statistical correlation significance analysis. This study developed a prediction model for providing an early warning mechanism to reduce the chances of dental implant failure. We collected the clinical data of patients who received artificial dental implants at the case hospital for a total of 8 categories and 20 variables. Supervised learning techniques such as decision tree (DT, support vector machines, logistic regressions, and classifier ensembles (i.e., Bagging and AdaBoost were used to analyze the prediction of the failure of dental implants. The results show that DT with both Bagging and Adaboost techniques possesses the highest prediction performance for the failure of dental implant (area under the receiver operating characteristic curve, AUC: 0.741; the analysis also revealed that the implant systems affect dental implant failure. The model can help clinical surgeons to reduce medical failures by choosing the optimal implant system and prosthodontics treatments for their patients.

  18. I’m just thinking - How learning opportunities are created in doctoral supervision

    DEFF Research Database (Denmark)

    Kobayashi, Sofie; Berge, Maria; Grout, Brian William Wilson

    for learning. Earlier research into doctoral supervision has been rather vague on how doctoral students learn to carry out research. Empirically, we have based the study on four cases each with one doctoral student and their supervisors. The supervision sessions were captured on video and audio to provide...... for verbatim transcripts that were subsequently analysed. Our results illustrate how supervisors and doctoral students create learning opportunities by varying aspects of research in the discussion. Better understanding of this mechanism whereby learning opportunities are created by bringing aspects......With this paper we aim to contribute towards an understanding of learning dynamics in doctoral supervision by analysing how learning opportunities are created in the interaction. We analyse interaction between supervisors and doctoral students using the notion of experiencing variation as a key...

  19. ISO learning approximates a solution to the inverse-controller problem in an unsupervised behavioral paradigm.

    Science.gov (United States)

    Porr, Bernd; von Ferber, Christian; Wörgötter, Florentin

    2003-04-01

    In "Isotropic Sequence Order Learning" (pp. 831-864 in this issue), we introduced a novel algorithm for temporal sequence learning (ISO learning). Here, we embed this algorithm into a formal nonevaluating (teacher free) environment, which establishes a sensor-motor feedback. The system is initially guided by a fixed reflex reaction, which has the objective disadvantage that it can react only after a disturbance has occurred. ISO learning eliminates this disadvantage by replacing the reflex-loop reactions with earlier anticipatory actions. In this article, we analytically demonstrate that this process can be understood in terms of control theory, showing that the system learns the inverse controller of its own reflex. Thereby, this system is able to learn a simple form of feedforward motor control.

  20. A Cluster-then-label Semi-supervised Learning Approach for Pathology Image Classification.

    Science.gov (United States)

    Peikari, Mohammad; Salama, Sherine; Nofech-Mozes, Sharon; Martel, Anne L

    2018-05-08

    Completely labeled pathology datasets are often challenging and time-consuming to obtain. Semi-supervised learning (SSL) methods are able to learn from fewer labeled data points with the help of a large number of unlabeled data points. In this paper, we investigated the possibility of using clustering analysis to identify the underlying structure of the data space for SSL. A cluster-then-label method was proposed to identify high-density regions in the data space which were then used to help a supervised SVM in finding the decision boundary. We have compared our method with other supervised and semi-supervised state-of-the-art techniques using two different classification tasks applied to breast pathology datasets. We found that compared with other state-of-the-art supervised and semi-supervised methods, our SSL method is able to improve classification performance when a limited number of labeled data instances are made available. We also showed that it is important to examine the underlying distribution of the data space before applying SSL techniques to ensure semi-supervised learning assumptions are not violated by the data.

  1. Unsupervised deep learning applied to breast density segmentation and mammographic risk scoring

    DEFF Research Database (Denmark)

    Kallenberg, Michiel Gijsbertus J.; Petersen, Peter Kersten; Nielsen, Mads

    2016-01-01

    Mammographic risk scoring has commonly been automated by extracting a set of handcrafted features from mammograms, and relating the responses directly or indirectly to breast cancer risk. We present a method that learns a feature hierarchy from unlabeled data. When the learned features are used...... as the input to a simple classifier, two different tasks can be addressed: i) breast density segmentation, and ii) scoring of mammographic texture. The proposed model learns features at multiple scales. To control the models capacity a novel sparsity regularizer is introduced that incorporates both lifetime...... and population sparsity. We evaluated our method on three different clinical datasets. Our state-of-the-art results show that the learned breast density scores have a very strong positive relationship with manual ones, and that the learned texture scores are predictive of breast cancer. The model is easy...

  2. An unsupervised learning algorithm: application to the discrimination of seismic events and quarry blasts in the vicinity of Istanbul

    Directory of Open Access Journals (Sweden)

    H. S. Kuyuk

    2011-01-01

    Full Text Available The results of the application of an unsupervised learning (neural network approach comprising a Self Organizing Map (SOM, to distinguish micro-earthquakes from quarry blasts in the vicinity of Istanbul, Turkey, are presented and discussed. The SOM is constructed as a neural classifier and complementary reliability estimator to distinguish seismic events, and was employed for varying map sizes. Input parameters consisting of frequency and time domain data (complexity, spectral ratio, S/P wave amplitude peak ratio and origin time of events extracted from the vertical components of digital seismograms were estimated as discriminants for 179 (1.8 < Md < 3.0 local events. The results show that complexity and amplitude peak ratio parameters of the observed velocity seismogram may suffice for a reliable discrimination, while origin time and spectral ratio were found to be fuzzy and misleading classifiers for this problem. The SOM discussed here achieved a discrimination reliability that could be employed routinely in observatory practice; however, about 6% of all events were classified as ambiguous cases. This approach was developed independently for this particular classification, but it could be applied to different earthquake regions.

  3. Short and long-term effects of supervised versus unsupervised exercise training on health-related quality of life and functional outcomes following lung cancer surgery - a randomized controlled trial.

    Science.gov (United States)

    Brocki, Barbara Cristina; Andreasen, Jane; Nielsen, Lene Rodkjaer; Nekrasas, Vytautas; Gorst-Rasmussen, Anders; Westerdahl, Elisabeth

    2014-01-01

    Surgical resection enhances long-term survival after lung cancer, but survivors face functional deficits and report on poor quality of life long time after surgery. This study evaluated short and long-term effects of supervised group exercise training on health-related quality of life and physical performance in patients, who were radically operated for lung cancer. A randomized, assessor-blinded, controlled trial was performed on 78 patients undergoing lung cancer surgery. The intervention group (IG, n=41) participated in supervised out-patient exercise training sessions, one hour once a week for ten weeks. The sessions were based on aerobic exercises with target intensity of 60-80% of work capacity, resistance training and dyspnoea management. The control group (CG, n=37) received one individual instruction in exercise training. Measurements consisted of: health-related quality of life (SF36), six minute walk test (6MWT) and lung function (spirometry), assessed three weeks after surgery and after four and twelve months. Both groups were comparable at baseline on demographic characteristic and outcome values. We found a statistically significant effect after four months in the bodily pain domain of SF36, with an estimated mean difference (EMD) of 15.3 (95% CI:4 to 26.6, p=0.01) and a trend in favour of the intervention for role physical functioning (EMD 12.04, 95% CI: -1 to 25.1, p=0.07) and physical component summary (EMD 3.76, 95% CI:-0.1 to 7.6, p=0.06). At 12 months, the tendency was reversed, with the CG presenting overall slightly better measures. We found no effect of the intervention on 6MWT or lung volumes at any time-point. Supervised compared to unsupervised exercise training resulted in no improvement in health-related quality of life, except for the bodily pain domain, four months after lung cancer surgery. No effects of the intervention were found for any outcome after one year. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  4. Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.

    Science.gov (United States)

    Park, Chihyun; Ahn, Jaegyoon; Kim, Hyunjin; Park, Sanghyun

    2014-01-01

    The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.

  5. Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.

    Directory of Open Access Journals (Sweden)

    Chihyun Park

    Full Text Available BACKGROUND: The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. RESULTS: In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. CONCLUSIONS: The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.

  6. Semi-Supervised Generation with Cluster-aware Generative Models

    DEFF Research Database (Denmark)

    Maaløe, Lars; Fraccaro, Marco; Winther, Ole

    2017-01-01

    Deep generative models trained with large amounts of unlabelled data have proven to be powerful within the domain of unsupervised learning. Many real life data sets contain a small amount of labelled data points, that are typically disregarded when training generative models. We propose the Clust...... a log-likelihood of −79.38 nats on permutation invariant MNIST, while also achieving competitive semi-supervised classification accuracies. The model can also be trained fully unsupervised, and still improve the log-likelihood performance with respect to related methods.......Deep generative models trained with large amounts of unlabelled data have proven to be powerful within the domain of unsupervised learning. Many real life data sets contain a small amount of labelled data points, that are typically disregarded when training generative models. We propose the Cluster...

  7. Model-Based Learning of Local Image Features for Unsupervised Texture Segmentation

    Science.gov (United States)

    Kiechle, Martin; Storath, Martin; Weinmann, Andreas; Kleinsteuber, Martin

    2018-04-01

    Features that capture well the textural patterns of a certain class of images are crucial for the performance of texture segmentation methods. The manual selection of features or designing new ones can be a tedious task. Therefore, it is desirable to automatically adapt the features to a certain image or class of images. Typically, this requires a large set of training images with similar textures and ground truth segmentation. In this work, we propose a framework to learn features for texture segmentation when no such training data is available. The cost function for our learning process is constructed to match a commonly used segmentation model, the piecewise constant Mumford-Shah model. This means that the features are learned such that they provide an approximately piecewise constant feature image with a small jump set. Based on this idea, we develop a two-stage algorithm which first learns suitable convolutional features and then performs a segmentation. We note that the features can be learned from a small set of images, from a single image, or even from image patches. The proposed method achieves a competitive rank in the Prague texture segmentation benchmark, and it is effective for segmenting histological images.

  8. ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data.

    Science.gov (United States)

    Oluwadare, Oluwatosin; Cheng, Jianlin

    2017-11-14

    With the development of chromosomal conformation capturing techniques, particularly, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The Hi-C technique can generate genome-wide chromosomal interaction (contact) data, which can be used to investigate the higher-level organization of chromosomes, such as Topologically Associated Domains (TAD), i.e., locally packed chromosome regions bounded together by intra chromosomal contacts. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function. Here, we formulate the TAD identification problem as an unsupervised machine learning (clustering) problem, and develop a new TAD identification method called ClusterTAD. We introduce a novel method to represent chromosomal contacts as features to be used by the clustering algorithm. Our results show that ClusterTAD can accurately predict the TADs on a simulated Hi-C data. Our method is also largely complementary and consistent with existing methods on the real Hi-C datasets of two mouse cells. The validation with the chromatin immunoprecipitation (ChIP) sequencing (ChIP-Seq) data shows that the domain boundaries identified by ClusterTAD have a high enrichment of CTCF binding sites, promoter-related marks, and enhancer-related histone modifications. As ClusterTAD is based on a proven clustering approach, it opens a new avenue to apply a large array of clustering methods developed in the machine learning field to the TAD identification problem. The source code, the results, and the TADs generated for the simulated and real Hi-C datasets are available here: https://github.com/BDM-Lab/ClusterTAD .

  9. An efficient flow-based botnet detection using supervised machine learning

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2014-01-01

    Botnet detection represents one of the most crucial prerequisites of successful botnet neutralization. This paper explores how accurate and timely detection can be achieved by using supervised machine learning as the tool of inferring about malicious botnet traffic. In order to do so, the paper...... introduces a novel flow-based detection system that relies on supervised machine learning for identifying botnet network traffic. For use in the system we consider eight highly regarded machine learning algorithms, indicating the best performing one. Furthermore, the paper evaluates how much traffic needs...... to accurately and timely detect botnet traffic using purely flow-based traffic analysis and supervised machine learning. Additionally, the results show that in order to achieve accurate detection traffic flows need to be monitored for only a limited time period and number of packets per flow. This indicates...

  10. A Supervised Machine Learning Study of Online Discussion Forums about Type-2 Diabetes

    DEFF Research Database (Denmark)

    Reichert, Jonathan-Raphael; Kristensen, Klaus Langholz; Mukkamala, Raghava Rao

    2017-01-01

    supervised machine learning techniques to analyze the online conversations. In order to analyse these online textual conversations, we have chosen four domain specific models (Emotions, Sentiment, Personality Traits and Patient Journey). As part of text classification, we employed the ensemble learning...... method by using 5 different supervised machine learning algorithms to build a set of text classifiers by using the voting method to predict most probable label for a given textual conversation from the online discussion forums. Our findings show that there is a high amount of trust expressed by a subset...

  11. Doctoral learning: a case for a cohort model of supervision and support

    Directory of Open Access Journals (Sweden)

    Naydene de Lange

    2011-01-01

    Full Text Available We document the efforts of the faculty of education of a large research-oriented university in supporting doctoral learning. The development of a space for doctoral learning is in line with the need to develop a community of researchers in South Africa. We describe the historical origins of this cohort model of doctoral supervision and support, draw on literature around doctoral learning, and analyse a cohort of doctoral students' evaluation of the seminarsoverthree years. The findings indicate that the model has great value in developing scholarship and reflective practice in candidates, in providing support and supervision, and in sustaining students towards the completion of their doctorates.

  12. Australia's Supervising Teachers: Motivators and Challenges to Inform Professional Learning

    Science.gov (United States)

    Nielsen, Wendy; Mena, Juanjo; Clarke, Anthony; O'Shea, Sarah; Hoban, Garry; Collins, John

    2017-01-01

    This paper offers an overview of what motivates and challenges Australian supervising teachers to work with preservice teachers in their classrooms. In the contemporary Australian context of new National Professional Standards for Teachers, a new national curriculum and new standards for Initial Teacher Education programs, what motivates and…

  13. Postgraduate supervision at an open distance e-learning institution ...

    African Journals Online (AJOL)

    Effective postgraduate supervision is a concern at universities worldwide, even under optimal conditions where post-graduate students are studying full-time. Universities are being pressured by their governments to increase the throughput of postgraduates where there is a need for supervisory guidance in order to produce ...

  14. The efficacy of early initiated, supervised, progressive resistance training compared to unsupervised, home-based exercise after unicompartmental knee arthroplasty: a single-blinded randomized controlled trial.

    Science.gov (United States)

    Jørgensen, Peter B; Bogh, Søren B; Kierkegaard, Signe; Sørensen, Henrik; Odgaard, Anders; Søballe, Kjeld; Mechlenburg, Inger

    2017-01-01

    To examine if supervised progressive resistance training was superior to home-based exercise in rehabilitation after unicompartmental knee arthroplasty. Single blinded, randomized clinical trial. Surgery, progressive resistance training and testing was carried out at Aarhus University Hospital and home-based exercise was carried out in the home of the patient. Fifty five patients were randomized to either progressive resistance training or home-based exercise. Patients were randomized to either progressive resistance training (home based exercise five days/week and progressive resistance training two days/week) or control group (home based exercise seven days/week). Preoperative assessment, 10-week (primary endpoint) and one-year follow-up were performed for leg extension power, spatiotemporal gait parameters and knee injury and osteoarthritis outcome score (KOOS). Forty patients (73%) completed 1-year follow-up. Patients in the progressive resistance training group participated in average 11 of 16 training sessions. Leg extension power increased from baseline to 10-week follow-up in progressive resistance training group (progressive resistance training: 0.28 W/kg, P= 0.01, control group: 0.01 W/kg, P=0.93) with no between-group difference. Walking speed and KOOS scores increased from baseline to 10-week follow-up in both groups with no between-group difference (six minutes walk test P=0.63, KOOS P>0.29). Progressive resistance training two days/week combined with home based exercise five days/week was not superior to home based exercise seven days/week in improving leg extension power of the operated leg.

  15. Supervised machine learning and active learning in classification of radiology reports.

    Science.gov (United States)

    Nguyen, Dung H M; Patrick, Jon D

    2014-01-01

    This paper presents an automated system for classifying the results of imaging examinations (CT, MRI, positron emission tomography) into reportable and non-reportable cancer cases. This system is part of an industrial-strength processing pipeline built to extract content from radiology reports for use in the Victorian Cancer Registry. In addition to traditional supervised learning methods such as conditional random fields and support vector machines, active learning (AL) approaches were investigated to optimize training production and further improve classification performance. The project involved two pilot sites in Victoria, Australia (Lake Imaging (Ballarat) and Peter MacCallum Cancer Centre (Melbourne)) and, in collaboration with the NSW Central Registry, one pilot site at Westmead Hospital (Sydney). The reportability classifier performance achieved 98.25% sensitivity and 96.14% specificity on the cancer registry's held-out test set. Up to 92% of training data needed for supervised machine learning can be saved by AL. AL is a promising method for optimizing the supervised training production used in classification of radiology reports. When an AL strategy is applied during the data selection process, the cost of manual classification can be reduced significantly. The most important practical application of the reportability classifier is that it can dramatically reduce human effort in identifying relevant reports from the large imaging pool for further investigation of cancer. The classifier is built on a large real-world dataset and can achieve high performance in filtering relevant reports to support cancer registries. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  16. Multiresolutional schemata for unsupervised learning of autonomous robots for 3D space operation

    Science.gov (United States)

    Lacaze, Alberto; Meystel, Michael; Meystel, Alex

    1994-01-01

    This paper describes a novel approach to the development of a learning control system for autonomous space robot (ASR) which presents the ASR as a 'baby' -- that is, a system with no a priori knowledge of the world in which it operates, but with behavior acquisition techniques that allows it to build this knowledge from the experiences of actions within a particular environment (we will call it an Astro-baby). The learning techniques are rooted in the recursive algorithm for inductive generation of nested schemata molded from processes of early cognitive development in humans. The algorithm extracts data from the environment and by means of correlation and abduction, it creates schemata that are used for control. This system is robust enough to deal with a constantly changing environment because such changes provoke the creation of new schemata by generalizing from experiences, while still maintaining minimal computational complexity, thanks to the system's multiresolutional nature.

  17. Using Unsupervised Machine Learning for Outlier Detection in Data to Improve Wind Power Production Prediction

    OpenAIRE

    Åkerberg, Ludvig

    2017-01-01

    The expansion of wind power for electrical energy production has increased in recent years and shows no signs of slowing down. This unpredictable source of energy has contributed to destabilization of the electrical grid causing the energy market prices to vary significantly on a daily basis. For energy producers and consumers to make good investments, methods have been developed to make predictions of wind power production. These methods are often based on machine learning were historical we...

  18. Real time unsupervised learning of visual stimuli in neuromorphic VLSI systems

    Science.gov (United States)

    Giulioni, Massimiliano; Corradi, Federico; Dante, Vittorio; Del Giudice, Paolo

    2015-10-01

    Neuromorphic chips embody computational principles operating in the nervous system, into microelectronic devices. In this domain it is important to identify computational primitives that theory and experiments suggest as generic and reusable cognitive elements. One such element is provided by attractor dynamics in recurrent networks. Point attractors are equilibrium states of the dynamics (up to fluctuations), determined by the synaptic structure of the network; a ‘basin’ of attraction comprises all initial states leading to a given attractor upon relaxation, hence making attractor dynamics suitable to implement robust associative memory. The initial network state is dictated by the stimulus, and relaxation to the attractor state implements the retrieval of the corresponding memorized prototypical pattern. In a previous work we demonstrated that a neuromorphic recurrent network of spiking neurons and suitably chosen, fixed synapses supports attractor dynamics. Here we focus on learning: activating on-chip synaptic plasticity and using a theory-driven strategy for choosing network parameters, we show that autonomous learning, following repeated presentation of simple visual stimuli, shapes a synaptic connectivity supporting stimulus-selective attractors. Associative memory develops on chip as the result of the coupled stimulus-driven neural activity and ensuing synaptic dynamics, with no artificial separation between learning and retrieval phases.

  19. Data integration modeling applied to drill hole planning through semi-supervised learning: A case study from the Dalli Cu-Au porphyry deposit in the central Iran

    Science.gov (United States)

    Fatehi, Moslem; Asadi, Hooshang H.

    2017-04-01

    In this study, the application of a transductive support vector machine (TSVM), an innovative semi-supervised learning algorithm, has been proposed for mapping the potential drill targets at a detailed exploration stage. The semi-supervised learning method is a hybrid of supervised and unsupervised learning approach that simultaneously uses both training and non-training data to design a classifier. By using the TSVM algorithm, exploration layers at the Dalli porphyry Cu-Au deposit in the central Iran were integrated to locate the boundary of the Cu-Au mineralization for further drilling. By applying this algorithm on the non-training (unlabeled) and limited training (labeled) Dalli exploration data, the study area was classified in two domains of Cu-Au ore and waste. Then, the results were validated by the earlier block models created, using the available borehole and trench data. In addition to TSVM, the support vector machine (SVM) algorithm was also implemented on the study area for comparison. Thirty percent of the labeled exploration data was used to evaluate the performance of these two algorithms. The results revealed 87 percent correct recognition accuracy for the TSVM algorithm and 82 percent for the SVM algorithm. The deepest inclined borehole, recently drilled in the western part of the Dalli deposit, indicated that the boundary of Cu-Au mineralization, as identified by the TSVM algorithm, was only 15 m off from the actual boundary intersected by this borehole. According to the results of the TSVM algorithm, six new boreholes were suggested for further drilling at the Dalli deposit. This study showed that the TSVM algorithm could be a useful tool for enhancing the mineralization zones and consequently, ensuring a more accurate drill hole planning.

  20. Model of Supervision Based on Primary School Teacher Professional Competency in Tematic Learning in Curriculum 2013

    Directory of Open Access Journals (Sweden)

    Meilani Hartono

    2017-08-01

    Full Text Available This study aims to find the Supervision Model Based on Primary Teacher Professional Competence which effective on integrated learning. This study use research and development with qualitative approach which will be carried out in the Palmerah, West Jakarta. The techniques used to collect data are interviews, questionnaires, observation and documentation. Data v alidity is tested with credibility, transferability, dependability, and comfortability. The model developed will be validated using the Delphi technique. The result of this research is the discovery of the model and device-based supervision model of professional competence of primary teachers in integrated learning. The long-term goal of this research is to improve the teachers’ competence and the supervision quality for primary teachers in integrated learning

  1. Automated labelling of cancer textures in colorectal histopathology slides using quasi-supervised learning.

    Science.gov (United States)

    Onder, Devrim; Sarioglu, Sulen; Karacali, Bilge

    2013-04-01

    Quasi-supervised learning is a statistical learning algorithm that contrasts two datasets by computing estimate for the posterior probability of each sample in either dataset. This method has not been applied to histopathological images before. The purpose of this study is to evaluate the performance of the method to identify colorectal tissues with or without adenocarcinoma. Light microscopic digital images from histopathological sections were obtained from 30 colorectal radical surgery materials including adenocarcinoma and non-neoplastic regions. The texture features were extracted by using local histograms and co-occurrence matrices. The quasi-supervised learning algorithm operates on two datasets, one containing samples of normal tissues labelled only indirectly, and the other containing an unlabeled collection of samples of both normal and cancer tissues. As such, the algorithm eliminates the need for manually labelled samples of normal and cancer tissues for conventional supervised learning and significantly reduces the expert intervention. Several texture feature vector datasets corresponding to different extraction parameters were tested within the proposed framework. The Independent Component Analysis dimensionality reduction approach was also identified as the one improving the labelling performance evaluated in this series. In this series, the proposed method was applied to the dataset of 22,080 vectors with reduced dimensionality 119 from 132. Regions containing cancer tissue could be identified accurately having false and true positive rates up to 19% and 88% respectively without using manually labelled ground-truth datasets in a quasi-supervised strategy. The resulting labelling performances were compared to that of a conventional powerful supervised classifier using manually labelled ground-truth data. The supervised classifier results were calculated as 3.5% and 95% for the same case. The results in this series in comparison with the benchmark

  2. Determining effects of non-synonymous SNPs on protein-protein interactions using supervised and semi-supervised learning.

    Directory of Open Access Journals (Sweden)

    Nan Zhao

    2014-05-01

    Full Text Available Single nucleotide polymorphisms (SNPs are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs have been found near or inside the protein-protein interaction (PPI interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor. Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1 a 2-class problem (strengthening/weakening PPI mutations, (2 another 2-class problem (mutations that disrupt/preserve a PPI, and (3 a 3-class classification (detrimental/neutral/beneficial mutation effects. In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the

  3. Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning

    Science.gov (United States)

    Zhao, Nan; Han, Jing Ginger; Shyu, Chi-Ren; Korkin, Dmitry

    2014-01-01

    Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of

  4. Semi-supervised learning and domain adaptation in natural language processing

    CERN Document Server

    Søgaard, Anders

    2013-01-01

    This book introduces basic supervised learning algorithms applicable to natural language processing (NLP) and shows how the performance of these algorithms can often be improved by exploiting the marginal distribution of large amounts of unlabeled data. One reason for that is data sparsity, i.e., the limited amounts of data we have available in NLP. However, in most real-world NLP applications our labeled data is also heavily biased. This book introduces extensions of supervised learning algorithms to cope with data sparsity and different kinds of sampling bias.This book is intended to be both

  5. Scaling up spike-and-slab models for unsupervised feature learning.

    Science.gov (United States)

    Goodfellow, Ian J; Courville, Aaron; Bengio, Yoshua

    2013-08-01

    We describe the use of two spike-and-slab models for modeling real-valued data, with an emphasis on their applications to object recognition. The first model, which we call spike-and-slab sparse coding (S3C), is a preexisting model for which we introduce a faster approximate inference algorithm. We introduce a deep variant of S3C, which we call the partially directed deep Boltzmann machine (PD-DBM) and extend our S3C inference algorithm for use on this model. We describe learning procedures for each. We demonstrate that our inference procedure for S3C enables scaling the model to unprecedented large problem sizes, and demonstrate that using S3C as a feature extractor results in very good object recognition performance, particularly when the number of labeled examples is low. We show that the PD-DBM generates better samples than its shallow counterpart, and that unlike DBMs or DBNs, the PD-DBM may be trained successfully without greedy layerwise training.

  6. Generative Adversarial Networks-Based Semi-Supervised Learning for Hyperspectral Image Classification

    Directory of Open Access Journals (Sweden)

    Zhi He

    2017-10-01

    Full Text Available Classification of hyperspectral image (HSI is an important research topic in the remote sensing community. Significant efforts (e.g., deep learning have been concentrated on this task. However, it is still an open issue to classify the high-dimensional HSI with a limited number of training samples. In this paper, we propose a semi-supervised HSI classification method inspired by the generative adversarial networks (GANs. Unlike the supervised methods, the proposed HSI classification method is semi-supervised, which can make full use of the limited labeled samples as well as the sufficient unlabeled samples. Core ideas of the proposed method are twofold. First, the three-dimensional bilateral filter (3DBF is adopted to extract the spectral-spatial features by naturally treating the HSI as a volumetric dataset. The spatial information is integrated into the extracted features by 3DBF, which is propitious to the subsequent classification step. Second, GANs are trained on the spectral-spatial features for semi-supervised learning. A GAN contains two neural networks (i.e., generator and discriminator trained in opposition to one another. The semi-supervised learning is achieved by adding samples from the generator to the features and increasing the dimension of the classifier output. Experimental results obtained on three benchmark HSI datasets have confirmed the effectiveness of the proposed method , especially with a limited number of labeled samples.

  7. Active learning for semi-supervised clustering based on locally linear propagation reconstruction.

    Science.gov (United States)

    Chang, Chin-Chun; Lin, Po-Yi

    2015-03-01

    The success of semi-supervised clustering relies on the effectiveness of side information. To get effective side information, a new active learner learning pairwise constraints known as must-link and cannot-link constraints is proposed in this paper. Three novel techniques are developed for learning effective pairwise constraints. The first technique is used to identify samples less important to cluster structures. This technique makes use of a kernel version of locally linear embedding for manifold learning. Samples neither important to locally linear propagation reconstructions of other samples nor on flat patches in the learned manifold are regarded as unimportant samples. The second is a novel criterion for query selection. This criterion considers not only the importance of a sample to expanding the space coverage of the learned samples but also the expected number of queries needed to learn the sample. To facilitate semi-supervised clustering, the third technique yields inferred must-links for passing information about flat patches in the learned manifold to semi-supervised clustering algorithms. Experimental results have shown that the learned pairwise constraints can capture the underlying cluster structures and proven the feasibility of the proposed approach. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. Unsupervised Language Acquisition

    Science.gov (United States)

    de Marcken, Carl

    1996-11-01

    This thesis presents a computational theory of unsupervised language acquisition, precisely defining procedures for learning language from ordinary spoken or written utterances, with no explicit help from a teacher. The theory is based heavily on concepts borrowed from machine learning and statistical estimation. In particular, learning takes place by fitting a stochastic, generative model of language to the evidence. Much of the thesis is devoted to explaining conditions that must hold for this general learning strategy to arrive at linguistically desirable grammars. The thesis introduces a variety of technical innovations, among them a common representation for evidence and grammars, and a learning strategy that separates the ``content'' of linguistic parameters from their representation. Algorithms based on it suffer from few of the search problems that have plagued other computational approaches to language acquisition. The theory has been tested on problems of learning vocabularies and grammars from unsegmented text and continuous speech, and mappings between sound and representations of meaning. It performs extremely well on various objective criteria, acquiring knowledge that causes it to assign almost exactly the same structure to utterances as humans do. This work has application to data compression, language modeling, speech recognition, machine translation, information retrieval, and other tasks that rely on either structural or stochastic descriptions of language.

  9. Semi-Supervised Active Learning for Sound Classification in Hybrid Learning Environments

    Science.gov (United States)

    Han, Wenjing; Coutinho, Eduardo; Li, Haifeng; Schuller, Björn; Yu, Xiaojie; Zhu, Xuan

    2016-01-01

    Coping with scarcity of labeled data is a common problem in sound classification tasks. Approaches for classifying sounds are commonly based on supervised learning algorithms, which require labeled data which is often scarce and leads to models that do not generalize well. In this paper, we make an efficient combination of confidence-based Active Learning and Self-Training with the aim of minimizing the need for human annotation for sound classification model training. The proposed method pre-processes the instances that are ready for labeling by calculating their classifier confidence scores, and then delivers the candidates with lower scores to human annotators, and those with high scores are automatically labeled by the machine. We demonstrate the feasibility and efficacy of this method in two practical scenarios: pool-based and stream-based processing. Extensive experimental results indicate that our approach requires significantly less labeled instances to reach the same performance in both scenarios compared to Passive Learning, Active Learning and Self-Training. A reduction of 52.2% in human labeled instances is achieved in both of the pool-based and stream-based scenarios on a sound classification task considering 16,930 sound instances. PMID:27627768

  10. Semi-Supervised Active Learning for Sound Classification in Hybrid Learning Environments.

    Science.gov (United States)

    Han, Wenjing; Coutinho, Eduardo; Ruan, Huabin; Li, Haifeng; Schuller, Björn; Yu, Xiaojie; Zhu, Xuan

    2016-01-01

    Coping with scarcity of labeled data is a common problem in sound classification tasks. Approaches for classifying sounds are commonly based on supervised learning algorithms, which require labeled data which is often scarce and leads to models that do not generalize well. In this paper, we make an efficient combination of confidence-based Active Learning and Self-Training with the aim of minimizing the need for human annotation for sound classification model training. The proposed method pre-processes the instances that are ready for labeling by calculating their classifier confidence scores, and then delivers the candidates with lower scores to human annotators, and those with high scores are automatically labeled by the machine. We demonstrate the feasibility and efficacy of this method in two practical scenarios: pool-based and stream-based processing. Extensive experimental results indicate that our approach requires significantly less labeled instances to reach the same performance in both scenarios compared to Passive Learning, Active Learning and Self-Training. A reduction of 52.2% in human labeled instances is achieved in both of the pool-based and stream-based scenarios on a sound classification task considering 16,930 sound instances.

  11. Predicting incomplete gene microarray data with the use of supervised learning algorithms

    CSIR Research Space (South Africa)

    Twala, B

    2010-10-01

    Full Text Available that prediction using supervised learning can be improved in probabilistic terms given incomplete microarray data. This imputation approach is based on the a priori probability of each value determined from the instances at that node of a decision tree (PDT...

  12. Multiclass semi-supervised learning for animal behavior recognition from accelerometer data

    NARCIS (Netherlands)

    Tanha, J.; van Someren, M.; de Bakker, M.; Bouten, W.; Shamoun-Baranes, J.; Afsarmanesh, H.

    2012-01-01

    In this paper we present a new Multiclass semi-supervised learning algorithm that uses a base classifier in combination with a similarity function applied to all data to find a classifier that maximizes the margin and consistency over all data. A novel multiclass loss function is presented and used

  13. Undergraduate Internship Supervision in Psychology Departments: Use of Experiential Learning Best Practices

    Science.gov (United States)

    Bailey, Sarah F.; Barber, Larissa K.; Nelson, Videl L.

    2017-01-01

    This study examined trends in how psychology internships are supervised compared to current experiential learning best practices in the literature. We sent a brief online survey to relevant contact persons for colleges/universities with psychology departments throughout the United States (n = 149 responded). Overall, the majority of institutions…

  14. Using supervised machine learning to code policy issues: Can classifiers generalize across contexts?

    NARCIS (Netherlands)

    Burscher, B.; Vliegenthart, R.; de Vreese, C.H.

    2015-01-01

    Content analysis of political communication usually covers large amounts of material and makes the study of dynamics in issue salience a costly enterprise. In this article, we present a supervised machine learning approach for the automatic coding of policy issues, which we apply to news articles

  15. Automatic Classification Using Supervised Learning in a Medical Document Filtering Application.

    Science.gov (United States)

    Mostafa, J.; Lam, W.

    2000-01-01

    Presents a multilevel model of the information filtering process that permits document classification. Evaluates a document classification approach based on a supervised learning algorithm, measures the accuracy of the algorithm in a neural network that was trained to classify medical documents on cell biology, and discusses filtering…

  16. Improving Layman Readability of Clinical Narratives with Unsupervised Synonym Replacement.

    Science.gov (United States)

    Moen, Hans; Peltonen, Laura-Maria; Koivumäki, Mikko; Suhonen, Henry; Salakoski, Tapio; Ginter, Filip; Salanterä, Sanna

    2018-01-01

    We report on the development and evaluation of a prototype tool aimed to assist laymen/patients in understanding the content of clinical narratives. The tool relies largely on unsupervised machine learning applied to two large corpora of unlabeled text - a clinical corpus and a general domain corpus. A joint semantic word-space model is created for the purpose of extracting easier to understand alternatives for words considered difficult to understand by laymen. Two domain experts evaluate the tool and inter-rater agreement is calculated. When having the tool suggest ten alternatives to each difficult word, it suggests acceptable lay words for 55.51% of them. This and future manual evaluation will serve to further improve performance, where also supervised machine learning will be used.

  17. Semi-supervised eigenvectors for large-scale locally-biased learning

    DEFF Research Database (Denmark)

    Hansen, Toke Jansen; Mahoney, Michael W.

    2014-01-01

    improved scaling properties. We provide several empirical examples demonstrating how these semi-supervised eigenvectors can be used to perform locally-biased learning; and we discuss the relationship between our results and recent machine learning algorithms that use global eigenvectors of the graph......In many applications, one has side information, e.g., labels that are provided in a semi-supervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks nearby that prespecified target region. For example, one might......-based machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities, thus limiting the applicability of eigenvector-based methods in situations where one is interested in very local properties of the data. In this paper, we address this issue by providing...

  18. Source localization in an ocean waveguide using supervised machine learning.

    Science.gov (United States)

    Niu, Haiqiang; Reeves, Emma; Gerstoft, Peter

    2017-09-01

    Source localization in ocean acoustics is posed as a machine learning problem in which data-driven methods learn source ranges directly from observed acoustic data. The pressure received by a vertical linear array is preprocessed by constructing a normalized sample covariance matrix and used as the input for three machine learning methods: feed-forward neural networks (FNN), support vector machines (SVM), and random forests (RF). The range estimation problem is solved both as a classification problem and as a regression problem by these three machine learning algorithms. The results of range estimation for the Noise09 experiment are compared for FNN, SVM, RF, and conventional matched-field processing and demonstrate the potential of machine learning for underwater source localization.

  19. Semi-Supervised Multi-View Ensemble Learning Based On Extracting Cross-View Correlation

    Directory of Open Access Journals (Sweden)

    ZALL, R.

    2016-05-01

    Full Text Available Correlated information between different views incorporate useful for learning in multi view data. Canonical correlation analysis (CCA plays important role to extract these information. However, CCA only extracts the correlated information between paired data and cannot preserve correlated information between within-class samples. In this paper, we propose a two-view semi-supervised learning method called semi-supervised random correlation ensemble base on spectral clustering (SS_RCE. SS_RCE uses a multi-view method based on spectral clustering which takes advantage of discriminative information in multiple views to estimate labeling information of unlabeled samples. In order to enhance discriminative power of CCA features, we incorporate the labeling information of both unlabeled and labeled samples into CCA. Then, we use random correlation between within-class samples from cross view to extract diverse correlated features for training component classifiers. Furthermore, we extend a general model namely SSMV_RCE to construct ensemble method to tackle semi-supervised learning in the presence of multiple views. Finally, we compare the proposed methods with existing multi-view feature extraction methods using multi-view semi-supervised ensembles. Experimental results on various multi-view data sets are presented to demonstrate the effectiveness of the proposed methods.

  20. Supervised orthogonal discriminant subspace projects learning for face recognition.

    Science.gov (United States)

    Chen, Yu; Xu, Xiao-Hong

    2014-02-01

    In this paper, a new linear dimension reduction method called supervised orthogonal discriminant subspace projection (SODSP) is proposed, which addresses high-dimensionality of data and the small sample size problem. More specifically, given a set of data points in the ambient space, a novel weight matrix that describes the relationship between the data points is first built. And in order to model the manifold structure, the class information is incorporated into the weight matrix. Based on the novel weight matrix, the local scatter matrix as well as non-local scatter matrix is defined such that the neighborhood structure can be preserved. In order to enhance the recognition ability, we impose an orthogonal constraint into a graph-based maximum margin analysis, seeking to find a projection that maximizes the difference, rather than the ratio between the non-local scatter and the local scatter. In this way, SODSP naturally avoids the singularity problem. Further, we develop an efficient and stable algorithm for implementing SODSP, especially, on high-dimensional data set. Moreover, the theoretical analysis shows that LPP is a special instance of SODSP by imposing some constraints. Experiments on the ORL, Yale, Extended Yale face database B and FERET face database are performed to test and evaluate the proposed algorithm. The results demonstrate the effectiveness of SODSP. Copyright © 2013 Elsevier Ltd. All rights reserved.

  1. Contaminant source identification using semi-supervised machine learning

    International Nuclear Information System (INIS)

    Vesselinov, Velimir Valentinov; Alexandrov, Boian S.; O’Malley, Dan

    2017-01-01

    Identification of the original groundwater types present in geochemical mixtures observed in an aquifer is a challenging but very important task. Frequently, some of the groundwater types are related to different infiltration and/or contamination sources associated with various geochemical signatures and origins. The characterization of groundwater mixing processes typically requires solving complex inverse models representing groundwater flow and geochemical transport in the aquifer, where the inverse analysis accounts for available site data. Usually, the model is calibrated against the available data characterizing the spatial and temporal distribution of the observed geochemical types. Numerous different geochemical constituents and processes may need to be simulated in these models which further complicates the analyses. In this paper, we propose a new contaminant source identification approach that performs decomposition of the observation mixtures based on Non-negative Matrix Factorization (NMF) method for Blind Source Separation (BSS), coupled with a custom semi-supervised clustering algorithm. Our methodology, called NMFk, is capable of identifying (a) the unknown number of groundwater types and (b) the original geochemical concentration of the contaminant sources from measured geochemical mixtures with unknown mixing ratios without any additional site information. NMFk is tested on synthetic and real-world site data. Finally, the NMFk algorithm works with geochemical data represented in the form of concentrations, ratios (of two constituents; for example, isotope ratios), and delta notations (standard normalized stable isotope ratios).

  2. Content-Based High-Resolution Remote Sensing Image Retrieval via Unsupervised Feature Learning and Collaborative Affinity Metric Fusion

    Directory of Open Access Journals (Sweden)

    Yansheng Li

    2016-08-01

    Full Text Available With the urgent demand for automatic management of large numbers of high-resolution remote sensing images, content-based high-resolution remote sensing image retrieval (CB-HRRS-IR has attracted much research interest. Accordingly, this paper proposes a novel high-resolution remote sensing image retrieval approach via multiple feature representation and collaborative affinity metric fusion (IRMFRCAMF. In IRMFRCAMF, we design four unsupervised convolutional neural networks with different layers to generate four types of unsupervised features from the fine level to the coarse level. In addition to these four types of unsupervised features, we also implement four traditional feature descriptors, including local binary pattern (LBP, gray level co-occurrence (GLCM, maximal response 8 (MR8, and scale-invariant feature transform (SIFT. In order to fully incorporate the complementary information among multiple features of one image and the mutual information across auxiliary images in the image dataset, this paper advocates collaborative affinity metric fusion to measure the similarity between images. The performance evaluation of high-resolution remote sensing image retrieval is implemented on two public datasets, the UC Merced (UCM dataset and the Wuhan University (WH dataset. Large numbers of experiments show that our proposed IRMFRCAMF can significantly outperform the state-of-the-art approaches.

  3. Outdoor Learning: Supervision Is More than Watching Children Play

    Science.gov (United States)

    Olsen, Heather; Thompson, Donna; Hudson, Susan

    2011-01-01

    Early childhood programs strive to provide good-quality care and education as young children develop their physical, emotional, social, and intellectual skills. In order to provide children with positive, developmentally appropriate learning opportunities, educators ensure the safety and security of children, indoors and outdoors. The outdoor…

  4. Improving orbit prediction accuracy through supervised machine learning

    Science.gov (United States)

    Peng, Hao; Bai, Xiaoli

    2018-05-01

    Due to the lack of information such as the space environment condition and resident space objects' (RSOs') body characteristics, current orbit predictions that are solely grounded on physics-based models may fail to achieve required accuracy for collision avoidance and have led to satellite collisions already. This paper presents a methodology to predict RSOs' trajectories with higher accuracy than that of the current methods. Inspired by the machine learning (ML) theory through which the models are learned based on large amounts of observed data and the prediction is conducted without explicitly modeling space objects and space environment, the proposed ML approach integrates physics-based orbit prediction algorithms with a learning-based process that focuses on reducing the prediction errors. Using a simulation-based space catalog environment as the test bed, the paper demonstrates three types of generalization capability for the proposed ML approach: (1) the ML model can be used to improve the same RSO's orbit information that is not available during the learning process but shares the same time interval as the training data; (2) the ML model can be used to improve predictions of the same RSO at future epochs; and (3) the ML model based on a RSO can be applied to other RSOs that share some common features.

  5. Generating a Spanish Affective Dictionary with Supervised Learning Techniques

    Science.gov (United States)

    Bermudez-Gonzalez, Daniel; Miranda-Jiménez, Sabino; García-Moreno, Raúl-Ulises; Calderón-Nepamuceno, Dora

    2016-01-01

    Nowadays, machine learning techniques are being used in several Natural Language Processing (NLP) tasks such as Opinion Mining (OM). OM is used to analyse and determine the affective orientation of texts. Usually, OM approaches use affective dictionaries in order to conduct sentiment analysis. These lexicons are labeled manually with affective…

  6. Facilitating the learning process in design-based learning practices: an investigation of teachers' actions in supervising students

    NARCIS (Netherlands)

    Gomez Puente, S.M.; Eijck, van M.W.; Jochems, W.M.G.

    2013-01-01

    Background: In research on design-based learning (DBL), inadequate attention is paid to the role the teacher plays in supervising students in gathering and applying knowledge to design artifacts, systems, and innovative solutions in higher education. Purpose: In this study, we examine whether

  7. Emotional Literacy Support Assistants' Views on Supervision Provided by Educational Psychologists: What EPs Can Learn from Group Supervision

    Science.gov (United States)

    Osborne, Cara; Burton, Sheila

    2014-01-01

    The Educational Psychology Service in this study has responsibility for providing group supervision to Emotional Literacy Support Assistants (ELSAs) working in schools. To date, little research has examined this type of inter-professional supervision arrangement. The current study used a questionnaire to examine ELSAs' views on the supervision…

  8. Extended apprenticeship learning in doctoral training and supervision - moving beyond 'cookbook recipes'

    DEFF Research Database (Denmark)

    Tanggaard, Lene; Wegener, Charlotte

    An apprenticeship perspective on learning in academia sheds light on the potential for mutual learning and production, and also reveals the diverse range of learning resources beyond the formal novice-–expert relationship. Although apprenticeship is a well-known concept in educational research......, in this case apprenticeship offers an innovative perspective on future practice and research in academia allowing more students access to high high-quality research training and giving supervisors a chance to combine their own research with their supervision obligations....

  9. Optimizing area under the ROC curve using semi-supervised learning.

    Science.gov (United States)

    Wang, Shijun; Li, Diana; Petrick, Nicholas; Sahiner, Berkman; Linguraru, Marius George; Summers, Ronald M

    2015-01-01

    Receiver operating characteristic (ROC) analysis is a standard methodology to evaluate the performance of a binary classification system. The area under the ROC curve (AUC) is a performance metric that summarizes how well a classifier separates two classes. Traditional AUC optimization techniques are supervised learning methods that utilize only labeled data (i.e., the true class is known for all data) to train the classifiers. In this work, inspired by semi-supervised and transductive learning, we propose two new AUC optimization algorithms hereby referred to as semi-supervised learning receiver operating characteristic (SSLROC) algorithms, which utilize unlabeled test samples in classifier training to maximize AUC. Unlabeled samples are incorporated into the AUC optimization process, and their ranking relationships to labeled positive and negative training samples are considered as optimization constraints. The introduced test samples will cause the learned decision boundary in a multidimensional feature space to adapt not only to the distribution of labeled training data, but also to the distribution of unlabeled test data. We formulate the semi-supervised AUC optimization problem as a semi-definite programming problem based on the margin maximization theory. The proposed methods SSLROC1 (1-norm) and SSLROC2 (2-norm) were evaluated using 34 (determined by power analysis) randomly selected datasets from the University of California, Irvine machine learning repository. Wilcoxon signed rank tests showed that the proposed methods achieved significant improvement compared with state-of-the-art methods. The proposed methods were also applied to a CT colonography dataset for colonic polyp classification and showed promising results.

  10. Unsupervised grammar induction of clinical report sublanguage.

    Science.gov (United States)

    Kate, Rohit J

    2012-10-05

    Clinical reports are written using a subset of natural language while employing many domain-specific terms; such a language is also known as a sublanguage for a scientific or a technical domain. Different genres of clinical reports use different sublaguages, and in addition, different medical facilities use different medical language conventions. This makes supervised training of a parser for clinical sentences very difficult as it would require expensive annotation effort to adapt to every type of clinical text. In this paper, we present an unsupervised method which automatically induces a grammar and a parser for the sublanguage of a given genre of clinical reports from a corpus with no annotations. In order to capture sentence structures specific to clinical domains, the grammar is induced in terms of semantic classes of clinical terms in addition to part-of-speech tags. Our method induces grammar by minimizing the combined encoding cost of the grammar and the corresponding sentence derivations. The probabilities for the productions of the induced grammar are then learned from the unannotated corpus using an instance of the expectation-maximization algorithm. Our experiments show that the induced grammar is able to parse novel sentences. Using a dataset of discharge summary sentences with no annotations, our method obtains 60.5% F-measure for parse-bracketing on sentences of maximum length 10. By varying a parameter, the method can induce a range of grammars, from very specific to very general, and obtains the best performance in between the two extremes.

  11. Scikit-learn: Machine Learning in Python

    OpenAIRE

    Pedregosa, Fabian; Varoquaux, Gaël; Gramfort, Alexandre; Michel, Vincent; Thirion, Bertrand; Grisel, Olivier; Blondel, Mathieu; Prettenhofer, Peter; Weiss, Ron; Dubourg, Vincent; Vanderplas, Jake; Passos, Alexandre; Cournapeau, David; Brucher, Matthieu; Perrot, Matthieu

    2011-01-01

    International audience; Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic ...

  12. Scikit-learn: Machine Learning in Python

    OpenAIRE

    Pedregosa, Fabian; Varoquaux, Gaël; Gramfort, Alexandre; Michel, Vincent; Thirion, Bertrand; Grisel, Olivier; Blondel, Mathieu; Louppe, Gilles; Prettenhofer, Peter; Weiss, Ron; Dubourg, Vincent; Vanderplas, Jake; Passos, Alexandre; Cournapeau, David; Brucher, Matthieu

    2012-01-01

    Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings....

  13. Unsupervised classification of variable stars

    Science.gov (United States)

    Valenzuela, Lucas; Pichara, Karim

    2018-03-01

    During the past 10 years, a considerable amount of effort has been made to develop algorithms for automatic classification of variable stars. That has been primarily achieved by applying machine learning methods to photometric data sets where objects are represented as light curves. Classifiers require training sets to learn the underlying patterns that allow the separation among classes. Unfortunately, building training sets is an expensive process that demands a lot of human efforts. Every time data come from new surveys; the only available training instances are the ones that have a cross-match with previously labelled objects, consequently generating insufficient training sets compared with the large amounts of unlabelled sources. In this work, we present an algorithm that performs unsupervised classification of variable stars, relying only on the similarity among light curves. We tackle the unsupervised classification problem by proposing an untraditional approach. Instead of trying to match classes of stars with clusters found by a clustering algorithm, we propose a query-based method where astronomers can find groups of variable stars ranked by similarity. We also develop a fast similarity function specific for light curves, based on a novel data structure that allows scaling the search over the entire data set of unlabelled objects. Experiments show that our unsupervised model achieves high accuracy in the classification of different types of variable stars and that the proposed algorithm scales up to massive amounts of light curves.

  14. Assessing Electronic Cigarette-Related Tweets for Sentiment and Content Using Supervised Machine Learning

    OpenAIRE

    Cole-Lewis, Heather; Varghese, Arun; Sanders, Amy; Schwarz, Mary; Pugatch, Jillian; Augustson, Erik

    2015-01-01

    Background Electronic cigarettes (e-cigarettes) continue to be a growing topic among social media users, especially on Twitter. The ability to analyze conversations about e-cigarettes in real-time can provide important insight into trends in the public?s knowledge, attitudes, and beliefs surrounding e-cigarettes, and subsequently guide public health interventions. Objective Our aim was to establish a supervised machine learning algorithm to build predictive classification models that assess T...

  15. A functional supervised learning approach to the study of blood pressure data.

    Science.gov (United States)

    Papayiannis, Georgios I; Giakoumakis, Emmanuel A; Manios, Efstathios D; Moulopoulos, Spyros D; Stamatelopoulos, Kimon S; Toumanidis, Savvas T; Zakopoulos, Nikolaos A; Yannacopoulos, Athanasios N

    2018-04-15

    In this work, a functional supervised learning scheme is proposed for the classification of subjects into normotensive and hypertensive groups, using solely the 24-hour blood pressure data, relying on the concepts of Fréchet mean and Fréchet variance for appropriate deformable functional models for the blood pressure data. The schemes are trained on real clinical data, and their performance was assessed and found to be very satisfactory. Copyright © 2017 John Wiley & Sons, Ltd.

  16. Fall detection using supervised machine learning algorithms: A comparative study

    KAUST Repository

    Zerrouki, Nabil; Harrou, Fouzi; Houacine, Amrane; Sun, Ying

    2017-01-01

    Fall incidents are considered as the leading cause of disability and even mortality among older adults. To address this problem, fall detection and prevention fields receive a lot of intention over the past years and attracted many researcher efforts. We present in the current study an overall performance comparison between fall detection systems using the most popular machine learning approaches which are: Naïve Bayes, K nearest neighbor, neural network, and support vector machine. The analysis of the classification power associated to these most widely utilized algorithms is conducted on two fall detection databases namely FDD and URFD. Since the performance of the classification algorithm is inherently dependent on the features, we extracted and used the same features for all classifiers. The classification evaluation is conducted using different state of the art statistical measures such as the overall accuracy, the F-measure coefficient, and the area under ROC curve (AUC) value.

  17. Supervised learning in spiking neural networks with FORCE training.

    Science.gov (United States)

    Nicola, Wilten; Clopath, Claudia

    2017-12-20

    Populations of neurons display an extraordinary diversity in the behaviors they affect and display. Machine learning techniques have recently emerged that allow us to create networks of model neurons that display behaviors of similar complexity. Here we demonstrate the direct applicability of one such technique, the FORCE method, to spiking neural networks. We train these networks to mimic dynamical systems, classify inputs, and store discrete sequences that correspond to the notes of a song. Finally, we use FORCE training to create two biologically motivated model circuits. One is inspired by the zebra finch and successfully reproduces songbird singing. The second network is motivated by the hippocampus and is trained to store and replay a movie scene. FORCE trained networks reproduce behaviors comparable in complexity to their inspired circuits and yield information not easily obtainable with other techniques, such as behavioral responses to pharmacological manipulations and spike timing statistics.

  18. Indonesian name matching using machine learning supervised approach

    Science.gov (United States)

    Alifikri, Mohamad; Arif Bijaksana, Moch.

    2018-03-01

    Most existing name matching methods are developed for English language and so they cover the characteristics of this language. Up to this moment, there is no specific one has been designed and implemented for Indonesian names. The purpose of this thesis is to develop Indonesian name matching dataset as a contribution to academic research and to propose suitable feature set by utilizing combination of context of name strings and its permute-winkler score. Machine learning classification algorithms is taken as the method for performing name matching. Based on the experiments, by using tuned Random Forest algorithm and proposed features, there is an improvement of matching performance by approximately 1.7% and it is able to reduce until 70% misclassification result of the state of the arts methods. This improving performance makes the matching system more effective and reduces the risk of misclassified matches.

  19. Fall detection using supervised machine learning algorithms: A comparative study

    KAUST Repository

    Zerrouki, Nabil

    2017-01-05

    Fall incidents are considered as the leading cause of disability and even mortality among older adults. To address this problem, fall detection and prevention fields receive a lot of intention over the past years and attracted many researcher efforts. We present in the current study an overall performance comparison between fall detection systems using the most popular machine learning approaches which are: Naïve Bayes, K nearest neighbor, neural network, and support vector machine. The analysis of the classification power associated to these most widely utilized algorithms is conducted on two fall detection databases namely FDD and URFD. Since the performance of the classification algorithm is inherently dependent on the features, we extracted and used the same features for all classifiers. The classification evaluation is conducted using different state of the art statistical measures such as the overall accuracy, the F-measure coefficient, and the area under ROC curve (AUC) value.

  20. Out-of-Sample Generalizations for Supervised Manifold Learning for Classification.

    Science.gov (United States)

    Vural, Elif; Guillemot, Christine

    2016-03-01

    Supervised manifold learning methods for data classification map high-dimensional data samples to a lower dimensional domain in a structure-preserving way while increasing the separation between different classes. Most manifold learning methods compute the embedding only of the initially available data; however, the generalization of the embedding to novel points, i.e., the out-of-sample extension problem, becomes especially important in classification applications. In this paper, we propose a semi-supervised method for building an interpolation function that provides an out-of-sample extension for general supervised manifold learning algorithms studied in the context of classification. The proposed algorithm computes a radial basis function interpolator that minimizes an objective function consisting of the total embedding error of unlabeled test samples, defined as their distance to the embeddings of the manifolds of their own class, as well as a regularization term that controls the smoothness of the interpolation function in a direction-dependent way. The class labels of test data and the interpolation function parameters are estimated jointly with an iterative process. Experimental results on face and object images demonstrate the potential of the proposed out-of-sample extension algorithm for the classification of manifold-modeled data sets.

  1. Supervised Learning Using Spike-Timing-Dependent Plasticity of Memristive Synapses.

    Science.gov (United States)

    Nishitani, Yu; Kaneko, Yukihiro; Ueda, Michihito

    2015-12-01

    We propose a supervised learning model that enables error backpropagation for spiking neural network hardware. The method is modeled by modifying an existing model to suit the hardware implementation. An example of a network circuit for the model is also presented. In this circuit, a three-terminal ferroelectric memristor (3T-FeMEM), which is a field-effect transistor with a gate insulator composed of ferroelectric materials, is used as an electric synapse device to store the analog synaptic weight. Our model can be implemented by reflecting the network error to the write voltage of the 3T-FeMEMs and introducing a spike-timing-dependent learning function to the device. An XOR problem was successfully demonstrated as a benchmark learning by numerical simulations using the circuit properties to estimate the learning performance. In principle, the learning time per step of this supervised learning model and the circuit is independent of the number of neurons in each layer, promising a high-speed and low-power calculation in large-scale neural networks.

  2. Cerebellar supervised learning revisited: biophysical modeling and degrees-of-freedom control.

    Science.gov (United States)

    Kawato, Mitsuo; Kuroda, Shinya; Schweighofer, Nicolas

    2011-10-01

    The biophysical models of spike-timing-dependent plasticity have explored dynamics with molecular basis for such computational concepts as coincidence detection, synaptic eligibility trace, and Hebbian learning. They overall support different learning algorithms in different brain areas, especially supervised learning in the cerebellum. Because a single spine is physically very small, chemical reactions at it are essentially stochastic, and thus sensitivity-longevity dilemma exists in the synaptic memory. Here, the cascade of excitable and bistable dynamics is proposed to overcome this difficulty. All kinds of learning algorithms in different brain regions confront with difficult generalization problems. For resolution of this issue, the control of the degrees-of-freedom can be realized by changing synchronicity of neural firing. Especially, for cerebellar supervised learning, the triangle closed-loop circuit consisting of Purkinje cells, the inferior olive nucleus, and the cerebellar nucleus is proposed as a circuit to optimally control synchronous firing and degrees-of-freedom in learning. Copyright © 2011 Elsevier Ltd. All rights reserved.

  3. Poster abstract: Water level estimation in urban ultrasonic/passive infrared flash flood sensor networks using supervised learning

    KAUST Repository

    Mousa, Mustafa; Claudel, Christian G.

    2014-01-01

    floods occur very rarely, we use a supervised learning approach to estimate the correction to the ultrasonic rangefinder caused by temperature fluctuations. Preliminary data shows that water level can be estimated with an absolute error of less than 2 cm

  4. Clinical learning environment and supervision of international nursing students: A cross-sectional study.

    Science.gov (United States)

    Mikkonen, Kristina; Elo, Satu; Miettunen, Jouko; Saarikoski, Mikko; Kääriäinen, Maria

    2017-05-01

    Previously, it has been shown that the clinical learning environment causes challenges for international nursing students, but there is a lack of empirical evidence relating to the background factors explaining and influencing the outcomes. To describe international and national students' perceptions of their clinical learning environment and supervision, and explain the related background factors. An explorative cross-sectional design was used in a study conducted in eight universities of applied sciences in Finland during September 2015-May 2016. All nursing students studying English language degree programs were invited to answer a self-administered questionnaire based on both the clinical learning environment, supervision and nurse teacher scale and Cultural and Linguistic Diversity scale with additional background questions. Participants (n=329) included international (n=231) and Finnish (n=98) nursing students. Binary logistic regression was used to identify background factors relating to the clinical learning environment and supervision. International students at a beginner level in Finnish perceived the pedagogical atmosphere as worse than native speakers. In comparison to native speakers, these international students generally needed greater support from the nurse teacher at their university. Students at an intermediate level in Finnish reported two times fewer negative encounters in cultural diversity at their clinical placement than the beginners. To facilitate a successful learning experience, international nursing students require a sufficient level of competence in the native language when conducting clinical placements. Educational interventions in language education are required to test causal effects on students' success in the clinical learning environment. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Characterization of the Optical Properties of Turbid Media by Supervised Learning of Scattering Patterns.

    Science.gov (United States)

    Hassaninia, Iman; Bostanabad, Ramin; Chen, Wei; Mohseni, Hooman

    2017-11-10

    Fabricated tissue phantoms are instrumental in optical in-vitro investigations concerning cancer diagnosis, therapeutic applications, and drug efficacy tests. We present a simple non-invasive computational technique that, when coupled with experiments, has the potential for characterization of a wide range of biological tissues. The fundamental idea of our approach is to find a supervised learner that links the scattering pattern of a turbid sample to its thickness and scattering parameters. Once found, this supervised learner is employed in an inverse optimization problem for estimating the scattering parameters of a sample given its thickness and scattering pattern. Multi-response Gaussian processes are used for the supervised learning task and a simple setup is introduced to obtain the scattering pattern of a tissue sample. To increase the predictive power of the supervised learner, the scattering patterns are filtered, enriched by a regressor, and finally characterized with two parameters, namely, transmitted power and scaled Gaussian width. We computationally illustrate that our approach achieves errors of roughly 5% in predicting the scattering properties of many biological tissues. Our method has the potential to facilitate the characterization of tissues and fabrication of phantoms used for diagnostic and therapeutic purposes over a wide range of optical spectrum.

  6. Adding Learning to Knowledge-Based Systems: Taking the "Artificial" Out of AI

    Science.gov (United States)

    Daniel L. Schmoldt

    1997-01-01

    Both, knowledge-based systems (KBS) development and maintenance require time-consuming analysis of domain knowledge. Where example cases exist, KBS can be built, and later updated, by incorporating learning capabilities into their architecture. This applies to both supervised and unsupervised learning scenarios. In this paper, the important issues for learning systems-...

  7. An online supervised learning method based on gradient descent for spiking neurons.

    Science.gov (United States)

    Xu, Yan; Yang, Jing; Zhong, Shuiming

    2017-09-01

    The purpose of supervised learning with temporal encoding for spiking neurons is to make the neurons emit a specific spike train encoded by precise firing times of spikes. The gradient-descent-based (GDB) learning methods are widely used and verified in the current research. Although the existing GDB multi-spike learning (or spike sequence learning) methods have good performance, they work in an offline manner and still have some limitations. This paper proposes an online GDB spike sequence learning method for spiking neurons that is based on the online adjustment mechanism of real biological neuron synapses. The method constructs error function and calculates the adjustment of synaptic weights as soon as the neurons emit a spike during their running process. We analyze and synthesize desired and actual output spikes to select appropriate input spikes in the calculation of weight adjustment in this paper. The experimental results show that our method obviously improves learning performance compared with the offline learning manner and has certain advantage on learning accuracy compared with other learning methods. Stronger learning ability determines that the method has large pattern storage capacity. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Machine Learning for Neuroimaging with Scikit-Learn

    Directory of Open Access Journals (Sweden)

    Alexandre eAbraham

    2014-02-01

    Full Text Available Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g. multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g. resting state functional MRI or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  9. Machine learning for neuroimaging with scikit-learn.

    Science.gov (United States)

    Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

    2014-01-01

    Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.

  10. Musical Instrument Classification Based on Nonlinear Recurrence Analysis and Supervised Learning

    Directory of Open Access Journals (Sweden)

    R.Rui

    2013-04-01

    Full Text Available In this paper, the phase space reconstruction of time series produced by different instruments is discussed based on the nonlinear dynamic theory. The dense ratio, a novel quantitative recurrence parameter, is proposed to describe the difference of wind instruments, stringed instruments and keyboard instruments in the phase space by analyzing the recursive property of every instrument. Furthermore, a novel supervised learning algorithm for automatic classification of individual musical instrument signals is addressed deriving from the idea of supervised non-negative matrix factorization (NMF algorithm. In our approach, the orthogonal basis matrix could be obtained without updating the matrix iteratively, which NMF is unable to do. The experimental results indicate that the accuracy of the proposed method is improved by 3% comparing with the conventional features in the individual instrument classification.

  11. Exploiting Attribute Correlations: A Novel Trace Lasso-Based Weakly Supervised Dictionary Learning Method.

    Science.gov (United States)

    Wu, Lin; Wang, Yang; Pan, Shirui

    2017-12-01

    It is now well established that sparse representation models are working effectively for many visual recognition tasks, and have pushed forward the success of dictionary learning therein. Recent studies over dictionary learning focus on learning discriminative atoms instead of purely reconstructive ones. However, the existence of intraclass diversities (i.e., data objects within the same category but exhibit large visual dissimilarities), and interclass similarities (i.e., data objects from distinct classes but share much visual similarities), makes it challenging to learn effective recognition models. To this end, a large number of labeled data objects are required to learn models which can effectively characterize these subtle differences. However, labeled data objects are always limited to access, committing it difficult to learn a monolithic dictionary that can be discriminative enough. To address the above limitations, in this paper, we propose a weakly-supervised dictionary learning method to automatically learn a discriminative dictionary by fully exploiting visual attribute correlations rather than label priors. In particular, the intrinsic attribute correlations are deployed as a critical cue to guide the process of object categorization, and then a set of subdictionaries are jointly learned with respect to each category. The resulting dictionary is highly discriminative and leads to intraclass diversity aware sparse representations. Extensive experiments on image classification and object recognition are conducted to show the effectiveness of our approach.

  12. Arrangement and Applying of Movement Patterns in the Cerebellum Based on Semi-supervised Learning.

    Science.gov (United States)

    Solouki, Saeed; Pooyan, Mohammad

    2016-06-01

    Biological control systems have long been studied as a possible inspiration for the construction of robotic controllers. The cerebellum is known to be involved in the production and learning of smooth, coordinated movements. Therefore, highly regular structure of the cerebellum has been in the core of attention in theoretical and computational modeling. However, most of these models reflect some special features of the cerebellum without regarding the whole motor command computational process. In this paper, we try to make a logical relation between the most significant models of the cerebellum and introduce a new learning strategy to arrange the movement patterns: cerebellar modular arrangement and applying of movement patterns based on semi-supervised learning (CMAPS). We assume here the cerebellum like a big archive of patterns that has an efficient organization to classify and recall them. The main idea is to achieve an optimal use of memory locations by more than just a supervised learning and classification algorithm. Surely, more experimental and physiological researches are needed to confirm our hypothesis.

  13. New supervised learning theory applied to cerebellar modeling for suppression of variability of saccade end points.

    Science.gov (United States)

    Fujita, Masahiko

    2013-06-01

    A new supervised learning theory is proposed for a hierarchical neural network with a single hidden layer of threshold units, which can approximate any continuous transformation, and applied to a cerebellar function to suppress the end-point variability of saccades. In motor systems, feedback control can reduce noise effects if the noise is added in a pathway from a motor center to a peripheral effector; however, it cannot reduce noise effects if the noise is generated in the motor center itself: a new control scheme is necessary for such noise. The cerebellar cortex is well known as a supervised learning system, and a novel theory of cerebellar cortical function developed in this study can explain the capability of the cerebellum to feedforwardly reduce noise effects, such as end-point variability of saccades. This theory assumes that a Golgi-granule cell system can encode the strength of a mossy fiber input as the state of neuronal activity of parallel fibers. By combining these parallel fiber signals with appropriate connection weights to produce a Purkinje cell output, an arbitrary continuous input-output relationship can be obtained. By incorporating such flexible computation and learning ability in a process of saccadic gain adaptation, a new control scheme in which the cerebellar cortex feedforwardly suppresses the end-point variability when it detects a variation in saccadic commands can be devised. Computer simulation confirmed the efficiency of such learning and showed a reduction in the variability of saccadic end points, similar to results obtained from experimental data.

  14. An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets.

    Science.gov (United States)

    Stanescu, Ana; Caragea, Doina

    2015-01-01

    Recent biochemical advances have led to inexpensive, time-efficient production of massive volumes of raw genomic data. Traditional machine learning approaches to genome annotation typically rely on large amounts of labeled data. The process of labeling data can be expensive, as it requires domain knowledge and expert involvement. Semi-supervised learning approaches that can make use of unlabeled data, in addition to small amounts of labeled data, can help reduce the costs associated with labeling. In this context, we focus on the problem of predicting splice sites in a genome using semi-supervised learning approaches. This is a challenging problem, due to the highly imbalanced distribution of the data, i.e., small number of splice sites as compared to the number of non-splice sites. To address this challenge, we propose to use ensembles of semi-supervised classifiers, specifically self-training and co-training classifiers. Our experiments on five highly imbalanced splice site datasets, with positive to negative ratios of 1-to-99, showed that the ensemble-based semi-supervised approaches represent a good choice, even when the amount of labeled data consists of less than 1% of all training data. In particular, we found that ensembles of co-training and self-training classifiers that dynamically balance the set of labeled instances during the semi-supervised iterations show improvements over the corresponding supervised ensemble baselines. In the presence of limited amounts of labeled data, ensemble-based semi-supervised approaches can successfully leverage the unlabeled data to enhance supervised ensembles learned from highly imbalanced data distributions. Given that such distributions are common for many biological sequence classification problems, our work can be seen as a stepping stone towards more sophisticated ensemble-based approaches to biological sequence annotation in a semi-supervised framework.

  15. Learning Outlier Ensembles

    DEFF Research Database (Denmark)

    Micenková, Barbora; McWilliams, Brian; Assent, Ira

    into the existing unsupervised algorithms. In this paper, we show how to use powerful machine learning approaches to combine labeled examples together with arbitrary unsupervised outlier scoring algorithms. We aim to get the best out of the two worlds—supervised and unsupervised. Our approach is also a viable......Years of research in unsupervised outlier detection have produced numerous algorithms to score data according to their exceptionality. wever, the nature of outliers heavily depends on the application context and different algorithms are sensitive to outliers of different nature. This makes it very...... difficult to assess suitability of a particular algorithm without a priori knowledge. On the other hand, in many applications, some examples of outliers exist or can be obtain edin addition to the vast amount of unlabeled data. Unfortunately, this extra knowledge cannot be simply incorporated...

  16. Unsupervised Categorization in a Sample of Children with Autism Spectrum Disorders

    Science.gov (United States)

    Edwards, Darren J.; Perlman, Amotz; Reed, Phil

    2012-01-01

    Studies of supervised Categorization have demonstrated limited Categorization performance in participants with autism spectrum disorders (ASD), however little research has been conducted regarding unsupervised Categorization in this population. This study explored unsupervised Categorization using two stimulus sets that differed in their…

  17. Counterbalancing clinical supervision and independent practice: case studies in learning thoracic epidural catheter insertion.

    Science.gov (United States)

    Johnson, T

    2010-12-01

    Thoracic epidural catheter placement is an example of a demanding and high-risk clinical skill that junior anaesthetists need to learn by experience and under the supervision of consultants. This learning is known to present challenges that require further study. Ten consultant and 10 trainee anaesthetists in a teaching hospital were interviewed about teaching and learning this skill in the operating theatre, and a phenomenological analysis of their experience was performed. Trainee participation was limited by time pressure, lack of familiarity with consultants, and consultants' own need for clinical experience. There was a particular tension between safe and effective consultant practice and permitting trainees' independence. Three distinct stages of participation and assistance were identified from reports of ideal practice: early (part-task or basic procedure, consultant always present giving instruction and feedback), middle (independent practice with straightforward cases without further instruction), and late (skill extension and transfer). Learning assistance provided by consultants varied, but it was often not matched to the trainees' stages of learning. Negotiation of participation and assistance was recognized as being useful, but it did not happen routinely. There are many obstacles to trainees' participation in thoracic epidural catheter insertion, and learning assistance is not matched to need. A more explicit understanding of stages of learning is required to benefit the learning of this and other advanced clinical skills.

  18. DL-ReSuMe: A Delay Learning-Based Remote Supervised Method for Spiking Neurons.

    Science.gov (United States)

    Taherkhani, Aboozar; Belatreche, Ammar; Li, Yuhua; Maguire, Liam P

    2015-12-01

    Recent research has shown the potential capability of spiking neural networks (SNNs) to model complex information processing in the brain. There is biological evidence to prove the use of the precise timing of spikes for information coding. However, the exact learning mechanism in which the neuron is trained to fire at precise times remains an open problem. The majority of the existing learning methods for SNNs are based on weight adjustment. However, there is also biological evidence that the synaptic delay is not constant. In this paper, a learning method for spiking neurons, called delay learning remote supervised method (DL-ReSuMe), is proposed to merge the delay shift approach and ReSuMe-based weight adjustment to enhance the learning performance. DL-ReSuMe uses more biologically plausible properties, such as delay learning, and needs less weight adjustment than ReSuMe. Simulation results have shown that the proposed DL-ReSuMe approach achieves learning accuracy and learning speed improvements compared with ReSuMe.

  19. Learning phacoemulsification. Results of different teaching methods.

    Directory of Open Access Journals (Sweden)

    Hennig Albrecht

    2004-01-01

    Full Text Available We report the learning curves of three eye surgeons converting from sutureless extracapsular cataract extraction to phacoemulsification using different teaching methods. Posterior capsule rupture (PCR as a per-operative complication and visual outcome of the first 100 operations were analysed. The PCR rate was 4% and 15% in supervised and unsupervised surgery respectively. Likewise, an uncorrected visual acuity of > or = 6/18 on the first postoperative day was seen in 62 (62% of patients and in 22 (22% in supervised and unsupervised surgery respectively.

  20. Prototype-based Models for the Supervised Learning of Classification Schemes

    Science.gov (United States)

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2017-06-01

    An introduction is given to the use of prototype-based models in supervised machine learning. The main concept of the framework is to represent previously observed data in terms of so-called prototypes, which reflect typical properties of the data. Together with a suitable, discriminative distance or dissimilarity measure, prototypes can be used for the classification of complex, possibly high-dimensional data. We illustrate the framework in terms of the popular Learning Vector Quantization (LVQ). Most frequently, standard Euclidean distance is employed as a distance measure. We discuss how LVQ can be equipped with more general dissimilarites. Moreover, we introduce relevance learning as a tool for the data-driven optimization of parameterized distances.

  1. Nonlinear Semi-Supervised Metric Learning Via Multiple Kernels and Local Topology.

    Science.gov (United States)

    Li, Xin; Bai, Yanqin; Peng, Yaxin; Du, Shaoyi; Ying, Shihui

    2018-03-01

    Changing the metric on the data may change the data distribution, hence a good distance metric can promote the performance of learning algorithm. In this paper, we address the semi-supervised distance metric learning (ML) problem to obtain the best nonlinear metric for the data. First, we describe the nonlinear metric by the multiple kernel representation. By this approach, we project the data into a high dimensional space, where the data can be well represented by linear ML. Then, we reformulate the linear ML by a minimization problem on the positive definite matrix group. Finally, we develop a two-step algorithm for solving this model and design an intrinsic steepest descent algorithm to learn the positive definite metric matrix. Experimental results validate that our proposed method is effective and outperforms several state-of-the-art ML methods.

  2. A Supervised Learning Process to Validate Online Disease Reports for Use in Predictive Models.

    Science.gov (United States)

    Patching, Helena M M; Hudson, Laurence M; Cooke, Warrick; Garcia, Andres J; Hay, Simon I; Roberts, Mark; Moyes, Catherine L

    2015-12-01

    Pathogen distribution models that predict spatial variation in disease occurrence require data from a large number of geographic locations to generate disease risk maps. Traditionally, this process has used data from public health reporting systems; however, using online reports of new infections could speed up the process dramatically. Data from both public health systems and online sources must be validated before they can be used, but no mechanisms exist to validate data from online media reports. We have developed a supervised learning process to validate geolocated disease outbreak data in a timely manner. The process uses three input features, the data source and two metrics derived from the location of each disease occurrence. The location of disease occurrence provides information on the probability of disease occurrence at that location based on environmental and socioeconomic factors and the distance within or outside the current known disease extent. The process also uses validation scores, generated by disease experts who review a subset of the data, to build a training data set. The aim of the supervised learning process is to generate validation scores that can be used as weights going into the pathogen distribution model. After analyzing the three input features and testing the performance of alternative processes, we selected a cascade of ensembles comprising logistic regressors. Parameter values for the training data subset size, number of predictors, and number of layers in the cascade were tested before the process was deployed. The final configuration was tested using data for two contrasting diseases (dengue and cholera), and 66%-79% of data points were assigned a validation score. The remaining data points are scored by the experts, and the results inform the training data set for the next set of predictors, as well as going to the pathogen distribution model. The new supervised learning process has been implemented within our live site and is

  3. Identification of Village Building via Google Earth Images and Supervised Machine Learning Methods

    Directory of Open Access Journals (Sweden)

    Zhiling Guo

    2016-03-01

    Full Text Available In this study, a method based on supervised machine learning is proposed to identify village buildings from open high-resolution remote sensing images. We select Google Earth (GE RGB images to perform the classification in order to examine its suitability for village mapping, and investigate the feasibility of using machine learning methods to provide automatic classification in such fields. By analyzing the characteristics of GE images, we design different features on the basis of two kinds of supervised machine learning methods for classification: adaptive boosting (AdaBoost and convolutional neural networks (CNN. To recognize village buildings via their color and texture information, the RGB color features and a large number of Haar-like features in a local window are utilized in the AdaBoost method; with multilayer trained networks based on gradient descent algorithms and back propagation, CNN perform the identification by mining deeper information from buildings and their neighborhood. Experimental results from the testing area at Savannakhet province in Laos show that our proposed AdaBoost method achieves an overall accuracy of 96.22% and the CNN method is also competitive with an overall accuracy of 96.30%.

  4. Test-retest reliability of the Clinical Learning Environment, Supervision and Nurse Teacher (CLES + T) scale.

    Science.gov (United States)

    Gustafsson, Margareta; Blomberg, Karin; Holmefur, Marie

    2015-07-01

    The Clinical Learning Environment, Supervision and Nurse Teacher (CLES + T) scale evaluates the student nurses' perception of the learning environment and supervision within the clinical placement. It has never been tested in a replication study. The aim of the present study was to evaluate the test-retest reliability of the CLES + T scale. The CLES + T scale was administered twice to a group of 42 student nurses, with a one-week interval. Test-retest reliability was determined by calculations of Intraclass Correlation Coefficients (ICCs) and weighted Kappa coefficients. Standard Error of Measurements (SEM) and Smallest Detectable Difference (SDD) determined the precision of individual scores. Bland-Altman plots were created for analyses of systematic differences between the test occasions. The results of the study showed that the stability over time was good to excellent (ICC 0.88-0.96) in the sub-dimensions "Supervisory relationship", "Pedagogical atmosphere on the ward" and "Role of the nurse teacher". Measurements of "Premises of nursing on the ward" and "Leadership style of the manager" had lower but still acceptable stability (ICC 0.70-0.75). No systematic differences occurred between the test occasions. This study supports the usefulness of the CLES + T scale as a reliable measure of the student nurses' perception of the learning environment within the clinical placement at a hospital. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Characterization and reconstruction of 3D stochastic microstructures via supervised learning.

    Science.gov (United States)

    Bostanabad, R; Chen, W; Apley, D W

    2016-12-01

    The need for computational characterization and reconstruction of volumetric maps of stochastic microstructures for understanding the role of material structure in the processing-structure-property chain has been highlighted in the literature. Recently, a promising characterization and reconstruction approach has been developed where the essential idea is to convert the digitized microstructure image into an appropriate training dataset to learn the stochastic nature of the morphology by fitting a supervised learning model to the dataset. This compact model can subsequently be used to efficiently reconstruct as many statistically equivalent microstructure samples as desired. The goal of this paper is to build upon the developed approach in three major directions by: (1) extending the approach to characterize 3D stochastic microstructures and efficiently reconstruct 3D samples, (2) improving the performance of the approach by incorporating user-defined predictors into the supervised learning model, and (3) addressing potential computational issues by introducing a reduced model which can perform as effectively as the full model. We test the extended approach on three examples and show that the spatial dependencies, as evaluated via various measures, are well preserved in the reconstructed samples. © 2016 The Authors Journal of Microscopy © 2016 Royal Microscopical Society.

  6. Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning.

    Science.gov (United States)

    Hocking, Toby Dylan; Goerner-Potvin, Patricia; Morin, Andreanne; Shao, Xiaojian; Pastinen, Tomi; Bourque, Guillaume

    2017-02-15

    Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which algorithm and what parameters are optimal for any given dataset. In contrast, regions with and without obvious peaks can be easily labeled by visual inspection of aligned read counts in a genome browser. We propose a supervised machine learning approach for ChIP-seq data analysis, using labels that encode qualitative judgments about which genomic regions contain or do not contain peaks. The main idea is to manually label a small subset of the genome, and then learn a model that makes consistent peak predictions on the rest of the genome. We created 7 new histone mark datasets with 12 826 visually determined labels, and analyzed 3 existing transcription factor datasets. We observed that default peak detection parameters yield high false positive rates, which can be reduced by learning parameters using a relatively small training set of labeled data from the same experiment type. We also observed that labels from different people are highly consistent. Overall, these data indicate that our supervised labeling method is useful for quantitatively training and testing peak detection algorithms. Labeled histone mark data http://cbio.ensmp.fr/~thocking/chip-seq-chunk-db/ , R package to compute the label error of predicted peaks https://github.com/tdhock/PeakError. toby.hocking@mail.mcgill.ca or guil.bourque@mcgill.ca. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  7. Self-supervised learning as an enabling technology for future space exploration robots: ISS experiments on monocular distance learning

    Science.gov (United States)

    van Hecke, Kevin; de Croon, Guido C. H. E.; Hennes, Daniel; Setterfield, Timothy P.; Saenz-Otero, Alvar; Izzo, Dario

    2017-11-01

    Although machine learning holds an enormous promise for autonomous space robots, it is currently not employed because of the inherent uncertain outcome of learning processes. In this article we investigate a learning mechanism, Self-Supervised Learning (SSL), which is very reliable and hence an important candidate for real-world deployment even on safety-critical systems such as space robots. To demonstrate this reliability, we introduce a novel SSL setup that allows a stereo vision equipped robot to cope with the failure of one of its cameras. The setup learns to estimate average depth using a monocular image, by using the stereo vision depths from the past as trusted ground truth. We present preliminary results from an experiment on the International Space Station (ISS) performed with the MIT/NASA SPHERES VERTIGO satellite. The presented experiments were performed on October 8th, 2015 on board the ISS. The main goals were (1) data gathering, and (2) navigation based on stereo vision. First the astronaut Kimiya Yui moved the satellite around the Japanese Experiment Module to gather stereo vision data for learning. Subsequently, the satellite freely explored the space in the module based on its (trusted) stereo vision system and a pre-programmed exploration behavior, while simultaneously performing the self-supervised learning of monocular depth estimation on board. The two main goals were successfully achieved, representing the first online learning robotic experiments in space. These results lay the groundwork for a follow-up experiment in which the satellite will use the learned single-camera depth estimation for autonomous exploration in the ISS, and are an advancement towards future space robots that continuously improve their navigation capabilities over time, even in harsh and completely unknown space environments.

  8. Empirical Studies On Machine Learning Based Text Classification Algorithms

    OpenAIRE

    Shweta C. Dharmadhikari; Maya Ingle; Parag Kulkarni

    2011-01-01

    Automatic classification of text documents has become an important research issue now days. Properclassification of text documents requires information retrieval, machine learning and Natural languageprocessing (NLP) techniques. Our aim is to focus on important approaches to automatic textclassification based on machine learning techniques viz. supervised, unsupervised and semi supervised.In this paper we present a review of various text classification approaches under machine learningparadig...

  9. Ellipsoidal fuzzy learning for smart car platoons

    Science.gov (United States)

    Dickerson, Julie A.; Kosko, Bart

    1993-12-01

    A neural-fuzzy system combined supervised and unsupervised learning to find and tune the fuzzy-rules. An additive fuzzy system approximates a function by covering its graph with fuzzy rules. A fuzzy rule patch can take the form of an ellipsoid in the input-output space. Unsupervised competitive learning found the statistics of data clusters. The covariance matrix of each synaptic quantization vector defined on ellipsoid centered at the centroid of the data cluster. Tightly clustered data gave smaller ellipsoids or more certain rules. Sparse data gave larger ellipsoids or less certain rules. Supervised learning tuned the ellipsoids to improve the approximation. The supervised neural system used gradient descent to find the ellipsoidal fuzzy patches. It locally minimized the mean-squared error of the fuzzy approximation. Hybrid ellipsoidal learning estimated the control surface for a smart car controller.

  10. Optimal robustness of supervised learning from a noniterative point of view

    Science.gov (United States)

    Hu, Chia-Lun J.

    1995-08-01

    In most artificial neural network applications, (e.g. pattern recognition) if the dimension of the input vectors is much larger than the number of patterns to be recognized, generally, a one- layered, hard-limited perceptron is sufficient to do the recognition job. As long as the training input-output mapping set is numerically given, and as long as this given set satisfies a special linear-independency relation, the connection matrix to meet the supervised learning requirements can be solved by a noniterative, one-step, algebra method. The learning of this noniterative scheme is very fast (close to real-time learning) because the learning is one-step and noniterative. The recognition of the untrained patterns is very robust because a universal geometrical optimization process of selecting the solution can be applied to the learning process. This paper reports the theoretical foundation of this noniterative learning scheme and focuses the result at the optimal robustness analysis. A real-time character recognition scheme is then designed along this line. This character recognition scheme will be used (in a movie presentation) to demonstrate the experimental results of some theoretical parts reported in this paper.

  11. Prototype-based models in machine learning

    NARCIS (Netherlands)

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2016-01-01

    An overview is given of prototype-based models in machine learning. In this framework, observations, i.e., data, are stored in terms of typical representatives. Together with a suitable measure of similarity, the systems can be employed in the context of unsupervised and supervised analysis of

  12. WLAN Fingerprint Indoor Positioning Strategy Based on Implicit Crowdsourcing and Semi-Supervised Learning

    Directory of Open Access Journals (Sweden)

    Chunjing Song

    2017-11-01

    Full Text Available Wireless local area network (WLAN fingerprint positioning is an indoor localization technique with high accuracy and low hardware requirements. However, collecting received signal strength (RSS samples for the fingerprint database is time-consuming and labor-intensive, hindering the use of this technique. The popular crowdsourcing sampling technique has been introduced to reduce the workload of sample collection, but has two challenges: one is the heterogeneity of devices, which can significantly affect the positioning accuracy; the other is the requirement of users’ intervention in traditional crowdsourcing, which reduces the practicality of the system. In response to these challenges, we have proposed a new WLAN indoor positioning strategy, which incorporates a new preprocessing method for RSS samples, the implicit crowdsourcing sampling technique, and a semi-supervised learning algorithm. First, implicit crowdsourcing does not require users’ intervention. The acquisition program silently collects unlabeled samples, the RSS samples, without information about the position. Secondly, to cope with the heterogeneity of devices, the preprocessing method maps all the RSS values of samples to a uniform range and discretizes them. Finally, by using a large number of unlabeled samples with some labeled samples, Co-Forest, the introduced semi-supervised learning algorithm, creates and repeatedly refines a random forest ensemble classifier that performs well for location estimation. The results of experiments conducted in a real indoor environment show that the proposed strategy reduces the demand for large quantities of labeled samples and achieves good positioning accuracy.

  13. Voxel-Based Neighborhood for Spatial Shape Pattern Classification of Lidar Point Clouds with Supervised Learning

    Directory of Open Access Journals (Sweden)

    Victoria Plaza-Leiva

    2017-03-01

    Full Text Available Improving the effectiveness of spatial shape features classification from 3D lidar data is very relevant because it is largely used as a fundamental step towards higher level scene understanding challenges of autonomous vehicles and terrestrial robots. In this sense, computing neighborhood for points in dense scans becomes a costly process for both training and classification. This paper proposes a new general framework for implementing and comparing different supervised learning classifiers with a simple voxel-based neighborhood computation where points in each non-overlapping voxel in a regular grid are assigned to the same class by considering features within a support region defined by the voxel itself. The contribution provides offline training and online classification procedures as well as five alternative feature vector definitions based on principal component analysis for scatter, tubular and planar shapes. Moreover, the feasibility of this approach is evaluated by implementing a neural network (NN method previously proposed by the authors as well as three other supervised learning classifiers found in scene processing methods: support vector machines (SVM, Gaussian processes (GP, and Gaussian mixture models (GMM. A comparative performance analysis is presented using real point clouds from both natural and urban environments and two different 3D rangefinders (a tilting Hokuyo UTM-30LX and a Riegl. Classification performance metrics and processing time measurements confirm the benefits of the NN classifier and the feasibility of voxel-based neighborhood.

  14. Multi-Modal Curriculum Learning for Semi-Supervised Image Classification.

    Science.gov (United States)

    Gong, Chen; Tao, Dacheng; Maybank, Stephen J; Liu, Wei; Kang, Guoliang; Yang, Jie

    2016-07-01

    Semi-supervised image classification aims to classify a large quantity of unlabeled images by typically harnessing scarce labeled images. Existing semi-supervised methods often suffer from inadequate classification accuracy when encountering difficult yet critical images, such as outliers, because they treat all unlabeled images equally and conduct classifications in an imperfectly ordered sequence. In this paper, we employ the curriculum learning methodology by investigating the difficulty of classifying every unlabeled image. The reliability and the discriminability of these unlabeled images are particularly investigated for evaluating their difficulty. As a result, an optimized image sequence is generated during the iterative propagations, and the unlabeled images are logically classified from simple to difficult. Furthermore, since images are usually characterized by multiple visual feature descriptors, we associate each kind of features with a teacher, and design a multi-modal curriculum learning (MMCL) strategy to integrate the information from different feature modalities. In each propagation, each teacher analyzes the difficulties of the currently unlabeled images from its own modality viewpoint. A consensus is subsequently reached among all the teachers, determining the currently simplest images (i.e., a curriculum), which are to be reliably classified by the multi-modal learner. This well-organized propagation process leveraging multiple teachers and one learner enables our MMCL to outperform five state-of-the-art methods on eight popular image data sets.

  15. Semi-Supervised Tensor-Based Graph Embedding Learning and Its Application to Visual Discriminant Tracking.

    Science.gov (United States)

    Hu, Weiming; Gao, Jin; Xing, Junliang; Zhang, Chao; Maybank, Stephen

    2017-01-01

    An appearance model adaptable to changes in object appearance is critical in visual object tracking. In this paper, we treat an image patch as a two-order tensor which preserves the original image structure. We design two graphs for characterizing the intrinsic local geometrical structure of the tensor samples of the object and the background. Graph embedding is used to reduce the dimensions of the tensors while preserving the structure of the graphs. Then, a discriminant embedding space is constructed. We prove two propositions for finding the transformation matrices which are used to map the original tensor samples to the tensor-based graph embedding space. In order to encode more discriminant information in the embedding space, we propose a transfer-learning- based semi-supervised strategy to iteratively adjust the embedding space into which discriminative information obtained from earlier times is transferred. We apply the proposed semi-supervised tensor-based graph embedding learning algorithm to visual tracking. The new tracking algorithm captures an object's appearance characteristics during tracking and uses a particle filter to estimate the optimal object state. Experimental results on the CVPR 2013 benchmark dataset demonstrate the effectiveness of the proposed tracking algorithm.

  16. A semi-supervised learning approach for RNA secondary structure prediction.

    Science.gov (United States)

    Yonemoto, Haruka; Asai, Kiyoshi; Hamada, Michiaki

    2015-08-01

    RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Voxel-Based Neighborhood for Spatial Shape Pattern Classification of Lidar Point Clouds with Supervised Learning.

    Science.gov (United States)

    Plaza-Leiva, Victoria; Gomez-Ruiz, Jose Antonio; Mandow, Anthony; García-Cerezo, Alfonso

    2017-03-15

    Improving the effectiveness of spatial shape features classification from 3D lidar data is very relevant because it is largely used as a fundamental step towards higher level scene understanding challenges of autonomous vehicles and terrestrial robots. In this sense, computing neighborhood for points in dense scans becomes a costly process for both training and classification. This paper proposes a new general framework for implementing and comparing different supervised learning classifiers with a simple voxel-based neighborhood computation where points in each non-overlapping voxel in a regular grid are assigned to the same class by considering features within a support region defined by the voxel itself. The contribution provides offline training and online classification procedures as well as five alternative feature vector definitions based on principal component analysis for scatter, tubular and planar shapes. Moreover, the feasibility of this approach is evaluated by implementing a neural network (NN) method previously proposed by the authors as well as three other supervised learning classifiers found in scene processing methods: support vector machines (SVM), Gaussian processes (GP), and Gaussian mixture models (GMM). A comparative performance analysis is presented using real point clouds from both natural and urban environments and two different 3D rangefinders (a tilting Hokuyo UTM-30LX and a Riegl). Classification performance metrics and processing time measurements confirm the benefits of the NN classifier and the feasibility of voxel-based neighborhood.

  18. Supervised Variational Relevance Learning, An Analytic Geometric Feature Selection with Applications to Omic Datasets.

    Science.gov (United States)

    Boareto, Marcelo; Cesar, Jonatas; Leite, Vitor B P; Caticha, Nestor

    2015-01-01

    We introduce Supervised Variational Relevance Learning (Suvrel), a variational method to determine metric tensors to define distance based similarity in pattern classification, inspired in relevance learning. The variational method is applied to a cost function that penalizes large intraclass distances and favors small interclass distances. We find analytically the metric tensor that minimizes the cost function. Preprocessing the patterns by doing linear transformations using the metric tensor yields a dataset which can be more efficiently classified. We test our methods using publicly available datasets, for some standard classifiers. Among these datasets, two were tested by the MAQC-II project and, even without the use of further preprocessing, our results improve on their performance.

  19. The helpfulness of category labels in semi-supervised learning depends on category structure.

    Science.gov (United States)

    Vong, Wai Keen; Navarro, Daniel J; Perfors, Amy

    2016-02-01

    The study of semi-supervised category learning has generally focused on how additional unlabeled information with given labeled information might benefit category learning. The literature is also somewhat contradictory, sometimes appearing to show a benefit to unlabeled information and sometimes not. In this paper, we frame the problem differently, focusing on when labels might be helpful to a learner who has access to lots of unlabeled information. Using an unconstrained free-sorting categorization experiment, we show that labels are useful to participants only when the category structure is ambiguous and that people's responses are driven by the specific set of labels they see. We present an extension of Anderson's Rational Model of Categorization that captures this effect.

  20. Unsupervised categorization with individuals diagnosed as having moderate traumatic brain injury: Over-selective responding.

    Science.gov (United States)

    Edwards, Darren J; Wood, Rodger

    2016-01-01

    This study explored over-selectivity (executive dysfunction) using a standard unsupervised categorization task. Over-selectivity has been demonstrated using supervised categorization procedures (where training is given); however, little has been done in the way of unsupervised categorization (without training). A standard unsupervised categorization task was used to assess levels of over-selectivity in a traumatic brain injury (TBI) population. Individuals with TBI were selected from the Tertiary Traumatic Brain Injury Clinic at Swansea University and were asked to categorize two-dimensional items (pictures on cards), into groups that they felt were most intuitive, and without any learning (feedback from experimenter). This was compared against categories made by a control group for the same task. The findings of this study demonstrate that individuals with TBI had deficits for both easy and difficult categorization sets, as indicated by a larger amount of one-dimensional sorting compared to control participants. Deficits were significantly greater for the easy condition. The implications of these findings are discussed in the context of over-selectivity, and the processes that underlie this deficit. Also, the implications for using this procedure as a screening measure for over-selectivity in TBI are discussed.

  1. Unsupervised grammar induction of clinical report sublanguage

    Directory of Open Access Journals (Sweden)

    Kate Rohit J

    2012-10-01

    Full Text Available Abstract Background Clinical reports are written using a subset of natural language while employing many domain-specific terms; such a language is also known as a sublanguage for a scientific or a technical domain. Different genres of clinical reports use different sublaguages, and in addition, different medical facilities use different medical language conventions. This makes supervised training of a parser for clinical sentences very difficult as it would require expensive annotation effort to adapt to every type of clinical text. Methods In this paper, we present an unsupervised method which automatically induces a grammar and a parser for the sublanguage of a given genre of clinical reports from a corpus with no annotations. In order to capture sentence structures specific to clinical domains, the grammar is induced in terms of semantic classes of clinical terms in addition to part-of-speech tags. Our method induces grammar by minimizing the combined encoding cost of the grammar and the corresponding sentence derivations. The probabilities for the productions of the induced grammar are then learned from the unannotated corpus using an instance of the expectation-maximization algorithm. Results Our experiments show that the induced grammar is able to parse novel sentences. Using a dataset of discharge summary sentences with no annotations, our method obtains 60.5% F-measure for parse-bracketing on sentences of maximum length 10. By varying a parameter, the method can induce a range of grammars, from very specific to very general, and obtains the best performance in between the two extremes.

  2. Supervised learning of tools for content-based search of image databases

    Science.gov (United States)

    Delanoy, Richard L.

    1996-03-01

    A computer environment, called the Toolkit for Image Mining (TIM), is being developed with the goal of enabling users with diverse interests and varied computer skills to create search tools for content-based image retrieval and other pattern matching tasks. Search tools are generated using a simple paradigm of supervised learning that is based on the user pointing at mistakes of classification made by the current search tool. As mistakes are identified, a learning algorithm uses the identified mistakes to build up a model of the user's intentions, construct a new search tool, apply the search tool to a test image, display the match results as feedback to the user, and accept new inputs from the user. Search tools are constructed in the form of functional templates, which are generalized matched filters capable of knowledge- based image processing. The ability of this system to learn the user's intentions from experience contrasts with other existing approaches to content-based image retrieval that base searches on the characteristics of a single input example or on a predefined and semantically- constrained textual query. Currently, TIM is capable of learning spectral and textural patterns, but should be adaptable to the learning of shapes, as well. Possible applications of TIM include not only content-based image retrieval, but also quantitative image analysis, the generation of metadata for annotating images, data prioritization or data reduction in bandwidth-limited situations, and the construction of components for larger, more complex computer vision algorithms.

  3. Clinical learning environment and supervision: experiences of Norwegian nursing students - a questionnaire survey.

    Science.gov (United States)

    Skaalvik, Mari Wolff; Normann, Hans Ketil; Henriksen, Nils

    2011-08-01

    To measure nursing students' experiences and satisfaction with their clinical learning environments. The primary interest was to compare the results between students with respect to clinical practice in nursing homes and hospital wards. Clinical learning environments are important for the learning processes of nursing students and for preferences for future workplaces. Working with older people is the least preferred area of practice among nursing students in Norway. A cross-sectional design. A validated questionnaire was distributed to all nursing students from five non-randomly selected university colleges in Norway. A total of 511 nursing students completed a Norwegian version of the questionnaire, Clinical Learning Environment, Supervision and Nurse Teacher (CLES+T) evaluation scale in 2009. Data including descriptive statistics were analysed using the Statistical Program for the Social Sciences. Factor structure was analysed by principal component analysis. Differences across sub-groups were tested with chi-square tests and Mann-Whitney U test for categorical variables and t-tests for continuous variables. Ordinal logistic regression analysis of perceptions of the ward as a good learning environment was performed with supervisory relationships and institutional contexts as independent variables, controlling for age, sex and study year. The participating nursing students with clinical placements in nursing homes assessed their clinical learning environment significantly more negatively than those with hospital placements on nearby all sub-dimensions. The evidence found in this study indicates that measures should be taken to strengthen nursing homes as learning environments for nursing students. To recruit more graduated nurses to work in nursing homes, actions to improve the learning environment are needed. © 2011 Blackwell Publishing Ltd.

  4. Literature mining of protein-residue associations with graph rules learned through distant supervision

    Directory of Open Access Journals (Sweden)

    Ravikumar KE

    2012-10-01

    Full Text Available Abstract Background We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. Results The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. Conclusions The primary contributions of this work are to (1 demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2 show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature.

  5. Literature mining of protein-residue associations with graph rules learned through distant supervision.

    Science.gov (United States)

    Ravikumar, Ke; Liu, Haibin; Cohn, Judith D; Wall, Michael E; Verspoor, Karin

    2012-10-05

    We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. The primary contributions of this work are to (1) demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2) show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature.

  6. Supervised learning classification models for prediction of plant virus encoded RNA silencing suppressors.

    Directory of Open Access Journals (Sweden)

    Zeenia Jagga

    Full Text Available Viral encoded RNA silencing suppressor proteins interfere with the host RNA silencing machinery, facilitating viral infection by evading host immunity. In plant hosts, the viral proteins have several basic science implications and biotechnology applications. However in silico identification of these proteins is limited by their high sequence diversity. In this study we developed supervised learning based classification models for plant viral RNA silencing suppressor proteins in plant viruses. We developed four classifiers based on supervised learning algorithms: J48, Random Forest, LibSVM and Naïve Bayes algorithms, with enriched model learning by correlation based feature selection. Structural and physicochemical features calculated for experimentally verified primary protein sequences were used to train the classifiers. The training features include amino acid composition; auto correlation coefficients; composition, transition, and distribution of various physicochemical properties; and pseudo amino acid composition. Performance analysis of predictive models based on 10 fold cross-validation and independent data testing revealed that the Random Forest based model was the best and achieved 86.11% overall accuracy and 86.22% balanced accuracy with a remarkably high area under the Receivers Operating Characteristic curve of 0.95 to predict viral RNA silencing suppressor proteins. The prediction models for plant viral RNA silencing suppressors can potentially aid identification of novel viral RNA silencing suppressors, which will provide valuable insights into the mechanism of RNA silencing and could be further explored as potential targets for designing novel antiviral therapeutics. Also, the key subset of identified optimal features may help in determining compositional patterns in the viral proteins which are important determinants for RNA silencing suppressor activities. The best prediction model developed in the study is available as a

  7. Unsupervised Image Segmentation

    Czech Academy of Sciences Publication Activity Database

    Haindl, Michal; Mikeš, Stanislav

    2014-01-01

    Roč. 36, č. 4 (2014), s. 23-23 R&D Projects: GA ČR(CZ) GA14-10911S Institutional support: RVO:67985556 Keywords : unsupervised image segmentation Subject RIV: BD - Theory of Information http://library.utia.cas.cz/separaty/2014/RO/haindl-0434412.pdf

  8. Student nurses' experiences of the clinical learning environment in relation to the organization of supervision: a questionnaire survey.

    Science.gov (United States)

    Sundler, Annelie J; Björk, Maria; Bisholt, Birgitta; Ohlsson, Ulla; Engström, Agneta Kullén; Gustafsson, Margareta

    2014-04-01

    The aim was to investigate student nurses' experiences of the clinical learning environment in relation to how the supervision was organized. The clinical environment plays an essential part in student nurses' learning. Even though different models for supervision have been previously set forth, it has been stressed that there is a need both of further empirical studies on the role of preceptorship in undergraduate nursing education and of studies comparing different models. A cross-sectional study with comparative design was carried out with a mixed method approach. Data were collected from student nurses in the final term of the nursing programme at three universities in Sweden by means of a questionnaire. In general the students had positive experiences of the clinical learning environment with respect to pedagogical atmosphere, leadership style of the ward manager, premises of nursing, supervisory relationship, and role of the nurse preceptor and nurse teacher. However, there were significant differences in their ratings of the supervisory relationship (ppedagogical atmosphere (p 0.025) depending on how the supervision was organized. Students who had the same preceptor all the time were more satisfied with the supervisory relationship than were those who had different preceptors each day. Students' comments on the supervision confirmed the significance of the preceptor and the supervisory relationship. The organization of the supervision was of significance with regard to the pedagogical atmosphere and the students' relation to preceptors. Students with the same preceptor throughout were more positive concerning the supervisory relationship and the pedagogical atmosphere. © 2013.

  9. Sampling algorithms for validation of supervised learning models for Ising-like systems

    Science.gov (United States)

    Portman, Nataliya; Tamblyn, Isaac

    2017-12-01

    In this paper, we build and explore supervised learning models of ferromagnetic system behavior, using Monte-Carlo sampling of the spin configuration space generated by the 2D Ising model. Given the enormous size of the space of all possible Ising model realizations, the question arises as to how to choose a reasonable number of samples that will form physically meaningful and non-intersecting training and testing datasets. Here, we propose a sampling technique called ;ID-MH; that uses the Metropolis-Hastings algorithm creating Markov process across energy levels within the predefined configuration subspace. We show that application of this method retains phase transitions in both training and testing datasets and serves the purpose of validation of a machine learning algorithm. For larger lattice dimensions, ID-MH is not feasible as it requires knowledge of the complete configuration space. As such, we develop a new ;block-ID; sampling strategy: it decomposes the given structure into square blocks with lattice dimension N ≤ 5 and uses ID-MH sampling of candidate blocks. Further comparison of the performance of commonly used machine learning methods such as random forests, decision trees, k nearest neighbors and artificial neural networks shows that the PCA-based Decision Tree regressor is the most accurate predictor of magnetizations of the Ising model. For energies, however, the accuracy of prediction is not satisfactory, highlighting the need to consider more algorithmically complex methods (e.g., deep learning).

  10. Supervised neural network modeling: an empirical investigation into learning from imbalanced data with labeling errors.

    Science.gov (United States)

    Khoshgoftaar, Taghi M; Van Hulse, Jason; Napolitano, Amri

    2010-05-01

    Neural network algorithms such as multilayer perceptrons (MLPs) and radial basis function networks (RBFNets) have been used to construct learners which exhibit strong predictive performance. Two data related issues that can have a detrimental impact on supervised learning initiatives are class imbalance and labeling errors (or class noise). Imbalanced data can make it more difficult for the neural network learning algorithms to distinguish between examples of the various classes, and class noise can lead to the formulation of incorrect hypotheses. Both class imbalance and labeling errors are pervasive problems encountered in a wide variety of application domains. Many studies have been performed to investigate these problems in isolation, but few have focused on their combined effects. This study presents a comprehensive empirical investigation using neural network algorithms to learn from imbalanced data with labeling errors. In particular, the first component of our study investigates the impact of class noise and class imbalance on two common neural network learning algorithms, while the second component considers the ability of data sampling (which is commonly used to address the issue of class imbalance) to improve their performances. Our results, for which over two million models were trained and evaluated, show that conclusions drawn using the more commonly studied C4.5 classifier may not apply when using neural networks.

  11. Using distant supervised learning to identify protein subcellular localizations from full-text scientific articles.

    Science.gov (United States)

    Zheng, Wu; Blake, Catherine

    2015-10-01

    Databases of curated biomedical knowledge, such as the protein-locations reflected in the UniProtKB database, provide an accurate and useful resource to researchers and decision makers. Our goal is to augment the manual efforts currently used to curate knowledge bases with automated approaches that leverage the increased availability of full-text scientific articles. This paper describes experiments that use distant supervised learning to identify protein subcellular localizations, which are important to understand protein function and to identify candidate drug targets. Experiments consider Swiss-Prot, the manually annotated subset of the UniProtKB protein knowledge base, and 43,000 full-text articles from the Journal of Biological Chemistry that contain just under 11.5 million sentences. The system achieves 0.81 precision and 0.49 recall at sentence level and an accuracy of 57% on held-out instances in a test set. Moreover, the approach identifies 8210 instances that are not in the UniProtKB knowledge base. Manual inspection of the 50 most likely relations showed that 41 (82%) were valid. These results have immediate benefit to researchers interested in protein function, and suggest that distant supervision should be explored to complement other manual data curation efforts. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. Porosity estimation by semi-supervised learning with sparsely available labeled samples

    Science.gov (United States)

    Lima, Luiz Alberto; Görnitz, Nico; Varella, Luiz Eduardo; Vellasco, Marley; Müller, Klaus-Robert; Nakajima, Shinichi

    2017-09-01

    This paper addresses the porosity estimation problem from seismic impedance volumes and porosity samples located in a small group of exploratory wells. Regression methods, trained on the impedance as inputs and the porosity as output labels, generally suffer from extremely expensive (and hence sparsely available) porosity samples. To optimally make use of the valuable porosity data, a semi-supervised machine learning method was proposed, Transductive Conditional Random Field Regression (TCRFR), showing good performance (Görnitz et al., 2017). TCRFR, however, still requires more labeled data than those usually available, which creates a gap when applying the method to the porosity estimation problem in realistic situations. In this paper, we aim to fill this gap by introducing two graph-based preprocessing techniques, which adapt the original TCRFR for extremely weakly supervised scenarios. Our new method outperforms the previous automatic estimation methods on synthetic data and provides a comparable result to the manual labored, time-consuming geostatistics approach on real data, proving its potential as a practical industrial tool.

  13. A Novel Semi-Supervised Electronic Nose Learning Technique: M-Training

    Directory of Open Access Journals (Sweden)

    Pengfei Jia

    2016-03-01

    Full Text Available When an electronic nose (E-nose is used to distinguish different kinds of gases, the label information of the target gas could be lost due to some fault of the operators or some other reason, although this is not expected. Another fact is that the cost of getting the labeled samples is usually higher than for unlabeled ones. In most cases, the classification accuracy of an E-nose trained using labeled samples is higher than that of the E-nose trained by unlabeled ones, so gases without label information should not be used to train an E-nose, however, this wastes resources and can even delay the progress of research. In this work a novel multi-class semi-supervised learning technique called M-training is proposed to train E-noses with both labeled and unlabeled samples. We employ M-training to train the E-nose which is used to distinguish three indoor pollutant gases (benzene, toluene and formaldehyde. Data processing results prove that the classification accuracy of E-nose trained by semi-supervised techniques (tri-training and M-training is higher than that of an E-nose trained only with labeled samples, and the performance of M-training is better than that of tri-training because more base classifiers can be employed by M-training.

  14. Assessing Miniaturized Sensor Performance using Supervised Learning, with Application to Drug and Explosive Detection

    DEFF Research Database (Denmark)

    Alstrøm, Tommy Sonne

    of sensors, as the sensors are designed to provide robust and reliable measurements. That means, the sensors are designed to have repeated measurement clusters. Sensor fusion is presented for the sensor based on chemoselective compounds. An array of color changing compounds are handled and in unity they make......This Ph.D. thesis titled “Assessing Miniaturized Sensor Performance using Supervised Learning, with Application to Drug and Explosive Detection” is a part of the strategic research project “Miniaturized sensors for explosives detection in air” funded by the Danish Agency for Science and Technology...... emanated by explosives and drugs, similar to an electronic nose. To evaluate sensor responses a data processing and evaluation pipeline is required. The work presented herein focuses on the feature extraction, feature representation and sensor accuracy. Thus the primary aim of this thesis is twofold...

  15. A Cross-Correlated Delay Shift Supervised Learning Method for Spiking Neurons with Application to Interictal Spike Detection in Epilepsy.

    Science.gov (United States)

    Guo, Lilin; Wang, Zhenzhong; Cabrerizo, Mercedes; Adjouadi, Malek

    2017-05-01

    This study introduces a novel learning algorithm for spiking neurons, called CCDS, which is able to learn and reproduce arbitrary spike patterns in a supervised fashion allowing the processing of spatiotemporal information encoded in the precise timing of spikes. Unlike the Remote Supervised Method (ReSuMe), synapse delays and axonal delays in CCDS are variants which are modulated together with weights during learning. The CCDS rule is both biologically plausible and computationally efficient. The properties of this learning rule are investigated extensively through experimental evaluations in terms of reliability, adaptive learning performance, generality to different neuron models, learning in the presence of noise, effects of its learning parameters and classification performance. Results presented show that the CCDS learning method achieves learning accuracy and learning speed comparable with ReSuMe, but improves classification accuracy when compared to both the Spike Pattern Association Neuron (SPAN) learning rule and the Tempotron learning rule. The merit of CCDS rule is further validated on a practical example involving the automated detection of interictal spikes in EEG records of patients with epilepsy. Results again show that with proper encoding, the CCDS rule achieves good recognition performance.

  16. Response monitoring using quantitative ultrasound methods and supervised dictionary learning in locally advanced breast cancer

    Science.gov (United States)

    Gangeh, Mehrdad J.; Fung, Brandon; Tadayyon, Hadi; Tran, William T.; Czarnota, Gregory J.

    2016-03-01

    A non-invasive computer-aided-theragnosis (CAT) system was developed for the early assessment of responses to neoadjuvant chemotherapy in patients with locally advanced breast cancer. The CAT system was based on quantitative ultrasound spectroscopy methods comprising several modules including feature extraction, a metric to measure the dissimilarity between "pre-" and "mid-treatment" scans, and a supervised learning algorithm for the classification of patients to responders/non-responders. One major requirement for the successful design of a high-performance CAT system is to accurately measure the changes in parametric maps before treatment onset and during the course of treatment. To this end, a unified framework based on Hilbert-Schmidt independence criterion (HSIC) was used for the design of feature extraction from parametric maps and the dissimilarity measure between the "pre-" and "mid-treatment" scans. For the feature extraction, HSIC was used to design a supervised dictionary learning (SDL) method by maximizing the dependency between the scans taken from "pre-" and "mid-treatment" with "dummy labels" given to the scans. For the dissimilarity measure, an HSIC-based metric was employed to effectively measure the changes in parametric maps as an indication of treatment effectiveness. The HSIC-based feature extraction and dissimilarity measure used a kernel function to nonlinearly transform input vectors into a higher dimensional feature space and computed the population means in the new space, where enhanced group separability was ideally obtained. The results of the classification using the developed CAT system indicated an improvement of performance compared to a CAT system with basic features using histogram of intensity.

  17. A semi-supervised learning framework for biomedical event extraction based on hidden topics.

    Science.gov (United States)

    Zhou, Deyu; Zhong, Dayou

    2015-05-01

    Scientists have devoted decades of efforts to understanding the interaction between proteins or RNA production. The information might empower the current knowledge on drug reactions or the development of certain diseases. Nevertheless, due to the lack of explicit structure, literature in life science, one of the most important sources of this information, prevents computer-based systems from accessing. Therefore, biomedical event extraction, automatically acquiring knowledge of molecular events in research articles, has attracted community-wide efforts recently. Most approaches are based on statistical models, requiring large-scale annotated corpora to precisely estimate models' parameters. However, it is usually difficult to obtain in practice. Therefore, employing un-annotated data based on semi-supervised learning for biomedical event extraction is a feasible solution and attracts more interests. In this paper, a semi-supervised learning framework based on hidden topics for biomedical event extraction is presented. In this framework, sentences in the un-annotated corpus are elaborately and automatically assigned with event annotations based on their distances to these sentences in the annotated corpus. More specifically, not only the structures of the sentences, but also the hidden topics embedded in the sentences are used for describing the distance. The sentences and newly assigned event annotations, together with the annotated corpus, are employed for training. Experiments were conducted on the multi-level event extraction corpus, a golden standard corpus. Experimental results show that more than 2.2% improvement on F-score on biomedical event extraction is achieved by the proposed framework when compared to the state-of-the-art approach. The results suggest that by incorporating un-annotated data, the proposed framework indeed improves the performance of the state-of-the-art event extraction system and the similarity between sentences might be precisely

  18. Uncertainty quantification and experimental design based on unsupervised machine learning identification of contaminant sources and groundwater types using hydrogeochemical data

    Science.gov (United States)

    Vesselinov, V. V.

    2017-12-01

    Identification of the original groundwater types present in geochemical mixtures observed in an aquifer is a challenging but very important task. Frequently, some of the groundwater types are related to different infiltration and/or contamination sources associated with various geochemical signatures and origins. The characterization of groundwater mixing processes typically requires solving complex inverse models representing groundwater flow and geochemical transport in the aquifer, where the inverse analysis accounts for available site data. Usually, the model is calibrated against the available data characterizing the spatial and temporal distribution of the observed geochemical species. Numerous geochemical constituents and processes may need to be simulated in these models which further complicates the analyses. As a result, these types of model analyses are typically extremely challenging. Here, we demonstrate a new contaminant source identification approach that performs decomposition of the observation mixtures based on Nonnegative Matrix Factorization (NMF) method for Blind Source Separation (BSS), coupled with a custom semi-supervised clustering algorithm. Our methodology, called NMFk, is capable of identifying (a) the number of groundwater types and (b) the original geochemical concentration of the contaminant sources from measured geochemical mixtures with unknown mixing ratios without any additional site information. We also demonstrate how NMFk can be extended to perform uncertainty quantification and experimental design related to real-world site characterization. The NMFk algorithm works with geochemical data represented in the form of concentrations, ratios (of two constituents; for example, isotope ratios), and delta notations (standard normalized stable isotope ratios). The NMFk algorithm has been extensively tested on synthetic datasets; NMFk analyses have been actively performed on real-world data collected at the Los Alamos National

  19. Just How Much Can School Pupils Learn from School Gardening? A Study of Two Supervised Agricultural Experience Approaches in Uganda

    Science.gov (United States)

    Okiror, John James; Matsiko, Biryabaho Frank; Oonyu, Joseph

    2011-01-01

    School systems in Africa are short of skills that link well with rural communities, yet arguments to vocationalize curricula remain mixed and school agriculture lacks the supervised practical component. This study, conducted in eight primary (elementary) schools in Uganda, sought to compare the learning achievement of pupils taught using…

  20. Spiking neural networks for handwritten digit recognition-Supervised learning and network optimization.

    Science.gov (United States)

    Kulkarni, Shruti R; Rajendran, Bipin

    2018-07-01

    We demonstrate supervised learning in Spiking Neural Networks (SNNs) for the problem of handwritten digit recognition using the spike triggered Normalized Approximate Descent (NormAD) algorithm. Our network that employs neurons operating at sparse biological spike rates below 300Hz achieves a classification accuracy of 98.17% on the MNIST test database with four times fewer parameters compared to the state-of-the-art. We present several insights from extensive numerical experiments regarding optimization of learning parameters and network configuration to improve its accuracy. We also describe a number of strategies to optimize the SNN for implementation in memory and energy constrained hardware, including approximations in computing the neuronal dynamics and reduced precision in storing the synaptic weights. Experiments reveal that even with 3-bit synaptic weights, the classification accuracy of the designed SNN does not degrade beyond 1% as compared to the floating-point baseline. Further, the proposed SNN, which is trained based on the precise spike timing information outperforms an equivalent non-spiking artificial neural network (ANN) trained using back propagation, especially at low bit precision. Thus, our study shows the potential for realizing efficient neuromorphic systems that use spike based information encoding and learning for real-world applications. Copyright © 2018 Elsevier Ltd. All rights reserved.

  1. IoT Security Techniques Based on Machine Learning

    OpenAIRE

    Xiao, Liang; Wan, Xiaoyue; Lu, Xiaozhen; Zhang, Yanyong; Wu, Di

    2018-01-01

    Internet of things (IoT) that integrate a variety of devices into networks to provide advanced and intelligent services have to protect user privacy and address attacks such as spoofing attacks, denial of service attacks, jamming and eavesdropping. In this article, we investigate the attack model for IoT systems, and review the IoT security solutions based on machine learning techniques including supervised learning, unsupervised learning and reinforcement learning. We focus on the machine le...

  2. Accurate in silico identification of protein succinylation sites using an iterative semi-supervised learning technique.

    Science.gov (United States)

    Zhao, Xiaowei; Ning, Qiao; Chai, Haiting; Ma, Zhiqiang

    2015-06-07

    As a widespread type of protein post-translational modifications (PTMs), succinylation plays an important role in regulating protein conformation, function and physicochemical properties. Compared with the labor-intensive and time-consuming experimental approaches, computational predictions of succinylation sites are much desirable due to their convenient and fast speed. Currently, numerous computational models have been developed to identify PTMs sites through various types of two-class machine learning algorithms. These methods require both positive and negative samples for training. However, designation of the negative samples of PTMs was difficult and if it is not properly done can affect the performance of computational models dramatically. So that in this work, we implemented the first application of positive samples only learning (PSoL) algorithm to succinylation sites prediction problem, which was a special class of semi-supervised machine learning that used positive samples and unlabeled samples to train the model. Meanwhile, we proposed a novel succinylation sites computational predictor called SucPred (succinylation site predictor) by using multiple feature encoding schemes. Promising results were obtained by the SucPred predictor with an accuracy of 88.65% using 5-fold cross validation on the training dataset and an accuracy of 84.40% on the independent testing dataset, which demonstrated that the positive samples only learning algorithm presented here was particularly useful for identification of protein succinylation sites. Besides, the positive samples only learning algorithm can be applied to build predictors for other types of PTMs sites with ease. A web server for predicting succinylation sites was developed and was freely accessible at http://59.73.198.144:8088/SucPred/. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. An Unsupervised Approach to Activity Recognition and Segmentation based on Object-Use Fingerprints

    DEFF Research Database (Denmark)

    Gu, Tao; Chen, Shaxun; Tao, Xianping

    2010-01-01

    Human activity recognition is an important task which has many potential applications. In recent years, researchers from pervasive computing are interested in deploying on-body sensors to collect observations and applying machine learning techniques to model and recognize activities. Supervised...... machine learning techniques typically require an appropriate training process in which training data need to be labeled manually. In this paper, we propose an unsupervised approach based on object-use fingerprints to recognize activities without human labeling. We show how to build our activity models...... a trace and detect the boundary of any two adjacent activities. We develop a wearable RFID system and conduct a real-world trace collection done by seven volunteers in a smart home over a period of 2 weeks. We conduct comprehensive experimental evaluations and comparison study. The results show that our...

  4. A semi-supervised classification algorithm using the TAD-derived background as training data

    Science.gov (United States)

    Fan, Lei; Ambeau, Brittany; Messinger, David W.

    2013-05-01

    In general, spectral image classification algorithms fall into one of two categories: supervised and unsupervised. In unsupervised approaches, the algorithm automatically identifies clusters in the data without a priori information about those clusters (except perhaps the expected number of them). Supervised approaches require an analyst to identify training data to learn the characteristics of the clusters such that they can then classify all other pixels into one of the pre-defined groups. The classification algorithm presented here is a semi-supervised approach based on the Topological Anomaly Detection (TAD) algorithm. The TAD algorithm defines background components based on a mutual k-Nearest Neighbor graph model of the data, along with a spectral connected components analysis. Here, the largest components produced by TAD are used as regions of interest (ROI's),or training data for a supervised classification scheme. By combining those ROI's with a Gaussian Maximum Likelihood (GML) or a Minimum Distance to the Mean (MDM) algorithm, we are able to achieve a semi supervised classification method. We test this classification algorithm against data collected by the HyMAP sensor over the Cooke City, MT area and University of Pavia scene.

  5. Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.

    Science.gov (United States)

    Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W

    2018-05-31

    In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.

  6. Exploiting the potential of unlabeled endoscopic video data with self-supervised learning.

    Science.gov (United States)

    Ross, Tobias; Zimmerer, David; Vemuri, Anant; Isensee, Fabian; Wiesenfarth, Manuel; Bodenstedt, Sebastian; Both, Fabian; Kessler, Philip; Wagner, Martin; Müller, Beat; Kenngott, Hannes; Speidel, Stefanie; Kopp-Schneider, Annette; Maier-Hein, Klaus; Maier-Hein, Lena

    2018-04-27

    Surgical data science is a new research field that aims to observe all aspects of the patient treatment process in order to provide the right assistance at the right time. Due to the breakthrough successes of deep learning-based solutions for automatic image annotation, the availability of reference annotations for algorithm training is becoming a major bottleneck in the field. The purpose of this paper was to investigate the concept of self-supervised learning to address this issue. Our approach is guided by the hypothesis that unlabeled video data can be used to learn a representation of the target domain that boosts the performance of state-of-the-art machine learning algorithms when used for pre-training. Core of the method is an auxiliary task based on raw endoscopic video data of the target domain that is used to initialize the convolutional neural network (CNN) for the target task. In this paper, we propose the re-colorization of medical images with a conditional generative adversarial network (cGAN)-based architecture as auxiliary task. A variant of the method involves a second pre-training step based on labeled data for the target task from a related domain. We validate both variants using medical instrument segmentation as target task. The proposed approach can be used to radically reduce the manual annotation effort involved in training CNNs. Compared to the baseline approach of generating annotated data from scratch, our method decreases exploratively the number of labeled images by up to 75% without sacrificing performance. Our method also outperforms alternative methods for CNN pre-training, such as pre-training on publicly available non-medical (COCO) or medical data (MICCAI EndoVis2017 challenge) using the target task (in this instance: segmentation). As it makes efficient use of available (non-)public and (un-)labeled data, the approach has the potential to become a valuable tool for CNN (pre-)training.

  7. Whither Supervision?

    Directory of Open Access Journals (Sweden)

    Duncan Waite

    2006-11-01

    Full Text Available This paper inquires if the school supervision is in decadence. Dr. Waite responds that the answer will depend on which perspective you look at it. Dr. Waite suggests taking in consideration three elements that are related: the field itself, the expert in the field (the professor, the theorist, the student and the administrator, and the context. When these three elements are revised, it emphasizes that there is not a consensus about the field of supervision, but there are coincidences related to its importance and that it is related to the improvement of the practice of the students in the school for their benefit. Dr. Waite suggests that the practice on this field is not always in harmony with what the theorists affirm. When referring to the supervisor or the skilled person, the author indicates that his or her perspective depends on his or her epistemological believes or in the way he or she conceives the learning; that is why supervision can be understood in different ways. About the context, Waite suggests that there have to be taken in consideration the social or external forces that influent the people and the society, because through them the education is affected. Dr. Waite concludes that the way to understand the supervision depends on the performer’s perspective. He responds to the initial question saying that the supervision authorities, the knowledge on this field, the performers, and its practice, are maybe spread but not extinct because the supervision will always be part of the great enterprise that we called education.

  8. Prediction of lung cancer patient survival via supervised machine learning classification techniques.

    Science.gov (United States)

    Lynch, Chip M; Abdollahi, Behnaz; Fuqua, Joshua D; de Carlo, Alexandra R; Bartholomai, James A; Balgemann, Rayeanne N; van Berkel, Victor H; Frieboes, Hermann B

    2017-12-01

    Outcomes for cancer patients have been previously estimated by applying various machine learning techniques to large datasets such as the Surveillance, Epidemiology, and End Results (SEER) program database. In particular for lung cancer, it is not well understood which types of techniques would yield more predictive information, and which data attributes should be used in order to determine this information. In this study, a number of supervised learning techniques is applied to the SEER database to classify lung cancer patients in terms of survival, including linear regression, Decision Trees, Gradient Boosting Machines (GBM), Support Vector Machines (SVM), and a custom ensemble. Key data attributes in applying these methods include tumor grade, tumor size, gender, age, stage, and number of primaries, with the goal to enable comparison of predictive power between the various methods The prediction is treated like a continuous target, rather than a classification into categories, as a first step towards improving survival prediction. The results show that the predicted values agree with actual values for low to moderate survival times, which constitute the majority of the data. The best performing technique was the custom ensemble with a Root Mean Square Error (RMSE) value of 15.05. The most influential model within the custom ensemble was GBM, while Decision Trees may be inapplicable as it had too few discrete outputs. The results further show that among the five individual models generated, the most accurate was GBM with an RMSE value of 15.32. Although SVM underperformed with an RMSE value of 15.82, statistical analysis singles the SVM as the only model that generated a distinctive output. The results of the models are consistent with a classical Cox proportional hazards model used as a reference technique. We conclude that application of these supervised learning techniques to lung cancer data in the SEER database may be of use to estimate patient survival time

  9. Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees

    Directory of Open Access Journals (Sweden)

    Philip H. Williams

    2012-01-01

    Full Text Available MicroRNAs (miRNAs are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require “read count” to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA∗ duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.

  10. Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees.

    Science.gov (United States)

    Williams, Philip H; Eyles, Rod; Weiller, Georg

    2012-01-01

    MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require "read count" to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA(∗) duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.

  11. Automated Detection of Microaneurysms Using Scale-Adapted Blob Analysis and Semi-Supervised Learning

    Energy Technology Data Exchange (ETDEWEB)

    Adal, Kedir M. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Sidebe, Desire [Univ. of Burgundy, Dijon (France); Ali, Sharib [Univ. of Burgundy, Dijon (France); Chaum, Edward [Univ. of Tennessee, Knoxville, TN (United States); Karnowski, Thomas Paul [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Meriaudeau, Fabrice [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

    2014-01-07

    Despite several attempts, automated detection of microaneurysm (MA) from digital fundus images still remains to be an open issue. This is due to the subtle nature of MAs against the surrounding tissues. In this paper, the microaneurysm detection problem is modeled as finding interest regions or blobs from an image and an automatic local-scale selection technique is presented. Several scale-adapted region descriptors are then introduced to characterize these blob regions. A semi-supervised based learning approach, which requires few manually annotated learning examples, is also proposed to train a classifier to detect true MAs. The developed system is built using only few manually labeled and a large number of unlabeled retinal color fundus images. The performance of the overall system is evaluated on Retinopathy Online Challenge (ROC) competition database. A competition performance measure (CPM) of 0.364 shows the competitiveness of the proposed system against state-of-the art techniques as well as the applicability of the proposed features to analyze fundus images.

  12. Application of semi-supervised deep learning to lung sound analysis.

    Science.gov (United States)

    Chamberlain, Daniel; Kodgule, Rahul; Ganelin, Daniela; Miglani, Vivek; Fletcher, Richard Ribon

    2016-08-01

    The analysis of lung sounds, collected through auscultation, is a fundamental component of pulmonary disease diagnostics for primary care and general patient monitoring for telemedicine. Despite advances in computation and algorithms, the goal of automated lung sound identification and classification has remained elusive. Over the past 40 years, published work in this field has demonstrated only limited success in identifying lung sounds, with most published studies using only a small numbers of patients (typically Ndeep learning algorithm for automatically classify lung sounds from a relatively large number of patients (N=284). Focusing on the two most common lung sounds, wheeze and crackle, we present results from 11,627 sound files recorded from 11 different auscultation locations on these 284 patients with pulmonary disease. 890 of these sound files were labeled to evaluate the model, which is significantly larger than previously published studies. Data was collected with a custom mobile phone application and a low-cost (US$30) electronic stethoscope. On this data set, our algorithm achieves ROC curves with AUCs of 0.86 for wheeze and 0.74 for crackle. Most importantly, this study demonstrates how semi-supervised deep learning can be used with larger data sets without requiring extensive labeling of data.

  13. Semi-supervised Learning Predicts Approximately One Third of the Alternative Splicing Isoforms as Functional Proteins

    Directory of Open Access Journals (Sweden)

    Yanqi Hao

    2015-07-01

    Full Text Available Alternative splicing acts on transcripts from almost all human multi-exon genes. Notwithstanding its ubiquity, fundamental ramifications of splicing on protein expression remain unresolved. The number and identity of spliced transcripts that form stably folded proteins remain the sources of considerable debate, due largely to low coverage of experimental methods and the resulting absence of negative data. We circumvent this issue by developing a semi-supervised learning algorithm, positive unlabeled learning for splicing elucidation (PULSE; http://www.kimlab.org/software/pulse, which uses 48 features spanning various categories. We validated its accuracy on sets of bona fide protein isoforms and directly on mass spectrometry (MS spectra for an overall AU-ROC of 0.85. We predict that around 32% of “exon skipping” alternative splicing events produce stable proteins, suggesting that the process engenders a significant number of previously uncharacterized proteins. We also provide insights into the distribution of positive isoforms in various functional classes and into the structural effects of alternative splicing.

  14. Automated Quality Assessment of Structural Magnetic Resonance Brain Images Based on a Supervised Machine Learning Algorithm

    Directory of Open Access Journals (Sweden)

    Ricardo Andres Pizarro

    2016-12-01

    Full Text Available High-resolution three-dimensional magnetic resonance imaging (3D-MRI is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM algorithm in the quality assessment of structural brain images, using global and region of interest (ROI automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.

  15. Automated Quality Assessment of Structural Magnetic Resonance Brain Images Based on a Supervised Machine Learning Algorithm.

    Science.gov (United States)

    Pizarro, Ricardo A; Cheng, Xi; Barnett, Alan; Lemaitre, Herve; Verchinski, Beth A; Goldman, Aaron L; Xiao, Ena; Luo, Qian; Berman, Karen F; Callicott, Joseph H; Weinberger, Daniel R; Mattay, Venkata S

    2016-01-01

    High-resolution three-dimensional magnetic resonance imaging (3D-MRI) is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM) algorithm in the quality assessment of structural brain images, using global and region of interest (ROI) automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy) of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.

  16. Supervised deep learning embeddings for the prediction of cervical cancer diagnosis

    Directory of Open Access Journals (Sweden)

    Kelwin Fernandes

    2018-05-01

    Full Text Available Cervical cancer remains a significant cause of mortality all around the world, even if it can be prevented and cured by removing affected tissues in early stages. Providing universal and efficient access to cervical screening programs is a challenge that requires identifying vulnerable individuals in the population, among other steps. In this work, we present a computationally automated strategy for predicting the outcome of the patient biopsy, given risk patterns from individual medical records. We propose a machine learning technique that allows a joint and fully supervised optimization of dimensionality reduction and classification models. We also build a model able to highlight relevant properties in the low dimensional space, to ease the classification of patients. We instantiated the proposed approach with deep learning architectures, and achieved accurate prediction results (top area under the curve AUC = 0.6875 which outperform previously developed methods, such as denoising autoencoders. Additionally, we explored some clinical findings from the embedding spaces, and we validated them through the medical literature, making them reliable for physicians and biomedical researchers.

  17. The effects of supervised learning on event-related potential correlates of music-syntactic processing.

    Science.gov (United States)

    Guo, Shuang; Koelsch, Stefan

    2015-11-11

    Humans process music even without conscious effort according to implicit knowledge about syntactic regularities. Whether such automatic and implicit processing is modulated by veridical knowledge has remained unknown in previous neurophysiological studies. This study investigates this issue by testing whether the acquisition of veridical knowledge of a music-syntactic irregularity (acquired through supervised learning) modulates early, partly automatic, music-syntactic processes (as reflected in the early right anterior negativity, ERAN), and/or late controlled processes (as reflected in the late positive component, LPC). Excerpts of piano sonatas with syntactically regular and less regular chords were presented repeatedly (10 times) to non-musicians and amateur musicians. Participants were informed by a cue as to whether the following excerpt contained a regular or less regular chord. Results showed that the repeated exposure to several presentations of regular and less regular excerpts did not influence the ERAN elicited by less regular chords. By contrast, amplitudes of the LPC (as well as of the P3a evoked by less regular chords) decreased systematically across learning trials. These results reveal that late controlled, but not early (partly automatic), neural mechanisms of music-syntactic processing are modulated by repeated exposure to a musical piece. This article is part of a Special Issue entitled SI: Prediction and Attention. Copyright © 2015 Elsevier B.V. All rights reserved.

  18. Automated detection of microaneurysms using scale-adapted blob analysis and semi-supervised learning.

    Science.gov (United States)

    Adal, Kedir M; Sidibé, Désiré; Ali, Sharib; Chaum, Edward; Karnowski, Thomas P; Mériaudeau, Fabrice

    2014-04-01

    Despite several attempts, automated detection of microaneurysm (MA) from digital fundus images still remains to be an open issue. This is due to the subtle nature of MAs against the surrounding tissues. In this paper, the microaneurysm detection problem is modeled as finding interest regions or blobs from an image and an automatic local-scale selection technique is presented. Several scale-adapted region descriptors are introduced to characterize these blob regions. A semi-supervised based learning approach, which requires few manually annotated learning examples, is also proposed to train a classifier which can detect true MAs. The developed system is built using only few manually labeled and a large number of unlabeled retinal color fundus images. The performance of the overall system is evaluated on Retinopathy Online Challenge (ROC) competition database. A competition performance measure (CPM) of 0.364 shows the competitiveness of the proposed system against state-of-the art techniques as well as the applicability of the proposed features to analyze fundus images. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  19. An Efficient Semi-supervised Learning Approach to Predict SH2 Domain Mediated Interactions.

    Science.gov (United States)

    Kundu, Kousik; Backofen, Rolf

    2017-01-01

    Src homology 2 (SH2) domain is an important subclass of modular protein domains that plays an indispensable role in several biological processes in eukaryotes. SH2 domains specifically bind to the phosphotyrosine residue of their binding peptides to facilitate various molecular functions. For determining the subtle binding specificities of SH2 domains, it is very important to understand the intriguing mechanisms by which these domains recognize their target peptides in a complex cellular environment. There are several attempts have been made to predict SH2-peptide interactions using high-throughput data. However, these high-throughput data are often affected by a low signal to noise ratio. Furthermore, the prediction methods have several additional shortcomings, such as linearity problem, high computational complexity, etc. Thus, computational identification of SH2-peptide interactions using high-throughput data remains challenging. Here, we propose a machine learning approach based on an efficient semi-supervised learning technique for the prediction of 51 SH2 domain mediated interactions in the human proteome. In our study, we have successfully employed several strategies to tackle the major problems in computational identification of SH2-peptide interactions.

  20. DuSK: A Dual Structure-preserving Kernel for Supervised Tensor Learning with Applications to Neuroimages

    Science.gov (United States)

    He, Lifang; Kong, Xiangnan; Yu, Philip S.; Ragin, Ann B.; Hao, Zhifeng; Yang, Xiaowei

    2015-01-01

    With advances in data collection technologies, tensor data is assuming increasing prominence in many applications and the problem of supervised tensor learning has emerged as a topic of critical significance in the data mining and machine learning community. Conventional methods for supervised tensor learning mainly focus on learning kernels by flattening the tensor into vectors or matrices, however structural information within the tensors will be lost. In this paper, we introduce a new scheme to design structure-preserving kernels for supervised tensor learning. Specifically, we demonstrate how to leverage the naturally available structure within the tensorial representation to encode prior knowledge in the kernel. We proposed a tensor kernel that can preserve tensor structures based upon dual-tensorial mapping. The dual-tensorial mapping function can map each tensor instance in the input space to another tensor in the feature space while preserving the tensorial structure. Theoretically, our approach is an extension of the conventional kernels in the vector space to tensor space. We applied our novel kernel in conjunction with SVM to real-world tensor classification problems including brain fMRI classification for three different diseases (i.e., Alzheimer's disease, ADHD and brain damage by HIV). Extensive empirical studies demonstrate that our proposed approach can effectively boost tensor classification performances, particularly with small sample sizes. PMID:25927014

  1. DuSK: A Dual Structure-preserving Kernel for Supervised Tensor Learning with Applications to Neuroimages.

    Science.gov (United States)

    He, Lifang; Kong, Xiangnan; Yu, Philip S; Ragin, Ann B; Hao, Zhifeng; Yang, Xiaowei

    With advances in data collection technologies, tensor data is assuming increasing prominence in many applications and the problem of supervised tensor learning has emerged as a topic of critical significance in the data mining and machine learning community. Conventional methods for supervised tensor learning mainly focus on learning kernels by flattening the tensor into vectors or matrices, however structural information within the tensors will be lost. In this paper, we introduce a new scheme to design structure-preserving kernels for supervised tensor learning. Specifically, we demonstrate how to leverage the naturally available structure within the tensorial representation to encode prior knowledge in the kernel. We proposed a tensor kernel that can preserve tensor structures based upon dual-tensorial mapping. The dual-tensorial mapping function can map each tensor instance in the input space to another tensor in the feature space while preserving the tensorial structure. Theoretically, our approach is an extension of the conventional kernels in the vector space to tensor space. We applied our novel kernel in conjunction with SVM to real-world tensor classification problems including brain fMRI classification for three different diseases ( i.e ., Alzheimer's disease, ADHD and brain damage by HIV). Extensive empirical studies demonstrate that our proposed approach can effectively boost tensor classification performances, particularly with small sample sizes.

  2. An unsupervised text mining method for relation extraction from biomedical literature.

    Directory of Open Access Journals (Sweden)

    Changqin Quan

    Full Text Available The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1 Protein-protein interactions extraction, and (2 Gene-suicide association extraction. The evaluation of task (1 on the benchmark dataset (AImed corpus showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene-suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method.

  3. Classification of behavior using unsupervised temporal neural networks

    International Nuclear Information System (INIS)

    Adair, K.L.

    1998-03-01

    Adding recurrent connections to unsupervised neural networks used for clustering creates a temporal neural network which clusters a sequence of inputs as they appear over time. The model presented combines the Jordan architecture with the unsupervised learning technique Adaptive Resonance Theory, Fuzzy ART. The combination yields a neural network capable of quickly clustering sequential pattern sequences as the sequences are generated. The applicability of the architecture is illustrated through a facility monitoring problem

  4. Computerized breast cancer analysis system using three stage semi-supervised learning method.

    Science.gov (United States)

    Sun, Wenqing; Tseng, Tzu-Liang Bill; Zhang, Jianying; Qian, Wei

    2016-10-01

    A large number of labeled medical image data is usually a requirement to train a well-performed computer-aided detection (CAD) system. But the process of data labeling is time consuming, and potential ethical and logistical problems may also present complications. As a result, incorporating unlabeled data into CAD system can be a feasible way to combat these obstacles. In this study we developed a three stage semi-supervised learning (SSL) scheme that combines a small amount of labeled data and larger amount of unlabeled data. The scheme was modified on our existing CAD system using the following three stages: data weighing, feature selection, and newly proposed dividing co-training data labeling algorithm. Global density asymmetry features were incorporated to the feature pool to reduce the false positive rate. Area under the curve (AUC) and accuracy were computed using 10 fold cross validation method to evaluate the performance of our CAD system. The image dataset includes mammograms from 400 women who underwent routine screening examinations, and each pair contains either two cranio-caudal (CC) or two mediolateral-oblique (MLO) view mammograms from the right and the left breasts. From these mammograms 512 regions were extracted and used in this study, and among them 90 regions were treated as labeled while the rest were treated as unlabeled. Using our proposed scheme, the highest AUC observed in our research was 0.841, which included the 90 labeled data and all the unlabeled data. It was 7.4% higher than using labeled data only. With the increasing amount of labeled data, AUC difference between using mixed data and using labeled data only reached its peak when the amount of labeled data was around 60. This study demonstrated that our proposed three stage semi-supervised learning can improve the CAD performance by incorporating unlabeled data. Using unlabeled data is promising in computerized cancer research and may have a significant impact for future CAD system

  5. Varieties of perceptual learning.

    Science.gov (United States)

    Mackintosh, N J

    2009-05-01

    Although most studies of perceptual learning in human participants have concentrated on the changes in perception assumed to be occurring, studies of nonhuman animals necessarily measure discrimination learning and generalization and remain agnostic on the question of whether changes in behavior reflect changes in perception. On the other hand, animal studies do make it easier to draw a distinction between supervised and unsupervised learning. Differential reinforcement will surely teach animals to attend to some features of a stimulus array rather than to others. But it is an open question as to whether such changes in attention underlie the enhanced discrimination seen after unreinforced exposure to such an array. I argue that most instances of unsupervised perceptual learning observed in animals (and at least some in human animals) are better explained by appeal to well-established principles and phenomena of associative learning theory: excitatory and inhibitory associations between stimulus elements, latent inhibition, and habituation.

  6. Master's Thesis Supervision: Relations between Perceptions of the Supervisor-Student Relationship, Final Grade, Perceived Supervisor Contribution to Learning and Student Satisfaction

    Science.gov (United States)

    de Kleijn, Renske A. M.; Mainhard, M. Tim; Meijer, Paulien C.; Pilot, Albert; Brekelmans, Mieke

    2012-01-01

    Master's thesis supervision is a complex task given the two-fold goal of the thesis (learning and assessment). An important aspect of supervision is the supervisor-student relationship. This quantitative study (N = 401) investigates how perceptions of the supervisor-student relationship are related to three dependent variables: final grade,…

  7. Self-Supervised Video Representation Learning With Odd-One-Out Networks : CVPR 2017 : 21-26 July 2016, Honolulu, Hawaii : proceedings

    NARCIS (Netherlands)

    Fernando, B.; Bilen, H.; Gavves, E.; Gould, S.

    2017-01-01

    We propose a new self-supervised CNN pre-training technique based on a novel auxiliary task called odd-one-out learning. In this task, the machine is asked to identify the unrelated or odd element from a set of otherwise related elements. We apply this technique to self-supervised video

  8. An Adaptive Privacy Protection Method for Smart Home Environments Using Supervised Learning

    Directory of Open Access Journals (Sweden)

    Jingsha He

    2017-03-01

    Full Text Available In recent years, smart home technologies have started to be widely used, bringing a great deal of convenience to people’s daily lives. At the same time, privacy issues have become particularly prominent. Traditional encryption methods can no longer meet the needs of privacy protection in smart home applications, since attacks can be launched even without the need for access to the cipher. Rather, attacks can be successfully realized through analyzing the frequency of radio signals, as well as the timestamp series, so that the daily activities of the residents in the smart home can be learnt. Such types of attacks can achieve a very high success rate, making them a great threat to users’ privacy. In this paper, we propose an adaptive method based on sample data analysis and supervised learning (SDASL, to hide the patterns of daily routines of residents that would adapt to dynamically changing network loads. Compared to some existing solutions, our proposed method exhibits advantages such as low energy consumption, low latency, strong adaptability, and effective privacy protection.

  9. A bifurcation identifier for IV-OCT using orthogonal least squares and supervised machine learning.

    Science.gov (United States)

    Macedo, Maysa M G; Guimarães, Welingson V N; Galon, Micheli Z; Takimura, Celso K; Lemos, Pedro A; Gutierrez, Marco Antonio

    2015-12-01

    Intravascular optical coherence tomography (IV-OCT) is an in-vivo imaging modality based on the intravascular introduction of a catheter which provides a view of the inner wall of blood vessels with a spatial resolution of 10-20 μm. Recent studies in IV-OCT have demonstrated the importance of the bifurcation regions. Therefore, the development of an automated tool to classify hundreds of coronary OCT frames as bifurcation or nonbifurcation can be an important step to improve automated methods for atherosclerotic plaques quantification, stent analysis and co-registration between different modalities. This paper describes a fully automated method to identify IV-OCT frames in bifurcation regions. The method is divided into lumen detection; feature extraction; and classification, providing a lumen area quantification, geometrical features of the cross-sectional lumen and labeled slices. This classification method is a combination of supervised machine learning algorithms and feature selection using orthogonal least squares methods. Training and tests were performed in sets with a maximum of 1460 human coronary OCT frames. The lumen segmentation achieved a mean difference of lumen area of 0.11 mm(2) compared with manual segmentation, and the AdaBoost classifier presented the best result reaching a F-measure score of 97.5% using 104 features. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Semi-supervised learning based probabilistic latent semantic analysis for automatic image annotation

    Institute of Scientific and Technical Information of China (English)

    Tian Dongping

    2017-01-01

    In recent years, multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas, especially for automatic image annotation, whose purpose is to provide an efficient and effective searching environment for users to query their images more easily.In this paper, a semi-supervised learning based probabilistic latent semantic analysis ( PL-SA) model for automatic image annotation is presenred.Since it' s often hard to obtain or create la-beled images in large quantities while unlabeled ones are easier to collect, a transductive support vector machine ( TSVM) is exploited to enhance the quality of the training image data.Then, differ-ent image features with different magnitudes will result in different performance for automatic image annotation.To this end, a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible.Finally, a PLSA model with asymmetric mo-dalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores.Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PL-SA for the task of automatic image annotation.

  11. Supervised learning for the automated transcription of spacer classification from spoligotype films

    Directory of Open Access Journals (Sweden)

    Abernethy Neil

    2009-08-01

    Full Text Available Abstract Background Molecular genotyping of bacteria has revolutionized the study of tuberculosis epidemiology, yet these established laboratory techniques typically require subjective and laborious interpretation by trained professionals. In the context of a Tuberculosis Case Contact study in The Gambia we used a reverse hybridization laboratory assay called spoligotype analysis. To facilitate processing of spoligotype images we have developed tools and algorithms to automate the classification and transcription of these data directly to a database while allowing for manual editing. Results Features extracted from each of the 1849 spots on a spoligo film were classified using two supervised learning algorithms. A graphical user interface allows manual editing of the classification, before export to a database. The application was tested on ten films of differing quality and the results of the best classifier were compared to expert manual classification, giving a median correct classification rate of 98.1% (inter quartile range: 97.1% to 99.2%, with an automated processing time of less than 1 minute per film. Conclusion The software implementation offers considerable time savings over manual processing whilst allowing expert editing of the automated classification. The automatic upload of the classification to a database reduces the chances of transcription errors.

  12. Classification of autism spectrum disorder using supervised learning of brain connectivity measures extracted from synchrostates

    Science.gov (United States)

    Jamal, Wasifa; Das, Saptarshi; Oprescu, Ioana-Anastasia; Maharatna, Koushik; Apicella, Fabio; Sicca, Federico

    2014-08-01

    Objective. The paper investigates the presence of autism using the functional brain connectivity measures derived from electro-encephalogram (EEG) of children during face perception tasks. Approach. Phase synchronized patterns from 128-channel EEG signals are obtained for typical children and children with autism spectrum disorder (ASD). The phase synchronized states or synchrostates temporally switch amongst themselves as an underlying process for the completion of a particular cognitive task. We used 12 subjects in each group (ASD and typical) for analyzing their EEG while processing fearful, happy and neutral faces. The minimal and maximally occurring synchrostates for each subject are chosen for extraction of brain connectivity features, which are used for classification between these two groups of subjects. Among different supervised learning techniques, we here explored the discriminant analysis and support vector machine both with polynomial kernels for the classification task. Main results. The leave one out cross-validation of the classification algorithm gives 94.7% accuracy as the best performance with corresponding sensitivity and specificity values as 85.7% and 100% respectively. Significance. The proposed method gives high classification accuracies and outperforms other contemporary research results. The effectiveness of the proposed method for classification of autistic and typical children suggests the possibility of using it on a larger population to validate it for clinical practice.

  13. Construction of Hamiltonians by supervised learning of energy and entanglement spectra

    Science.gov (United States)

    Fujita, Hiroyuki; Nakagawa, Yuya O.; Sugiura, Sho; Oshikawa, Masaki

    2018-02-01

    Correlated many-body problems ubiquitously appear in various fields of physics such as condensed matter, nuclear, and statistical physics. However, due to the interplay of the large number of degrees of freedom, it is generically impossible to treat these problems from first principles. Thus the construction of a proper model, namely, effective Hamiltonian, is essential. Here, we propose a simple supervised learning algorithm for constructing Hamiltonians from given energy or entanglement spectra. We apply the proposed scheme to the Hubbard model at the half-filling, and compare the obtained effective low-energy spin model with several analytic results based on the high-order perturbation theory, which have been inconsistent with each other. We also show that our approach can be used to construct the entanglement Hamiltonian of a quantum many-body state from its entanglement spectrum as well. We exemplify this using the ground states of the S =1 /2 two-leg Heisenberg ladders. We observe a qualitative difference between the entanglement Hamiltonians of the two phases (the Haldane and the rung singlet phase) of the model due to the different origin of the entanglement. In the Haldane phase, we find that the entanglement Hamiltonian is nonlocal by nature, and the locality can be restored by introducing the anisotropy and turning the ground state into the large-D phase. Possible applications to the model construction from experimental data and to various problems of strongly correlated systems are discussed.

  14. Comparative Analysis of River Flow Modelling by Using Supervised Learning Technique

    Science.gov (United States)

    Ismail, Shuhaida; Mohamad Pandiahi, Siraj; Shabri, Ani; Mustapha, Aida

    2018-04-01

    The goal of this research is to investigate the efficiency of three supervised learning algorithms for forecasting monthly river flow of the Indus River in Pakistan, spread over 550 square miles or 1800 square kilometres. The algorithms include the Least Square Support Vector Machine (LSSVM), Artificial Neural Network (ANN) and Wavelet Regression (WR). The forecasting models predict the monthly river flow obtained from the three models individually for river flow data and the accuracy of the all models were then compared against each other. The monthly river flow of the said river has been forecasted using these three models. The obtained results were compared and statistically analysed. Then, the results of this analytical comparison showed that LSSVM model is more precise in the monthly river flow forecasting. It was found that LSSVM has he higher r with the value of 0.934 compared to other models. This indicate that LSSVM is more accurate and efficient as compared to the ANN and WR model.

  15. Restricted Boltzmann machines based oversampling and semi-supervised learning for false positive reduction in breast CAD.

    Science.gov (United States)

    Cao, Peng; Liu, Xiaoli; Bao, Hang; Yang, Jinzhu; Zhao, Dazhe

    2015-01-01

    The false-positive reduction (FPR) is a crucial step in the computer aided detection system for the breast. The issues of imbalanced data distribution and the limitation of labeled samples complicate the classification procedure. To overcome these challenges, we propose oversampling and semi-supervised learning methods based on the restricted Boltzmann machines (RBMs) to solve the classification of imbalanced data with a few labeled samples. To evaluate the proposed method, we conducted a comprehensive performance study and compared its results with the commonly used techniques. Experiments on benchmark dataset of DDSM demonstrate the effectiveness of the RBMs based oversampling and semi-supervised learning method in terms of geometric mean (G-mean) for false positive reduction in Breast CAD.

  16. Learning through simulated independent practice leads to better future performance in a simulated crisis than learning through simulated supervised practice.

    Science.gov (United States)

    Goldberg, A; Silverman, E; Samuelson, S; Katz, D; Lin, H M; Levine, A; DeMaria, S

    2015-05-01

    Anaesthetists may fail to recognize and manage certain rare intraoperative events. Simulation has been shown to be an effective educational adjunct to typical operating room-based education to train for these events. It is yet unclear, however, why simulation has any benefit. We hypothesize that learners who are allowed to manage a scenario independently and allowed to fail, thus causing simulated morbidity, will consequently perform better when re-exposed to a similar scenario. Using a randomized, controlled, observer-blinded design, 24 first-year residents were exposed to an oxygen pipeline contamination scenario, either where patient harm occurred (independent group, n=12) or where a simulated attending anaesthetist intervened to prevent harm (supervised group, n=12). Residents were brought back 6 months later and exposed to a different scenario (pipeline contamination) with the same end point. Participants' proper treatment, time to diagnosis, and non-technical skills (measured using the Anaesthetists' Non-Technical Skills Checklist, ANTS) were measured. No participants provided proper treatment in the initial exposure. In the repeat encounter 6 months later, 67% in the independent group vs 17% in the supervised group resumed adequate oxygen delivery (P=0.013). The independent group also had better ANTS scores [median (interquartile range): 42.3 (31.5-53.1) vs 31.3 (21.6-41), P=0.015]. There was no difference in time to treatment if proper management was provided [602 (490-820) vs 610 (420-800) s, P=0.79]. Allowing residents to practise independently in the simulation laboratory, and subsequently, allowing them to fail, can be an important part of simulation-based learning. This is not feasible in real clinical practice but appears to have improved resident performance in this study. The purposeful use of independent practice and its potentially negative outcomes thus sets simulation-based learning apart from traditional operating room learning. © The Author

  17. Assessing Electronic Cigarette-Related Tweets for Sentiment and Content Using Supervised Machine Learning.

    Science.gov (United States)

    Cole-Lewis, Heather; Varghese, Arun; Sanders, Amy; Schwarz, Mary; Pugatch, Jillian; Augustson, Erik

    2015-08-25

    Electronic cigarettes (e-cigarettes) continue to be a growing topic among social media users, especially on Twitter. The ability to analyze conversations about e-cigarettes in real-time can provide important insight into trends in the public's knowledge, attitudes, and beliefs surrounding e-cigarettes, and subsequently guide public health interventions. Our aim was to establish a supervised machine learning algorithm to build predictive classification models that assess Twitter data for a range of factors related to e-cigarettes. Manual content analysis was conducted for 17,098 tweets. These tweets were coded for five categories: e-cigarette relevance, sentiment, user description, genre, and theme. Machine learning classification models were then built for each of these five categories, and word groupings (n-grams) were used to define the feature space for each classifier. Predictive performance scores for classification models indicated that the models correctly labeled the tweets with the appropriate variables between 68.40% and 99.34% of the time, and the percentage of maximum possible improvement over a random baseline that was achieved by the classification models ranged from 41.59% to 80.62%. Classifiers with the highest performance scores that also achieved the highest percentage of the maximum possible improvement over a random baseline were Policy/Government (performance: 0.94; % improvement: 80.62%), Relevance (performance: 0.94; % improvement: 75.26%), Ad or Promotion (performance: 0.89; % improvement: 72.69%), and Marketing (performance: 0.91; % improvement: 72.56%). The most appropriate word-grouping unit (n-gram) was 1 for the majority of classifiers. Performance continued to marginally increase with the size of the training dataset of manually annotated data, but eventually leveled off. Even at low dataset sizes of 4000 observations, performance characteristics were fairly sound. Social media outlets like Twitter can uncover real-time snapshots of

  18. Assessing Electronic Cigarette-Related Tweets for Sentiment and Content Using Supervised Machine Learning

    Science.gov (United States)

    Cole-Lewis, Heather; Varghese, Arun; Sanders, Amy; Schwarz, Mary; Pugatch, Jillian

    2015-01-01

    Background Electronic cigarettes (e-cigarettes) continue to be a growing topic among social media users, especially on Twitter. The ability to analyze conversations about e-cigarettes in real-time can provide important insight into trends in the public’s knowledge, attitudes, and beliefs surrounding e-cigarettes, and subsequently guide public health interventions. Objective Our aim was to establish a supervised machine learning algorithm to build predictive classification models that assess Twitter data for a range of factors related to e-cigarettes. Methods Manual content analysis was conducted for 17,098 tweets. These tweets were coded for five categories: e-cigarette relevance, sentiment, user description, genre, and theme. Machine learning classification models were then built for each of these five categories, and word groupings (n-grams) were used to define the feature space for each classifier. Results Predictive performance scores for classification models indicated that the models correctly labeled the tweets with the appropriate variables between 68.40% and 99.34% of the time, and the percentage of maximum possible improvement over a random baseline that was achieved by the classification models ranged from 41.59% to 80.62%. Classifiers with the highest performance scores that also achieved the highest percentage of the maximum possible improvement over a random baseline were Policy/Government (performance: 0.94; % improvement: 80.62%), Relevance (performance: 0.94; % improvement: 75.26%), Ad or Promotion (performance: 0.89; % improvement: 72.69%), and Marketing (performance: 0.91; % improvement: 72.56%). The most appropriate word-grouping unit (n-gram) was 1 for the majority of classifiers. Performance continued to marginally increase with the size of the training dataset of manually annotated data, but eventually leveled off. Even at low dataset sizes of 4000 observations, performance characteristics were fairly sound. Conclusions Social media outlets

  19. Unsupervised EEG analysis for automated epileptic seizure detection

    Science.gov (United States)

    Birjandtalab, Javad; Pouyan, Maziyar Baran; Nourani, Mehrdad

    2016-07-01

    Epilepsy is a neurological disorder which can, if not controlled, potentially cause unexpected death. It is extremely crucial to have accurate automatic pattern recognition and data mining techniques to detect the onset of seizures and inform care-givers to help the patients. EEG signals are the preferred biosignals for diagnosis of epileptic patients. Most of the existing pattern recognition techniques used in EEG analysis leverage the notion of supervised machine learning algorithms. Since seizure data are heavily under-represented, such techniques are not always practical particularly when the labeled data is not sufficiently available or when disease progression is rapid and the corresponding EEG footprint pattern will not be robust. Furthermore, EEG pattern change is highly individual dependent and requires experienced specialists to annotate the seizure and non-seizure events. In this work, we present an unsupervised technique to discriminate seizures and non-seizures events. We employ power spectral density of EEG signals in different frequency bands that are informative features to accurately cluster seizure and non-seizure events. The experimental results tried so far indicate achieving more than 90% accuracy in clustering seizure and non-seizure events without having any prior knowledge on patient's history.

  20. Evaluation of Four Supervised Learning Methods for Benthic Habitat Mapping Using Backscatter from Multi-Beam Sonar

    Directory of Open Access Journals (Sweden)

    Jacquomo Monk

    2012-11-01

    Full Text Available An understanding of the distribution and extent of marine habitats is essential for the implementation of ecosystem-based management strategies. Historically this had been difficult in marine environments until the advancement of acoustic sensors. This study demonstrates the applicability of supervised learning techniques for benthic habitat characterization using angular backscatter response data. With the advancement of multibeam echo-sounder (MBES technology, full coverage datasets of physical structure over vast regions of the seafloor are now achievable. Supervised learning methods typically applied to terrestrial remote sensing provide a cost-effective approach for habitat characterization in marine systems. However the comparison of the relative performance of different classifiers using acoustic data is limited. Characterization of acoustic backscatter data from MBES using four different supervised learning methods to generate benthic habitat maps is presented. Maximum Likelihood Classifier (MLC, Quick, Unbiased, Efficient Statistical Tree (QUEST, Random Forest (RF and Support Vector Machine (SVM were evaluated to classify angular backscatter response into habitat classes using training data acquired from underwater video observations. Results for biota classifications indicated that SVM and RF produced the highest accuracies, followed by QUEST and MLC, respectively. The most important backscatter data were from the moderate incidence angles between 30° and 50°. This study presents initial results for understanding how acoustic backscatter from MBES can be optimized for the characterization of marine benthic biological habitats.

  1. Introduction to machine learning.

    Science.gov (United States)

    Baştanlar, Yalin; Ozuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning approaches for this application area. In this chapter, we first review the fundamental concepts of machine learning such as feature assessment, unsupervised versus supervised learning and types of classification. Then, we point out the main issues of designing machine learning experiments and their performance evaluation. Finally, we introduce some supervised learning methods.

  2. A new semi-supervised learning model combined with Cox and SP-AFT models in cancer survival analysis.

    Science.gov (United States)

    Chai, Hua; Li, Zi-Na; Meng, De-Yu; Xia, Liang-Yong; Liang, Yong

    2017-10-12

    Gene selection is an attractive and important task in cancer survival analysis. Most existing supervised learning methods can only use the labeled biological data, while the censored data (weakly labeled data) far more than the labeled data are ignored in model building. Trying to utilize such information in the censored data, a semi-supervised learning framework (Cox-AFT model) combined with Cox proportional hazard (Cox) and accelerated failure time (AFT) model was used in cancer research, which has better performance than the single Cox or AFT model. This method, however, is easily affected by noise. To alleviate this problem, in this paper we combine the Cox-AFT model with self-paced learning (SPL) method to more effectively employ the information in the censored data in a self-learning way. SPL is a kind of reliable and stable learning mechanism, which is recently proposed for simulating the human learning process to help the AFT model automatically identify and include samples of high confidence into training, minimizing interference from high noise. Utilizing the SPL method produces two direct advantages: (1) The utilization of censored data is further promoted; (2) the noise delivered to the model is greatly decreased. The experimental results demonstrate the effectiveness of the proposed model compared to the traditional Cox-AFT model.

  3. SUPERVISION IMPLEMENTATION IN MANAGEMENT QUALITY: AN ATTEMPT TO IMPROVE THE QUALITY OF LEARNING AT MADRASAH ALIYAH DARUL A’MAL METRO

    Directory of Open Access Journals (Sweden)

    Subandi Subandi

    2016-03-01

    Full Text Available The primary purpose of this qualitative study is to identify and analyze supervision by implementing management quality to improve the quality of learning at Madrasah Aliyah Darul A’mal Metro, Lampung. The quality of management implementation is elaborated in the steps of assurance of learning quality. Two instruments, which consist of observation and questionnaire, were used in this study of which each instrument was analyzed based the deductive framework. The results of this study from each instrument revealed four steps of assurance of learning quality, among others (1 by socializing academic supervision program and its advantages to all stake holders, and (2 by implementing stages of assurance through academic supervision by the principals of Madrasah Aliyah Darul A’mal and supervisor, (3 performing supervision that is equipped with valid instrument to measure learning success, (4 performing the follow-up program by clinical and group discussion to provide appropriate model of performance.

  4. Semi-supervised Probabilistic Distance Clustering and the Uncertainty of Classification

    Science.gov (United States)

    Iyigun, Cem; Ben-Israel, Adi

    Semi-supervised clustering is an attempt to reconcile clustering (unsupervised learning) and classification (supervised learning, using prior information on the data). These two modes of data analysis are combined in a parameterized model, the parameter θ ∈ [0, 1] is the weight attributed to the prior information, θ = 0 corresponding to clustering, and θ = 1 to classification. The results (cluster centers, classification rule) depend on the parameter θ, an insensitivity to θ indicates that the prior information is in agreement with the intrinsic cluster structure, and is otherwise redundant. This explains why some data sets (such as the Wisconsin breast cancer data, Merz and Murphy, UCI repository of machine learning databases, University of California, Irvine, CA) give good results for all reasonable classification methods. The uncertainty of classification is represented here by the geometric mean of the membership probabilities, shown to be an entropic distance related to the Kullback-Leibler divergence.

  5. Facebook Blocket with Unsupervised Learning

    OpenAIRE

    Amin, Khizer; Minhas, Mehmood ul haq

    2014-01-01

    The Internet has become a valuable channel for both business-to- consumer and business-to-business e-commerce. It has changed the way for many companies to manage the business. Every day, more and more companies are making their presence on Internet. Web sites are launched for online shopping as web shops or on-line stores are a popular means of goods distribution. The number of items sold through the internet has sprung up significantly in the past few years. Moreover, it has become a choice...

  6. Supervised detection of exoplanets in high-contrast imaging sequences

    Science.gov (United States)

    Gomez Gonzalez, C. A.; Absil, O.; Van Droogenbroeck, M.

    2018-06-01

    Context. Post-processing algorithms play a key role in pushing the detection limits of high-contrast imaging (HCI) instruments. State-of-the-art image processing approaches for HCI enable the production of science-ready images relying on unsupervised learning techniques, such as low-rank approximations, for generating a model point spread function (PSF) and subtracting the residual starlight and speckle noise. Aims: In order to maximize the detection rate of HCI instruments and survey campaigns, advanced algorithms with higher sensitivities to faint companions are needed, especially for the speckle-dominated innermost region of the images. Methods: We propose a reformulation of the exoplanet detection task (for ADI sequences) that builds on well-established machine learning techniques to take HCI post-processing from an unsupervised to a supervised learning context. In this new framework, we present algorithmic solutions using two different discriminative models: SODIRF (random forests) and SODINN (neural networks). We test these algorithms on real ADI datasets from VLT/NACO and VLT/SPHERE HCI instruments. We then assess their performances by injecting fake companions and using receiver operating characteristic analysis. This is done in comparison with state-of-the-art ADI algorithms, such as ADI principal component analysis (ADI-PCA). Results: This study shows the improved sensitivity versus specificity trade-off of the proposed supervised detection approach. At the diffraction limit, SODINN improves the true positive rate by a factor ranging from 2 to 10 (depending on the dataset and angular separation) with respect to ADI-PCA when working at the same false-positive level. Conclusions: The proposed supervised detection framework outperforms state-of-the-art techniques in the task of discriminating planet signal from speckles. In addition, it offers the possibility of re-processing existing HCI databases to maximize their scientific return and potentially improve

  7. Developing a practice of supervision in university as a collective learning process

    DEFF Research Database (Denmark)

    Lund, Birthe; Jensen, Annie Aarup

    2009-01-01

    of the framework surrounding the supervision process, both as regards the students and the teachers; to de-privatize the problems encountered by the individual teacher during the supervision; to ensure that students would be able to graduate within the timeframe of the education (the institutional economic......The point of departure of the paper is a university pedagogical course established with the purpose of strengthening the university teachers’ competence regarding the supervision of students working on their master’s thesis. The purpose of the course is furthermore to ensure the improvement...

  8. Unsupervised information extraction by text segmentation

    CERN Document Server

    Cortez, Eli

    2013-01-01

    A new unsupervised approach to the problem of Information Extraction by Text Segmentation (IETS) is proposed, implemented and evaluated herein. The authors' approach relies on information available on pre-existing data to learn how to associate segments in the input string with attributes of a given domain relying on a very effective set of content-based features. The effectiveness of the content-based features is also exploited to directly learn from test data structure-based features, with no previous human-driven training, a feature unique to the presented approach. Based on the approach, a

  9. Computer-aided assessment of breast density: comparison of supervised deep learning and feature-based statistical learning

    Science.gov (United States)

    Li, Songfeng; Wei, Jun; Chan, Heang-Ping; Helvie, Mark A.; Roubidoux, Marilyn A.; Lu, Yao; Zhou, Chuan; Hadjiiski, Lubomir M.; Samala, Ravi K.

    2018-01-01

    Breast density is one of the most significant factors that is associated with cancer risk. In this study, our purpose was to develop a supervised deep learning approach for automated estimation of percentage density (PD) on digital mammograms (DMs). The input ‘for processing’ DMs was first log-transformed, enhanced by a multi-resolution preprocessing scheme, and subsampled to a pixel size of 800 µm  ×  800 µm from 100 µm  ×  100 µm. A deep convolutional neural network (DCNN) was trained to estimate a probability map of breast density (PMD) by using a domain adaptation resampling method. The PD was estimated as the ratio of the dense area to the breast area based on the PMD. The DCNN approach was compared to a feature-based statistical learning approach. Gray level, texture and morphological features were extracted and a least absolute shrinkage and selection operator was used to combine the features into a feature-based PMD. With approval of the Institutional Review Board, we retrospectively collected a training set of 478 DMs and an independent test set of 183 DMs from patient files in our institution. Two experienced mammography quality standards act radiologists interactively segmented PD as the reference standard. Ten-fold cross-validation was used for model selection and evaluation with the training set. With cross-validation, DCNN obtained a Dice’s coefficient (DC) of 0.79  ±  0.13 and Pearson’s correlation (r) of 0.97, whereas feature-based learning obtained DC  =  0.72  ±  0.18 and r  =  0.85. For the independent test set, DCNN achieved DC  =  0.76  ±  0.09 and r  =  0.94, while feature-based learning achieved DC  =  0.62  ±  0.21 and r  =  0.75. Our DCNN approach was significantly better and more robust than the feature-based learning approach for automated PD estimation on DMs, demonstrating its potential use for automated density reporting as

  10. Computer-aided assessment of breast density: comparison of supervised deep learning and feature-based statistical learning.

    Science.gov (United States)

    Li, Songfeng; Wei, Jun; Chan, Heang-Ping; Helvie, Mark A; Roubidoux, Marilyn A; Lu, Yao; Zhou, Chuan; Hadjiiski, Lubomir M; Samala, Ravi K

    2018-01-09

    Breast density is one of the most significant factors that is associated with cancer risk. In this study, our purpose was to develop a supervised deep learning approach for automated estimation of percentage density (PD) on digital mammograms (DMs). The input 'for processing' DMs was first log-transformed, enhanced by a multi-resolution preprocessing scheme, and subsampled to a pixel size of 800 µm  ×  800 µm from 100 µm  ×  100 µm. A deep convolutional neural network (DCNN) was trained to estimate a probability map of breast density (PMD) by using a domain adaptation resampling method. The PD was estimated as the ratio of the dense area to the breast area based on the PMD. The DCNN approach was compared to a feature-based statistical learning approach. Gray level, texture and morphological features were extracted and a least absolute shrinkage and selection operator was used to combine the features into a feature-based PMD. With approval of the Institutional Review Board, we retrospectively collected a training set of 478 DMs and an independent test set of 183 DMs from patient files in our institution. Two experienced mammography quality standards act radiologists interactively segmented PD as the reference standard. Ten-fold cross-validation was used for model selection and evaluation with the training set. With cross-validation, DCNN obtained a Dice's coefficient (DC) of 0.79  ±  0.13 and Pearson's correlation (r) of 0.97, whereas feature-based learning obtained DC  =  0.72  ±  0.18 and r  =  0.85. For the independent test set, DCNN achieved DC  =  0.76  ±  0.09 and r  =  0.94, while feature-based learning achieved DC  =  0.62  ±  0.21 and r  =  0.75. Our DCNN approach was significantly better and more robust than the feature-based learning approach for automated PD estimation on DMs, demonstrating its potential use for automated density reporting as well as

  11. Supervised learning based model for predicting variability-induced timing errors

    NARCIS (Netherlands)

    Jiao, X.; Rahimi, A.; Narayanaswamy, B.; Fatemi, H.; Pineda de Gyvez, J.; Gupta, R.K.

    2015-01-01

    Circuit designers typically combat variations in hardware and workload by increasing conservative guardbanding that leads to operational inefficiency. Reducing this excessive guardband is highly desirable, but causes timing errors in synchronous circuits. We propose a methodology for supervised

  12. Ischemia Detection Using Supervised Learning for Hierarchical Neural Networks Based on Kohonen-Maps

    National Research Council Canada - National Science Library

    Vladutu, L

    2001-01-01

    .... The motivation for developing the Supervising Network - Self Organizing Map (sNet-SOM) model is to design computationally effective solutions for the particular problem of ischemia detection and other similar applications...

  13. SU-E-J-107: Supervised Learning Model of Aligned Collagen for Human Breast Carcinoma Prognosis

    International Nuclear Information System (INIS)

    Bredfeldt, J; Liu, Y; Conklin, M; Keely, P; Eliceiri, K; Mackie, T

    2014-01-01

    Purpose: Our goal is to develop and apply a set of optical and computational tools to enable large-scale investigations of the interaction between collagen and tumor cells. Methods: We have built a novel imaging system for automating the capture of whole-slide second harmonic generation (SHG) images of collagen in registry with bright field (BF) images of hematoxylin and eosin stained tissue. To analyze our images, we have integrated a suite of supervised learning tools that semi-automatically model and score collagen interactions with tumor cells via a variety of metrics, a method we call Electronic Tumor Associated Collagen Signatures (eTACS). This group of tools first segments regions of epithelial cells and collagen fibers from BF and SHG images respectively. We then associate fibers with groups of epithelial cells and finally compute features based on the angle of interaction and density of the collagen surrounding the epithelial cell clusters. These features are then processed with a support vector machine to separate cancer patients into high and low risk groups. Results: We validated our model by showing that eTACS produces classifications that have statistically significant correlation with manual classifications. In addition, our system generated classification scores that accurately predicted breast cancer patient survival in a cohort of 196 patients. Feature rank analysis revealed that TACS positive fibers are more well aligned with each other, generally lower density, and terminate within or near groups of epithelial cells. Conclusion: We are working to apply our model to predict survival in larger cohorts of breast cancer patients with a diversity of breast cancer types, predict response to treatments such as COX2 inhibitors, and to study collagen architecture changes in other cancer types. In the future, our system may be used to provide metastatic potential information to cancer patients to augment existing clinical assays

  14. Graph-Based Semi-Supervised Learning for Indoor Localization Using Crowdsourced Data

    Directory of Open Access Journals (Sweden)

    Liye Zhang

    2017-04-01

    Full Text Available Indoor positioning based on the received signal strength (RSS of the WiFi signal has become the most popular solution for indoor localization. In order to realize the rapid deployment of indoor localization systems, solutions based on crowdsourcing have been proposed. However, compared to conventional methods, lots of different devices are used in crowdsourcing system and less RSS values are collected by each device. Therefore, the crowdsourced RSS values are more erroneous and can result in significant localization errors. In order to eliminate the signal strength variations across diverse devices, the Linear Regression (LR algorithm is proposed to solve the device diversity problem in crowdsourcing system. After obtaining the uniform RSS values, a graph-based semi-supervised learning (G-SSL method is used to exploit the correlation between the RSS values at nearby locations to estimate an optimal RSS value at each location. As a result, the negative effect of the erroneous measurements could be mitigated. Since the AP locations need to be known in G-SSL algorithm, the Compressed Sensing (CS method is applied to precisely estimate the location of the APs. Based on the location of the APs and a simple signal propagation model, the RSS difference between different locations is calculated and used as an additional constraint to improve the performance of G-SSL. Furthermore, to exploit the sparsity of the weights used in the G-SSL, we use the CS method to reconstruct these weights more accurately and make a further improvement on the performance of the G-SSL. Experimental results show improved results in terms of the smoothness of the radio map and the localization accuracy.

  15. Neuroanatomical heterogeneity of schizophrenia revealed by semi-supervised machine learning methods.

    Science.gov (United States)

    Honnorat, Nicolas; Dong, Aoyan; Meisenzahl-Lechner, Eva; Koutsouleris, Nikolaos; Davatzikos, Christos

    2017-12-20

    Schizophrenia is associated with heterogeneous clinical symptoms and neuroanatomical alterations. In this work, we aim to disentangle the patterns of neuroanatomical alterations underlying a heterogeneous population of patients using a semi-supervised clustering method. We apply this strategy to a cohort of patients with schizophrenia of varying extends of disease duration, and we describe the neuroanatomical, demographic and clinical characteristics of the subtypes discovered. We analyze the neuroanatomical heterogeneity of 157 patients diagnosed with Schizophrenia, relative to a control population of 169 subjects, using a machine learning method called CHIMERA. CHIMERA clusters the differences between patients and a demographically-matched population of healthy subjects, rather than clustering patients themselves, thereby specifically assessing disease-related neuroanatomical alterations. Voxel-Based Morphometry was conducted to visualize the neuroanatomical patterns associated with each group. The clinical presentation and the demographics of the groups were then investigated. Three subgroups were identified. The first two differed substantially, in that one involved predominantly temporal-thalamic-peri-Sylvian regions, whereas the other involved predominantly frontal regions and the thalamus. Both subtypes included primarily male patients. The third pattern was a mix of these two and presented milder neuroanatomic alterations and comprised a comparable number of men and women. VBM and statistical analyses suggest that these groups could correspond to different neuroanatomical dimensions of schizophrenia. Our analysis suggests that schizophrenia presents distinct neuroanatomical variants. This variability points to the need for a dimensional neuroanatomical approach using data-driven, mathematically principled multivariate pattern analysis methods, and should be taken into account in clinical studies. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Supervised learning methods for pathological arterial pulse wave differentiation: A SVM and neural networks approach.

    Science.gov (United States)

    Paiva, Joana S; Cardoso, João; Pereira, Tânia

    2018-01-01

    The main goal of this study was to develop an automatic method based on supervised learning methods, able to distinguish healthy from pathologic arterial pulse wave (APW), and those two from noisy waveforms (non-relevant segments of the signal), from the data acquired during a clinical examination with a novel optical system. The APW dataset analysed was composed by signals acquired in a clinical environment from a total of 213 subjects, including healthy volunteers and non-healthy patients. The signals were parameterised by means of 39pulse features: morphologic, time domain statistics, cross-correlation features, wavelet features. Multiclass Support Vector Machine Recursive Feature Elimination (SVM RFE) method was used to select the most relevant features. A comparative study was performed in order to evaluate the performance of the two classifiers: Support Vector Machine (SVM) and Artificial Neural Network (ANN). SVM achieved a statistically significant better performance for this problem with an average accuracy of 0.9917±0.0024 and a F-Measure of 0.9925±0.0019, in comparison with ANN, which reached the values of 0.9847±0.0032 and 0.9852±0.0031 for Accuracy and F-Measure, respectively. A significant difference was observed between the performances obtained with SVM classifier using a different number of features from the original set available. The comparison between SVM and NN allowed reassert the higher performance of SVM. The results obtained in this study showed the potential of the proposed method to differentiate those three important signal outcomes (healthy, pathologic and noise) and to reduce bias associated with clinical diagnosis of cardiovascular disease using APW. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Protein complex detection in PPI networks based on data integration and supervised learning method.

    Science.gov (United States)

    Yu, Feng; Yang, Zhi; Hu, Xiao; Sun, Yuan; Lin, Hong; Wang, Jian

    2015-01-01

    Revealing protein complexes are important for understanding principles of cellular organization and function. High-throughput experimental techniques have produced a large amount of protein interactions, which makes it possible to predict protein complexes from protein-protein interaction (PPI) networks. However, the small amount of known physical interactions may limit protein complex detection. The new PPI networks are constructed by integrating PPI datasets with the large and readily available PPI data from biomedical literature, and then the less reliable PPI between two proteins are filtered out based on semantic similarity and topological similarity of the two proteins. Finally, the supervised learning protein complex detection (SLPC), which can make full use of the information of available known complexes, is applied to detect protein complex on the new PPI networks. The experimental results of SLPC on two different categories yeast PPI networks demonstrate effectiveness of the approach: compared with the original PPI networks, the best average improvements of 4.76, 6.81 and 15.75 percentage units in the F-score, accuracy and maximum matching ratio (MMR) are achieved respectively; compared with the denoising PPI networks, the best average improvements of 3.91, 4.61 and 12.10 percentage units in the F-score, accuracy and MMR are achieved respectively; compared with ClusterONE, the start-of the-art complex detection method, on the denoising extended PPI networks, the average improvements of 26.02 and 22.40 percentage units in the F-score and MMR are achieved respectively. The experimental results show that the performances of SLPC have a large improvement through integration of new receivable PPI data from biomedical literature into original PPI networks and denoising PPI networks. In addition, our protein complexes detection method can achieve better performance than ClusterONE.

  18. SU-E-J-107: Supervised Learning Model of Aligned Collagen for Human Breast Carcinoma Prognosis

    Energy Technology Data Exchange (ETDEWEB)

    Bredfeldt, J; Liu, Y; Conklin, M; Keely, P; Eliceiri, K; Mackie, T [University of Wisconsin, Madison, WI (United States)

    2014-06-01

    Purpose: Our goal is to develop and apply a set of optical and computational tools to enable large-scale investigations of the interaction between collagen and tumor cells. Methods: We have built a novel imaging system for automating the capture of whole-slide second harmonic generation (SHG) images of collagen in registry with bright field (BF) images of hematoxylin and eosin stained tissue. To analyze our images, we have integrated a suite of supervised learning tools that semi-automatically model and score collagen interactions with tumor cells via a variety of metrics, a method we call Electronic Tumor Associated Collagen Signatures (eTACS). This group of tools first segments regions of epithelial cells and collagen fibers from BF and SHG images respectively. We then associate fibers with groups of epithelial cells and finally compute features based on the angle of interaction and density of the collagen surrounding the epithelial cell clusters. These features are then processed with a support vector machine to separate cancer patients into high and low risk groups. Results: We validated our model by showing that eTACS produces classifications that have statistically significant correlation with manual classifications. In addition, our system generated classification scores that accurately predicted breast cancer patient survival in a cohort of 196 patients. Feature rank analysis revealed that TACS positive fibers are more well aligned with each other, generally lower density, and terminate within or near groups of epithelial cells. Conclusion: We are working to apply our model to predict survival in larger cohorts of breast cancer patients with a diversity of breast cancer types, predict response to treatments such as COX2 inhibitors, and to study collagen architecture changes in other cancer types. In the future, our system may be used to provide metastatic potential information to cancer patients to augment existing clinical assays.

  19. Seeing It All: Evaluating Supervised Machine Learning Methods for the Classification of Diverse Otariid Behaviours.

    Directory of Open Access Journals (Sweden)

    Monique A Ladds

    Full Text Available Constructing activity budgets for marine animals when they are at sea and cannot be directly observed is challenging, but recent advances in bio-logging technology offer solutions to this problem. Accelerometers can potentially identify a wide range of behaviours for animals based on unique patterns of acceleration. However, when analysing data derived from accelerometers, there are many statistical techniques available which when applied to different data sets produce different classification accuracies. We investigated a selection of supervised machine learning methods for interpreting behavioural data from captive otariids (fur seals and sea lions. We conducted controlled experiments with 12 seals, where their behaviours were filmed while they were wearing 3-axis accelerometers. From video we identified 26 behaviours that could be grouped into one of four categories (foraging, resting, travelling and grooming representing key behaviour states for wild seals. We used data from 10 seals to train four predictive classification models: stochastic gradient boosting (GBM, random forests, support vector machine using four different kernels and a baseline model: penalised logistic regression. We then took the best parameters from each model and cross-validated the results on the two seals unseen so far. We also investigated the influence of feature statistics (describing some characteristic of the seal, testing the models both with and without these. Cross-validation accuracies were lower than training accuracy, but the SVM with a polynomial kernel was still able to classify seal behaviour with high accuracy (>70%. Adding feature statistics improved accuracies across all models tested. Most categories of behaviour -resting, grooming and feeding-were all predicted with reasonable accuracy (52-81% by the SVM while travelling was poorly categorised (31-41%. These results show that model selection is important when classifying behaviour and that by using

  20. TF.Learn: TensorFlow's High-level Module for Distributed Machine Learning

    OpenAIRE

    Tang, Yuan

    2016-01-01

    TF.Learn is a high-level Python module for distributed machine learning inside TensorFlow. It provides an easy-to-use Scikit-learn style interface to simplify the process of creating, configuring, training, evaluating, and experimenting a machine learning model. TF.Learn integrates a wide range of state-of-art machine learning algorithms built on top of TensorFlow's low level APIs for small to large-scale supervised and unsupervised problems. This module focuses on bringing machine learning t...

  1. Kollegial supervision

    DEFF Research Database (Denmark)

    Andersen, Ole Dibbern; Petersson, Erling

    Publikationen belyser, hvordan kollegial supervision i en kan organiseres i en uddannelsesinstitution......Publikationen belyser, hvordan kollegial supervision i en kan organiseres i en uddannelsesinstitution...

  2. Learning a Markov Logic network for supervised gene regulatory network inference.

    Science.gov (United States)

    Brouard, Céline; Vrain, Christel; Dubois, Julie; Castel, David; Debily, Marie-Anne; d'Alché-Buc, Florence

    2013-09-12

    Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules. We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate "regulates", starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a

  3. Alzheimer's Disease Early Diagnosis Using Manifold-Based Semi-Supervised Learning.

    Science.gov (United States)

    Khajehnejad, Moein; Saatlou, Forough Habibollahi; Mohammadzade, Hoda

    2017-08-20

    Alzheimer's disease (AD) is currently ranked as the sixth leading cause of death in the United States and recent estimates indicate that the disorder may rank third, just behind heart disease and cancer, as a cause of death for older people. Clearly, predicting this disease in the early stages and preventing it from progressing is of great importance. The diagnosis of Alzheimer's disease (AD) requires a variety of medical tests, which leads to huge amounts of multivariate heterogeneous data. It can be difficult and exhausting to manually compare, visualize, and analyze this data due to the heterogeneous nature of medical tests; therefore, an efficient approach for accurate prediction of the condition of the brain through the classification of magnetic resonance imaging (MRI) images is greatly beneficial and yet very challenging. In this paper, a novel approach is proposed for the diagnosis of very early stages of AD through an efficient classification of brain MRI images, which uses label propagation in a manifold-based semi-supervised learning framework. We first apply voxel morphometry analysis to extract some of the most critical AD-related features of brain images from the original MRI volumes and also gray matter (GM) segmentation volumes. The features must capture the most discriminative properties that vary between a healthy and Alzheimer-affected brain. Next, we perform a principal component analysis (PCA)-based dimension reduction on the extracted features for faster yet sufficiently accurate analysis. To make the best use of the captured features, we present a hybrid manifold learning framework which embeds the feature vectors in a subspace. Next, using a small set of labeled training data, we apply a label propagation method in the created manifold space to predict the labels of the remaining images and classify them in the two groups of mild Alzheimer's and normal condition (MCI/NC). The accuracy of the classification using the proposed method is 93

  4. Value of supervised learning events in predicting doctors in difficulty.

    Science.gov (United States)

    Patel, Mumtaz; Agius, Steven; Wilkinson, Jack; Patel, Leena; Baker, Paul

    2016-07-01

    In the UK, supervised learning events (SLE) replaced traditional workplace-based assessments for foundation-year trainees in 2012. A key element of SLEs was to incorporate trainee reflection and assessor feedback in order to drive learning and identify training issues early. Few studies, however, have investigated the value of SLEs in predicting doctors in difficulty. This study aimed to identify principles that would inform understanding about how and why SLEs work or not in identifying doctors in difficulty (DiD). A retrospective case-control study of North West Foundation School trainees' electronic portfolios was conducted. Cases comprised all known DiD. Controls were randomly selected from the same cohort. Free-text supervisor comments from each SLE were assessed for the four domains defined in the General Medical Council's Good Medical Practice Guidelines and each scored blindly for level of concern using a three-point ordinal scale. Cumulative scores for each SLE were then analysed quantitatively for their predictive value of actual DiD. A qualitative thematic analysis was also conducted. The prevalence of DiD in this sample was 6.5%. Receiver operator characteristic curve analysis showed that Team Assessment of Behaviour (TAB) was the only SLE strongly predictive of actual DiD status. The Educational Supervisor Report (ESR) was also strongly predictive of DiD status. Fisher's test showed significant associations of TAB and ESR for both predicted and actual DiD status and also the health and performance subtypes. None of the other SLEs showed significant associations. Qualitative data analysis revealed inadequate completion and lack of constructive, particularly negative, feedback. This indicated that SLEs were not used to their full potential. TAB and the ESR are strongly predictive of DiD. However, SLEs are not being used to their full potential, and the quality of completion of reports on SLEs and feedback needs to be improved in order to better identify

  5. SUSTAIN: a network model of category learning.

    Science.gov (United States)

    Love, Bradley C; Medin, Douglas L; Gureckis, Todd M

    2004-04-01

    SUSTAIN (Supervised and Unsupervised STratified Adaptive Incremental Network) is a model of how humans learn categories from examples. SUSTAIN initially assumes a simple category structure. If simple solutions prove inadequate and SUSTAIN is confronted with a surprising event (e.g., it is told that a bat is a mammal instead of a bird), SUSTAIN recruits an additional cluster to represent the surprising event. Newly recruited clusters are available to explain future events and can themselves evolve into prototypes-attractors-rules. SUSTAIN's discovery of category substructure is affected not only by the structure of the world but by the nature of the learning task and the learner's goals. SUSTAIN successfully extends category learning models to studies of inference learning, unsupervised learning, category construction, and contexts in which identification learning is faster than classification learning.

  6. Collective academic supervision

    DEFF Research Database (Denmark)

    Nordentoft, Helle Merete; Thomsen, Rie; Wichmann-Hansen, Gitte

    2013-01-01

    Supervision of students is a core activity in higher education. Previous research on student supervision in higher education focus on individual and relational aspects in the supervisory relationship rather than collective, pedagogical and methodical aspects of the planning of the supervision...... process. This article fills these gaps by discussing potentials and challenges in “Collective Academic Supervision”, a model for supervision at the Master of Education in Guidance at Aarhus University in Denmark. The pedagogical rationale behind the model is that students’ participation and learning...

  7. Unsupervised Classification Using Immune Algorithm

    OpenAIRE

    Al-Muallim, M. T.; El-Kouatly, R.

    2012-01-01

    Unsupervised classification algorithm based on clonal selection principle named Unsupervised Clonal Selection Classification (UCSC) is proposed in this paper. The new proposed algorithm is data driven and self-adaptive, it adjusts its parameters to the data to make the classification operation as fast as possible. The performance of UCSC is evaluated by comparing it with the well known K-means algorithm using several artificial and real-life data sets. The experiments show that the proposed U...

  8. SU-F-R-08: Can Normalization of Brain MRI Texture Features Reduce Scanner-Dependent Effects in Unsupervised Machine Learning?

    Energy Technology Data Exchange (ETDEWEB)

    Ogden, K; O’Dwyer, R [SUNY Upstate Medical University, Syracuse, NY (United States); Bradford, T [Syracuse University, Syracuse, NY (United States); Cussen, L [Rochester Institute of Technology, Rochester, NY (United States)

    2016-06-15

    Purpose: To reduce differences in features calculated from MRI brain scans acquired at different field strengths with or without Gadolinium contrast. Methods: Brain scans were processed for 111 epilepsy patients to extract hippocampus and thalamus features. Scans were acquired on 1.5 T scanners with Gadolinium contrast (group A), 1.5T scanners without Gd (group B), and 3.0 T scanners without Gd (group C). A total of 72 features were extracted. Features were extracted from original scans and from scans where the image pixel values were rescaled to the mean of the hippocampi and thalami values. For each data set, cluster analysis was performed on the raw feature set and for feature sets with normalization (conversion to Z scores). Two methods of normalization were used: The first was over all values of a given feature, and the second by normalizing within the patient group membership. The clustering software was configured to produce 3 clusters. Group fractions in each cluster were calculated. Results: For features calculated from both the non-rescaled and rescaled data, cluster membership was identical for both the non-normalized and normalized data sets. Cluster 1 was comprised entirely of Group A data, Cluster 2 contained data from all three groups, and Cluster 3 contained data from only groups 1 and 2. For the categorically normalized data sets there was a more uniform distribution of group data in the three Clusters. A less pronounced effect was seen in the rescaled image data features. Conclusion: Image Rescaling and feature renormalization can have a significant effect on the results of clustering analysis. These effects are also likely to influence the results of supervised machine learning algorithms. It may be possible to partly remove the influence of scanner field strength and the presence of Gadolinium based contrast in feature extraction for radiomics applications.

  9. SU-F-R-08: Can Normalization of Brain MRI Texture Features Reduce Scanner-Dependent Effects in Unsupervised Machine Learning?

    International Nuclear Information System (INIS)

    Ogden, K; O’Dwyer, R; Bradford, T; Cussen, L

    2016-01-01

    Purpose: To reduce differences in features calculated from MRI brain scans acquired at different field strengths with or without Gadolinium contrast. Methods: Brain scans were processed for 111 epilepsy patients to extract hippocampus and thalamus features. Scans were acquired on 1.5 T scanners with Gadolinium contrast (group A), 1.5T scanners without Gd (group B), and 3.0 T scanners without Gd (group C). A total of 72 features were extracted. Features were extracted from original scans and from scans where the image pixel values were rescaled to the mean of the hippocampi and thalami values. For each data set, cluster analysis was performed on the raw feature set and for feature sets with normalization (conversion to Z scores). Two methods of normalization were used: The first was over all values of a given feature, and the second by normalizing within the patient group membership. The clustering software was configured to produce 3 clusters. Group fractions in each cluster were calculated. Results: For features calculated from both the non-rescaled and rescaled data, cluster membership was identical for both the non-normalized and normalized data sets. Cluster 1 was comprised entirely of Group A data, Cluster 2 contained data from all three groups, and Cluster 3 contained data from only groups 1 and 2. For the categorically normalized data sets there was a more uniform distribution of group data in the three Clusters. A less pronounced effect was seen in the rescaled image data features. Conclusion: Image Rescaling and feature renormalization can have a significant effect on the results of clustering analysis. These effects are also likely to influence the results of supervised machine learning algorithms. It may be possible to partly remove the influence of scanner field strength and the presence of Gadolinium based contrast in feature extraction for radiomics applications.

  10. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    Science.gov (United States)

    Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Benien, Parul; Ozcan, Aydogan

    2017-06-01

    Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of 0.8 cm2 and weighs only 180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging) approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond water) and achieved a

  11. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    KAUST Repository

    Ceylan Koydemir, Hatice

    2017-06-14

    Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of ~0.8 cm2 and weighs only ~180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging) approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond water) and achieved

  12. Prediction of interactions between viral and host proteins using supervised machine learning methods.

    Directory of Open Access Journals (Sweden)

    Ranjan Kumar Barman

    Full Text Available BACKGROUND: Viral-host protein-protein interaction plays a vital role in pathogenesis, since it defines viral infection of the host and regulation of the host proteins. Identification of key viral-host protein-protein interactions (PPIs has great implication for therapeutics. METHODS: In this study, a systematic attempt has been made to predict viral-host PPIs by integrating different features, including domain-domain association, network topology and sequence information using viral-host PPIs from VirusMINT. The three well-known supervised machine learning methods, such as SVM, Naïve Bayes and Random Forest, which are commonly used in the prediction of PPIs, were employed to evaluate the performance measure based on five-fold cross validation techniques. RESULTS: Out of 44 descriptors, best features were found to be domain-domain association and methionine, serine and valine amino acid composition of viral proteins. In this study, SVM-based method achieved better sensitivity of 67% over Naïve Bayes (37.49% and Random Forest (55.66%. However the specificity of Naïve Bayes was the highest (99.52% as compared with SVM (74% and Random Forest (89.08%. Overall, the SVM and Random Forest achieved accuracy of 71% and 72.41%, respectively. The proposed SVM-based method was evaluated on blind dataset and attained a sensitivity of 64%, specificity of 83%, and accuracy of 74%. In addition, unknown potential targets of hepatitis B virus-human and hepatitis E virus-human PPIs have been predicted through proposed SVM model and validated by gene ontology enrichment analysis. Our proposed model shows that, hepatitis B virus "C protein" binds to membrane docking protein, while "X protein" and "P protein" interacts with cell-killing and metabolic process proteins, respectively. CONCLUSION: The proposed method can predict large scale interspecies viral-human PPIs. The nature and function of unknown viral proteins (HBV and HEV, interacting partners of host

  13. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    Directory of Open Access Journals (Sweden)

    Ceylan Koydemir Hatice

    2017-06-01

    Full Text Available Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of ~0.8 cm2 and weighs only ~180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond

  14. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    KAUST Repository

    Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Benien, Parul; Ozcan, Aydogan

    2017-01-01

    Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of ~0.8 cm2 and weighs only ~180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging) approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond water) and achieved

  15. Supervised machine learning techniques to predict binding affinity. A study for cyclin-dependent kinase 2.

    Science.gov (United States)

    de Ávila, Maurício Boff; Xavier, Mariana Morrone; Pintro, Val Oliveira; de Azevedo, Walter Filgueira

    2017-12-09

    Here we report the development of a machine-learning model to predict binding affinity based on the crystallographic structures of protein-ligand complexes. We used an ensemble of crystallographic structures (resolution better than 1.5 Å resolution) for which half-maximal inhibitory concentration (IC 50 ) data is available. Polynomial scoring functions were built using as explanatory variables the energy terms present in the MolDock and PLANTS scoring functions. Prediction performance was tested and the supervised machine learning models showed improvement in the prediction power, when compared with PLANTS and MolDock scoring functions. In addition, the machine-learning model was applied to predict binding affinity of CDK2, which showed a better performance when compared with AutoDock4, AutoDock Vina, MolDock, and PLANTS scores. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. Information theoretic learning Renyi's entropy and Kernel perspectives

    CERN Document Server

    Principe, Jose C

    2010-01-01

    This book presents the first cohesive treatment of Information Theoretic Learning (ITL) algorithms to adapt linear or nonlinear learning machines both in supervised or unsupervised paradigms. ITL is a framework where the conventional concepts of second order statistics (covariance, L2 distances, correlation functions) are substituted by scalars and functions with information theoretic underpinnings, respectively entropy, mutual information and correntropy. ITL quantifies the stochastic structure of the data beyond second order statistics for improved performance without using full-blown Bayesi

  17. SSEL-ADE: A semi-supervised ensemble learning framework for extracting adverse drug events from social media.

    Science.gov (United States)

    Liu, Jing; Zhao, Songzheng; Wang, Gang

    2018-01-01

    With the development of Web 2.0 technology, social media websites have become lucrative but under-explored data sources for extracting adverse drug events (ADEs), which is a serious health problem. Besides ADE, other semantic relation types (e.g., drug indication and beneficial effect) could hold between the drug and adverse event mentions, making ADE relation extraction - distinguishing ADE relationship from other relation types - necessary. However, conducting ADE relation extraction in social media environment is not a trivial task because of the expertise-dependent, time-consuming and costly annotation process, and the feature space's high-dimensionality attributed to intrinsic characteristics of social media data. This study aims to develop a framework for ADE relation extraction using patient-generated content in social media with better performance than that delivered by previous efforts. To achieve the objective, a general semi-supervised ensemble learning framework, SSEL-ADE, was developed. The framework exploited various lexical, semantic, and syntactic features, and integrated ensemble learning and semi-supervised learning. A series of experiments were conducted to verify the effectiveness of the proposed framework. Empirical results demonstrate the effectiveness of each component of SSEL-ADE and reveal that our proposed framework outperforms most of existing ADE relation extraction methods The SSEL-ADE can facilitate enhanced ADE relation extraction performance, thereby providing more reliable support for pharmacovigilance. Moreover, the proposed semi-supervised ensemble methods have the potential of being applied to effectively deal with other social media-based problems. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. A Comparison of Supervised Machine Learning Algorithms and Feature Vectors for MS Lesion Segmentation Using Multimodal Structural MRI

    Science.gov (United States)

    Sweeney, Elizabeth M.; Vogelstein, Joshua T.; Cuzzocreo, Jennifer L.; Calabresi, Peter A.; Reich, Daniel S.; Crainiceanu, Ciprian M.; Shinohara, Russell T.

    2014-01-01

    Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance. PMID:24781953

  19. Fieldwork online: a GIS-based electronic learning environment for supervising fieldwork

    NARCIS (Netherlands)

    Alberti, K.; Marra, W.A.; Baarsma, R.J.; Karssenberg, D.J.

    2016-01-01

    Fieldwork comes in many forms: individual research projects in unique places, large groups of students on organized fieldtrips, and everything in between those extremes. Supervising students in often distant places can be a logistical challenge and requires a significant time investment of their

  20. Don't Leave Teaching to Chance: Learning Objectives for Psychodynamic Psychotherapy Supervision

    Science.gov (United States)

    Rojas, Alicia; Arbuckle, Melissa; Cabaniss, Deborah

    2010-01-01

    Objective: The way in which the competencies for psychodynamic psychotherapy specified by the Psychiatry Residency Review Committee of the Accreditation Council for Graduate Medical Education translate into the day-to-day work of individual supervision remains unstudied and unspecified. The authors hypothesized that despite the existence of…

  1. Constrained parameter estimation for semi-supervised learning : The case of the nearest mean classifier

    NARCIS (Netherlands)

    Loog, M.

    2011-01-01

    A rather simple semi-supervised version of the equally simple nearest mean classifier is presented. However simple, the proposed approach is of practical interest as the nearest mean classifier remains a relevant tool in biomedical applications or other areas dealing with relatively high-dimensional

  2. Linear time relational prototype based learning.

    Science.gov (United States)

    Gisbrecht, Andrej; Mokbel, Bassam; Schleif, Frank-Michael; Zhu, Xibin; Hammer, Barbara

    2012-10-01

    Prototype based learning offers an intuitive interface to inspect large quantities of electronic data in supervised or unsupervised settings. Recently, many techniques have been extended to data described by general dissimilarities rather than Euclidean vectors, so-called relational data settings. Unlike the Euclidean counterparts, the techniques have quadratic time complexity due to the underlying quadratic dissimilarity matrix. Thus, they are infeasible already for medium sized data sets. The contribution of this article is twofold: On the one hand we propose a novel supervised prototype based classification technique for dissimilarity data based on popular learning vector quantization (LVQ), on the other hand we transfer a linear time approximation technique, the Nyström approximation, to this algorithm and an unsupervised counterpart, the relational generative topographic mapping (GTM). This way, linear time and space methods result. We evaluate the techniques on three examples from the biomedical domain.

  3. An Early Historical Examination of the Educational Intent of Supervised Agricultural Experiences (SAEs) and Project-Based Learning in Agricultural Education

    Science.gov (United States)

    Smith, Kasee L.; Rayfield, John

    2016-01-01

    Project-based learning has been a component of agricultural education since its inception. In light of the current call for additional emphasis of the Supervised Agricultural Experience (SAE) component of agricultural education, there is a need to revisit the roots of project-based learning. This early historical research study was conducted to…

  4. Per-service supervised learning for identifying desired WoT apps from user requests in natural language.

    Directory of Open Access Journals (Sweden)

    Young Yoon

    Full Text Available Web of Things (WoT platforms are growing fast so as the needs for composing WoT apps more easily and efficiently. We have recently commenced the campaign to develop an interface where users can issue requests for WoT apps entirely in natural language. This requires an effort to build a system that can learn to identify relevant WoT functions that fulfill user's requests. In our preceding work, we trained a supervised learning system with thousands of publicly-available IFTTT app recipes based on conditional random fields (CRF. However, the sub-par accuracy and excessive training time motivated us to devise a better approach. In this paper, we present a novel solution that creates a separate learning engine for each trigger service. With this approach, parallel and incremental learning becomes possible. For inference, our system first identifies the most relevant trigger service for a given user request by using an information retrieval technique. Then, the learning engine associated with the trigger service predicts the most likely pair of trigger and action functions. We expect that such two-phase inference method given parallel learning engines would improve the accuracy of identifying related WoT functions. We verify our new solution through the empirical evaluation with training and test sets sampled from a pool of refined IFTTT app recipes. We also meticulously analyze the characteristics of the recipes to find future research directions.

  5. Per-service supervised learning for identifying desired WoT apps from user requests in natural language.

    Science.gov (United States)

    Yoon, Young

    2017-01-01

    Web of Things (WoT) platforms are growing fast so as the needs for composing WoT apps more easily and efficiently. We have recently commenced the campaign to develop an interface where users can issue requests for WoT apps entirely in natural language. This requires an effort to build a system that can learn to identify relevant WoT functions that fulfill user's requests. In our preceding work, we trained a supervised learning system with thousands of publicly-available IFTTT app recipes based on conditional random fields (CRF). However, the sub-par accuracy and excessive training time motivated us to devise a better approach. In this paper, we present a novel solution that creates a separate learning engine for each trigger service. With this approach, parallel and incremental learning becomes possible. For inference, our system first identifies the most relevant trigger service for a given user request by using an information retrieval technique. Then, the learning engine associated with the trigger service predicts the most likely pair of trigger and action functions. We expect that such two-phase inference method given parallel learning engines would improve the accuracy of identifying related WoT functions. We verify our new solution through the empirical evaluation with training and test sets sampled from a pool of refined IFTTT app recipes. We also meticulously analyze the characteristics of the recipes to find future research directions.

  6. Nepalese undergraduate nursing students' perceptions of the clinical learning environment, supervision and nurse teachers: A questionnaire survey.

    Science.gov (United States)

    Nepal, Bijeta; Taketomi, Kikuko; Ito, Yoichi M; Kohanawa, Masashi; Kawabata, Hidenobu; Tanaka, Michiko; Otaki, Junji

    2016-04-01

    Clinical practice enables nursing students to acquire essential professional skills, but little is known about nursing students' perceptions of the clinical learning environment (CLE) in Nepal. To examine Nepalese nursing students' perceptions regarding the CLE and supervision. A cross-sectional questionnaire design was used. Government and private hospitals in Nepal where the undergraduate nursing college students undertook their clinical practice. Students with clinical practice experience were recruited from years 2-4 of the B.Sc. nursing program in Nepal (n=350). The final sample comprised 263 students. A self-administered questionnaire including demographic characteristics, latest clinical practice site, and general satisfaction was administered February-March 2014. The previously validated Clinical Learning Environment, Supervision and Nurse Teacher evaluation scale was used in the questionnaire. The analytical approach used exploratory factor analysis, assessments of the scale and sub-dimension reliability, correlations of factors between scale sub-dimensions, and multiple regression analysis. Students' practicum satisfaction level at government hospitals was significantly higher than those at private hospitals (prelationship between satisfaction and pedagogical atmosphere (ppedagogical atmosphere. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Development of a Late-Life Dementia Prediction Index with Supervised Machine Learning in the Population-Based CAIDE Study.

    Science.gov (United States)

    Pekkala, Timo; Hall, Anette; Lötjönen, Jyrki; Mattila, Jussi; Soininen, Hilkka; Ngandu, Tiia; Laatikainen, Tiina; Kivipelto, Miia; Solomon, Alina

    2017-01-01

    This study aimed to develop a late-life dementia prediction model using a novel validated supervised machine learning method, the Disease State Index (DSI), in the Finnish population-based CAIDE study. The CAIDE study was based on previous population-based midlife surveys. CAIDE participants were re-examined twice in late-life, and the first late-life re-examination was used as baseline for the present study. The main study population included 709 cognitively normal subjects at first re-examination who returned to the second re-examination up to 10 years later (incident dementia n = 39). An extended population (n = 1009, incident dementia 151) included non-participants/non-survivors (national registers data). DSI was used to develop a dementia index based on first re-examination assessments. Performance in predicting dementia was assessed as area under the ROC curve (AUC). AUCs for DSI were 0.79 and 0.75 for main and extended populations. Included predictors were cognition, vascular factors, age, subjective memory complaints, and APOE genotype. The supervised machine learning method performed well in identifying comprehensive profiles for predicting dementia development up to 10 years later. DSI could thus be useful for identifying individuals who are most at risk and may benefit from dementia prevention interventions.

  8. Cavity contour segmentation in chest radiographs using supervised learning and dynamic programming

    Energy Technology Data Exchange (ETDEWEB)

    Maduskar, Pragnya, E-mail: pragnya.maduskar@radboudumc.nl; Hogeweg, Laurens; Sánchez, Clara I.; Ginneken, Bram van [Diagnostic Image Analysis Group, Radboud University Medical Center, Nijmegen, 6525 GA (Netherlands); Jong, Pim A. de [Department of Radiology, University Medical Center Utrecht, 3584 CX (Netherlands); Peters-Bax, Liesbeth [Department of Radiology, Radboud University Medical Center, Nijmegen, 6525 GA (Netherlands); Dawson, Rodney [University of Cape Town Lung Institute, Cape Town 7700 (South Africa); Ayles, Helen [Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London WC1E 7HT (United Kingdom)

    2014-07-15

    Purpose: Efficacy of tuberculosis (TB) treatment is often monitored using chest radiography. Monitoring size of cavities in pulmonary tuberculosis is important as the size predicts severity of the disease and its persistence under therapy predicts relapse. The authors present a method for automatic cavity segmentation in chest radiographs. Methods: A two stage method is proposed to segment the cavity borders, given a user defined seed point close to the center of the cavity. First, a supervised learning approach is employed to train a pixel classifier using texture and radial features to identify the border pixels of the cavity. A likelihood value of belonging to the cavity border is assigned to each pixel by the classifier. The authors experimented with four different classifiers:k-nearest neighbor (kNN), linear discriminant analysis (LDA), GentleBoost (GB), and random forest (RF). Next, the constructed likelihood map was used as an input cost image in the polar transformed image space for dynamic programming to trace the optimal maximum cost path. This constructed path corresponds to the segmented cavity contour in image space. Results: The method was evaluated on 100 chest radiographs (CXRs) containing 126 cavities. The reference segmentation was manually delineated by an experienced chest radiologist. An independent observer (a chest radiologist) also delineated all cavities to estimate interobserver variability. Jaccard overlap measure Ω was computed between the reference segmentation and the automatic segmentation; and between the reference segmentation and the independent observer's segmentation for all cavities. A median overlap Ω of 0.81 (0.76 ± 0.16), and 0.85 (0.82 ± 0.11) was achieved between the reference segmentation and the automatic segmentation, and between the segmentations by the two radiologists, respectively. The best reported mean contour distance and Hausdorff distance between the reference and the automatic segmentation were

  9. Cavity contour segmentation in chest radiographs using supervised learning and dynamic programming

    International Nuclear Information System (INIS)

    Maduskar, Pragnya; Hogeweg, Laurens; Sánchez, Clara I.; Ginneken, Bram van; Jong, Pim A. de; Peters-Bax, Liesbeth; Dawson, Rodney; Ayles, Helen

    2014-01-01

    Purpose: Efficacy of tuberculosis (TB) treatment is often monitored using chest radiography. Monitoring size of cavities in pulmonary tuberculosis is important as the size predicts severity of the disease and its persistence under therapy predicts relapse. The authors present a method for automatic cavity segmentation in chest radiographs. Methods: A two stage method is proposed to segment the cavity borders, given a user defined seed point close to the center of the cavity. First, a supervised learning approach is employed to train a pixel classifier using texture and radial features to identify the border pixels of the cavity. A likelihood value of belonging to the cavity border is assigned to each pixel by the classifier. The authors experimented with four different classifiers:k-nearest neighbor (kNN), linear discriminant analysis (LDA), GentleBoost (GB), and random forest (RF). Next, the constructed likelihood map was used as an input cost image in the polar transformed image space for dynamic programming to trace the optimal maximum cost path. This constructed path corresponds to the segmented cavity contour in image space. Results: The method was evaluated on 100 chest radiographs (CXRs) containing 126 cavities. The reference segmentation was manually delineated by an experienced chest radiologist. An independent observer (a chest radiologist) also delineated all cavities to estimate interobserver variability. Jaccard overlap measure Ω was computed between the reference segmentation and the automatic segmentation; and between the reference segmentation and the independent observer's segmentation for all cavities. A median overlap Ω of 0.81 (0.76 ± 0.16), and 0.85 (0.82 ± 0.11) was achieved between the reference segmentation and the automatic segmentation, and between the segmentations by the two radiologists, respectively. The best reported mean contour distance and Hausdorff distance between the reference and the automatic segmentation were

  10. Poster abstract: Water level estimation in urban ultrasonic/passive infrared flash flood sensor networks using supervised learning

    KAUST Repository

    Mousa, Mustafa

    2014-04-01

    This article describes a machine learning approach to water level estimation in a dual ultrasonic/passive infrared urban flood sensor system. We first show that an ultrasonic rangefinder alone is unable to accurately measure the level of water on a road due to thermal effects. Using additional passive infrared sensors, we show that ground temperature and local sensor temperature measurements are sufficient to correct the rangefinder readings and improve the flood detection performance. Since floods occur very rarely, we use a supervised learning approach to estimate the correction to the ultrasonic rangefinder caused by temperature fluctuations. Preliminary data shows that water level can be estimated with an absolute error of less than 2 cm. © 2014 IEEE.

  11. Bilingual Lexical Interactions in an Unsupervised Neural Network Model

    Science.gov (United States)

    Zhao, Xiaowei; Li, Ping

    2010-01-01

    In this paper we present an unsupervised neural network model of bilingual lexical development and interaction. We focus on how the representational structures of the bilingual lexicons can emerge, develop, and interact with each other as a function of the learning history. The results show that: (1) distinct representations for the two lexicons…

  12. Optimistic semi-supervised least squares classification

    DEFF Research Database (Denmark)

    Krijthe, Jesse H.; Loog, Marco

    2017-01-01

    The goal of semi-supervised learning is to improve supervised classifiers by using additional unlabeled training examples. In this work we study a simple self-learning approach to semi-supervised learning applied to the least squares classifier. We show that a soft-label and a hard-label variant ...

  13. Deep Learning @15 Petaflops/second: Semi-supervised pattern detection for 15 Terabytes of climate data

    Science.gov (United States)

    Collins, W. D.; Wehner, M. F.; Prabhat, M.; Kurth, T.; Satish, N.; Mitliagkas, I.; Zhang, J.; Racah, E.; Patwary, M.; Sundaram, N.; Dubey, P.

    2017-12-01

    Anthropogenically-forced climate changes in the number and character of extreme storms have the potential to significantly impact human and natural systems. Current high-performance computing enables multidecadal simulations with global climate models at resolutions of 25km or finer. Such high-resolution simulations are demonstrably superior in simulating extreme storms such as tropical cyclones than the coarser simulations available in the Coupled Model Intercomparison Project (CMIP5) and provide the capability to more credibly project future changes in extreme storm statistics and properties. The identification and tracking of storms in the voluminous model output is very challenging as it is impractical to manually identify storms due to the enormous size of the datasets, and therefore automated procedures are used. Traditionally, these procedures are based on a multi-variate set of physical conditions based on known properties of the class of storms in question. In recent years, we have successfully demonstrated that Deep Learning produces state of the art results for pattern detection in climate data. We have developed supervised and semi-supervised convolutional architectures for detecting and localizing tropical cyclones, extra-tropical cyclones and atmospheric rivers in simulation data. One of the primary challenges in the applicability of Deep Learning to climate data is in the expensive training phase. Typical networks may take days to converge on 10GB-sized datasets, while the climate science community has ready access to O(10 TB)-O(PB) sized datasets. In this work, we present the most scalable implementation of Deep Learning to date. We successfully scale a unified, semi-supervised convolutional architecture on all of the Cori Phase II supercomputer at NERSC. We use IntelCaffe, MKL and MLSL libraries. We have optimized single node MKL libraries to obtain 1-4 TF on single KNL nodes. We have developed a novel hybrid parameter update strategy to improve

  14. Quality assurance of the clinical learning environment in Austria: Construct validity of the Clinical Learning Environment, Supervision and Nurse Teacher Scale (CLES+T scale).

    Science.gov (United States)

    Mueller, Gerhard; Mylonas, Demetrius; Schumacher, Petra

    2018-04-21

    Within nursing education, the clinical learning environment is of a high importance in regards to the development of competencies and abilities. The organization, atmosphere, and supervision in the clinical learning environment are only a few factors that influence this development. In Austria there is currently no valid instrument available for the evaluation of influencing factors. The aim of the study was to test the construct validity with principal component analysis as well as the internal consistency of the German Clinical Learning Environment, Supervision and Teacher Scale (CLES+T scale) in Austria. The present validation study has a descriptive-quantitative cross-sectional design. The sample consisted of 385 nursing students from thirteen training institutions in Austria. The data collection was carried out online between March and April 2016. Starting with a polychoric correlation matrix, a parallel analysis with principal component extraction and promax rotation was carried out due to the ordinal data. The exploratory ordinal factor analysis supported a four-component solution and explained 73% of the total variance. The internal consistency of all 25 items reached a Cronbach's α of 0.95 and the four components ranged between 0.83 and 0.95. The German version of the CLES+T scale seems to be a useful instrument for identifying potential areas of improvement in clinical practice in order to derive specific quality measures for the practical learning environment. Copyright © 2018 Elsevier Ltd. All rights reserved.

  15. Adaptation and validation of the instrument Clinical Learning Environment and Supervision for medical students in primary health care

    Directory of Open Access Journals (Sweden)

    Eva Öhman

    2016-12-01

    Full Text Available Abstract Background Clinical learning takes place in complex socio-cultural environments that are workplaces for the staff and learning places for the students. In the clinical context, the students learn by active participation and in interaction with the rest of the community at the workplace. Clinical learning occurs outside the university, therefore is it important for both the university and the student that the student is given opportunities to evaluate the clinical placements with an instrument that allows evaluation from many perspectives. The instrument Clinical Learning Environment and Supervision (CLES was originally developed for evaluation of nursing students’ clinical learning environment. The aim of this study was to adapt and validate the CLES instrument to measure medical students’ perceptions of their learning environment in primary health care. Methods In the adaptation process the face validity was tested by an expert panel of primary care physicians, who were also active clinical supervisors. The adapted CLES instrument with 25 items and six background questions was sent electronically to 1,256 medical students from one university. Answers from 394 students were eligible for inclusion. Exploratory factor analysis based on principal component methods followed by oblique rotation was used to confirm the adequate number of factors in the data. Construct validity was assessed by factor analysis. Confirmatory factor analysis was used to confirm the dimensions of CLES instrument. Results The construct validity showed a clearly indicated four-factor model. The cumulative variance explanation was 0.65, and the overall Cronbach’s alpha was 0.95. All items loaded similarly with the dimensions in the non-adapted CLES except for one item that loaded to another dimension. The CLES instrument in its adapted form had high construct validity and high reliability and internal consistency. Conclusion CLES, in its adapted form, appears

  16. Fieldwork online: a GIS-based electronic learning environment for supervising fieldwork

    Science.gov (United States)

    Alberti, Koko; Marra, Wouter; Baarsma, Rein; Karssenberg, Derek

    2016-04-01

    Fieldwork comes in many forms: individual research projects in unique places, large groups of students on organized fieldtrips, and everything in between those extremes. Supervising students in often distant places can be a logistical challenge and requires a significant time investment of their supervisors. We developed an online application for remote supervision of students on fieldwork. In our fieldworkonline webapp, which is accessible through a web browser, students can upload their field data in the form of a spreadsheet with coordinates (in a system of choice) and data-fields. Field data can be any combination of quantitative or qualitative data, and can contain references to photos or other documents uploaded to the app. The student's data is converted to a map with data-points that contain all the data-fields and links to photos and documents associated with that location. Supervisors can review the data of their students and provide feedback on observations, or geo-referenced feedback on the map. Similarly, students can ask geo-referenced questions to their supervisors. Furthermore, supervisors can choose different basemaps or upload their own. Fieldwork online is a useful tool for supervising students at a distant location in the field and is most suitable for first-order feedback on students' observations, can be used to guide students to interesting locations, and allows for short discussions on phenomena observed in the field. We seek user that like to use this system, we are able to provide support and add new features if needed. The website is built and controlled using Flask, an open-source Python Framework. The maps are generated and controlled using MapServer and OpenLayers, and the database is built in PostgreSQL with PostGIS support. Fieldworkonline and all tools used to create it are open-source. Experience fieldworkonline at our demo during this session, or online at fieldworkonline.geo.uu.nl (username: EGU2016, password: Vienna).

  17. Local dimensionality reduction and supervised learning within natural clusters for biomedical data analysis

    NARCIS (Netherlands)

    Pechenizkiy, M.; Tsymbal, A.; Puuronen, S.

    2006-01-01

    Inductive learning systems were successfully applied in a number of medical domains. Nevertheless, the effective use of these systems often requires data preprocessing before applying a learning algorithm. This is especially important for multidimensional heterogeneous data presented by a large

  18. Dimensionality reduction with unsupervised nearest neighbors

    CERN Document Server

    Kramer, Oliver

    2013-01-01

    This book is devoted to a novel approach for dimensionality reduction based on the famous nearest neighbor method that is a powerful classification and regression approach. It starts with an introduction to machine learning concepts and a real-world application from the energy domain. Then, unsupervised nearest neighbors (UNN) is introduced as efficient iterative method for dimensionality reduction. Various UNN models are developed step by step, reaching from a simple iterative strategy for discrete latent spaces to a stochastic kernel-based algorithm for learning submanifolds with independent parameterizations. Extensions that allow the embedding of incomplete and noisy patterns are introduced. Various optimization approaches are compared, from evolutionary to swarm-based heuristics. Experimental comparisons to related methodologies taking into account artificial test data sets and also real-world data demonstrate the behavior of UNN in practical scenarios. The book contains numerous color figures to illustr...

  19. Deep learning in neural networks: an overview.

    Science.gov (United States)

    Schmidhuber, Jürgen

    2015-01-01

    In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarizes relevant work, much of it from the previous millennium. Shallow and Deep Learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

  20. Surface mapping via unsupervised classification of remote sensing: application to MESSENGER/MASCS and DAWN/VIRS data.

    Science.gov (United States)

    D'Amore, M.; Le Scaon, R.; Helbert, J.; Maturilli, A.

    2017-12-01

    Machine-learning achieved unprecedented results in high-dimensional data processing tasks with wide applications in various fields. Due to the growing number of complex nonlinear systems that have to be investigated in science and the bare raw size of data nowadays available, ML offers the unique ability to extract knowledge, regardless the specific application field. Examples are image segmentation, supervised/unsupervised/ semi-supervised classification, feature extraction, data dimensionality analysis/reduction.The MASCS instrument has mapped Mercury surface in the 400-1145 nm wavelength range during orbital observations by the MESSENGER spacecraft. We have conducted k-means unsupervised hierarchical clustering to identify and characterize spectral units from MASCS observations. The results display a dichotomy: a polar and equatorial units, possibly linked to compositional differences or weathering due to irradiation. To explore possible relations between composition and spectral behavior, we have compared the spectral provinces with elemental abundance maps derived from MESSENGER's X-Ray Spectrometer (XRS).For the Vesta application on DAWN Visible and infrared spectrometer (VIR) data, we explored several Machine Learning techniques: image segmentation method, stream algorithm and hierarchical clustering.The algorithm successfully separates the Olivine outcrops around two craters on Vesta's surface [1]. New maps summarizing the spectral and chemical signature of the surface could be automatically produced.We conclude that instead of hand digging in data, scientist could choose a subset of algorithms with well known feature (i.e. efficacy on the particular problem, speed, accuracy) and focus their effort in understanding what important characteristic of the groups found in the data mean. [1] E Ammannito et al. "Olivine in an unexpected location on Vesta's surface". In: Nature 504.7478 (2013), pp. 122-125.