methodology unsupervised clustering: Topics by WorldWideScience.org

Sample records for methodology unsupervised clustering

Predicting protein complexes from weighted protein-protein interaction graphs with a novel unsupervised methodology: Evolutionary enhanced Markov clustering.

Science.gov (United States)

Theofilatos, Konstantinos; Pavlopoulou, Niki; Papasavvas, Christoforos; Likothanassis, Spiros; Dimitrakopoulos, Christos; Georgopoulos, Efstratios; Moschopoulos, Charalampos; Mavroudi, Seferina

2015-03-01

Proteins are considered to be the most important individual components of biological systems and they combine to form physical protein complexes which are responsible for certain molecular functions. Despite the large availability of protein-protein interaction (PPI) information, not much information is available about protein complexes. Experimental methods are limited in terms of time, efficiency, cost and performance constraints. Existing computational methods have provided encouraging preliminary results, but they phase certain disadvantages as they require parameter tuning, some of them cannot handle weighted PPI data and others do not allow a protein to participate in more than one protein complex. In the present paper, we propose a new fully unsupervised methodology for predicting protein complexes from weighted PPI graphs. The proposed methodology is called evolutionary enhanced Markov clustering (EE-MC) and it is a hybrid combination of an adaptive evolutionary algorithm and a state-of-the-art clustering algorithm named enhanced Markov clustering. EE-MC was compared with state-of-the-art methodologies when applied to datasets from the human and the yeast Saccharomyces cerevisiae organisms. Using public available datasets, EE-MC outperformed existing methodologies (in some datasets the separation metric was increased by 10-20%). Moreover, when applied to new human datasets its performance was encouraging in the prediction of protein complexes which consist of proteins with high functional similarity. In specific, 5737 protein complexes were predicted and 72.58% of them are enriched for at least one gene ontology (GO) function term. EE-MC is by design able to overcome intrinsic limitations of existing methodologies such as their inability to handle weighted PPI networks, their constraint to assign every protein in exactly one cluster and the difficulties they face concerning the parameter tuning. This fact was experimentally validated and moreover, new
Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm.

Science.gov (United States)

Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong

2016-01-01

In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis.
Application of cluster analysis and unsupervised learning to multivariate tissue characterization

International Nuclear Information System (INIS)

Momenan, R.; Insana, M.F.; Wagner, R.F.; Garra, B.S.; Loew, M.H.

1987-01-01

This paper describes a procedure for classifying tissue types from unlabeled acoustic measurements (data type unknown) using unsupervised cluster analysis. These techniques are being applied to unsupervised ultrasonic image segmentation and tissue characterization. The performance of a new clustering technique is measured and compared with supervised methods, such as a linear Bayes classifier. In these comparisons two objectives are sought: a) How well does the clustering method group the data?; b) Do the clusters correspond to known tissue classes? The first question is investigated by a measure of cluster similarity and dispersion. The second question involves a comparison with a supervised technique using labeled data
An Improved Unsupervised Modeling Methodology For Detecting Fraud In Vendor Payment Transactions

National Research Council Canada - National Science Library

Rouillard, Gregory

2003-01-01

...) vendor payment transactions through Unsupervised Modeling (cluster analysis) . Clementine Data Mining software is used to construct unsupervised models of vendor payment data using the K-Means, Two Step, and Kohonen algorithms...
Performance Analysis of Unsupervised Clustering Methods for Brain Tumor Segmentation

Directory of Open Access Journals (Sweden)

Tushar H Jaware

2013-10-01

Full Text Available Medical image processing is the most challenging and emerging field of neuroscience. The ultimate goal of medical image analysis in brain MRI is to extract important clinical features that would improve methods of diagnosis & treatment of disease. This paper focuses on methods to detect & extract brain tumour from brain MR images. MATLAB is used to design, software tool for locating brain tumor, based on unsupervised clustering methods. K-Means clustering algorithm is implemented & tested on data base of 30 images. Performance evolution of unsupervised clusteringmethods is presented.
Factored Translation with Unsupervised Word Clusters

DEFF Research Database (Denmark)

Rishøj, Christian; Søgaard, Anders

2011-01-01

Unsupervised word clustering algorithms — which form word clusters based on a measure of distributional similarity — have proven to be useful in providing beneficial features for various natural language processing tasks involving supervised learning. This work explores the utility of such word...... clusters as factors in statistical machine translation. Although some of the language pairs in this work clearly benefit from the factor augmentation, there is no consistent improvement in translation accuracy across the board. For all language pairs, the word clusters clearly improve translation for some...... proportion of the sentences in the test set, but has a weak or even detrimental effect on the rest. It is shown that if one could determine whether or not to use a factor when translating a given sentence, rather substantial improvements in precision could be achieved for all of the language pairs evaluated...
Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning.

Directory of Open Access Journals (Sweden)

Jiayi Wu

Full Text Available Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM. We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.
Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning.

Science.gov (United States)

Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi; Mao, Youdong

2017-01-01

Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.
Clustervision: Visual Supervision of Unsupervised Clustering.

Science.gov (United States)

Kwon, Bum Chul; Eysenbach, Ben; Verma, Janu; Ng, Kenney; De Filippi, Christopher; Stewart, Walter F; Perer, Adam

2018-01-01

Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While having access to a wide variety of algorithms is helpful, in practice, it is quite difficult for data scientists to choose and parameterize algorithms to get the clustering results relevant for their dataset and analytical tasks. To alleviate this problem, we built Clustervision, a visual analytics tool that helps ensure data scientists find the right clustering among the large amount of techniques and parameters available. Our system clusters data using a variety of clustering techniques and parameters and then ranks clustering results utilizing five quality metrics. In addition, users can guide the system to produce more relevant results by providing task-relevant constraints on the data. Our visual user interface allows users to find high quality clustering results, explore the clusters using several coordinated visualization techniques, and select the cluster result that best suits their task. We demonstrate this novel approach using a case study with a team of researchers in the medical domain and showcase that our system empowers users to choose an effective representation of their complex data.
Misty Mountain clustering: application to fast unsupervised flow cytometry gating

Directory of Open Access Journals (Sweden)

Sealfon Stuart C

2010-10-01

Full Text Available Abstract Background There are many important clustering questions in computational biology for which no satisfactory method exists. Automated clustering algorithms, when applied to large, multidimensional datasets, such as flow cytometry data, prove unsatisfactory in terms of speed, problems with local minima or cluster shape bias. Model-based approaches are restricted by the assumptions of the fitting functions. Furthermore, model based clustering requires serial clustering for all cluster numbers within a user defined interval. The final cluster number is then selected by various criteria. These supervised serial clustering methods are time consuming and frequently different criteria result in different optimal cluster numbers. Various unsupervised heuristic approaches that have been developed such as affinity propagation are too expensive to be applied to datasets on the order of 106 points that are often generated by high throughput experiments. Results To circumvent these limitations, we developed a new, unsupervised density contour clustering algorithm, called Misty Mountain, that is based on percolation theory and that efficiently analyzes large data sets. The approach can be envisioned as a progressive top-down removal of clouds covering a data histogram relief map to identify clusters by the appearance of statistically distinct peaks and ridges. This is a parallel clustering method that finds every cluster after analyzing only once the cross sections of the histogram. The overall run time for the composite steps of the algorithm increases linearly by the number of data points. The clustering of 106 data points in 2D data space takes place within about 15 seconds on a standard laptop PC. Comparison of the performance of this algorithm with other state of the art automated flow cytometry gating methods indicate that Misty Mountain provides substantial improvements in both run time and in the accuracy of cluster assignment. Conclusions
Unsupervised clustering with spiking neurons by sparse temporal coding and multi-layer RBF networks

NARCIS (Netherlands)

S.M. Bohte (Sander); J.A. La Poutré (Han); J.N. Kok (Joost)

2000-01-01

textabstractWe demonstrate that spiking neural networks encoding information in spike times are capable of computing and learning clusters from realistic data. We show how a spiking neural network based on spike-time coding and Hebbian learning can successfully perform unsupervised clustering on
Artificial immune kernel clustering network for unsupervised image segmentation

Institute of Scientific and Technical Information of China (English)

Wenlong Huang; Licheng Jiao

2008-01-01

An immune kernel clustering network (IKCN) is proposed based on the combination of the artificial immune network and the support vector domain description (SVDD) for the unsupervised image segmentation. In the network, a new antibody neighborhood and an adaptive learning coefficient, which is inspired by the long-term memory in cerebral cortices are presented. Starting from IKCN algorithm, we divide the image feature sets into subsets by the antibodies, and then map each subset into a high dimensional feature space by a mercer kernel, where each antibody neighborhood is represented as a support vector hypersphere. The clustering results of the local support vector hyperspheres are combined to yield a global clustering solution by the minimal spanning tree (MST), where a predefined number of clustering is not needed. We compare the proposed methods with two common clustering algorithms for the artificial synthetic data set and several image data sets, including the synthetic texture images and the SAR images, and encouraging experimental results are obtained.
Data mining with unsupervised clustering using photonic micro-ring resonators

Science.gov (United States)

McAulay, Alastair D.

2013-09-01

Data is commonly moved through optical fiber in modern data centers and may be stored optically. We propose an optical method of data mining for future data centers to enhance performance. For example, in clustering, a form of unsupervised learning, we propose that parameters corresponding to information in a database are converted from analog values to frequencies, as in the brain's neurons, where similar data will have close frequencies. We describe the Wilson-Cowan model for oscillating neurons. In optics we implement the frequencies with micro ring resonators. Due to the influence of weak coupling, a group of resonators will form clusters of similar frequencies that will indicate the desired parameters having close relations. Fewer clusters are formed as clustering proceeds, which allows the creation of a tree showing topics of importance and their relationships in the database. The tree can be used for instance to target advertising and for planning.
Unsupervised Two-Way Clustering of Metagenomic Sequences

Directory of Open Access Journals (Sweden)

Shruthi Prabhakara

2012-01-01

Full Text Available A major challenge facing metagenomics is the development of tools for the characterization of functional and taxonomic content of vast amounts of short metagenome reads. The efficacy of clustering methods depends on the number of reads in the dataset, the read length and relative abundances of source genomes in the microbial community. In this paper, we formulate an unsupervised naive Bayes multispecies, multidimensional mixture model for reads from a metagenome. We use the proposed model to cluster metagenomic reads by their species of origin and to characterize the abundance of each species. We model the distribution of word counts along a genome as a Gaussian for shorter, frequent words and as a Poisson for longer words that are rare. We employ either a mixture of Gaussians or mixture of Poissons to model reads within each bin. Further, we handle the high-dimensionality and sparsity associated with the data, by grouping the set of words comprising the reads, resulting in a two-way mixture model. Finally, we demonstrate the accuracy and applicability of this method on simulated and real metagenomes. Our method can accurately cluster reads as short as 100 bps and is robust to varying abundances, divergences and read lengths.
Unsupervised active learning based on hierarchical graph-theoretic clustering.

Science.gov (United States)

Hu, Weiming; Hu, Wei; Xie, Nianhua; Maybank, Steve

2009-10-01

Most existing active learning approaches are supervised. Supervised active learning has the following problems: inefficiency in dealing with the semantic gap between the distribution of samples in the feature space and their labels, lack of ability in selecting new samples that belong to new categories that have not yet appeared in the training samples, and lack of adaptability to changes in the semantic interpretation of sample categories. To tackle these problems, we propose an unsupervised active learning framework based on hierarchical graph-theoretic clustering. In the framework, two promising graph-theoretic clustering algorithms, namely, dominant-set clustering and spectral clustering, are combined in a hierarchical fashion. Our framework has some advantages, such as ease of implementation, flexibility in architecture, and adaptability to changes in the labeling. Evaluations on data sets for network intrusion detection, image classification, and video classification have demonstrated that our active learning framework can effectively reduce the workload of manual classification while maintaining a high accuracy of automatic classification. It is shown that, overall, our framework outperforms the support-vector-machine-based supervised active learning, particularly in terms of dealing much more efficiently with new samples whose categories have not yet appeared in the training samples.
Unsupervised color image segmentation using a lattice algebra clustering technique

Science.gov (United States)

Urcid, Gonzalo; Ritter, Gerhard X.

2011-08-01

In this paper we introduce a lattice algebra clustering technique for segmenting digital images in the Red-Green- Blue (RGB) color space. The proposed technique is a two step procedure. Given an input color image, the first step determines the finite set of its extreme pixel vectors within the color cube by means of the scaled min-W and max-M lattice auto-associative memory matrices, including the minimum and maximum vector bounds. In the second step, maximal rectangular boxes enclosing each extreme color pixel are found using the Chebychev distance between color pixels; afterwards, clustering is performed by assigning each image pixel to its corresponding maximal box. The two steps in our proposed method are completely unsupervised or autonomous. Illustrative examples are provided to demonstrate the color segmentation results including a brief numerical comparison with two other non-maximal variations of the same clustering technique.
Unsupervised learning algorithms

CERN Document Server

Aydin, Kemal

2016-01-01

This book summarizes the state-of-the-art in unsupervised learning. The contributors discuss how with the proliferation of massive amounts of unlabeled data, unsupervised learning algorithms, which can automatically discover interesting and useful patterns in such data, have gained popularity among researchers and practitioners. The authors outline how these algorithms have found numerous applications including pattern recognition, market basket analysis, web mining, social network analysis, information retrieval, recommender systems, market research, intrusion detection, and fraud detection. They present how the difficulty of developing theoretically sound approaches that are amenable to objective evaluation have resulted in the proposal of numerous unsupervised learning algorithms over the past half-century. The intended audience includes researchers and practitioners who are increasingly using unsupervised learning algorithms to analyze their data. Topics of interest include anomaly detection, clustering,...
Classification of behavior using unsupervised temporal neural networks

International Nuclear Information System (INIS)

Adair, K.L.

1998-03-01

Adding recurrent connections to unsupervised neural networks used for clustering creates a temporal neural network which clusters a sequence of inputs as they appear over time. The model presented combines the Jordan architecture with the unsupervised learning technique Adaptive Resonance Theory, Fuzzy ART. The combination yields a neural network capable of quickly clustering sequential pattern sequences as the sequences are generated. The applicability of the architecture is illustrated through a facility monitoring problem
Unsupervised Feature Subset Selection

DEFF Research Database (Denmark)

Søndberg-Madsen, Nicolaj; Thomsen, C.; Pena, Jose

2003-01-01

This paper studies filter and hybrid filter-wrapper feature subset selection for unsupervised learning (data clustering). We constrain the search for the best feature subset by scoring the dependence of every feature on the rest of the features, conjecturing that these scores discriminate some ir...... irrelevant features. We report experimental results on artificial and real data for unsupervised learning of naive Bayes models. Both the filter and hybrid approaches perform satisfactorily....
Unsupervised Performance Evaluation Strategy for Bridge Superstructure Based on Fuzzy Clustering and Field Data

Directory of Open Access Journals (Sweden)

Yubo Jiao

2013-01-01

Full Text Available Performance evaluation of a bridge is critical for determining the optimal maintenance strategy. An unsupervised bridge superstructure state assessment method is proposed in this paper based on fuzzy clustering and bridge field measured data. Firstly, the evaluation index system of bridge is constructed. Secondly, a certain number of bridge health monitoring data are selected as clustering samples to obtain the fuzzy similarity matrix and fuzzy equivalent matrix. Finally, different thresholds are selected to form dynamic clustering maps and determine the best classification based on statistic analysis. The clustering result is regarded as a sample base, and the bridge state can be evaluated by calculating the fuzzy nearness between the unknown bridge state data and the sample base. Nanping Bridge in Jilin Province is selected as the engineering project to verify the effectiveness of the proposed method.

Hierarchical Adaptive Means (HAM) clustering for hardware-efficient, unsupervised and real-time spike sorting.

Science.gov (United States)

Paraskevopoulou, Sivylla E; Wu, Di; Eftekhar, Amir; Constandinou, Timothy G

2014-09-30

This work presents a novel unsupervised algorithm for real-time adaptive clustering of neural spike data (spike sorting). The proposed Hierarchical Adaptive Means (HAM) clustering method combines centroid-based clustering with hierarchical cluster connectivity to classify incoming spikes using groups of clusters. It is described how the proposed method can adaptively track the incoming spike data without requiring any past history, iteration or training and autonomously determines the number of spike classes. Its performance (classification accuracy) has been tested using multiple datasets (both simulated and recorded) achieving a near-identical accuracy compared to k-means (using 10-iterations and provided with the number of spike classes). Also, its robustness in applying to different feature extraction methods has been demonstrated by achieving classification accuracies above 80% across multiple datasets. Last but crucially, its low complexity, that has been quantified through both memory and computation requirements makes this method hugely attractive for future hardware implementation. Copyright © 2014 Elsevier B.V. All rights reserved.
Class imbalance in unsupervised change detection - A diagnostic analysis from urban remote sensing

Science.gov (United States)

Leichtle, Tobias; Geiß, Christian; Lakes, Tobia; Taubenböck, Hannes

2017-08-01

Automatic monitoring of changes on the Earth's surface is an intrinsic capability and simultaneously a persistent methodological challenge in remote sensing, especially regarding imagery with very-high spatial resolution (VHR) and complex urban environments. In order to enable a high level of automatization, the change detection problem is solved in an unsupervised way to alleviate efforts associated with collection of properly encoded prior knowledge. In this context, this paper systematically investigates the nature and effects of class distribution and class imbalance in an unsupervised binary change detection application based on VHR imagery over urban areas. For this purpose, a diagnostic framework for sensitivity analysis of a large range of possible degrees of class imbalance is presented, which is of particular importance with respect to unsupervised approaches where the content of images and thus the occurrence and the distribution of classes are generally unknown a priori. Furthermore, this framework can serve as a general technique to evaluate model transferability in any two-class classification problem. The applied change detection approach is based on object-based difference features calculated from VHR imagery and subsequent unsupervised two-class clustering using k-means, genetic k-means and self-organizing map (SOM) clustering. The results from two test sites with different structural characteristics of the built environment demonstrated that classification performance is generally worse in imbalanced class distribution settings while best results were reached in balanced or close to balanced situations. Regarding suitable accuracy measures for evaluating model performance in imbalanced settings, this study revealed that the Kappa statistics show significant response to class distribution while the true skill statistic was widely insensitive to imbalanced classes. In general, the genetic k-means clustering algorithm achieved the most robust results
Decomposition methods for unsupervised learning

DEFF Research Database (Denmark)

Mørup, Morten

2008-01-01

This thesis presents the application and development of decomposition methods for Unsupervised Learning. It covers topics from classical factor analysis based decomposition and its variants such as Independent Component Analysis, Non-negative Matrix Factorization and Sparse Coding...... methods and clustering problems is derived both in terms of classical point clustering but also in terms of community detection in complex networks. A guiding principle throughout this thesis is the principle of parsimony. Hence, the goal of Unsupervised Learning is here posed as striving for simplicity...... in the decompositions. Thus, it is demonstrated how a wide range of decomposition methods explicitly or implicitly strive to attain this goal. Applications of the derived decompositions are given ranging from multi-media analysis of image and sound data, analysis of biomedical data such as electroencephalography...
Unsupervised Learning and Generalization

DEFF Research Database (Denmark)

Hansen, Lars Kai; Larsen, Jan

1996-01-01

The concept of generalization is defined for a general class of unsupervised learning machines. The generalization error is a straightforward extension of the corresponding concept for supervised learning, and may be estimated empirically using a test set or by statistical means-in close analogy ...... with supervised learning. The empirical and analytical estimates are compared for principal component analysis and for K-means clustering based density estimation......The concept of generalization is defined for a general class of unsupervised learning machines. The generalization error is a straightforward extension of the corresponding concept for supervised learning, and may be estimated empirically using a test set or by statistical means-in close analogy...
A Distributed Algorithm for the Cluster-Based Outlier Detection Using Unsupervised Extreme Learning Machines

Directory of Open Access Journals (Sweden)

Xite Wang

2017-01-01

Full Text Available Outlier detection is an important data mining task, whose target is to find the abnormal or atypical objects from a given dataset. The techniques for detecting outliers have a lot of applications, such as credit card fraud detection and environment monitoring. Our previous work proposed the Cluster-Based (CB outlier and gave a centralized method using unsupervised extreme learning machines to compute CB outliers. In this paper, we propose a new distributed algorithm for the CB outlier detection (DACB. On the master node, we collect a small number of points from the slave nodes to obtain a threshold. On each slave node, we design a new filtering method that can use the threshold to efficiently speed up the computation. Furthermore, we also propose a ranking method to optimize the order of cluster scanning. At last, the effectiveness and efficiency of the proposed approaches are verified through a plenty of simulation experiments.
Unsupervised classification of variable stars

Science.gov (United States)

Valenzuela, Lucas; Pichara, Karim

2018-03-01

During the past 10 years, a considerable amount of effort has been made to develop algorithms for automatic classification of variable stars. That has been primarily achieved by applying machine learning methods to photometric data sets where objects are represented as light curves. Classifiers require training sets to learn the underlying patterns that allow the separation among classes. Unfortunately, building training sets is an expensive process that demands a lot of human efforts. Every time data come from new surveys; the only available training instances are the ones that have a cross-match with previously labelled objects, consequently generating insufficient training sets compared with the large amounts of unlabelled sources. In this work, we present an algorithm that performs unsupervised classification of variable stars, relying only on the similarity among light curves. We tackle the unsupervised classification problem by proposing an untraditional approach. Instead of trying to match classes of stars with clusters found by a clustering algorithm, we propose a query-based method where astronomers can find groups of variable stars ranked by similarity. We also develop a fast similarity function specific for light curves, based on a novel data structure that allows scaling the search over the entire data set of unlabelled objects. Experiments show that our unsupervised model achieves high accuracy in the classification of different types of variable stars and that the proposed algorithm scales up to massive amounts of light curves.
An Efficient Optimization Method for Solving Unsupervised Data Classification Problems

Directory of Open Access Journals (Sweden)

Parvaneh Shabanzadeh

2015-01-01

Full Text Available Unsupervised data classification (or clustering analysis is one of the most useful tools and a descriptive task in data mining that seeks to classify homogeneous groups of objects based on similarity and is used in many medical disciplines and various applications. In general, there is no single algorithm that is suitable for all types of data, conditions, and applications. Each algorithm has its own advantages, limitations, and deficiencies. Hence, research for novel and effective approaches for unsupervised data classification is still active. In this paper a heuristic algorithm, Biogeography-Based Optimization (BBO algorithm, was adapted for data clustering problems by modifying the main operators of BBO algorithm, which is inspired from the natural biogeography distribution of different species. Similar to other population-based algorithms, BBO algorithm starts with an initial population of candidate solutions to an optimization problem and an objective function that is calculated for them. To evaluate the performance of the proposed algorithm assessment was carried on six medical and real life datasets and was compared with eight well known and recent unsupervised data classification algorithms. Numerical results demonstrate that the proposed evolutionary optimization algorithm is efficient for unsupervised data classification.
An unsupervised text mining method for relation extraction from biomedical literature.

Directory of Open Access Journals (Sweden)

Changqin Quan

Full Text Available The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1 Protein-protein interactions extraction, and (2 Gene-suicide association extraction. The evaluation of task (1 on the benchmark dataset (AImed corpus showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene-suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method.
Unsupervised classification of major depression using functional connectivity MRI.

Science.gov (United States)

Zeng, Ling-Li; Shen, Hui; Liu, Li; Hu, Dewen

2014-04-01

The current diagnosis of psychiatric disorders including major depressive disorder based largely on self-reported symptoms and clinical signs may be prone to patients' behaviors and psychiatrists' bias. This study aims at developing an unsupervised machine learning approach for the accurate identification of major depression based on single resting-state functional magnetic resonance imaging scans in the absence of clinical information. Twenty-four medication-naive patients with major depression and 29 demographically similar healthy individuals underwent resting-state functional magnetic resonance imaging. We first clustered the voxels within the perigenual cingulate cortex into two subregions, a subgenual region and a pregenual region, according to their distinct resting-state functional connectivity patterns and showed that a maximum margin clustering-based unsupervised machine learning approach extracted sufficient information from the subgenual cingulate functional connectivity map to differentiate depressed patients from healthy controls with a group-level clustering consistency of 92.5% and an individual-level classification consistency of 92.5%. It was also revealed that the subgenual cingulate functional connectivity network with the highest discriminative power primarily included the ventrolateral and ventromedial prefrontal cortex, superior temporal gyri and limbic areas, indicating that these connections may play critical roles in the pathophysiology of major depression. The current study suggests that subgenual cingulate functional connectivity network signatures may provide promising objective biomarkers for the diagnosis of major depression and that maximum margin clustering-based unsupervised machine learning approaches may have the potential to inform clinical practice and aid in research on psychiatric disorders. Copyright © 2013 Wiley Periodicals, Inc.
Data clustering theory, algorithms, and applications

CERN Document Server

Gan, Guojun; Wu, Jianhong

2007-01-01

Cluster analysis is an unsupervised process that divides a set of objects into homogeneous groups. This book starts with basic information on cluster analysis, including the classification of data and the corresponding similarity measures, followed by the presentation of over 50 clustering algorithms in groups according to some specific baseline methodologies such as hierarchical, center-based, and search-based methods. As a result, readers and users can easily identify an appropriate algorithm for their applications and compare novel ideas with existing results. The book also provides examples of clustering applications to illustrate the advantages and shortcomings of different clustering architectures and algorithms. Application areas include pattern recognition, artificial intelligence, information technology, image processing, biology, psychology, and marketing. Readers also learn how to perform cluster analysis with the C/C++ and MATLAB® programming languages.
Semi-supervised and unsupervised extreme learning machines.

Science.gov (United States)

Huang, Gao; Song, Shiji; Gupta, Jatinder N D; Wu, Cheng

2014-12-01

Extreme learning machines (ELMs) have proven to be efficient and effective learning mechanisms for pattern classification and regression. However, ELMs are primarily applied to supervised learning problems. Only a few existing research papers have used ELMs to explore unlabeled data. In this paper, we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization, thus greatly expanding the applicability of ELMs. The key advantages of the proposed algorithms are as follows: 1) both the semi-supervised ELM (SS-ELM) and the unsupervised ELM (US-ELM) exhibit learning capability and computational efficiency of ELMs; 2) both algorithms naturally handle multiclass classification or multicluster clustering; and 3) both algorithms are inductive and can handle unseen data at test time directly. Moreover, it is shown in this paper that all the supervised, semi-supervised, and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping, which is the key concept in ELM theory. Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with the state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency.
Content Discovery from Composite Audio : An unsupervised approach

NARCIS (Netherlands)

Lu, L.

2009-01-01

In this thesis, we developed and assessed a novel robust and unsupervised framework for semantic inference from composite audio signals. We focused on the problem of detecting audio scenes and grouping them into meaningful clusters. Our approach addressed all major steps in a general process of
flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding.

Science.gov (United States)

Ge, Yongchao; Sealfon, Stuart C

2012-08-01

For flow cytometry data, there are two common approaches to the unsupervised clustering problem: one is based on the finite mixture model and the other on spatial exploration of the histograms. The former is computationally slow and has difficulty to identify clusters of irregular shapes. The latter approach cannot be applied directly to high-dimensional data as the computational time and memory become unmanageable and the estimated histogram is unreliable. An algorithm without these two problems would be very useful. In this article, we combine ideas from the finite mixture model and histogram spatial exploration. This new algorithm, which we call flowPeaks, can be applied directly to high-dimensional data and identify irregular shape clusters. The algorithm first uses K-means algorithm with a large K to partition the cell population into many small clusters. These partitioned data allow the generation of a smoothed density function using the finite mixture model. All local peaks are exhaustively searched by exploring the density function and the cells are clustered by the associated local peak. The algorithm flowPeaks is automatic, fast and reliable and robust to cluster shape and outliers. This algorithm has been applied to flow cytometry data and it has been compared with state of the art algorithms, including Misty Mountain, FLOCK, flowMeans, flowMerge and FLAME. The R package flowPeaks is available at https://github.com/yongchao/flowPeaks. yongchao.ge@mssm.edu Supplementary data are available at Bioinformatics online.
Segmentation of fluorescence microscopy cell images using unsupervised mining.

Science.gov (United States)

Du, Xian; Dua, Sumeet

2010-05-28

The accurate measurement of cell and nuclei contours are critical for the sensitive and specific detection of changes in normal cells in several medical informatics disciplines. Within microscopy, this task is facilitated using fluorescence cell stains, and segmentation is often the first step in such approaches. Due to the complex nature of cell issues and problems inherent to microscopy, unsupervised mining approaches of clustering can be incorporated in the segmentation of cells. In this study, we have developed and evaluated the performance of multiple unsupervised data mining techniques in cell image segmentation. We adapt four distinctive, yet complementary, methods for unsupervised learning, including those based on k-means clustering, EM, Otsu's threshold, and GMAC. Validation measures are defined, and the performance of the techniques is evaluated both quantitatively and qualitatively using synthetic and recently published real data. Experimental results demonstrate that k-means, Otsu's threshold, and GMAC perform similarly, and have more precise segmentation results than EM. We report that EM has higher recall values and lower precision results from under-segmentation due to its Gaussian model assumption. We also demonstrate that these methods need spatial information to segment complex real cell images with a high degree of efficacy, as expected in many medical informatics applications.
A Hierarchical Clustering Methodology for the Estimation of Toxicity

Science.gov (United States)

A Quantitative Structure Activity Relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural sim...
GibbsCluster: unsupervised clustering and alignment of peptide sequences

DEFF Research Database (Denmark)

Andreatta, Massimo; Alvarez, Bruno; Nielsen, Morten

2017-01-01

motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large......-scale peptidome data generated by mass spectrometry. The server is available at http://www.cbs.dtu.dk/services/GibbsCluster-2.0....
Unsupervised land cover change detection: meaningful sequential time series analysis

CSIR Research Space (South Africa)

Salmon, BP

2011-06-01

Full Text Available An automated land cover change detection method is proposed that uses coarse spatial resolution hyper-temporal earth observation satellite time series data. The study compared three different unsupervised clustering approaches that operate on short...
The Typology of Methodological Approaches to Development of Innovative Clusters

Directory of Open Access Journals (Sweden)

Farat Olexandra V.

2017-06-01

Full Text Available The aim of the article is to study the existing methodological approaches to assessing the development of enterprises for further substantiation of possibilities of their using by cluster associations. As a result of research, based on the analysis of scientific literature, the most applicable methodological approaches to assessing the development of enterprises are characterized. 8 methodical approaches to assessing the level of development of enterprises and 4 methodological approaches to assessing the level of development of clusters are singled out. Each of the approaches is characterized by the presence of certain advantages and disadvantages, but none of them allows to obtain a systematic assessment of all areas of cluster functioning, identify possible reserves for cluster competitiveness growth and characterize possible strategies for their future development. Taking into account peculiarities of the functioning and development of cluster associations of enterprises, we propose our own methodological approach for assessing the development of innovative cluster structures.
Supervised versus unsupervised categorization: two sides of the same coin?

Science.gov (United States)

Pothos, Emmanuel M; Edwards, Darren J; Perlman, Amotz

2011-09-01

Supervised and unsupervised categorization have been studied in separate research traditions. A handful of studies have attempted to explore a possible convergence between the two. The present research builds on these studies, by comparing the unsupervised categorization results of Pothos et al. ( 2011 ; Pothos et al., 2008 ) with the results from two procedures of supervised categorization. In two experiments, we tested 375 participants with nine different stimulus sets and examined the relation between ease of learning of a classification, memory for a classification, and spontaneous preference for a classification. After taking into account the role of the number of category labels (clusters) in supervised learning, we found the three variables to be closely associated with each other. Our results provide encouragement for researchers seeking unified theoretical explanations for supervised and unsupervised categorization, but raise a range of challenging theoretical questions.
ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data.

Science.gov (United States)

Oluwadare, Oluwatosin; Cheng, Jianlin

2017-11-14

With the development of chromosomal conformation capturing techniques, particularly, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The Hi-C technique can generate genome-wide chromosomal interaction (contact) data, which can be used to investigate the higher-level organization of chromosomes, such as Topologically Associated Domains (TAD), i.e., locally packed chromosome regions bounded together by intra chromosomal contacts. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function. Here, we formulate the TAD identification problem as an unsupervised machine learning (clustering) problem, and develop a new TAD identification method called ClusterTAD. We introduce a novel method to represent chromosomal contacts as features to be used by the clustering algorithm. Our results show that ClusterTAD can accurately predict the TADs on a simulated Hi-C data. Our method is also largely complementary and consistent with existing methods on the real Hi-C datasets of two mouse cells. The validation with the chromatin immunoprecipitation (ChIP) sequencing (ChIP-Seq) data shows that the domain boundaries identified by ClusterTAD have a high enrichment of CTCF binding sites, promoter-related marks, and enhancer-related histone modifications. As ClusterTAD is based on a proven clustering approach, it opens a new avenue to apply a large array of clustering methods developed in the machine learning field to the TAD identification problem. The source code, the results, and the TADs generated for the simulated and real Hi-C datasets are available here: https://github.com/BDM-Lab/ClusterTAD .

Unsupervised Approach Data Analysis Based on Fuzzy Possibilistic Clustering: Application to Medical Image MRI

Directory of Open Access Journals (Sweden)

Nour-Eddine El Harchaoui

2013-01-01

Full Text Available The analysis and processing of large data are a challenge for researchers. Several approaches have been used to model these complex data, and they are based on some mathematical theories: fuzzy, probabilistic, possibilistic, and evidence theories. In this work, we propose a new unsupervised classification approach that combines the fuzzy and possibilistic theories; our purpose is to overcome the problems of uncertain data in complex systems. We used the membership function of fuzzy c-means (FCM to initialize the parameters of possibilistic c-means (PCM, in order to solve the problem of coinciding clusters that are generated by PCM and also overcome the weakness of FCM to noise. To validate our approach, we used several validity indexes and we compared them with other conventional classification algorithms: fuzzy c-means, possibilistic c-means, and possibilistic fuzzy c-means. The experiments were realized on different synthetics data sets and real brain MR images.
Regional health care planning: a methodology to cluster facilities using community utilization patterns.

Science.gov (United States)

Delamater, Paul L; Shortridge, Ashton M; Messina, Joseph P

2013-08-22

Community-based health care planning and regulation necessitates grouping facilities and areal units into regions of similar health care use. Limited research has explored the methodologies used in creating these regions. We offer a new methodology that clusters facilities based on similarities in patient utilization patterns and geographic location. Our case study focused on Hospital Groups in Michigan, the allocation units used for predicting future inpatient hospital bed demand in the state's Bed Need Methodology. The scientific, practical, and political concerns that were considered throughout the formulation and development of the methodology are detailed. The clustering methodology employs a 2-step K-means + Ward's clustering algorithm to group hospitals. The final number of clusters is selected using a heuristic that integrates both a statistical-based measure of cluster fit and characteristics of the resulting Hospital Groups. Using recent hospital utilization data, the clustering methodology identified 33 Hospital Groups in Michigan. Despite being developed within the politically charged climate of Certificate of Need regulation, we have provided an objective, replicable, and sustainable methodology to create Hospital Groups. Because the methodology is built upon theoretically sound principles of clustering analysis and health care service utilization, it is highly transferable across applications and suitable for grouping facilities or areal units.
Clustering and visualizing similarity networks of membrane proteins.

Science.gov (United States)

Hu, Geng-Ming; Mai, Te-Lun; Chen, Chi-Ming

2015-08-01

We proposed a fast and unsupervised clustering method, minimum span clustering (MSC), for analyzing the sequence-structure-function relationship of biological networks, and demonstrated its validity in clustering the sequence/structure similarity networks (SSN) of 682 membrane protein (MP) chains. The MSC clustering of MPs based on their sequence information was found to be consistent with their tertiary structures and functions. For the largest seven clusters predicted by MSC, the consistency in chain function within the same cluster is found to be 100%. From analyzing the edge distribution of SSN for MPs, we found a characteristic threshold distance for the boundary between clusters, over which SSN of MPs could be properly clustered by an unsupervised sparsification of the network distance matrix. The clustering results of MPs from both MSC and the unsupervised sparsification methods are consistent with each other, and have high intracluster similarity and low intercluster similarity in sequence, structure, and function. Our study showed a strong sequence-structure-function relationship of MPs. We discussed evidence of convergent evolution of MPs and suggested applications in finding structural similarities and predicting biological functions of MP chains based on their sequence information. © 2015 Wiley Periodicals, Inc.
Tune Your Brown Clustering, Please

DEFF Research Database (Denmark)

Derczynski, Leon; Chester, Sean; Bøgh, Kenneth Sejdenfaden

2015-01-01

Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly...
A novel unsupervised spike sorting algorithm for intracranial EEG.

Science.gov (United States)

Yadav, R; Shah, A K; Loeb, J A; Swamy, M N S; Agarwal, R

2011-01-01

This paper presents a novel, unsupervised spike classification algorithm for intracranial EEG. The method combines template matching and principal component analysis (PCA) for building a dynamic patient-specific codebook without a priori knowledge of the spike waveforms. The problem of misclassification due to overlapping classes is resolved by identifying similar classes in the codebook using hierarchical clustering. Cluster quality is visually assessed by projecting inter- and intra-clusters onto a 3D plot. Intracranial EEG from 5 patients was utilized to optimize the algorithm. The resulting codebook retains 82.1% of the detected spikes in non-overlapping and disjoint clusters. Initial results suggest a definite role of this method for both rapid review and quantitation of interictal spikes that could enhance both clinical treatment and research studies on epileptic patients.
Unsupervised learning of mixture models based on swarm intelligence and neural networks with optimal completion using incomplete data

Directory of Open Access Journals (Sweden)

Ahmed R. Abas

2012-07-01

Full Text Available In this paper, a new algorithm is presented for unsupervised learning of finite mixture models (FMMs using data set with missing values. This algorithm overcomes the local optima problem of the Expectation-Maximization (EM algorithm via integrating the EM algorithm with Particle Swarm Optimization (PSO. In addition, the proposed algorithm overcomes the problem of biased estimation due to overlapping clusters in estimating missing values in the input data set by integrating locally-tuned general regression neural networks with Optimal Completion Strategy (OCS. A comparison study shows the superiority of the proposed algorithm over other algorithms commonly used in the literature in unsupervised learning of FMM parameters that result in minimum mis-classification errors when used in clustering incomplete data set that is generated from overlapping clusters and these clusters are largely different in their sizes.
Rough-fuzzy clustering and unsupervised feature selection for wavelet based MR image segmentation.

Directory of Open Access Journals (Sweden)

Pradipta Maji

Full Text Available Image segmentation is an indispensable process in the visualization of human tissues, particularly during clinical analysis of brain magnetic resonance (MR images. For many human experts, manual segmentation is a difficult and time consuming task, which makes an automated brain MR image segmentation method desirable. In this regard, this paper presents a new segmentation method for brain MR images, integrating judiciously the merits of rough-fuzzy computing and multiresolution image analysis technique. The proposed method assumes that the major brain tissues, namely, gray matter, white matter, and cerebrospinal fluid from the MR images are considered to have different textural properties. The dyadic wavelet analysis is used to extract the scale-space feature vector for each pixel, while the rough-fuzzy clustering is used to address the uncertainty problem of brain MR image segmentation. An unsupervised feature selection method is introduced, based on maximum relevance-maximum significance criterion, to select relevant and significant textural features for segmentation problem, while the mathematical morphology based skull stripping preprocessing step is proposed to remove the non-cerebral tissues like skull. The performance of the proposed method, along with a comparison with related approaches, is demonstrated on a set of synthetic and real brain MR images using standard validity indices.
Segmentation methodology for automated classification and differentiation of soft tissues in multiband images of high-resolution ultrasonic transmission tomography.

Science.gov (United States)

Jeong, Jeong-Won; Shin, Dae C; Do, Synho; Marmarelis, Vasilis Z

2006-08-01

This paper presents a novel segmentation methodology for automated classification and differentiation of soft tissues using multiband data obtained with the newly developed system of high-resolution ultrasonic transmission tomography (HUTT) for imaging biological organs. This methodology extends and combines two existing approaches: the L-level set active contour (AC) segmentation approach and the agglomerative hierarchical kappa-means approach for unsupervised clustering (UC). To prevent the trapping of the current iterative minimization AC algorithm in a local minimum, we introduce a multiresolution approach that applies the level set functions at successively increasing resolutions of the image data. The resulting AC clusters are subsequently rearranged by the UC algorithm that seeks the optimal set of clusters yielding the minimum within-cluster distances in the feature space. The presented results from Monte Carlo simulations and experimental animal-tissue data demonstrate that the proposed methodology outperforms other existing methods without depending on heuristic parameters and provides a reliable means for soft tissue differentiation in HUTT images.
Methodologies for Improved Tag Cloud Generation with Clustering

DEFF Research Database (Denmark)

Leginus, Martin; Dolog, Peter; Lage, Ricardo Gomes

2012-01-01

Tag clouds are useful means for navigation in the social web systems. Usually the systems implement the tag cloud generation based on tag popularity which is not always the best method. In this paper we propose methodologies on how to combine clustering into the tag cloud generation to improve...... coverage and overlap. We study several clustering algorithms to generate tag clouds. We show that by extending cloud generation based on tag popularity with clustering we slightly improve coverage. We also show that if the cloud is generated by clustering independently of the tag popularity baseline we...
Unsupervised spike sorting based on discriminative subspace learning.

Science.gov (United States)

Keshtkaran, Mohammad Reza; Yang, Zhi

2014-01-01

Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. In this paper, we present two unsupervised spike sorting algorithms based on discriminative subspace learning. The first algorithm simultaneously learns the discriminative feature subspace and performs clustering. It uses histogram of features in the most discriminative projection to detect the number of neurons. The second algorithm performs hierarchical divisive clustering that learns a discriminative 1-dimensional subspace for clustering in each level of the hierarchy until achieving almost unimodal distribution in the subspace. The algorithms are tested on synthetic and in-vivo data, and are compared against two widely used spike sorting methods. The comparative results demonstrate that our spike sorting methods can achieve substantially higher accuracy in lower dimensional feature space, and they are highly robust to noise. Moreover, they provide significantly better cluster separability in the learned subspace than in the subspace obtained by principal component analysis or wavelet transform.
Integrating the Supervised Information into Unsupervised Learning

Directory of Open Access Journals (Sweden)

Ping Ling

2013-01-01

Full Text Available This paper presents an assembling unsupervised learning framework that adopts the information coming from the supervised learning process and gives the corresponding implementation algorithm. The algorithm consists of two phases: extracting and clustering data representatives (DRs firstly to obtain labeled training data and then classifying non-DRs based on labeled DRs. The implementation algorithm is called SDSN since it employs the tuning-scaled Support vector domain description to collect DRs, uses spectrum-based method to cluster DRs, and adopts the nearest neighbor classifier to label non-DRs. The validation of the clustering procedure of the first-phase is analyzed theoretically. A new metric is defined data dependently in the second phase to allow the nearest neighbor classifier to work with the informed information. A fast training approach for DRs’ extraction is provided to bring more efficiency. Experimental results on synthetic and real datasets verify that the proposed idea is of correctness and performance and SDSN exhibits higher popularity in practice over the traditional pure clustering procedure.
Classification Of Cluster Area Forsatellite Image

Directory of Open Access Journals (Sweden)

Thwe Zin Phyo

2015-06-01

Full Text Available Abstract This paper describes area classification for Landsat7 satellite image. The main purpose of this system is to classify the area of each cluster contained in a satellite image. To classify this image firstly need to clusterthe satellite image into different land cover types. Clustering is an unsupervised learning method that aimsto classify an image into homogeneous regions. This system is implemented based on color features with K-means clustering unsupervised algorithm. This method does not need to train image before clustering.The clusters of satellite image are grouped into a set of three clusters for Landsat7 satellite image. For this work the combined band 432 from Landsat7 satellite is used as an input. Satellite imageMandalay area in 2001 is chosen to test the segmentation method. After clustering a specific range for three clustered images must be defined in order to obtain greenland water and urbanbalance.This system is implemented by using MATLAB programming language.
Classification and unsupervised clustering of LIGO data with Deep Transfer Learning

Science.gov (United States)

George, Daniel; Shen, Hongyu; Huerta, E. A.

2018-05-01

Gravitational wave detection requires a detailed understanding of the response of the LIGO and Virgo detectors to true signals in the presence of environmental and instrumental noise. Of particular interest is the study of anomalous non-Gaussian transients, such as glitches, since their occurrence rate in LIGO and Virgo data can obscure or even mimic true gravitational wave signals. Therefore, successfully identifying and excising these anomalies from gravitational wave data is of utmost importance for the detection and characterization of true signals and for the accurate computation of their significance. To facilitate this work, we present the first application of deep learning combined with transfer learning to show that knowledge from pretrained models for real-world object recognition can be transferred for classifying spectrograms of glitches. To showcase this new method, we use a data set of twenty-two classes of glitches, curated and labeled by the Gravity Spy project using data collected during LIGO's first discovery campaign. We demonstrate that our Deep Transfer Learning method enables an optimal use of very deep convolutional neural networks for glitch classification given small and unbalanced training data sets, significantly reduces the training time, and achieves state-of-the-art accuracy above 98.8%, lowering the previous error rate by over 60%. More importantly, once trained via transfer learning on the known classes, we show that our neural networks can be truncated and used as feature extractors for unsupervised clustering to automatically group together new unknown classes of glitches and anomalous signals. This novel capability is of paramount importance to identify and remove new types of glitches which will occur as the LIGO/Virgo detectors gradually attain design sensitivity.
Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling.

Science.gov (United States)

Keshtkaran, Mohammad Reza; Yang, Zhi

2017-06-01

Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling

Science.gov (United States)

Keshtkaran, Mohammad Reza; Yang, Zhi

2017-06-01

Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Enhancement of Tropical Land Cover Mapping with Wavelet-Based Fusion and Unsupervised Clustering of SAR and Landsat Image Data

Science.gov (United States)

LeMoigne, Jacqueline; Laporte, Nadine; Netanyahuy, Nathan S.; Zukor, Dorothy (Technical Monitor)

2001-01-01

The characterization and the mapping of land cover/land use of forest areas, such as the Central African rainforest, is a very complex task. This complexity is mainly due to the extent of such areas and, as a consequence, to the lack of full and continuous cloud-free coverage of those large regions by one single remote sensing instrument, In order to provide improved vegetation maps of Central Africa and to develop forest monitoring techniques for applications at the local and regional scales, we propose to utilize multi-sensor remote sensing observations coupled with in-situ data. Fusion and clustering of multi-sensor data are the first steps towards the development of such a forest monitoring system. In this paper, we will describe some preliminary experiments involving the fusion of SAR and Landsat image data of the Lope Reserve in Gabon. Similarly to previous fusion studies, our fusion method is wavelet-based. The fusion provides a new image data set which contains more detailed texture features and preserves the large homogeneous regions that are observed by the Thematic Mapper sensor. The fusion step is followed by unsupervised clustering and provides a vegetation map of the area.
Bayesian feature weighting for unsupervised learning, with application to object recognition

OpenAIRE

Carbonetto , Peter; De Freitas , Nando; Gustafson , Paul; Thompson , Natalie

2003-01-01

International audience; We present a method for variable selection/weighting in an unsupervised learning context using Bayesian shrinkage. The basis for the model parameters and cluster assignments can be computed simultaneous using an efficient EM algorithm. Applying our Bayesian shrinkage model to a complex problem in object recognition (Duygulu, Barnard, de Freitas and Forsyth 2002), our experiments yied good results.
Hadoop Cluster Deployment: A Methodological Approach

Directory of Open Access Journals (Sweden)

Ronaldo Celso Messias Correia

2018-05-01

Full Text Available For a long time, data has been treated as a general problem because it just represents fractions of an event without any relevant purpose. However, the last decade has been just about information and how to get it. Seeking meaning in data and trying to solve scalability problems, many frameworks have been developed to improve data storage and its analysis. As a framework, Hadoop was presented as a powerful tool to deal with large amounts of data. However, it still causes doubts about how to deal with its deployment and if there is any reliable method to compare the performance of distinct Hadoop clusters. This paper presents a methodology based on benchmark analysis to guide the Hadoop cluster deployment. The experiments employed The Apache Hadoop and the Hadoop distributions of Cloudera, Hortonworks, and MapR, analyzing the architectures on local and on clouding—using centralized and geographically distributed servers. The results show the methodology can be dynamically applied on a reliable comparison among different architectures. Additionally, the study suggests that the knowledge acquired can be used to improve the data analysis process by understanding the Hadoop architecture.
ClusTrack: feature extraction and similarity measures for clustering of genome-wide data sets.

Directory of Open Access Journals (Sweden)

Halfdan Rydbeck

Full Text Available Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-level genomic and epigenomic data, e.g. ChIP-based data. We here introduce a general methodology for clustering data sets of coordinates relative to a genome assembly, i.e. genomic tracks. By defining appropriate feature extraction approaches and similarity measures, we allow biologically meaningful clustering to be performed for genomic tracks using standard clustering algorithms. An implementation of the methodology is provided through a tool, ClusTrack, which allows fine-tuned clustering analyses to be specified through a web-based interface. We apply our methods to the clustering of occupancy of the H3K4me1 histone modification in samples from a range of different cell types. The majority of samples form meaningful subclusters, confirming that the definitions of features and similarity capture biological, rather than technical, variation between the genomic tracks. Input data and results are available, and can be reproduced, through a Galaxy Pages document at http://hyperbrowser.uio.no/hb/u/hb-superuser/p/clustrack. The clustering functionality is available as a Galaxy tool, under the menu option "Specialized analyzis of tracks", and the submenu option "Cluster tracks based on genome level similarity", at the Genomic HyperBrowser server: http://hyperbrowser.uio.no/hb/.
Evaluating Mixture Modeling for Clustering: Recommendations and Cautions

Science.gov (United States)

Steinley, Douglas; Brusco, Michael J.

2011-01-01

This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…

The composite sequential clustering technique for analysis of multispectral scanner data

Science.gov (United States)

Su, M. Y.

1972-01-01

The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.
Unsupervised consensus cluster analysis of [18F]-fluoroethyl-L-tyrosine positron emission tomography identified textural features for the diagnosis of pseudoprogression in high-grade glioma.

Science.gov (United States)

Kebir, Sied; Khurshid, Zain; Gaertner, Florian C; Essler, Markus; Hattingen, Elke; Fimmers, Rolf; Scheffler, Björn; Herrlinger, Ulrich; Bundschuh, Ralph A; Glas, Martin

2017-01-31

Timely detection of pseudoprogression (PSP) is crucial for the management of patients with high-grade glioma (HGG) but remains difficult. Textural features of O-(2-[18F]fluoroethyl)-L-tyrosine positron emission tomography (FET-PET) mirror tumor uptake heterogeneity; some of them may be associated with tumor progression. Fourteen patients with HGG and suspected of PSP underwent FET-PET imaging. A set of 19 conventional and textural FET-PET features were evaluated and subjected to unsupervised consensus clustering. The final diagnosis of true progression vs. PSP was based on follow-up MRI using RANO criteria. Three robust clusters have been identified based on 10 predominantly textural FET-PET features. None of the patients with PSP fell into cluster 2, which was associated with high values for textural FET-PET markers of uptake heterogeneity. Three out of 4 patients with PSP were assigned to cluster 3 that was largely associated with low values of textural FET-PET features. By comparison, tumor-to-normal brain ratio (TNRmax) at the optimal cutoff 2.1 was less predictive of PSP (negative predictive value 57% for detecting true progression, p=0.07 vs. 75% with cluster 3, p=0.04). Clustering based on textural O-(2-[18F]fluoroethyl)-L-tyrosine PET features may provide valuable information in assessing the elusive phenomenon of pseudoprogression.
Statistical mechanics of semi-supervised clustering in sparse graphs

International Nuclear Information System (INIS)

Ver Steeg, Greg; Galstyan, Aram; Allahverdyan, Armen E

2011-01-01

We theoretically study semi-supervised clustering in sparse graphs in the presence of pair-wise constraints on the cluster assignments of nodes. We focus on bi-cluster graphs and study the impact of semi-supervision for varying constraint density and overlap between the clusters. Recent results for unsupervised clustering in sparse graphs indicate that there is a critical ratio of within-cluster and between-cluster connectivities below which clusters cannot be recovered with better than random accuracy. The goal of this paper is to examine the impact of pair-wise constraints on the clustering accuracy. Our results suggest that the addition of constraints does not provide automatic improvement over the unsupervised case. When the density of the constraints is sufficiently small, their only impact is to shift the detection threshold while preserving the criticality. Conversely, if the density of (hard) constraints is above the percolation threshold, the criticality is suppressed and the detection threshold disappears
Single pass kernel k-means clustering method

Indian Academy of Sciences (India)

In unsupervised classiﬁcation, kernel -means clustering method has been shown to perform better than conventional -means clustering method in ... 518501, India; Department of Computer Science and Engineering, Jawaharlal Nehru Technological University, Anantapur College of Engineering, Anantapur 515002, India ...
Humanitarian Logistics: a Clustering Methodology for Assisting Humanitarian Operations

Directory of Open Access Journals (Sweden)

Fabiana santos Lima

2014-06-01

Full Text Available In this paper, we propose a methodology to identify and classify regions by the type and frequency of disasters. The data on the clusters allow you to extract information that can be used in the preparedness phase as well as to identify the relief items needed to meet each cluster. Using this approach, the clusters are formed by using a computing tool that uses as the input the history data of the disasters in the Brazilian state of Santa Catarina, with a specific focus on: windstorms, hail, floods, droughts, landslides, and flash floods. The results show that the knowledge provided by the clustering analysis contributes to the decision making process in the response phase of Humanitarian Logistics (HL.
Unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma.

Science.gov (United States)

Young, Jonathan D; Cai, Chunhui; Lu, Xinghua

2017-10-03

One approach to improving the personalized treatment of cancer is to understand the cellular signaling transduction pathways that cause cancer at the level of the individual patient. In this study, we used unsupervised deep learning to learn the hierarchical structure within cancer gene expression data. Deep learning is a group of machine learning algorithms that use multiple layers of hidden units to capture hierarchically related, alternative representations of the input data. We hypothesize that this hierarchical structure learned by deep learning will be related to the cellular signaling system. Robust deep learning model selection identified a network architecture that is biologically plausible. Our model selection results indicated that the 1st hidden layer of our deep learning model should contain about 1300 hidden units to most effectively capture the covariance structure of the input data. This agrees with the estimated number of human transcription factors, which is approximately 1400. This result lends support to our hypothesis that the 1st hidden layer of a deep learning model trained on gene expression data may represent signals related to transcription factor activation. Using the 3rd hidden layer representation of each tumor as learned by our unsupervised deep learning model, we performed consensus clustering on all tumor samples-leading to the discovery of clusters of glioblastoma multiforme with differential survival. One of these clusters contained all of the glioblastoma samples with G-CIMP, a known methylation phenotype driven by the IDH1 mutation and associated with favorable prognosis, suggesting that the hidden units in the 3rd hidden layer representations captured a methylation signal without explicitly using methylation data as input. We also found differentially expressed genes and well-known mutations (NF1, IDH1, EGFR) that were uniquely correlated with each of these clusters. Exploring these unique genes and mutations will allow us to
The clustering-based case-based reasoning for imbalanced business failure prediction: a hybrid approach through integrating unsupervised process with supervised process

Science.gov (United States)

Li, Hui; Yu, Jun-Ling; Yu, Le-An; Sun, Jie

2014-05-01

Case-based reasoning (CBR) is one of the main forecasting methods in business forecasting, which performs well in prediction and holds the ability of giving explanations for the results. In business failure prediction (BFP), the number of failed enterprises is relatively small, compared with the number of non-failed ones. However, the loss is huge when an enterprise fails. Therefore, it is necessary to develop methods (trained on imbalanced samples) which forecast well for this small proportion of failed enterprises and performs accurately on total accuracy meanwhile. Commonly used methods constructed on the assumption of balanced samples do not perform well in predicting minority samples on imbalanced samples consisting of the minority/failed enterprises and the majority/non-failed ones. This article develops a new method called clustering-based CBR (CBCBR), which integrates clustering analysis, an unsupervised process, with CBR, a supervised process, to enhance the efficiency of retrieving information from both minority and majority in CBR. In CBCBR, various case classes are firstly generated through hierarchical clustering inside stored experienced cases, and class centres are calculated out by integrating cases information in the same clustered class. When predicting the label of a target case, its nearest clustered case class is firstly retrieved by ranking similarities between the target case and each clustered case class centre. Then, nearest neighbours of the target case in the determined clustered case class are retrieved. Finally, labels of the nearest experienced cases are used in prediction. In the empirical experiment with two imbalanced samples from China, the performance of CBCBR was compared with the classical CBR, a support vector machine, a logistic regression and a multi-variant discriminate analysis. The results show that compared with the other four methods, CBCBR performed significantly better in terms of sensitivity for identifying the
Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.

Directory of Open Access Journals (Sweden)

Ujjwal Maulik

Full Text Available Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request.sarkar@labri.fr.
Unsupervised daily routine and activity discovery in smart homes.

Science.gov (United States)

Jie Yin; Qing Zhang; Karunanithi, Mohan

2015-08-01

The ability to accurately recognize daily activities of residents is a core premise of smart homes to assist with remote health monitoring. Most of the existing methods rely on a supervised model trained from a preselected and manually labeled set of activities, which are often time-consuming and costly to obtain in practice. In contrast, this paper presents an unsupervised method for discovering daily routines and activities for smart home residents. Our proposed method first uses a Markov chain to model a resident's locomotion patterns at different times of day and discover clusters of daily routines at the macro level. For each routine cluster, it then drills down to further discover room-level activities at the micro level. The automatic identification of daily routines and activities is useful for understanding indicators of functional decline of elderly people and suggesting timely interventions.
Partitional clustering algorithms

CERN Document Server

2015-01-01

This book summarizes the state-of-the-art in partitional clustering. Clustering, the unsupervised classification of patterns into groups, is one of the most important tasks in exploratory data analysis. Primary goals of clustering include gaining insight into, classifying, and compressing data. Clustering has a long and rich history that spans a variety of scientific disciplines including anthropology, biology, medicine, psychology, statistics, mathematics, engineering, and computer science. As a result, numerous clustering algorithms have been proposed since the early 1950s. Among these algorithms, partitional (nonhierarchical) ones have found many applications, especially in engineering and computer science. This book provides coverage of consensus clustering, constrained clustering, large scale and/or high dimensional clustering, cluster validity, cluster visualization, and applications of clustering. Examines clustering as it applies to large and/or high-dimensional data sets commonly encountered in reali...
Rational Variety Mapping for Contrast-Enhanced Nonlinear Unsupervised Segmentation of Multispectral Images of Unstained Specimen

Science.gov (United States)

Kopriva, Ivica; Hadžija, Mirko; Popović Hadžija, Marijana; Korolija, Marina; Cichocki, Andrzej

2011-01-01

A methodology is proposed for nonlinear contrast-enhanced unsupervised segmentation of multispectral (color) microscopy images of principally unstained specimens. The methodology exploits spectral diversity and spatial sparseness to find anatomical differences between materials (cells, nuclei, and background) present in the image. It consists of rth-order rational variety mapping (RVM) followed by matrix/tensor factorization. Sparseness constraint implies duality between nonlinear unsupervised segmentation and multiclass pattern assignment problems. Classes not linearly separable in the original input space become separable with high probability in the higher-dimensional mapped space. Hence, RVM mapping has two advantages: it takes implicitly into account nonlinearities present in the image (ie, they are not required to be known) and it increases spectral diversity (ie, contrast) between materials, due to increased dimensionality of the mapped space. This is expected to improve performance of systems for automated classification and analysis of microscopic histopathological images. The methodology was validated using RVM of the second and third orders of the experimental multispectral microscopy images of unstained sciatic nerve fibers (nervus ischiadicus) and of unstained white pulp in the spleen tissue, compared with a manually defined ground truth labeled by two trained pathophysiologists. The methodology can also be useful for additional contrast enhancement of images of stained specimens. PMID:21708116
KMEANS CLUSTERING FOR HIDDEN MARKOV MODEL

NARCIS (Netherlands)

Perrone, M.P.; Connell, S.D.

2004-01-01

An unsupervised kmeans clustering algorithm for hidden Markov models is described and applied to the task of generating subclass models for individual handwritten character classes. The algorithm is compared to a related clustering method and shown to give a relative change in the error rate of as
Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

Directory of Open Access Journals (Sweden)

Guoqi Qian

2016-01-01

Full Text Available Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method.
Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

Directory of Open Access Journals (Sweden)

Marco Borri

Full Text Available To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment.The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4. Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters.The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4, determined with cluster validation, produced the best separation between reducing and non-reducing clusters.The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.
Unsupervised Classification of Surface Defects in Wire Rod Production Obtained by Eddy Current Sensors

Directory of Open Access Journals (Sweden)

Sergio Saludes-Rodil

2015-04-01

Full Text Available An unsupervised approach to classify surface defects in wire rod manufacturing is developed in this paper. The defects are extracted from an eddy current signal and classified using a clustering technique that uses the dynamic time warping distance as the dissimilarity measure. The new approach has been successfully tested using industrial data. It is shown that it outperforms other classification alternatives, such as the modified Fourier descriptors.
Unsupervised Classification Using Immune Algorithm

OpenAIRE

Al-Muallim, M. T.; El-Kouatly, R.

2012-01-01

Unsupervised classification algorithm based on clonal selection principle named Unsupervised Clonal Selection Classification (UCSC) is proposed in this paper. The new proposed algorithm is data driven and self-adaptive, it adjusts its parameters to the data to make the classification operation as fast as possible. The performance of UCSC is evaluated by comparing it with the well known K-means algorithm using several artificial and real-life data sets. The experiments show that the proposed U...
An Unsupervised Online Spike-Sorting Framework.

Science.gov (United States)

Knieling, Simeon; Sridharan, Kousik S; Belardinelli, Paolo; Naros, Georgios; Weiss, Daniel; Mormann, Florian; Gharabaghi, Alireza

2016-08-01

Extracellular neuronal microelectrode recordings can include action potentials from multiple neurons. To separate spikes from different neurons, they can be sorted according to their shape, a procedure referred to as spike-sorting. Several algorithms have been reported to solve this task. However, when clustering outcomes are unsatisfactory, most of them are difficult to adjust to achieve the desired results. We present an online spike-sorting framework that uses feature normalization and weighting to maximize the distinctiveness between different spike shapes. Furthermore, multiple criteria are applied to either facilitate or prevent cluster fusion, thereby enabling experimenters to fine-tune the sorting process. We compare our method to established unsupervised offline (Wave_Clus (WC)) and online (OSort (OS)) algorithms by examining their performance in sorting various test datasets using two different scoring systems (AMI and the Adamos metric). Furthermore, we evaluate sorting capabilities on intra-operative recordings using established quality metrics. Compared to WC and OS, our algorithm achieved comparable or higher scores on average and produced more convincing sorting results for intra-operative datasets. Thus, the presented framework is suitable for both online and offline analysis and could substantially improve the quality of microelectrode-based data evaluation for research and clinical application.
Semi-Supervised Clustering for High-Dimensional and Sparse Features

Science.gov (United States)

Yan, Su

2010-01-01

Clustering is one of the most common data mining tasks, used frequently for data organization and analysis in various application domains. Traditional machine learning approaches to clustering are fully automated and unsupervised where class labels are unknown a priori. In real application domains, however, some "weak" form of side…
Identifying influential individuals on intensive care units: using cluster analysis to explore culture.

Science.gov (United States)

Fong, Allan; Clark, Lindsey; Cheng, Tianyi; Franklin, Ella; Fernandez, Nicole; Ratwani, Raj; Parker, Sarah Henrickson

2017-07-01

The objective of this paper is to identify attribute patterns of influential individuals in intensive care units using unsupervised cluster analysis. Despite the acknowledgement that culture of an organisation is critical to improving patient safety, specific methods to shift culture have not been explicitly identified. A social network analysis survey was conducted and an unsupervised cluster analysis was used. A total of 100 surveys were gathered. Unsupervised cluster analysis was used to group individuals with similar dimensions highlighting three general genres of influencers: well-rounded, knowledge and relational. Culture is created locally by individual influencers. Cluster analysis is an effective way to identify common characteristics among members of an intensive care unit team that are noted as highly influential by their peers. To change culture, identifying and then integrating the influencers in intervention development and dissemination may create more sustainable and effective culture change. Additional studies are ongoing to test the effectiveness of utilising these influencers to disseminate patient safety interventions. This study offers an approach that can be helpful in both identifying and understanding influential team members and may be an important aspect of developing methods to change organisational culture. © 2017 John Wiley & Sons Ltd.
Modelling unsupervised online-learning of artificial grammars: linking implicit and statistical learning.

Science.gov (United States)

Rohrmeier, Martin A; Cross, Ian

2014-07-01

Humans rapidly learn complex structures in various domains. Findings of above-chance performance of some untrained control groups in artificial grammar learning studies raise questions about the extent to which learning can occur in an untrained, unsupervised testing situation with both correct and incorrect structures. The plausibility of unsupervised online-learning effects was modelled with n-gram, chunking and simple recurrent network models. A novel evaluation framework was applied, which alternates forced binary grammaticality judgments and subsequent learning of the same stimulus. Our results indicate a strong online learning effect for n-gram and chunking models and a weaker effect for simple recurrent network models. Such findings suggest that online learning is a plausible effect of statistical chunk learning that is possible when ungrammatical sequences contain a large proportion of grammatical chunks. Such common effects of continuous statistical learning may underlie statistical and implicit learning paradigms and raise implications for study design and testing methodologies. Copyright © 2014 Elsevier Inc. All rights reserved.

Consensus clustering approach to group brain connectivity matrices

Directory of Open Access Journals (Sweden)

Javier Rasero

2017-10-01

Full Text Available A novel approach rooted on the notion of consensus clustering, a strategy developed for community detection in complex networks, is proposed to cope with the heterogeneity that characterizes connectivity matrices in health and disease. The method can be summarized as follows: (a define, for each node, a distance matrix for the set of subjects by comparing the connectivity pattern of that node in all pairs of subjects; (b cluster the distance matrix for each node; (c build the consensus network from the corresponding partitions; and (d extract groups of subjects by finding the communities of the consensus network thus obtained. Different from the previous implementations of consensus clustering, we thus propose to use the consensus strategy to combine the information arising from the connectivity patterns of each node. The proposed approach may be seen either as an exploratory technique or as an unsupervised pretraining step to help the subsequent construction of a supervised classifier. Applications on a toy model and two real datasets show the effectiveness of the proposed methodology, which represents heterogeneity of a set of subjects in terms of a weighted network, the consensus matrix.
A Hybrid Supervised/Unsupervised Machine Learning Approach to Solar Flare Prediction

Science.gov (United States)

Benvenuto, Federico; Piana, Michele; Campi, Cristina; Massone, Anna Maria

2018-01-01

This paper introduces a novel method for flare forecasting, combining prediction accuracy with the ability to identify the most relevant predictive variables. This result is obtained by means of a two-step approach: first, a supervised regularization method for regression, namely, LASSO is applied, where a sparsity-enhancing penalty term allows the identification of the significance with which each data feature contributes to the prediction; then, an unsupervised fuzzy clustering technique for classification, namely, Fuzzy C-Means, is applied, where the regression outcome is partitioned through the minimization of a cost function and without focusing on the optimization of a specific skill score. This approach is therefore hybrid, since it combines supervised and unsupervised learning; realizes classification in an automatic, skill-score-independent way; and provides effective prediction performances even in the case of imbalanced data sets. Its prediction power is verified against NOAA Space Weather Prediction Center data, using as a test set, data in the range between 1996 August and 2010 December and as training set, data in the range between 1988 December and 1996 June. To validate the method, we computed several skill scores typically utilized in flare prediction and compared the values provided by the hybrid approach with the ones provided by several standard (non-hybrid) machine learning methods. The results showed that the hybrid approach performs classification better than all other supervised methods and with an effectiveness comparable to the one of clustering methods; but, in addition, it provides a reliable ranking of the weights with which the data properties contribute to the forecast.
Unsupervised Learning (Clustering) of Odontocete Echolocation Clicks

Science.gov (United States)

2015-09-30

develop methods for clustering of marine mammal echolocation clicks to learn about species assemblages where little or no prior knowledge exists about... Mexico or the Atlanic. 2 APPROACH Acoustic encounters with odontocetes are detected automatically and noise-corrected cepstral features...Estmation of Marine Mammals Using Passive Acoustic Monitoring (DCLDE). KL divergence maps were created for all known species, but the sperm whale
Methodology for Clustering High-Resolution Spatiotemporal Solar Resource Data

Energy Technology Data Exchange (ETDEWEB)

Getman, Dan [National Renewable Energy Lab. (NREL), Golden, CO (United States); Lopez, Anthony [National Renewable Energy Lab. (NREL), Golden, CO (United States); Mai, Trieu [National Renewable Energy Lab. (NREL), Golden, CO (United States); Dyson, Mark [National Renewable Energy Lab. (NREL), Golden, CO (United States)

2015-09-01

In this report, we introduce a methodology to achieve multiple levels of spatial resolution reduction of solar resource data, with minimal impact on data variability, for use in energy systems modeling. The selection of an appropriate clustering algorithm, parameter selection including cluster size, methods of temporal data segmentation, and methods of cluster evaluation are explored in the context of a repeatable process. In describing this process, we illustrate the steps in creating a reduced resolution, but still viable, dataset to support energy systems modeling, e.g. capacity expansion or production cost modeling. This process is demonstrated through the use of a solar resource dataset; however, the methods are applicable to other resource data represented through spatiotemporal grids, including wind data. In addition to energy modeling, the techniques demonstrated in this paper can be used in a novel top-down approach to assess renewable resources within many other contexts that leverage variability in resource data but require reduction in spatial resolution to accommodate modeling or computing constraints.
Unsupervised Image Segmentation

Czech Academy of Sciences Publication Activity Database

Haindl, Michal; Mikeš, Stanislav

2014-01-01

Roč. 36, č. 4 (2014), s. 23-23 R&D Projects: GA ČR(CZ) GA14-10911S Institutional support: RVO:67985556 Keywords : unsupervised image segmentation Subject RIV: BD - Theory of Information http://library.utia.cas.cz/separaty/2014/RO/haindl-0434412.pdf
Semi-supervised clustering methods.

Science.gov (United States)

Bair, Eric

2013-01-01

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.
A new hybrid imperialist competitive algorithm on data clustering

Indian Academy of Sciences (India)

Modified imperialist competitive algorithm; simulated annealing; ... Clustering is one of the unsupervised learning branches where a set of patterns, usually vectors ..... machine classification is based on design, operation, and/or purpose.
On the Multi-Modal Object Tracking and Image Fusion Using Unsupervised Deep Learning Methodologies

Science.gov (United States)

LaHaye, N.; Ott, J.; Garay, M. J.; El-Askary, H. M.; Linstead, E.

2017-12-01

The number of different modalities of remote-sensors has been on the rise, resulting in large datasets with different complexity levels. Such complex datasets can provide valuable information separately, yet there is a bigger value in having a comprehensive view of them combined. As such, hidden information can be deduced through applying data mining techniques on the fused data. The curse of dimensionality of such fused data, due to the potentially vast dimension space, hinders our ability to have deep understanding of them. This is because each dataset requires a user to have instrument-specific and dataset-specific knowledge for optimum and meaningful usage. Once a user decides to use multiple datasets together, deeper understanding of translating and combining these datasets in a correct and effective manner is needed. Although there exists data centric techniques, generic automated methodologies that can potentially solve this problem completely don't exist. Here we are developing a system that aims to gain a detailed understanding of different data modalities. Such system will provide an analysis environment that gives the user useful feedback and can aid in research tasks. In our current work, we show the initial outputs our system implementation that leverages unsupervised deep learning techniques so not to burden the user with the task of labeling input data, while still allowing for a detailed machine understanding of the data. Our goal is to be able to track objects, like cloud systems or aerosols, across different image-like data-modalities. The proposed system is flexible, scalable and robust to understand complex likenesses within multi-modal data in a similar spatio-temporal range, and also to be able to co-register and fuse these images when needed.
Unsupervised classification of multivariate geostatistical data: Two algorithms

Science.gov (United States)

Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques

2015-12-01

With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.
Unsupervised action classification using space-time link analysis

DEFF Research Database (Denmark)

Liu, Haowei; Feris, Rogerio; Krüger, Volker

2010-01-01

In this paper we address the problem of unsupervised discovery of action classes in video data. Different from all existing methods thus far proposed for this task, we present a space-time link analysis approach which matches the performance of traditional unsupervised action categorization metho...
Dimensionality reduction with unsupervised nearest neighbors

CERN Document Server

Kramer, Oliver

2013-01-01

This book is devoted to a novel approach for dimensionality reduction based on the famous nearest neighbor method that is a powerful classification and regression approach. It starts with an introduction to machine learning concepts and a real-world application from the energy domain. Then, unsupervised nearest neighbors (UNN) is introduced as efficient iterative method for dimensionality reduction. Various UNN models are developed step by step, reaching from a simple iterative strategy for discrete latent spaces to a stochastic kernel-based algorithm for learning submanifolds with independent parameterizations. Extensions that allow the embedding of incomplete and noisy patterns are introduced. Various optimization approaches are compared, from evolutionary to swarm-based heuristics. Experimental comparisons to related methodologies taking into account artificial test data sets and also real-world data demonstrate the behavior of UNN in practical scenarios. The book contains numerous color figures to illustr...
An improved clustering algorithm based on reverse learning in intelligent transportation

Science.gov (United States)

Qiu, Guoqing; Kou, Qianqian; Niu, Ting

2017-05-01

With the development of artificial intelligence and data mining technology, big data has gradually entered people's field of vision. In the process of dealing with large data, clustering is an important processing method. By introducing the reverse learning method in the clustering process of PAM clustering algorithm, to further improve the limitations of one-time clustering in unsupervised clustering learning, and increase the diversity of clustering clusters, so as to improve the quality of clustering. The algorithm analysis and experimental results show that the algorithm is feasible.
Unsupervised EEG analysis for automated epileptic seizure detection

Science.gov (United States)

Birjandtalab, Javad; Pouyan, Maziyar Baran; Nourani, Mehrdad

2016-07-01

Epilepsy is a neurological disorder which can, if not controlled, potentially cause unexpected death. It is extremely crucial to have accurate automatic pattern recognition and data mining techniques to detect the onset of seizures and inform care-givers to help the patients. EEG signals are the preferred biosignals for diagnosis of epileptic patients. Most of the existing pattern recognition techniques used in EEG analysis leverage the notion of supervised machine learning algorithms. Since seizure data are heavily under-represented, such techniques are not always practical particularly when the labeled data is not sufficiently available or when disease progression is rapid and the corresponding EEG footprint pattern will not be robust. Furthermore, EEG pattern change is highly individual dependent and requires experienced specialists to annotate the seizure and non-seizure events. In this work, we present an unsupervised technique to discriminate seizures and non-seizures events. We employ power spectral density of EEG signals in different frequency bands that are informative features to accurately cluster seizure and non-seizure events. The experimental results tried so far indicate achieving more than 90% accuracy in clustering seizure and non-seizure events without having any prior knowledge on patient's history.
Do COPD subtypes really exist? COPD heterogeneity and clustering in 10 independent cohorts

NARCIS (Netherlands)

Castaldi, Peter J; Benet, Marta; Petersen, Hans; Rafaels, Nicholas; Finigan, James; Paoletti, Matteo; Marike Boezen, H; Vonk, Judith M; Bowler, Russell; Pistolesi, Massimo; Puhan, Milo A; Anto, Josep; Wauters, Els; Lambrechts, Diether; Janssens, Wim; Bigazzi, Francesca; Camiciottoli, Gianna; Cho, Michael H; Hersh, Craig P; Barnes, Kathleen; Rennard, Stephen; Boorgula, Meher Preethi; Dy, Jennifer; Hansel, Nadia N; Crapo, James D; Tesfaigzi, Yohannes; Agusti, Alvar; Silverman, Edwin K; Garcia-Aymerich, Judith

Background COPD is a heterogeneous disease, but there is little consensus on specific definitions for COPD subtypes. Unsupervised clustering offers the promise of 'unbiased' data-driven assessment of COPD heterogeneity. Multiple groups have identified COPD subtypes using cluster analysis, but there
Semi-Supervised Generation with Cluster-aware Generative Models

DEFF Research Database (Denmark)

Maaløe, Lars; Fraccaro, Marco; Winther, Ole

2017-01-01

Deep generative models trained with large amounts of unlabelled data have proven to be powerful within the domain of unsupervised learning. Many real life data sets contain a small amount of labelled data points, that are typically disregarded when training generative models. We propose the Clust...... a log-likelihood of −79.38 nats on permutation invariant MNIST, while also achieving competitive semi-supervised classification accuracies. The model can also be trained fully unsupervised, and still improve the log-likelihood performance with respect to related methods.......Deep generative models trained with large amounts of unlabelled data have proven to be powerful within the domain of unsupervised learning. Many real life data sets contain a small amount of labelled data points, that are typically disregarded when training generative models. We propose the Cluster...
A new web-based system for unsupervised classification of satellite images from the Google Maps engine

Science.gov (United States)

Ferrán, Ángel; Bernabé, Sergio; García-Rodríguez, Pablo; Plaza, Antonio

2012-10-01

In this paper, we develop a new web-based system for unsupervised classification of satellite images available from the Google Maps engine. The system has been developed using the Google Maps API and incorporates functionalities such as unsupervised classification of image portions selected by the user (at the desired zoom level). For this purpose, we use a processing chain made up of the well-known ISODATA and k-means algorithms, followed by spatial post-processing based on majority voting. The system is currently hosted on a high performance server which performs the execution of classification algorithms and returns the obtained classification results in a very efficient way. The previous functionalities are necessary to use efficient techniques for the classification of images and the incorporation of content-based image retrieval (CBIR). Several experimental validation types of the classification results with the proposed system are performed by comparing the classification accuracy of the proposed chain by means of techniques available in the well-known Environment for Visualizing Images (ENVI) software package. The server has access to a cluster of commodity graphics processing units (GPUs), hence in future work we plan to perform the processing in parallel by taking advantage of the cluster.
Towards a methodology for cluster searching to provide conceptual and contextual "richness" for systematic reviews of complex interventions: case study (CLUSTER).

Science.gov (United States)

Booth, Andrew; Harris, Janet; Croot, Elizabeth; Springett, Jane; Campbell, Fiona; Wilkins, Emma

2013-09-28

Systematic review methodologies can be harnessed to help researchers to understand and explain how complex interventions may work. Typically, when reviewing complex interventions, a review team will seek to understand the theories that underpin an intervention and the specific context for that intervention. A single published report from a research project does not typically contain this required level of detail. A review team may find it more useful to examine a "study cluster"; a group of related papers that explore and explain various features of a single project and thus supply necessary detail relating to theory and/or context.We sought to conduct a preliminary investigation, from a single case study review, of techniques required to identify a cluster of related research reports, to document the yield from such methods, and to outline a systematic methodology for cluster searching. In a systematic review of community engagement we identified a relevant project - the Gay Men's Task Force. From a single "key pearl citation" we conducted a series of related searches to find contextually or theoretically proximate documents. We followed up Citations, traced Lead authors, identified Unpublished materials, searched Google Scholar, tracked Theories, undertook ancestry searching for Early examples and followed up Related projects (embodied in the CLUSTER mnemonic). Our structured, formalised procedure for cluster searching identified useful reports that are not typically identified from topic-based searches on bibliographic databases. Items previously rejected by an initial sift were subsequently found to inform our understanding of underpinning theory (for example Diffusion of Innovations Theory), context or both. Relevant material included book chapters, a Web-based process evaluation, and peer reviewed reports of projects sharing a common ancestry. We used these reports to understand the context for the intervention and to explore explanations for its relative
Unsupervised Performance Evaluation of Image Segmentation

Directory of Open Access Journals (Sweden)

Chabrier Sebastien

2006-01-01

Full Text Available We present in this paper a study of unsupervised evaluation criteria that enable the quantification of the quality of an image segmentation result. These evaluation criteria compute some statistics for each region or class in a segmentation result. Such an evaluation criterion can be useful for different applications: the comparison of segmentation results, the automatic choice of the best fitted parameters of a segmentation method for a given image, or the definition of new segmentation methods by optimization. We first present the state of art of unsupervised evaluation, and then, we compare six unsupervised evaluation criteria. For this comparative study, we use a database composed of 8400 synthetic gray-level images segmented in four different ways. Vinet's measure (correct classification rate is used as an objective criterion to compare the behavior of the different criteria. Finally, we present the experimental results on the segmentation evaluation of a few gray-level natural images.
Characterizing Interference in Radio Astronomy Observations through Active and Unsupervised Learning

Science.gov (United States)

Doran, G.

2013-01-01

In the process of observing signals from astronomical sources, radio astronomers must mitigate the effects of manmade radio sources such as cell phones, satellites, aircraft, and observatory equipment. Radio frequency interference (RFI) often occurs as short bursts (active learning approach in which an astronomer labels events that are most confusing to a classifier, minimizing the human effort required for classification. We also explore the use of unsupervised clustering techniques, which automatically group events into classes without user input. We apply these techniques to data from the Parkes Multibeam Pulsar Survey to characterize several million detected RFI events from over a thousand hours of observation.
Multispectral and Panchromatic used Enhancement Resolution and Study Effective Enhancement on Supervised and Unsupervised Classification Land – Cover

Science.gov (United States)

Salman, S. S.; Abbas, W. A.

2018-05-01

The goal of the study is to support analysis Enhancement of Resolution and study effect on classification methods on bands spectral information of specific and quantitative approaches. In this study introduce a method to enhancement resolution Landsat 8 of combining the bands spectral of 30 meters resolution with panchromatic band 8 of 15 meters resolution, because of importance multispectral imagery to extracting land - cover. Classification methods used in this study to classify several lands -covers recorded from OLI- 8 imagery. Two methods of Data mining can be classified as either supervised or unsupervised. In supervised methods, there is a particular predefined target, that means the algorithm learn which values of the target are associated with which values of the predictor sample. K-nearest neighbors and maximum likelihood algorithms examine in this work as supervised methods. In other hand, no sample identified as target in unsupervised methods, the algorithm of data extraction searches for structure and patterns between all the variables, represented by Fuzzy C-mean clustering method as one of the unsupervised methods, NDVI vegetation index used to compare the results of classification method, the percent of dense vegetation in maximum likelihood method give a best results.

SU-D-204-01: A Methodology Based On Machine Learning and Quantum Clustering to Predict Lung SBRT Dosimetric Endpoints From Patient Specific Anatomic Features

Energy Technology Data Exchange (ETDEWEB)

Lafata, K; Ren, L; Wu, Q; Kelsey, C; Hong, J; Cai, J; Yin, F [Duke University Medical Center, Durham, NC (United States)

2016-06-15

Purpose: To develop a data-mining methodology based on quantum clustering and machine learning to predict expected dosimetric endpoints for lung SBRT applications based on patient-specific anatomic features. Methods: Ninety-three patients who received lung SBRT at our clinic from 2011–2013 were retrospectively identified. Planning information was acquired for each patient, from which various features were extracted using in-house semi-automatic software. Anatomic features included tumor-to-OAR distances, tumor location, total-lung-volume, GTV and ITV. Dosimetric endpoints were adopted from RTOG-0195 recommendations, and consisted of various OAR-specific partial-volume doses and maximum point-doses. First, PCA analysis and unsupervised quantum-clustering was used to explore the feature-space to identify potentially strong classifiers. Secondly, a multi-class logistic regression algorithm was developed and trained to predict dose-volume endpoints based on patient-specific anatomic features. Classes were defined by discretizing the dose-volume data, and the feature-space was zero-mean normalized. Fitting parameters were determined by minimizing a regularized cost function, and optimization was performed via gradient descent. As a pilot study, the model was tested on two esophageal dosimetric planning endpoints (maximum point-dose, dose-to-5cc), and its generalizability was evaluated with leave-one-out cross-validation. Results: Quantum-Clustering demonstrated a strong separation of feature-space at 15Gy across the first-and-second Principle Components of the data when the dosimetric endpoints were retrospectively identified. Maximum point dose prediction to the esophagus demonstrated a cross-validation accuracy of 87%, and the maximum dose to 5cc demonstrated a respective value of 79%. The largest optimized weighting factor was placed on GTV-to-esophagus distance (a factor of 10 greater than the second largest weighting factor), indicating an intuitively strong
Detecting Transitions in Manual Tasks from Wearables: An Unsupervised Labeling Approach

Directory of Open Access Journals (Sweden)

Sebastian Böttcher

2018-03-01

Full Text Available Authoring protocols for manual tasks such as following recipes, manufacturing processes or laboratory experiments requires significant effort. This paper presents a system that estimates individual procedure transitions from the user’s physical movement and gestures recorded with inertial motion sensors. Combined with egocentric or external video recordings, this facilitates efficient review and annotation of video databases. We investigate different clustering algorithms on wearable inertial sensor data recorded on par with video data, to automatically create transition marks between task steps. The goal is to match these marks to the transitions given in a description of the workflow, thus creating navigation cues to browse video repositories of manual work. To evaluate the performance of unsupervised algorithms, the automatically-generated marks are compared to human expert-created labels on two publicly-available datasets. Additionally, we tested the approach on a novel dataset in a manufacturing lab environment, describing an existing sequential manufacturing process. The results from selected clustering methods are also compared to some supervised methods.
Automatic Clustering Using FSDE-Forced Strategy Differential Evolution

Science.gov (United States)

Yasid, A.

2018-01-01

Clustering analysis is important in datamining for unsupervised data, cause no adequate prior knowledge. One of the important tasks is defining the number of clusters without user involvement that is known as automatic clustering. This study intends on acquiring cluster number automatically utilizing forced strategy differential evolution (AC-FSDE). Two mutation parameters, namely: constant parameter and variable parameter are employed to boost differential evolution performance. Four well-known benchmark datasets were used to evaluate the algorithm. Moreover, the result is compared with other state of the art automatic clustering methods. The experiment results evidence that AC-FSDE is better or competitive with other existing automatic clustering algorithm.
Automated three-dimensional morphology-based clustering of human erythrocytes with regular shapes: stomatocytes, discocytes, and echinocytes

Science.gov (United States)

Ahmadzadeh, Ezat; Jaferzadeh, Keyvan; Lee, Jieun; Moon, Inkyu

2017-07-01

We present unsupervised clustering methods for automatic grouping of human red blood cells (RBCs) extracted from RBC quantitative phase images obtained by digital holographic microscopy into three RBC clusters with regular shapes, including biconcave, stomatocyte, and sphero-echinocyte. We select some good features related to the RBC profile and morphology, such as RBC average thickness, sphericity coefficient, and mean corpuscular volume, and clustering methods, including density-based spatial clustering applications with noise, k-medoids, and k-means, are applied to the set of morphological features. The clustering results of RBCs using a set of three-dimensional features are compared against a set of two-dimensional features. Our experimental results indicate that by utilizing the introduced set of features, two groups of biconcave RBCs and old RBCs (suffering from the sphero-echinocyte process) can be perfectly clustered. In addition, by increasing the number of clusters, the three RBC types can be effectively clustered in an automated unsupervised manner with high accuracy. The performance evaluation of the clustering techniques reveals that they can assist hematologists in further diagnosis.
Extending the input–output energy balance methodology in agriculture through cluster analysis

International Nuclear Information System (INIS)

Bojacá, Carlos Ricardo; Casilimas, Héctor Albeiro; Gil, Rodrigo; Schrevens, Eddie

2012-01-01

The input–output balance methodology has been applied to characterize the energy balance of agricultural systems. This study proposes to extend this methodology with the inclusion of multivariate analysis to reveal particular patterns in the energy use of a system. The objective was to demonstrate the usefulness of multivariate exploratory techniques to analyze the variability found in a farming system and, establish efficiency categories that can be used to improve the energy balance of the system. To this purpose an input–output analysis was applied to the major greenhouse tomato production area in Colombia. Individual energy profiles were built and the k-means clustering method was applied to the production factors. On average, the production system in the study zone consumes 141.8 GJ ha −1 to produce 96.4 GJ ha −1 , resulting in an energy efficiency of 0.68. With the k-means clustering analysis, three clusters of farmers were identified with energy efficiencies of 0.54, 0.67 and 0.78. The most energy efficient cluster grouped 56.3% of the farmers. It is possible to optimize the production system by improving the management practices of those with the lowest energy use efficiencies. Multivariate analysis techniques demonstrated to be a complementary pathway to improve the energy efficiency of a system. -- Highlights: ► An input–output energy balance was estimated for greenhouse tomatoes in Colombia. ► We used the k-means clustering method to classify growers based on their energy use. ► Three clusters of growers were found with energy efficiencies of 0.54, 0.67 and 0.78. ► Overall system optimization is possible by improving the energy use of the less efficient.
Unsupervised Condition Change Detection In Large Diesel Engines

DEFF Research Database (Denmark)

Pontoppidan, Niels Henrik; Larsen, Jan

2003-01-01

This paper presents a new method for unsupervised change detection which combines independent component modeling and probabilistic outlier etection. The method further provides a compact data representation, which is amenable to interpretation, i.e., the detected condition changes can be investig...... be investigated further. The method is successfully applied to unsupervised condition change detection in large diesel engines from acoustical emission sensor signal and compared to more classical techniques based on principal component analysis and Gaussian mixture models.......This paper presents a new method for unsupervised change detection which combines independent component modeling and probabilistic outlier etection. The method further provides a compact data representation, which is amenable to interpretation, i.e., the detected condition changes can...
A Comparison of Methods for Player Clustering via Behavioral Telemetry

DEFF Research Database (Denmark)

Drachen, Anders; Thurau, C.; Sifa, R.

2013-01-01

patterns in the behavioral data, and developing profiles that are actionable to game developers. There are numerous methods for unsupervised clustering of user behavior, e.g. k-means/c-means, Nonnegative Matrix Factorization, or Principal Component Analysis. Although all yield behavior categorizations......, interpretation of the resulting categories in terms of actual play behavior can be difficult if not impossible. In this paper, a range of unsupervised techniques are applied together with Archetypal Analysis to develop behavioral clusters from playtime data of 70,014 World of Warcraft players, covering a five......The analysis of user behavior in digital games has been aided by the introduction of user telemetry in game development, which provides unprecedented access to quantitative data on user behavior from the installed game clients of the entire population of players. Player behavior telemetry datasets...
Semi-supervised clustering methods

Science.gov (United States)

Bair, Eric

2013-01-01

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as “semi-supervised clustering” methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided. PMID:24729830
Onto-clust--a methodology for combining clustering analysis and ontological methods for identifying groups of comorbidities for developmental disorders.

Science.gov (United States)

Peleg, Mor; Asbeh, Nuaman; Kuflik, Tsvi; Schertz, Mitchell

2009-02-01

Children with developmental disorders usually exhibit multiple developmental problems (comorbidities). Hence, such diagnosis needs to revolve on developmental disorder groups. Our objective is to systematically identify developmental disorder groups and represent them in an ontology. We developed a methodology that combines two methods (1) a literature-based ontology that we created, which represents developmental disorders and potential developmental disorder groups, and (2) clustering for detecting comorbid developmental disorders in patient data. The ontology is used to interpret and improve clustering results and the clustering results are used to validate the ontology and suggest directions for its development. We evaluated our methodology by applying it to data of 1175 patients from a child development clinic. We demonstrated that the ontology improves clustering results, bringing them closer to an expert generated gold-standard. We have shown that our methodology successfully combines an ontology with a clustering method to support systematic identification and representation of developmental disorder groups.
Framework methodology for increased energy efficiency and renewable feedstock integration in industrial clusters

International Nuclear Information System (INIS)

Hackl, Roman; Harvey, Simon

2013-01-01

Highlights: • Framework methodology for energy efficiency of process plants and total sites. • Identification of suitable biorefinery based on host site future energy systems. • Case study results show large energy savings of site wide heat integration. • Case study on refrigeration systems: 15% shaft work savings potential. • Case study on biorefinery integration: utility savings potential of up to 37%. - Abstract: Energy intensive industries, such as the bulk chemical industry, are facing major challenges and adopting strategies to face these challenges. This paper investigates options for clusters of chemical process plants to decrease their energy and emission footprints. There is a wide range of technologies and process integration opportunities available for achieving these objectives, including (i) decreasing fossil fuel and electricity demand by increasing heat integration within individual processes and across the total cluster site; (ii) replacing fossil feedstocks with renewables and biorefinery integration with the existing cluster; (iii) increasing external utilization of excess process heat wherever possible. This paper presents an overview of the use of process integration methods for development of chemical clusters. Process simulation, pinch analysis, Total Site Analysis (TSA) and exergy concepts are combined in a holistic approach to identify opportunities to improve energy efficiency and integrate renewable feedstocks within such clusters. The methodology is illustrated by application to a chemical cluster in Stenungsund on the West Coast of Sweden consisting of five different companies operating six process plants. The paper emphasizes and quantifies the gains that can be made by adopting a total site approach for targeting energy efficiency measures within the cluster and when investigating integration opportunities for advanced biorefinery concepts compared to restricting the analysis to the individual constituent plants. The
Generation of brain pseudo-CTs using an undersampled, single-acquisition UTE-mDixon pulse sequence and unsupervised clustering

International Nuclear Information System (INIS)

Su, Kuan-Hao; Hu, Lingzhi; Traughber, Melanie; Stehning, Christian; Helle, Michael; Qian, Pengjiang; Thompson, Cheryl L.; Pereira, Gisele C.; Traughber, Bryan J.; Jordan, David W.; Herrmann, Karin A.; Muzic, Raymond F.

2015-01-01

Purpose: MR-based pseudo-CT has an important role in MR-based radiation therapy planning and PET attenuation correction. The purpose of this study is to establish a clinically feasible approach, including image acquisition, correction, and CT formation, for pseudo-CT generation of the brain using a single-acquisition, undersampled ultrashort echo time (UTE)-mDixon pulse sequence. Methods: Nine patients were recruited for this study. For each patient, a 190-s, undersampled, single acquisition UTE-mDixon sequence of the brain was acquired (TE = 0.1, 1.5, and 2.8 ms). A novel method of retrospective trajectory correction of the free induction decay (FID) signal was performed based on point-spread functions of three external MR markers. Two-point Dixon images were reconstructed using the first and second echo data (TE = 1.5 and 2.8 ms). R2 ∗ images (1/T2 ∗ ) were then estimated and were used to provide bone information. Three image features, i.e., Dixon-fat, Dixon-water, and R2 ∗ , were used for unsupervised clustering. Five tissue clusters, i.e., air, brain, fat, fluid, and bone, were estimated using the fuzzy c-means (FCM) algorithm. A two-step, automatic tissue-assignment approach was proposed and designed according to the prior information of the given feature space. Pseudo-CTs were generated by a voxelwise linear combination of the membership functions of the FCM. A low-dose CT was acquired for each patient and was used as the gold standard for comparison. Results: The contrast and sharpness of the FID images were improved after trajectory correction was applied. The mean of the estimated trajectory delay was 0.774 μs (max: 1.350 μs; min: 0.180 μs). The FCM-estimated centroids of different tissue types showed a distinguishable pattern for different tissues, and significant differences were found between the centroid locations of different tissue types. Pseudo-CT can provide additional skull detail and has low bias and absolute error of estimated CT
Automatic microseismic event picking via unsupervised machine learning

Science.gov (United States)

Chen, Yangkang

2018-01-01

Effective and efficient arrival picking plays an important role in microseismic and earthquake data processing and imaging. Widely used short-term-average long-term-average ratio (STA/LTA) based arrival picking algorithms suffer from the sensitivity to moderate-to-strong random ambient noise. To make the state-of-the-art arrival picking approaches effective, microseismic data need to be first pre-processed, for example, removing sufficient amount of noise, and second analysed by arrival pickers. To conquer the noise issue in arrival picking for weak microseismic or earthquake event, I leverage the machine learning techniques to help recognizing seismic waveforms in microseismic or earthquake data. Because of the dependency of supervised machine learning algorithm on large volume of well-designed training data, I utilize an unsupervised machine learning algorithm to help cluster the time samples into two groups, that is, waveform points and non-waveform points. The fuzzy clustering algorithm has been demonstrated to be effective for such purpose. A group of synthetic, real microseismic and earthquake data sets with different levels of complexity show that the proposed method is much more robust than the state-of-the-art STA/LTA method in picking microseismic events, even in the case of moderately strong background noise.
Correlates of Unsupervised Bathing of Infants: A Cross-Sectional Study

Directory of Open Access Journals (Sweden)

Tinneke M. J. Beirens

2013-03-01

Full Text Available Drowning represents the third leading cause of fatal unintentional injury in infants (0–1 years. The aim of this study is to investigate correlates of unsupervised bathing. This cross-sectional study included 1,410 parents with an infant. Parents completed a questionnaire regarding supervision during bathing, socio-demographic factors, and Protection Motivation Theory-constructs. To determine correlates of parents who leave their infant unsupervised, logistic regression analyses were performed. Of the parents, 6.2% left their child unsupervised in the bathtub. Parents with older children (OR 1.24; 95%CI 1.00–1.54 were more likely to leave their child unsupervised in the bathtub. First-time parents (OR 0.59; 95%CI 0.36–0.97 and non-Western migrant fathers (OR 0.18; 95%CI 0.05–0.63 were less likely to leave their child unsupervised in the bathtub. Furthermore, parents who perceived higher self-efficacy (OR 0.57; 95%CI 0.47–0.69, higher response efficacy (OR 0.34; 95%CI 0.24–0.48, and higher severity (OR 0.74; 95%CI 0.58–0.93 were less likely to leave their child unsupervised. Since young children are at great risk of drowning if supervision is absent, effective strategies for drowning prevention should be developed and evaluated. In the meantime, health care professionals should inform parents with regard to the importance of supervision during bathing.
An evaluation of unsupervised and supervised learning algorithms for clustering landscape types in the United States

Science.gov (United States)

Wendel, Jochen; Buttenfield, Barbara P.; Stanislawski, Larry V.

2016-01-01

Knowledge of landscape type can inform cartographic generalization of hydrographic features, because landscape characteristics provide an important geographic context that affects variation in channel geometry, flow pattern, and network configuration. Landscape types are characterized by expansive spatial gradients, lacking abrupt changes between adjacent classes; and as having a limited number of outliers that might confound classification. The US Geological Survey (USGS) is exploring methods to automate generalization of features in the National Hydrography Data set (NHD), to associate specific sequences of processing operations and parameters with specific landscape characteristics, thus obviating manual selection of a unique processing strategy for every NHD watershed unit. A chronology of methods to delineate physiographic regions for the United States is described, including a recent maximum likelihood classification based on seven input variables. This research compares unsupervised and supervised algorithms applied to these seven input variables, to evaluate and possibly refine the recent classification. Evaluation metrics for unsupervised methods include the Davies–Bouldin index, the Silhouette index, and the Dunn index as well as quantization and topographic error metrics. Cross validation and misclassification rate analysis are used to evaluate supervised classification methods. The paper reports the comparative analysis and its impact on the selection of landscape regions. The compared solutions show problems in areas of high landscape diversity. There is some indication that additional input variables, additional classes, or more sophisticated methods can refine the existing classification.
Unsupervised Learning and Pattern Recognition of Biological Data Structures with Density Functional Theory and Machine Learning.

Science.gov (United States)

Chen, Chien-Chang; Juan, Hung-Hui; Tsai, Meng-Yuan; Lu, Henry Horng-Shing

2018-01-11

By introducing the methods of machine learning into the density functional theory, we made a detour for the construction of the most probable density function, which can be estimated by learning relevant features from the system of interest. Using the properties of universal functional, the vital core of density functional theory, the most probable cluster numbers and the corresponding cluster boundaries in a studying system can be simultaneously and automatically determined and the plausibility is erected on the Hohenberg-Kohn theorems. For the method validation and pragmatic applications, interdisciplinary problems from physical to biological systems were enumerated. The amalgamation of uncharged atomic clusters validated the unsupervised searching process of the cluster numbers and the corresponding cluster boundaries were exhibited likewise. High accurate clustering results of the Fisher's iris dataset showed the feasibility and the flexibility of the proposed scheme. Brain tumor detections from low-dimensional magnetic resonance imaging datasets and segmentations of high-dimensional neural network imageries in the Brainbow system were also used to inspect the method practicality. The experimental results exhibit the successful connection between the physical theory and the machine learning methods and will benefit the clinical diagnoses.
Unsupervised laparoscopic appendicectomy by surgical trainees is safe and time-effective.

Science.gov (United States)

Wong, Kenneth; Duncan, Tristram; Pearson, Andrew

2007-07-01

Open appendicectomy is the traditional standard treatment for appendicitis. Laparoscopic appendicectomy is perceived as a procedure with greater potential for complications and longer operative times. This paper examines the hypothesis that unsupervised laparoscopic appendicectomy by surgical trainees is a safe and time-effective valid alternative. Medical records, operating theatre records and histopathology reports of all patients undergoing laparoscopic and open appendicectomy over a 15-month period in two hospitals within an area health service were retrospectively reviewed. Data were analysed to compare patient features, pathology findings, operative times, complications, readmissions and mortality between laparoscopic and open groups and between unsupervised surgical trainee operators versus consultant surgeon operators. A total of 143 laparoscopic and 222 open appendicectomies were reviewed. Unsupervised trainees performed 64% of the laparoscopic appendicectomies and 55% of the open appendicectomies. There were no significant differences in complication rates, readmissions, mortality and length of stay between laparoscopic and open appendicectomy groups or between trainee and consultant surgeon operators. Conversion rates (laparoscopic to open approach) were similar for trainees and consultants. Unsupervised senior surgical trainees did not take significantly longer to perform laparoscopic appendicectomy when compared to unsupervised trainee-performed open appendicectomy. Unsupervised laparoscopic appendicectomy by surgical trainees is safe and time-effective.
Machine learning in APOGEE. Unsupervised spectral classification with K-means

Science.gov (United States)

Garcia-Dias, Rafael; Allende Prieto, Carlos; Sánchez Almeida, Jorge; Ordovás-Pascual, Ignacio

2018-05-01

Context. The volume of data generated by astronomical surveys is growing rapidly. Traditional analysis techniques in spectroscopy either demand intensive human interaction or are computationally expensive. In this scenario, machine learning, and unsupervised clustering algorithms in particular, offer interesting alternatives. The Apache Point Observatory Galactic Evolution Experiment (APOGEE) offers a vast data set of near-infrared stellar spectra, which is perfect for testing such alternatives. Aims: Our research applies an unsupervised classification scheme based on K-means to the massive APOGEE data set. We explore whether the data are amenable to classification into discrete classes. Methods: We apply the K-means algorithm to 153 847 high resolution spectra (R ≈ 22 500). We discuss the main virtues and weaknesses of the algorithm, as well as our choice of parameters. Results: We show that a classification based on normalised spectra captures the variations in stellar atmospheric parameters, chemical abundances, and rotational velocity, among other factors. The algorithm is able to separate the bulge and halo populations, and distinguish dwarfs, sub-giants, RC, and RGB stars. However, a discrete classification in flux space does not result in a neat organisation in the parameters' space. Furthermore, the lack of obvious groups in flux space causes the results to be fairly sensitive to the initialisation, and disrupts the efficiency of commonly-used methods to select the optimal number of clusters. Our classification is publicly available, including extensive online material associated with the APOGEE Data Release 12 (DR12). Conclusions: Our description of the APOGEE database can help greatly with the identification of specific types of targets for various applications. We find a lack of obvious groups in flux space, and identify limitations of the K-means algorithm in dealing with this kind of data. Full Tables B.1-B.4 are only available at the CDS via
Enhancement of ELM by Clustering Discrimination Manifold Regularization and Multiobjective FOA for Semisupervised Classification

OpenAIRE

Qing Ye; Hao Pan; Changhua Liu

2015-01-01

A novel semisupervised extreme learning machine (ELM) with clustering discrimination manifold regularization (CDMR) framework named CDMR-ELM is proposed for semisupervised classification. By using unsupervised fuzzy clustering method, CDMR framework integrates clustering discrimination of both labeled and unlabeled data with twinning constraints regularization. Aiming at further improving the classification accuracy and efficiency, a new multiobjective fruit fly optimization algorithm (MOFOA)...
Inferring hierarchical clustering structures by deterministic annealing

International Nuclear Information System (INIS)

Hofmann, T.; Buhmann, J.M.

1996-01-01

The unsupervised detection of hierarchical structures is a major topic in unsupervised learning and one of the key questions in data analysis and representation. We propose a novel algorithm for the problem of learning decision trees for data clustering and related problems. In contrast to many other methods based on successive tree growing and pruning, we propose an objective function for tree evaluation and we derive a non-greedy technique for tree growing. Applying the principles of maximum entropy and minimum cross entropy, a deterministic annealing algorithm is derived in a meanfield approximation. This technique allows us to canonically superimpose tree structures and to fit parameters to averaged or open-quote fuzzified close-quote trees
Unsupervised text mining methods for literature analysis: a case study for Thomas Pynchon's V.

Directory of Open Access Journals (Sweden)

Christos Iraklis Tsatsoulis

2013-08-01

Full Text Available We investigate the use of unsupervised text mining methods for the analysis of prose literature works, using Thomas Pynchon's novel 'V'. as a case study. Our results suggest that such methods may be employed to reveal meaningful information regarding the novel’s structure. We report results using a wide variety of clustering algorithms, several distinct distance functions, and different visualization techniques. The application of a simple topic model is also demonstrated. We discuss the meaningfulness of our results along with the limitations of our approach, and we suggest some possible paths for further study.

Cluster-randomized Studies in Educational Research: Principles and Methodological Aspects

Directory of Open Access Journals (Sweden)

Dreyhaupt, Jens

2017-05-01

Full Text Available An increasing number of studies are being performed in educational research to evaluate new teaching methods and approaches. These studies could be performed more efficiently and deliver more convincing results if they more strictly applied and complied with recognized standards of scientific studies. Such an approach could substantially increase the quality in particular of prospective, two-arm (intervention studies that aim to compare two different teaching methods. A key standard in such studies is randomization, which can minimize systematic bias in study findings; such bias may result if the two study arms are not structurally equivalent. If possible, educational research studies should also achieve this standard, although this is not yet generally the case. Some difficulties and concerns exist, particularly regarding organizational and methodological aspects. An important point to consider in educational research studies is that usually individuals cannot be randomized, because of the teaching situation, and instead whole groups have to be randomized (so-called “cluster randomization”. Compared with studies with individual randomization, studies with cluster randomization normally require (significantly larger sample sizes and more complex methods for calculating sample size. Furthermore, cluster-randomized studies require more complex methods for statistical analysis. The consequence of the above is that a competent expert with respective special knowledge needs to be involved in all phases of cluster-randomized studies.Studies to evaluate new teaching methods need to make greater use of randomization in order to achieve scientifically convincing results. Therefore, in this article we describe the general principles of cluster randomization and how to implement these principles, and we also outline practical aspects of using cluster randomization in prospective, two-arm comparative educational research studies.
Cluster-randomized Studies in Educational Research: Principles and Methodological Aspects.

Science.gov (United States)

Dreyhaupt, Jens; Mayer, Benjamin; Keis, Oliver; Öchsner, Wolfgang; Muche, Rainer

2017-01-01

An increasing number of studies are being performed in educational research to evaluate new teaching methods and approaches. These studies could be performed more efficiently and deliver more convincing results if they more strictly applied and complied with recognized standards of scientific studies. Such an approach could substantially increase the quality in particular of prospective, two-arm (intervention) studies that aim to compare two different teaching methods. A key standard in such studies is randomization, which can minimize systematic bias in study findings; such bias may result if the two study arms are not structurally equivalent. If possible, educational research studies should also achieve this standard, although this is not yet generally the case. Some difficulties and concerns exist, particularly regarding organizational and methodological aspects. An important point to consider in educational research studies is that usually individuals cannot be randomized, because of the teaching situation, and instead whole groups have to be randomized (so-called "cluster randomization"). Compared with studies with individual randomization, studies with cluster randomization normally require (significantly) larger sample sizes and more complex methods for calculating sample size. Furthermore, cluster-randomized studies require more complex methods for statistical analysis. The consequence of the above is that a competent expert with respective special knowledge needs to be involved in all phases of cluster-randomized studies. Studies to evaluate new teaching methods need to make greater use of randomization in order to achieve scientifically convincing results. Therefore, in this article we describe the general principles of cluster randomization and how to implement these principles, and we also outline practical aspects of using cluster randomization in prospective, two-arm comparative educational research studies.
Statistical Significance for Hierarchical Clustering

Science.gov (United States)

Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

2017-01-01

Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990
An unsupervised strategy for biomedical image segmentation

Directory of Open Access Journals (Sweden)

Roberto Rodríguez

2010-09-01

Full Text Available Roberto Rodríguez1, Rubén Hernández21Digital Signal Processing Group, Institute of Cybernetics, Mathematics, and Physics, Havana, Cuba; 2Interdisciplinary Professional Unit of Engineering and Advanced Technology, IPN, MexicoAbstract: Many segmentation techniques have been published, and some of them have been widely used in different application problems. Most of these segmentation techniques have been motivated by specific application purposes. Unsupervised methods, which do not assume any prior scene knowledge can be learned to help the segmentation process, and are obviously more challenging than the supervised ones. In this paper, we present an unsupervised strategy for biomedical image segmentation using an algorithm based on recursively applying mean shift filtering, where entropy is used as a stopping criterion. This strategy is proven with many real images, and a comparison is carried out with manual segmentation. With the proposed strategy, errors less than 20% for false positives and 0% for false negatives are obtained.Keywords: segmentation, mean shift, unsupervised segmentation, entropy
Unsupervised image matching based on manifold alignment.

Science.gov (United States)

Pei, Yuru; Huang, Fengchun; Shi, Fuhao; Zha, Hongbin

2012-08-01

This paper challenges the issue of automatic matching between two image sets with similar intrinsic structures and different appearances, especially when there is no prior correspondence. An unsupervised manifold alignment framework is proposed to establish correspondence between data sets by a mapping function in the mutual embedding space. We introduce a local similarity metric based on parameterized distance curves to represent the connection of one point with the rest of the manifold. A small set of valid feature pairs can be found without manual interactions by matching the distance curve of one manifold with the curve cluster of the other manifold. To avoid potential confusions in image matching, we propose an extended affine transformation to solve the nonrigid alignment in the embedding space. The comparatively tight alignments and the structure preservation can be obtained simultaneously. The point pairs with the minimum distance after alignment are viewed as the matchings. We apply manifold alignment to image set matching problems. The correspondence between image sets of different poses, illuminations, and identities can be established effectively by our approach.
AUTOMATED UNSUPERVISED CLASSIFICATION OF THE SLOAN DIGITAL SKY SURVEY STELLAR SPECTRA USING k-MEANS CLUSTERING

Energy Technology Data Exchange (ETDEWEB)

Sanchez Almeida, J.; Allende Prieto, C., E-mail: jos@iac.es, E-mail: callende@iac.es [Instituto de Astrofisica de Canarias, E-38205 La Laguna, Tenerife (Spain)

2013-01-20

Large spectroscopic surveys require automated methods of analysis. This paper explores the use of k-means clustering as a tool for automated unsupervised classification of massive stellar spectral catalogs. The classification criteria are defined by the data and the algorithm, with no prior physical framework. We work with a representative set of stellar spectra associated with the Sloan Digital Sky Survey (SDSS) SEGUE and SEGUE-2 programs, which consists of 173,390 spectra from 3800 to 9200 A sampled on 3849 wavelengths. We classify the original spectra as well as the spectra with the continuum removed. The second set only contains spectral lines, and it is less dependent on uncertainties of the flux calibration. The classification of the spectra with continuum renders 16 major classes. Roughly speaking, stars are split according to their colors, with enough finesse to distinguish dwarfs from giants of the same effective temperature, but with difficulties to separate stars with different metallicities. There are classes corresponding to particular MK types, intrinsically blue stars, dust-reddened, stellar systems, and also classes collecting faulty spectra. Overall, there is no one-to-one correspondence between the classes we derive and the MK types. The classification of spectra without continuum renders 13 classes, the color separation is not so sharp, but it distinguishes stars of the same effective temperature and different metallicities. Some classes thus obtained present a fairly small range of physical parameters (200 K in effective temperature, 0.25 dex in surface gravity, and 0.35 dex in metallicity), so that the classification can be used to estimate the main physical parameters of some stars at a minimum computational cost. We also analyze the outliers of the classification. Most of them turn out to be failures of the reduction pipeline, but there are also high redshift QSOs, multiple stellar systems, dust-reddened stars, galaxies, and, finally, odd
Concept formation knowledge and experience in unsupervised learning

CERN Document Server

Fisher, Douglas H; Langley, Pat

1991-01-01

Concept Formation: Knowledge and Experience in Unsupervised Learning presents the interdisciplinary interaction between machine learning and cognitive psychology on unsupervised incremental methods. This book focuses on measures of similarity, strategies for robust incremental learning, and the psychological consistency of various approaches.Organized into three parts encompassing 15 chapters, this book begins with an overview of inductive concept learning in machine learning and psychology, with emphasis on issues that distinguish concept formation from more prevalent supervised methods and f
Unsupervised text mining for assessing and augmenting GWAS results.

Science.gov (United States)

Ailem, Melissa; Role, François; Nadif, Mohamed; Demenais, Florence

2016-04-01

Text mining can assist in the analysis and interpretation of large-scale biomedical data, helping biologists to quickly and cheaply gain confirmation of hypothesized relationships between biological entities. We set this question in the context of genome-wide association studies (GWAS), an actively emerging field that contributed to identify many genes associated with multifactorial diseases. These studies allow to identify groups of genes associated with the same phenotype, but provide no information about the relationships between these genes. Therefore, our objective is to leverage unsupervised text mining techniques using text-based cosine similarity comparisons and clustering applied to candidate and random gene vectors, in order to augment the GWAS results. We propose a generic framework which we used to characterize the relationships between 10 genes reported associated with asthma by a previous GWAS. The results of this experiment showed that the similarities between these 10 genes were significantly stronger than would be expected by chance (one-sided p-value<0.01). The clustering of observed and randomly selected gene also allowed to generate hypotheses about potential functional relationships between these genes and thus contributed to the discovery of new candidate genes for asthma. Copyright © 2016 Elsevier Inc. All rights reserved.
A Comparison between Standard and Functional Clustering Methodologies: Application to Agricultural Fields for Yield Pattern Assessment

Directory of Open Access Journals (Sweden)

Simone Pascucci

2018-04-01

Full Text Available The recognition of spatial patterns within agricultural fields, presenting similar yield potential areas, stable through time, is very important for optimizing agricultural practices. This study proposes the evaluation of different clustering methodologies applied to multispectral satellite time series for retrieving temporally stable (constant patterns in agricultural fields, related to within-field yield spatial distribution. The ability of different clustering procedures for the recognition and mapping of constant patterns in fields of cereal crops was assessed. Crop vigor patterns, considered to be related to soils characteristics, and possibly indicative of yield potential, were derived by applying the different clustering algorithms to time series of Landsat images acquired on 94 agricultural fields near Rome (Italy. Two different approaches were applied and validated using Landsat 7 and 8 archived imagery. The first approach automatically extracts and calculates for each field of interest (FOI the Normalized Difference Vegetation Index (NDVI, then exploits the standard K-means clustering algorithm to derive constant patterns at the field level. The second approach applies novel clustering procedures directly to spectral reflectance time series, in particular: (1 standard K-means; (2 functional K-means; (3 multivariate functional principal components clustering analysis; (4 hierarchical clustering. The different approaches were validated through cluster accuracy estimates on a reference set of FOIs for which yield maps were available for some years. Results show that multivariate functional principal components clustering, with an a priori determination of the optimal number of classes for each FOI, provides a better accuracy than those of standard clustering algorithms. The proposed novel functional clustering methodologies are effective and efficient for constant pattern retrieval and can be used for a sustainable management of
Audio-based, unsupervised machine learning reveals cyclic changes in earthquake mechanisms in the Geysers geothermal field, California

Science.gov (United States)

Holtzman, B. K.; Paté, A.; Paisley, J.; Waldhauser, F.; Repetto, D.; Boschi, L.

2017-12-01

The earthquake process reflects complex interactions of stress, fracture and frictional properties. New machine learning methods reveal patterns in time-dependent spectral properties of seismic signals and enable identification of changes in faulting processes. Our methods are based closely on those developed for music information retrieval and voice recognition, using the spectrogram instead of the waveform directly. Unsupervised learning involves identification of patterns based on differences among signals without any additional information provided to the algorithm. Clustering of 46,000 earthquakes of $0.3
Deep Unsupervised Learning on a Desktop PC: A Primer for Cognitive Scientists.

Science.gov (United States)

Testolin, Alberto; Stoianov, Ivilin; De Filippo De Grazia, Michele; Zorzi, Marco

2013-01-01

Deep belief networks hold great promise for the simulation of human cognition because they show how structured and abstract representations may emerge from probabilistic unsupervised learning. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. However, learning in deep networks typically requires big datasets and it can involve millions of connection weights, which implies that simulations on standard computers are unfeasible. Developing realistic, medium-to-large-scale learning models of cognition would therefore seem to require expertise in programing parallel-computing hardware, and this might explain why the use of this promising approach is still largely confined to the machine learning community. Here we show how simulations of deep unsupervised learning can be easily performed on a desktop PC by exploiting the processors of low cost graphic cards (graphic processor units) without any specific programing effort, thanks to the use of high-level programming routines (available in MATLAB or Python). We also show that even an entry-level graphic card can outperform a small high-performance computing cluster in terms of learning time and with no loss of learning quality. We therefore conclude that graphic card implementations pave the way for a widespread use of deep learning among cognitive scientists for modeling cognition and behavior.
Deep unsupervised learning on a desktop PC: A primer for cognitive scientists

Directory of Open Access Journals (Sweden)

Alberto eTestolin

2013-05-01

Full Text Available Deep belief networks hold great promise for the simulation of human cognition because they show how structured and abstract representations may emerge from probabilistic unsupervised learning. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. However, learning in deep networks typically requires big datasets and it can involve millions of connection weights, which implies that simulations on standard computers are unfeasible. Developing realistic, medium-to-large-scale learning models of cognition would therefore seem to require expertise in programming parallel-computing hardware, and this might explain why the use of this promising approach is still largely confined to the machine learning community. Here we show how simulations of deep unsupervised learning can be easily performed on a desktop PC by exploiting the processors of low-cost graphic cards (GPUs without any specific programming effort, thanks to the use of high-level programming routines (available in MATLAB or Python. We also show that even an entry-level graphic card can outperform a small high-performance computing cluster in terms of learning time and with no loss of learning quality. We therefore conclude that graphic card implementations pave the way for a widespread use of deep learning among cognitive scientists for modeling cognition and behavior.
Deep Unsupervised Learning on a Desktop PC: A Primer for Cognitive Scientists

Science.gov (United States)

Testolin, Alberto; Stoianov, Ivilin; De Filippo De Grazia, Michele; Zorzi, Marco

2013-01-01

Deep belief networks hold great promise for the simulation of human cognition because they show how structured and abstract representations may emerge from probabilistic unsupervised learning. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. However, learning in deep networks typically requires big datasets and it can involve millions of connection weights, which implies that simulations on standard computers are unfeasible. Developing realistic, medium-to-large-scale learning models of cognition would therefore seem to require expertise in programing parallel-computing hardware, and this might explain why the use of this promising approach is still largely confined to the machine learning community. Here we show how simulations of deep unsupervised learning can be easily performed on a desktop PC by exploiting the processors of low cost graphic cards (graphic processor units) without any specific programing effort, thanks to the use of high-level programming routines (available in MATLAB or Python). We also show that even an entry-level graphic card can outperform a small high-performance computing cluster in terms of learning time and with no loss of learning quality. We therefore conclude that graphic card implementations pave the way for a widespread use of deep learning among cognitive scientists for modeling cognition and behavior. PMID:23653617
Automated age-related macular degeneration classification in OCT using unsupervised feature learning

Science.gov (United States)

Venhuizen, Freerk G.; van Ginneken, Bram; Bloemen, Bart; van Grinsven, Mark J. J. P.; Philipsen, Rick; Hoyng, Carel; Theelen, Thomas; Sánchez, Clara I.

2015-03-01

Age-related Macular Degeneration (AMD) is a common eye disorder with high prevalence in elderly people. The disease mainly affects the central part of the retina, and could ultimately lead to permanent vision loss. Optical Coherence Tomography (OCT) is becoming the standard imaging modality in diagnosis of AMD and the assessment of its progression. However, the evaluation of the obtained volumetric scan is time consuming, expensive and the signs of early AMD are easy to miss. In this paper we propose a classification method to automatically distinguish AMD patients from healthy subjects with high accuracy. The method is based on an unsupervised feature learning approach, and processes the complete image without the need for an accurate pre-segmentation of the retina. The method can be divided in two steps: an unsupervised clustering stage that extracts a set of small descriptive image patches from the training data, and a supervised training stage that uses these patches to create a patch occurrence histogram for every image on which a random forest classifier is trained. Experiments using 384 volume scans show that the proposed method is capable of identifying AMD patients with high accuracy, obtaining an area under the Receiver Operating Curve of 0:984. Our method allows for a quick and reliable assessment of the presence of AMD pathology in OCT volume scans without the need for accurate layer segmentation algorithms.
Automated segmentation of white matter fiber bundles using diffusion tensor imaging data and a new density based clustering algorithm.

Science.gov (United States)

Kamali, Tahereh; Stashuk, Daniel

2016-10-01

Robust and accurate segmentation of brain white matter (WM) fiber bundles assists in diagnosing and assessing progression or remission of neuropsychiatric diseases such as schizophrenia, autism and depression. Supervised segmentation methods are infeasible in most applications since generating gold standards is too costly. Hence, there is a growing interest in designing unsupervised methods. However, most conventional unsupervised methods require the number of clusters be known in advance which is not possible in most applications. The purpose of this study is to design an unsupervised segmentation algorithm for brain white matter fiber bundles which can automatically segment fiber bundles using intrinsic diffusion tensor imaging data information without considering any prior information or assumption about data distributions. Here, a new density based clustering algorithm called neighborhood distance entropy consistency (NDEC), is proposed which discovers natural clusters within data by simultaneously utilizing both local and global density information. The performance of NDEC is compared with other state of the art clustering algorithms including chameleon, spectral clustering, DBSCAN and k-means using Johns Hopkins University publicly available diffusion tensor imaging data. The performance of NDEC and other employed clustering algorithms were evaluated using dice ratio as an external evaluation criteria and density based clustering validation (DBCV) index as an internal evaluation metric. Across all employed clustering algorithms, NDEC obtained the highest average dice ratio (0.94) and DBCV value (0.71). NDEC can find clusters with arbitrary shapes and densities and consequently can be used for WM fiber bundle segmentation where there is no distinct boundary between various bundles. NDEC may also be used as an effective tool in other pattern recognition and medical diagnostic systems in which discovering natural clusters within data is a necessity. Copyright �
Unsupervised Categorization in a Sample of Children with Autism Spectrum Disorders

Science.gov (United States)

Edwards, Darren J.; Perlman, Amotz; Reed, Phil

2012-01-01

Studies of supervised Categorization have demonstrated limited Categorization performance in participants with autism spectrum disorders (ASD), however little research has been conducted regarding unsupervised Categorization in this population. This study explored unsupervised Categorization using two stimulus sets that differed in their…
Application of unsupervised pattern recognition approaches for exploration of rare earth elements in Se-Chahun iron ore, central Iran

Science.gov (United States)

Sarparandeh, Mohammadali; Hezarkhani, Ardeshir

2017-12-01

The use of efficient methods for data processing has always been of interest to researchers in the field of earth sciences. Pattern recognition techniques are appropriate methods for high-dimensional data such as geochemical data. Evaluation of the geochemical distribution of rare earth elements (REEs) requires the use of such methods. In particular, the multivariate nature of REE data makes them a good target for numerical analysis. The main subject of this paper is application of unsupervised pattern recognition approaches in evaluating geochemical distribution of REEs in the Kiruna type magnetite-apatite deposit of Se-Chahun. For this purpose, 42 bulk lithology samples were collected from the Se-Chahun iron ore deposit. In this study, 14 rare earth elements were measured with inductively coupled plasma mass spectrometry (ICP-MS). Pattern recognition makes it possible to evaluate the relations between the samples based on all these 14 features, simultaneously. In addition to providing easy solutions, discovery of the hidden information and relations of data samples is the advantage of these methods. Therefore, four clustering methods (unsupervised pattern recognition) - including a modified basic sequential algorithmic scheme (MBSAS), hierarchical (agglomerative) clustering, k-means clustering and self-organizing map (SOM) - were applied and results were evaluated using the silhouette criterion. Samples were clustered in four types. Finally, the results of this study were validated with geological facts and analysis results from, for example, scanning electron microscopy (SEM), X-ray diffraction (XRD), ICP-MS and optical mineralogy. The results of the k-means clustering and SOM methods have the best matches with reality, with experimental studies of samples and with field surveys. Since only the rare earth elements are used in this division, a good agreement of the results with lithology is considerable. It is concluded that the combination of the proposed
Comparative analysis of clustering methods for gene expression time course data

Directory of Open Access Journals (Sweden)

Ivan G. Costa

2004-01-01

Full Text Available This work performs a data driven comparative study of clustering methods used in the analysis of gene expression time courses (or time series. Five clustering methods found in the literature of gene expression analysis are compared: agglomerative hierarchical clustering, CLICK, dynamical clustering, k-means and self-organizing maps. In order to evaluate the methods, a k-fold cross-validation procedure adapted to unsupervised methods is applied. The accuracy of the results is assessed by the comparison of the partitions obtained in these experiments with gene annotation, such as protein function and series classification.
Cluster Randomised Trials in Cochrane Reviews: Evaluation of Methodological and Reporting Practice.

Directory of Open Access Journals (Sweden)

Marty Richardson

Full Text Available Systematic reviews can include cluster-randomised controlled trials (C-RCTs, which require different analysis compared with standard individual-randomised controlled trials. However, it is not known whether review authors follow the methodological and reporting guidance when including these trials. The aim of this study was to assess the methodological and reporting practice of Cochrane reviews that included C-RCTs against criteria developed from existing guidance.Criteria were developed, based on methodological literature and personal experience supervising review production and quality. Criteria were grouped into four themes: identifying, reporting, assessing risk of bias, and analysing C-RCTs. The Cochrane Database of Systematic Reviews was searched (2nd December 2013, and the 50 most recent reviews that included C-RCTs were retrieved. Each review was then assessed using the criteria.The 50 reviews we identified were published by 26 Cochrane Review Groups between June 2013 and November 2013. For identifying C-RCTs, only 56% identified that C-RCTs were eligible for inclusion in the review in the eligibility criteria. For reporting C-RCTs, only eight (24% of the 33 reviews reported the method of cluster adjustment for their included C-RCTs. For assessing risk of bias, only one review assessed all five C-RCT-specific risk-of-bias criteria. For analysing C-RCTs, of the 27 reviews that presented unadjusted data, only nine (33% provided a warning that confidence intervals may be artificially narrow. Of the 34 reviews that reported data from unadjusted C-RCTs, only 13 (38% excluded the unadjusted results from the meta-analyses.The methodological and reporting practices in Cochrane reviews incorporating C-RCTs could be greatly improved, particularly with regard to analyses. Criteria developed as part of the current study could be used by review authors or editors to identify errors and improve the quality of published systematic reviews incorporating
Unsupervised Analysis of Array Comparative Genomic Hybridization Data from Early-Onset Colorectal Cancer Reveals Equivalence with Molecular Classification and Phenotypes

Directory of Open Access Journals (Sweden)

María Arriba

2017-01-01

Full Text Available AIM: To investigate whether chromosomal instability (CIN is associated with tumor phenotypes and/or with global genomic status based on MSI (microsatellite instability and CIMP (CpG island methylator phenotype in early-onset colorectal cancer (EOCRC. METHODS: Taking as a starting point our previous work in which tumors from 60 EOCRC cases (≤45 years at the time of diagnosis were analyzed by array comparative genomic hybridization (aCGH, in the present study we performed an unsupervised hierarchical clustering analysis of those aCGH data in order to unveil possible associations between the CIN profile and the clinical features of the tumors. In addition, we evaluated the MSI and the CIMP statuses of the samples with the aim of investigating a possible relationship between copy number alterations (CNAs and the MSI/CIMP condition in EOCRC. RESULTS: Based on the similarity of the CNAs detected, the unsupervised analysis stratified samples into two main clusters (A, B and four secondary clusters (A1, A2, B3, B4. The different subgroups showed a certain correspondence with the molecular classification of colorectal cancer (CRC, which enabled us to outline an algorithm to categorize tumors according to their CIMP status. Interestingly, each subcluster showed some distinctive clinicopathological features. But more interestingly, the CIN of each subcluster mainly affected particular chromosomes, allowing us to define chromosomal regions more specifically affected depending on the CIMP/MSI status of the samples. CONCLUSIONS: Our findings may provide a basis for a new form of classifying EOCRC according to the genomic status of the tumors.

Unit commitment solution using agglomerative and divisive cluster algorithm : an effective new methodology

Energy Technology Data Exchange (ETDEWEB)

Reddy, N.M.; Reddy, K.R. [G. Narayanamma Inst. of Technology and Science, Hyderabad (India). Dept. of Electrical Engineering; Ramana, N.V. [JNTU College of Engineering, Jagityala (India). Dept. of Electrical Engineering

2008-07-01

Thermal power plants consist of several generating units with different generating capacities, fuel cost per MWH generated, minimum up/down times, and start-up or shut-down costs. The Unit Commitment (UC) problem in power systems involves determining the start-up and shut-down schedules of thermal generating units to meet forecasted load over a future short term for a period of one to seven days. This paper presented a new approach for the most complex UC problem using agglomerative and divisive hierarchical clustering. Euclidean costs, which are a measure of differences in fuel cost and start-up costs of any two units, were first calculated. Then, depending on the value of Euclidean costs, similar type of units were placed in a cluster. The proposed methodology has 2 individual algorithms. An agglomerative cluster algorithm is used while the load is increasing, and a divisive cluster algorithm is used when the load is decreasing. A search was conducted for an optimal solution for a minimal number of clusters and cluster data points. A standard ten-unit thermal unit power system was used to test and evaluate the performance of the method for a period of 24 hours. The new approach proved to be quite effective and satisfactory. 15 refs., 9 tabs., 5 figs.
DAFi: A directed recursive data filtering and clustering approach for improving and interpreting data clustering identification of cell populations from polychromatic flow cytometry data.

Science.gov (United States)

Lee, Alexandra J; Chang, Ivan; Burel, Julie G; Lindestam Arlehamn, Cecilia S; Mandava, Aishwarya; Weiskopf, Daniela; Peters, Bjoern; Sette, Alessandro; Scheuermann, Richard H; Qian, Yu

2018-04-17

Computational methods for identification of cell populations from polychromatic flow cytometry data are changing the paradigm of cytometry bioinformatics. Data clustering is the most common computational approach to unsupervised identification of cell populations from multidimensional cytometry data. However, interpretation of the identified data clusters is labor-intensive. Certain types of user-defined cell populations are also difficult to identify by fully automated data clustering analysis. Both are roadblocks before a cytometry lab can adopt the data clustering approach for cell population identification in routine use. We found that combining recursive data filtering and clustering with constraints converted from the user manual gating strategy can effectively address these two issues. We named this new approach DAFi: Directed Automated Filtering and Identification of cell populations. Design of DAFi preserves the data-driven characteristics of unsupervised clustering for identifying novel cell subsets, but also makes the results interpretable to experimental scientists through mapping and merging the multidimensional data clusters into the user-defined two-dimensional gating hierarchy. The recursive data filtering process in DAFi helped identify small data clusters which are otherwise difficult to resolve by a single run of the data clustering method due to the statistical interference of the irrelevant major clusters. Our experiment results showed that the proportions of the cell populations identified by DAFi, while being consistent with those by expert centralized manual gating, have smaller technical variances across samples than those from individual manual gating analysis and the nonrecursive data clustering analysis. Compared with manual gating segregation, DAFi-identified cell populations avoided the abrupt cut-offs on the boundaries. DAFi has been implemented to be used with multiple data clustering methods including K-means, FLOCK, FlowSOM, and
Cyclist–motorist crash patterns in Denmark: A latent class clustering approach

DEFF Research Database (Denmark)

Kaplan, Sigal; Prato, Carlo Giacomo

2013-01-01

to prioritize safety issues and to devise efficient preventive measures. Method: The current study focused on cyclist–motorist crashes that occurred in Denmark during the period between 2007 and 2011. To uncover crash patterns, the current analysis applied latent class clustering, an unsupervised probabilistic...
Prevalence of cluster headache in the Republic of Georgia: results of a population-based study and methodological considerations

DEFF Research Database (Denmark)

Katsarava, Z; Dzagnidze, A; Kukava, M

2009-01-01

We present a study of the general-population prevalence of cluster headache in the Republic of Georgia and discuss the advantages and challenges of different methodological approaches. In a community-based survey, specially trained medical residents visited 500 adjacent households in the capital...... with possible cluster headache, who were then personally interviewed by one of two headache-experienced neurologists. Cluster headache was confirmed in one subject. The prevalence of cluster headache was therefore estimated to be 87/100,000 (95% confidence interval
Shadow detection and removal in RGB VHR images for land use unsupervised classification

Science.gov (United States)

Movia, A.; Beinat, A.; Crosilla, F.

2016-09-01

Nowadays, high resolution aerial images are widely available thanks to the diffusion of advanced technologies such as UAVs (Unmanned Aerial Vehicles) and new satellite missions. Although these developments offer new opportunities for accurate land use analysis and change detection, cloud and terrain shadows actually limit benefits and possibilities of modern sensors. Focusing on the problem of shadow detection and removal in VHR color images, the paper proposes new solutions and analyses how they can enhance common unsupervised classification procedures for identifying land use classes related to the CO2 absorption. To this aim, an improved fully automatic procedure has been developed for detecting image shadows using exclusively RGB color information, and avoiding user interaction. Results show a significant accuracy enhancement with respect to similar methods using RGB based indexes. Furthermore, novel solutions derived from Procrustes analysis have been applied to remove shadows and restore brightness in the images. In particular, two methods implementing the so called "anisotropic Procrustes" and the "not-centered oblique Procrustes" algorithms have been developed and compared with the linear correlation correction method based on the Cholesky decomposition. To assess how shadow removal can enhance unsupervised classifications, results obtained with classical methods such as k-means, maximum likelihood, and self-organizing maps, have been compared to each other and with a supervised clustering procedure.
Methodological Foundations of Clustering and Innovativeness for Establishing the Competitive Production of Biofuels

Directory of Open Access Journals (Sweden)

Klymchuk Oleksandr V.

2016-05-01

Full Text Available The article is aimed to study the worldwide trends in development of innovative processes and creation of cluster structures for elaborating methodological foundations for establishing the competitive production of biofuels. The article highlights the cluster approaches in conducting the global commercial activities that create effective mechanisms and tools to encourage innovation-investment regional development and can be characterized by their relevance for the Ukrainian economy. Emphasis is made on the matter that clustering is one of the key tools for structuring the energy market, integrated exploiting the potential of bioenergy industry sector, management of the economic policies of redistribution of value added, implementation of the growth of investment attractiveness of the biofuel industry in our country. It has been concluded that cluster development in the biofuel production will stimulate specialization and cooperation processes in the agro-industrial economy sector, bringing together related businesses in the direction of an effective interaction, thereby ensuring a high level of competitiveness of biofuels in both the national and the international markets.
Kernel method for clustering based on optimal target vector

International Nuclear Information System (INIS)

Angelini, Leonardo; Marinazzo, Daniele; Pellicoro, Mario; Stramaglia, Sebastiano

2006-01-01

We introduce Ising models, suitable for dichotomic clustering, with couplings that are (i) both ferro- and anti-ferromagnetic (ii) depending on the whole data-set and not only on pairs of samples. Couplings are determined exploiting the notion of optimal target vector, here introduced, a link between kernel supervised and unsupervised learning. The effectiveness of the method is shown in the case of the well-known iris data-set and in benchmarks of gene expression levels, where it works better than existing methods for dichotomic clustering
Algorithms of maximum likelihood data clustering with applications

Science.gov (United States)

Giada, Lorenzo; Marsili, Matteo

2002-12-01

We address the problem of data clustering by introducing an unsupervised, parameter-free approach based on maximum likelihood principle. Starting from the observation that data sets belonging to the same cluster share a common information, we construct an expression for the likelihood of any possible cluster structure. The likelihood in turn depends only on the Pearson's coefficient of the data. We discuss clustering algorithms that provide a fast and reliable approximation to maximum likelihood configurations. Compared to standard clustering methods, our approach has the advantages that (i) it is parameter free, (ii) the number of clusters need not be fixed in advance and (iii) the interpretation of the results is transparent. In order to test our approach and compare it with standard clustering algorithms, we analyze two very different data sets: time series of financial market returns and gene expression data. We find that different maximization algorithms produce similar cluster structures whereas the outcome of standard algorithms has a much wider variability.
Unsupervised detection and removal of muscle artifacts from scalp EEG recordings using canonical correlation analysis, wavelets and random forests.

Science.gov (United States)

Anastasiadou, Maria N; Christodoulakis, Manolis; Papathanasiou, Eleftherios S; Papacostas, Savvas S; Mitsis, Georgios D

2017-09-01

This paper proposes supervised and unsupervised algorithms for automatic muscle artifact detection and removal from long-term EEG recordings, which combine canonical correlation analysis (CCA) and wavelets with random forests (RF). The proposed algorithms first perform CCA and continuous wavelet transform of the canonical components to generate a number of features which include component autocorrelation values and wavelet coefficient magnitude values. A subset of the most important features is subsequently selected using RF and labelled observations (supervised case) or synthetic data constructed from the original observations (unsupervised case). The proposed algorithms are evaluated using realistic simulation data as well as 30min epochs of non-invasive EEG recordings obtained from ten patients with epilepsy. We assessed the performance of the proposed algorithms using classification performance and goodness-of-fit values for noisy and noise-free signal windows. In the simulation study, where the ground truth was known, the proposed algorithms yielded almost perfect performance. In the case of experimental data, where expert marking was performed, the results suggest that both the supervised and unsupervised algorithm versions were able to remove artifacts without affecting noise-free channels considerably, outperforming standard CCA, independent component analysis (ICA) and Lagged Auto-Mutual Information Clustering (LAMIC). The proposed algorithms achieved excellent performance for both simulation and experimental data. Importantly, for the first time to our knowledge, we were able to perform entirely unsupervised artifact removal, i.e. without using already marked noisy data segments, achieving performance that is comparable to the supervised case. Overall, the results suggest that the proposed algorithms yield significant future potential for improving EEG signal quality in research or clinical settings without the need for marking by expert
Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

Science.gov (United States)

Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

2013-03-01

Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.
Glaucomatous patterns in Frequency Doubling Technology (FDT) perimetry data identified by unsupervised machine learning classifiers.

Science.gov (United States)

Bowd, Christopher; Weinreb, Robert N; Balasubramanian, Madhusudhanan; Lee, Intae; Jang, Giljin; Yousefi, Siamak; Zangwill, Linda M; Medeiros, Felipe A; Girkin, Christopher A; Liebmann, Jeffrey M; Goldbaum, Michael H

2014-01-01

The variational Bayesian independent component analysis-mixture model (VIM), an unsupervised machine-learning classifier, was used to automatically separate Matrix Frequency Doubling Technology (FDT) perimetry data into clusters of healthy and glaucomatous eyes, and to identify axes representing statistically independent patterns of defect in the glaucoma clusters. FDT measurements were obtained from 1,190 eyes with normal FDT results and 786 eyes with abnormal FDT results from the UCSD-based Diagnostic Innovations in Glaucoma Study (DIGS) and African Descent and Glaucoma Evaluation Study (ADAGES). For all eyes, VIM input was 52 threshold test points from the 24-2 test pattern, plus age. FDT mean deviation was -1.00 dB (S.D. = 2.80 dB) and -5.57 dB (S.D. = 5.09 dB) in FDT-normal eyes and FDT-abnormal eyes, respectively (p<0.001). VIM identified meaningful clusters of FDT data and positioned a set of statistically independent axes through the mean of each cluster. The optimal VIM model separated the FDT fields into 3 clusters. Cluster N contained primarily normal fields (1109/1190, specificity 93.1%) and clusters G1 and G2 combined, contained primarily abnormal fields (651/786, sensitivity 82.8%). For clusters G1 and G2 the optimal number of axes were 2 and 5, respectively. Patterns automatically generated along axes within the glaucoma clusters were similar to those known to be indicative of glaucoma. Fields located farther from the normal mean on each glaucoma axis showed increasing field defect severity. VIM successfully separated FDT fields from healthy and glaucoma eyes without a priori information about class membership, and identified familiar glaucomatous patterns of loss.
Subspace K-means clustering.

Science.gov (United States)

Timmerman, Marieke E; Ceulemans, Eva; De Roover, Kim; Van Leeuwen, Karla

2013-12-01

To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).
Hierarchical clustering of HPV genotype patterns in the ASCUS-LSIL triage study

Science.gov (United States)

Wentzensen, Nicolas; Wilson, Lauren E.; Wheeler, Cosette M.; Carreon, Joseph D.; Gravitt, Patti E.; Schiffman, Mark; Castle, Philip E.

2010-01-01

Anogenital cancers are associated with about 13 carcinogenic HPV types in a broader group that cause cervical intraepithelial neoplasia (CIN). Multiple concurrent cervical HPV infections are common which complicate the attribution of HPV types to different grades of CIN. Here we report the analysis of HPV genotype patterns in the ASCUS-LSIL triage study using unsupervised hierarchical clustering. Women who underwent colposcopy at baseline (n = 2780) were grouped into 20 disease categories based on histology and cytology. Disease groups and HPV genotypes were clustered using complete linkage. Risk of 2-year cumulative CIN3+, viral load, colposcopic impression, and age were compared between disease groups and major clusters. Hierarchical clustering yielded four major disease clusters: Cluster 1 included all CIN3 histology with abnormal cytology; Cluster 2 included CIN3 histology with normal cytology and combinations with either CIN2 or high-grade squamous intraepithelial lesion (HSIL) cytology; Cluster 3 included older women with normal or low grade histology/cytology and low viral load; Cluster 4 included younger women with low grade histology/cytology, multiple infections, and the highest viral load. Three major groups of HPV genotypes were identified: Group 1 included only HPV16; Group 2 included nine carcinogenic types plus non-carcinogenic HPV53 and HPV66; and Group 3 included non-carcinogenic types plus carcinogenic HPV33 and HPV45. Clustering results suggested that colposcopy missed a prevalent precancer in many women with no biopsy/normal histology and HSIL. This result was confirmed by an elevated 2-year risk of CIN3+ in these groups. Our novel approach to study multiple genotype infections in cervical disease using unsupervised hierarchical clustering can address complex genotype distributions on a population level. PMID:20959485
Projection-based curve clustering

International Nuclear Information System (INIS)

Auder, Benjamin; Fischer, Aurelie

2012-01-01

This paper focuses on unsupervised curve classification in the context of nuclear industry. At the Commissariat a l'Energie Atomique (CEA), Cadarache (France), the thermal-hydraulic computer code CATHARE is used to study the reliability of reactor vessels. The code inputs are physical parameters and the outputs are time evolution curves of a few other physical quantities. As the CATHARE code is quite complex and CPU time-consuming, it has to be approximated by a regression model. This regression process involves a clustering step. In the present paper, the CATHARE output curves are clustered using a k-means scheme, with a projection onto a lower dimensional space. We study the properties of the empirically optimal cluster centres found by the clustering method based on projections, compared with the 'true' ones. The choice of the projection basis is discussed, and an algorithm is implemented to select the best projection basis among a library of orthonormal bases. The approach is illustrated on a simulated example and then applied to the industrial problem. (authors)
Surface mapping via unsupervised classification of remote sensing: application to MESSENGER/MASCS and DAWN/VIRS data.

Science.gov (United States)

D'Amore, M.; Le Scaon, R.; Helbert, J.; Maturilli, A.

2017-12-01

Machine-learning achieved unprecedented results in high-dimensional data processing tasks with wide applications in various fields. Due to the growing number of complex nonlinear systems that have to be investigated in science and the bare raw size of data nowadays available, ML offers the unique ability to extract knowledge, regardless the specific application field. Examples are image segmentation, supervised/unsupervised/ semi-supervised classification, feature extraction, data dimensionality analysis/reduction.The MASCS instrument has mapped Mercury surface in the 400-1145 nm wavelength range during orbital observations by the MESSENGER spacecraft. We have conducted k-means unsupervised hierarchical clustering to identify and characterize spectral units from MASCS observations. The results display a dichotomy: a polar and equatorial units, possibly linked to compositional differences or weathering due to irradiation. To explore possible relations between composition and spectral behavior, we have compared the spectral provinces with elemental abundance maps derived from MESSENGER's X-Ray Spectrometer (XRS).For the Vesta application on DAWN Visible and infrared spectrometer (VIR) data, we explored several Machine Learning techniques: image segmentation method, stream algorithm and hierarchical clustering.The algorithm successfully separates the Olivine outcrops around two craters on Vesta's surface [1]. New maps summarizing the spectral and chemical signature of the surface could be automatically produced.We conclude that instead of hand digging in data, scientist could choose a subset of algorithms with well known feature (i.e. efficacy on the particular problem, speed, accuracy) and focus their effort in understanding what important characteristic of the groups found in the data mean. [1] E Ammannito et al. "Olivine in an unexpected location on Vesta's surface". In: Nature 504.7478 (2013), pp. 122-125.
A new methodology to study customer electrocardiogram using RFM analysis and clustering

Directory of Open Access Journals (Sweden)

Mohammad Reza Gholamian

2011-04-01

Full Text Available One of the primary issues on marketing planning is to know the customer's behavioral trends. A customer's purchasing interest may fluctuate for different reasons and it is important to find the declining or increasing trends whenever they happen. It is important to study these fluctuations to improve customer relationships. There are different methods to increase the customer's willingness such as planning good promotions, an increase on advertisement, etc. This paper proposes a new methodology to measure customer's behavioral trends called customer electrocardiogram. The proposed model of this paper uses K-means clustering method with RFM analysis to study customer's fluctuations over different time frames. We also apply the proposed electrocardiogram methodology for a real-world case study of food industry and the results are discussed in details.
Mastication Evaluation With Unsupervised Learning: Using an Inertial Sensor-Based System

Science.gov (United States)

Lucena, Caroline Vieira; Lacerda, Marcelo; Caldas, Rafael; De Lima Neto, Fernando Buarque

2018-01-01

There is a direct relationship between the prevalence of musculoskeletal disorders of the temporomandibular joint and orofacial disorders. A well-elaborated analysis of the jaw movements provides relevant information for healthcare professionals to conclude their diagnosis. Different approaches have been explored to track jaw movements such that the mastication analysis is getting less subjective; however, all methods are still highly subjective, and the quality of the assessments depends much on the experience of the health professional. In this paper, an accurate and non-invasive method based on a commercial low-cost inertial sensor (MPU6050) to measure jaw movements is proposed. The jaw-movement feature values are compared to the obtained with clinical analysis, showing no statistically significant difference between both methods. Moreover, We propose to use unsupervised paradigm approaches to cluster mastication patterns of healthy subjects and simulated patients with facial trauma. Two techniques were used in this paper to instantiate the method: Kohonen’s Self-Organizing Maps and K-Means Clustering. Both algorithms have excellent performances to process jaw-movements data, showing encouraging results and potential to bring a full assessment of the masticatory function. The proposed method can be applied in real-time providing relevant dynamic information for health-care professionals. PMID:29651365
Mastication Evaluation With Unsupervised Learning: Using an Inertial Sensor-Based System.

Science.gov (United States)

Lucena, Caroline Vieira; Lacerda, Marcelo; Caldas, Rafael; De Lima Neto, Fernando Buarque; Rativa, Diego

2018-01-01

There is a direct relationship between the prevalence of musculoskeletal disorders of the temporomandibular joint and orofacial disorders. A well-elaborated analysis of the jaw movements provides relevant information for healthcare professionals to conclude their diagnosis. Different approaches have been explored to track jaw movements such that the mastication analysis is getting less subjective; however, all methods are still highly subjective, and the quality of the assessments depends much on the experience of the health professional. In this paper, an accurate and non-invasive method based on a commercial low-cost inertial sensor (MPU6050) to measure jaw movements is proposed. The jaw-movement feature values are compared to the obtained with clinical analysis, showing no statistically significant difference between both methods. Moreover, We propose to use unsupervised paradigm approaches to cluster mastication patterns of healthy subjects and simulated patients with facial trauma. Two techniques were used in this paper to instantiate the method: Kohonen's Self-Organizing Maps and K-Means Clustering. Both algorithms have excellent performances to process jaw-movements data, showing encouraging results and potential to bring a full assessment of the masticatory function. The proposed method can be applied in real-time providing relevant dynamic information for health-care professionals.
Unsupervised versus Supervised Identification of Prognostic Factors in Patients with Localized Retroperitoneal Sarcoma: A Data Clustering and Mahalanobis Distance Approach

Directory of Open Access Journals (Sweden)

Rita De Sanctis

2018-01-01

Full Text Available The aim of this report is to unveil specific prognostic factors for retroperitoneal sarcoma (RPS patients by univariate and multivariate statistical techniques. A phase I-II study on localized RPS treated with high-dose ifosfamide and radiotherapy followed by surgery (ISG-STS 0303 protocol demonstrated that chemo/radiotherapy was safe and increased the 3-year relapse-free survival (RFS with respect to historical controls. Of 70 patients, twenty-six developed local, 10 distant, and 5 combined relapse. Median disease-free interval (DFI was 29.47 months. According to a discriminant function analysis, DFI, histology, relapse pattern, and the first treatment approach at relapse had a statistically significant prognostic impact. Based on scientific literature and clinical expertise, clinicopathological data were analyzed using both a supervised and an unsupervised classification method to predict the prognosis, with similar sample sizes (66 and 65, resp., in casewise approach and 70 in mean-substitution one. This is the first attempt to predict patients’ prognosis by means of multivariate statistics, and in this light, it looks noticable that (i some clinical data have a well-defined prognostic value, (ii the unsupervised model produced comparable results with respect to the supervised one, and (iii the appropriate combination of both models appears fruitful and easily extensible to different clinical contexts.
Wind Energy Development in India and a Methodology for Evaluating Performance of Wind Farm Clusters

Directory of Open Access Journals (Sweden)

Sanjeev H. Kulkarni

2016-01-01

Full Text Available With maturity of advanced technologies and urgent requirement for maintaining a healthy environment with reasonable price, India is moving towards a trend of generating electricity from renewable resources. Wind energy production, with its relatively safer and positive environmental characteristics, has evolved from a marginal activity into a multibillion dollar industry today. Wind energy power plants, also known as wind farms, comprise multiple wind turbines. Though there are several wind-mill clusters producing energy in different geographical locations across the world, evaluating their performance is a complex task and is an important focus for stakeholders. In this work an attempt is made to estimate the performance of wind clusters employing a multicriteria approach. Multiple factors that affect wind farm operations are analyzed by taking experts opinions, and a performance ranking of the wind farms is generated. The weights of the selection criteria are determined by pairwise comparison matrices of the Analytic Hierarchy Process (AHP. The proposed methodology evaluates wind farm performance based on technical, economic, environmental, and sociological indicators. Both qualitative and quantitative parameters were considered. Empirical data were collected through questionnaire from the selected wind farms of Belagavi district in the Indian State of Karnataka. This proposed methodology is a useful tool for cluster analysis.

A Trajectory Regression Clustering Technique Combining a Novel Fuzzy C-Means Clustering Algorithm with the Least Squares Method

Directory of Open Access Journals (Sweden)

Xiangbing Zhou

2018-04-01

Full Text Available Rapidly growing GPS (Global Positioning System trajectories hide much valuable information, such as city road planning, urban travel demand, and population migration. In order to mine the hidden information and to capture better clustering results, a trajectory regression clustering method (an unsupervised trajectory clustering method is proposed to reduce local information loss of the trajectory and to avoid getting stuck in the local optimum. Using this method, we first define our new concept of trajectory clustering and construct a novel partitioning (angle-based partitioning method of line segments; second, the Lagrange-based method and Hausdorff-based K-means++ are integrated in fuzzy C-means (FCM clustering, which are used to maintain the stability and the robustness of the clustering process; finally, least squares regression model is employed to achieve regression clustering of the trajectory. In our experiment, the performance and effectiveness of our method is validated against real-world taxi GPS data. When comparing our clustering algorithm with the partition-based clustering algorithms (K-means, K-median, and FCM, our experimental results demonstrate that the presented method is more effective and generates a more reasonable trajectory.
Unsupervised Anomaly Detection for Liquid-Fueled Rocket Prop...

Data.gov (United States)

National Aeronautics and Space Administration — Title: Unsupervised Anomaly Detection for Liquid-Fueled Rocket Propulsion Health Monitoring. Abstract: This article describes the results of applying four...
GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge.

Science.gov (United States)

Wagner, Florian

2015-01-01

Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets.
Random clustering ferns for multimodal object recognition

OpenAIRE

Villamizar Vergel, Michael Alejandro; Garrell Zulueta, Anais; Sanfeliu Cortés, Alberto; Moreno-Noguer, Francesc

2017-01-01

The final publication is available at link.springer.com We propose an efficient and robust method for the recognition of objects exhibiting multiple intra-class modes, where each one is associated with a particular object appearance. The proposed method, called random clustering ferns, combines synergically a single and real-time classifier, based on the boosted assembling of extremely randomized trees (ferns), with an unsupervised and probabilistic approach in order to recognize efficient...
Maximum Margin Clustering of Hyperspectral Data

Science.gov (United States)

Niazmardi, S.; Safari, A.; Homayouni, S.

2013-09-01

In recent decades, large margin methods such as Support Vector Machines (SVMs) are supposed to be the state-of-the-art of supervised learning methods for classification of hyperspectral data. However, the results of these algorithms mainly depend on the quality and quantity of available training data. To tackle down the problems associated with the training data, the researcher put effort into extending the capability of large margin algorithms for unsupervised learning. One of the recent proposed algorithms is Maximum Margin Clustering (MMC). The MMC is an unsupervised SVMs algorithm that simultaneously estimates both the labels and the hyperplane parameters. Nevertheless, the optimization of the MMC algorithm is a non-convex problem. Most of the existing MMC methods rely on the reformulating and the relaxing of the non-convex optimization problem as semi-definite programs (SDP), which are computationally very expensive and only can handle small data sets. Moreover, most of these algorithms are two-class classification, which cannot be used for classification of remotely sensed data. In this paper, a new MMC algorithm is used that solve the original non-convex problem using Alternative Optimization method. This algorithm is also extended for multi-class classification and its performance is evaluated. The results of the proposed algorithm show that the algorithm has acceptable results for hyperspectral data clustering.
Effects of Supervised vs. Unsupervised Training Programs on Balance and Muscle Strength in Older Adults: A Systematic Review and Meta-Analysis.

Science.gov (United States)

Lacroix, André; Hortobágyi, Tibor; Beurskens, Rainer; Granacher, Urs

2017-11-01

to the following modalities: period, frequency, volume, modalities of supervision (i.e., number of supervised/unsupervised sessions within the supervised or unsupervised training groups, respectively). Heterogeneity was computed using I 2 and χ 2 statistics. The methodological quality of the included studies was evaluated using the Physiotherapy Evidence Database scale. Our analyses revealed that in older adults, supervised balance/resistance training was superior compared with unsupervised balance/resistance training in improving measures of static steady-state balance (mean SMD bs = 0.28, p = 0.39), dynamic steady-state balance (mean SMD bs = 0.35, p = 0.02), proactive balance (mean SMD bs = 0.24, p = 0.05), balance test batteries (mean SMD bs = 0.53, p = 0.02), and measures of muscle strength/power (mean SMD bs = 0.51, p = 0.04). Regarding the examined dose-response relationships, our analyses showed that a number of 10-29 additional supervised sessions in the supervised training groups compared with the unsupervised training groups resulted in the largest effects for static steady-state balance (mean SMD bs = 0.35), dynamic steady-state balance (mean SMD bs = 0.37), and muscle strength/power (mean SMD bs = 1.12). Further, ≥30 additional supervised sessions in the supervised training groups were needed to produce the largest effects on proactive balance (mean SMD bs = 0.30) and balance test batteries (mean SMD bs = 0.77). Effects in favor of supervised programs were larger for studies that did not include any supervised sessions in their unsupervised programs (mean SMD bs : 0.28-1.24) compared with studies that implemented a few supervised sessions in their unsupervised programs (e.g., three supervised sessions throughout the entire intervention program; SMD bs : -0.06 to 0.41). The present findings have to be interpreted with caution because of the low number of eligible studies and the moderate methodological quality
Unsupervised process monitoring and fault diagnosis with machine learning methods

CERN Document Server

Aldrich, Chris

2013-01-01

This unique text/reference describes in detail the latest advances in unsupervised process monitoring and fault diagnosis with machine learning methods. Abundant case studies throughout the text demonstrate the efficacy of each method in real-world settings. The broad coverage examines such cutting-edge topics as the use of information theory to enhance unsupervised learning in tree-based methods, the extension of kernel methods to multiple kernel learning for feature extraction from data, and the incremental training of multilayer perceptrons to construct deep architectures for enhanced data
A Dedicated Mixture Model for Clustering Smart Meter Data: Identification and Analysis of Electricity Consumption Behaviors

Directory of Open Access Journals (Sweden)

Fateh Nassim Melzi

2017-09-01

Full Text Available The large amount of data collected by smart meters is a valuable resource that can be used to better understand consumer behavior and optimize electricity consumption in cities. This paper presents an unsupervised classification approach for extracting typical consumption patterns from data generated by smart electric meters. The proposed approach is based on a constrained Gaussian mixture model whose parameters vary according to the day type (weekday, Saturday or Sunday. The proposed methodology is applied to a real dataset of Irish households collected by smart meters over one year. For each cluster, the model provides three consumption profiles that depend on the day type. In the first instance, the model is applied on the electricity consumption of users during one month to extract groups of consumers who exhibit similar consumption behaviors. The clustering results are then crossed with contextual variables available for the households to show the close links between electricity consumption and household socio-economic characteristics. At the second instance, the evolution of the consumer behavior from one month to another is assessed through variations of cluster sizes over time. The results show that the consumer behavior evolves over time depending on the contextual variables such as temperature fluctuations and calendar events.
Spectrum Hole Identification in IEEE 802.22 WRAN using Unsupervised Learning

Directory of Open Access Journals (Sweden)

V. Balaji

2016-01-01

Full Text Available In this paper we present a Cooperative Spectrum Sensing (CSS algorithm for Cognitive Radios (CR based on IEEE 802.22Wireless Regional Area Network (WRAN standard. The core objective is to improve cooperative sensing efficiency which specifies how fast a decision can be reached in each round of cooperation (iteration to sense an appropriate number of channels/bands (i.e. 86 channels of 7MHz bandwidth as per IEEE 802.22 within a time constraint (channel sensing time. To meet this objective, we have developed CSS algorithm using unsupervised K-means clustering classification approach. The received energy level of each Secondary User (SU is considered as the parameter for determining channel availability. The performance of proposed algorithm is quantified in terms of detection accuracy, training and classification delay time. Further, the detection accuracy of our proposed scheme meets the requirement of IEEE 802.22 WRAN with the target probability of falsealrm as 0.1. All the simulations are carried out using Matlab tool.
Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination.

Science.gov (United States)

Yau, Christopher; Holmes, Chris

2011-07-01

We propose a hierarchical Bayesian nonparametric mixture model for clustering when some of the covariates are assumed to be of varying relevance to the clustering problem. This can be thought of as an issue in variable selection for unsupervised learning. We demonstrate that by defining a hierarchical population based nonparametric prior on the cluster locations scaled by the inverse covariance matrices of the likelihood we arrive at a 'sparsity prior' representation which admits a conditionally conjugate prior. This allows us to perform full Gibbs sampling to obtain posterior distributions over parameters of interest including an explicit measure of each covariate's relevance and a distribution over the number of potential clusters present in the data. This also allows for individual cluster specific variable selection. We demonstrate improved inference on a number of canonical problems.
Clustering performance comparison using K-means and expectation maximization algorithms.

Science.gov (United States)

Jung, Yong Gyu; Kang, Min Soo; Heo, Jun

2014-11-14

Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
Specialization processes in on-line unsupervised learning

NARCIS (Netherlands)

Biehl, M.; Freking, A.; Reents, G.; Schlösser, E.

1998-01-01

From the recent analysis of supervised learning by on-line gradient descent in multilayered neural networks it is known that the necessary process of student specialization can be delayed significantly. We demonstrate that this phenomenon also occurs in various models of unsupervised learning. A
Unsupervised Document Embedding With CNNs

OpenAIRE

Liu, Chundi; Zhao, Shunan; Volkovs, Maksims

2017-01-01

We propose a new model for unsupervised document embedding. Leading existing approaches either require complex inference or use recurrent neural networks (RNN) that are difficult to parallelize. We take a different route and develop a convolutional neural network (CNN) embedding model. Our CNN architecture is fully parallelizable resulting in over 10x speedup in inference time over RNN models. Parallelizable architecture enables to train deeper models where each successive layer has increasin...
Gastric cancer differentiation using Fourier transform near-infrared spectroscopy with unsupervised pattern recognition

Science.gov (United States)

Yi, Wei-song; Cui, Dian-sheng; Li, Zhi; Wu, Lan-lan; Shen, Ai-guo; Hu, Ji-ming

2013-01-01

The manuscript has investigated the application of near-infrared (NIR) spectroscopy for differentiation gastric cancer. The 90 spectra from cancerous and normal tissues were collected from a total of 30 surgical specimens using Fourier transform near-infrared spectroscopy (FT-NIR) equipped with a fiber-optic probe. Major spectral differences were observed in the CH-stretching second overtone (9000-7000 cm-1), CH-stretching first overtone (6000-5200 cm-1), and CH-stretching combination (4500-4000 cm-1) regions. By use of unsupervised pattern recognition, such as principal component analysis (PCA) and cluster analysis (CA), all spectra were classified into cancerous and normal tissue groups with accuracy up to 81.1%. The sensitivity and specificity was 100% and 68.2%, respectively. These present results indicate that CH-stretching first, combination band and second overtone regions can serve as diagnostic markers for gastric cancer.
Supervised and Unsupervised Classification for Pattern Recognition Purposes

Directory of Open Access Journals (Sweden)

Catalina COCIANU

2006-01-01

Full Text Available A cluster analysis task has to identify the grouping trends of data, to decide on the sound clusters as well as to validate somehow the resulted structure. The identification of the grouping tendency existing in a data collection assumes the selection of a framework stated in terms of a mathematical model allowing to express the similarity degree between couples of particular objects, quasi-metrics expressing the similarity between an object an a cluster and between clusters, respectively. In supervised classification, we are provided with a collection of preclassified patterns, and the problem is to label a newly encountered pattern. Typically, the given training patterns are used to learn the descriptions of classes which in turn are used to label a new pattern. The final section of the paper presents a new methodology for supervised learning based on PCA. The classes are represented in the measurement/feature space by a continuous repartitions
Unsupervised learning of facial emotion decoding skills.

Science.gov (United States)

Huelle, Jan O; Sack, Benjamin; Broer, Katja; Komlewa, Irina; Anders, Silke

2014-01-01

Research on the mechanisms underlying human facial emotion recognition has long focussed on genetically determined neural algorithms and often neglected the question of how these algorithms might be tuned by social learning. Here we show that facial emotion decoding skills can be significantly and sustainably improved by practice without an external teaching signal. Participants saw video clips of dynamic facial expressions of five different women and were asked to decide which of four possible emotions (anger, disgust, fear, and sadness) was shown in each clip. Although no external information about the correctness of the participant's response or the sender's true affective state was provided, participants showed a significant increase of facial emotion recognition accuracy both within and across two training sessions two days to several weeks apart. We discuss several similarities and differences between the unsupervised improvement of facial decoding skills observed in the current study, unsupervised perceptual learning of simple stimuli described in previous studies and practice effects often observed in cognitive tasks.
Unsupervised classification of neocortical activity patterns in neonatal and pre-juvenile rodents

Directory of Open Access Journals (Sweden)

Nicole eCichon

2014-05-01

Full Text Available Flexible communication within the brain, which relies on oscillatory activity, is not confined to adult neuronal networks. Experimental evidence has documented the presence of discontinuous patterns of oscillatory activity already during early development. Their highly variable spatial and time-frequency organization has been related to region specificity. However, it might be equally due to the absence of unitary criteria for classifying the early activity patterns, since they have been mainly characterized by visual inspection. Therefore, robust and unbiased methods for categorizing these discontinuous oscillations are needed for increasingly complex data sets from different labs. Here, we introduce an unsupervised detection and classification algorithm for the discontinuous activity patterns of rodents during early development. For this, firstly time windows with discontinuous oscillations vs. epochs of network silence were identified. In a second step, the major features of detected events were identified and processed by principal component analysis for deciding on their contribution to the classification of different oscillatory patterns. Finally, these patterns were categorized using an unsupervised cluster algorithm. The results were validated on manually characterized neonatal spindle bursts, which ubiquitously entrain neocortical areas of rats and mice, and prelimbic nested gamma spindle bursts. Moreover, the algorithm led to satisfactory results for oscillatory events that, due to increased similarity of their features, were more difficult to classify, e.g. during the pre-juvenile developmental period. Based on a linear classification, the optimal number of features to consider increased with the difficulty of detection. This algorithm allows the comparison of neonatal and pre-juvenile oscillatory patterns in their spatial and temporal organization. It might represent a first step for the unbiased elucidation of activity patterns
Theoretical developments for interpreting kernel spectral clustering from alternative viewpoints

Directory of Open Access Journals (Sweden)

Diego Peluffo-Ordóñez

2017-08-01

Full Text Available To perform an exploration process over complex structured data within unsupervised settings, the so-called kernel spectral clustering (KSC is one of the most recommended and appealing approaches, given its versatility and elegant formulation. In this work, we explore the relationship between (KSC and other well-known approaches, namely normalized cut clustering and kernel k-means. To do so, we first deduce a generic KSC model from a primal-dual formulation based on least-squares support-vector machines (LS-SVM. For experiments, KSC as well as other consider methods are assessed on image segmentation tasks to prove their usability.
Exploitation of Clustering Techniques in Transactional Healthcare Data

Directory of Open Access Journals (Sweden)

Naeem Ahmad Mahoto

2014-03-01

Full Text Available Healthcare service centres equipped with electronic health systems have improved their resources as well as treatment processes. The dynamic nature of healthcare data of each individual makes it complex and difficult for physicians to manually mediate them; therefore, automatic techniques are essential to manage the quality and standardization of treatment procedures. Exploratory data analysis, patternanalysis and grouping of data is managed using clustering techniques, which work as an unsupervised classification. A number of healthcare applications are developed that use several data mining techniques for classification, clustering and extracting useful information from healthcare data. The challenging issue in this domain is to select adequate data mining algorithm for optimal results. This paper exploits three different clustering algorithms: DBSCAN (Density-Based Clustering, agglomerative hierarchical and k-means in real transactional healthcare data of diabetic patients (taken as case study to analyse their performance in large and dispersed healthcare data. The best solution of cluster sets among the exploited algorithms is evaluated using clustering quality indexes and is selected to identify the possible subgroups of patients having similar treatment patterns
Implementasi KD-Tree K-Means Clustering untuk Klasterisasi Dokumen

Directory of Open Access Journals (Sweden)

Eric Budiman Gosno

2013-09-01

Full Text Available Klasterisasi dokumen adalah suatu proses pengelompokan dokumen secara otomatis dan unsupervised. Klasterisasi dokumen merupakan permasalahan yang sering ditemui dalam berbagai bidang seperti text mining dan sistem temu kembali informasi. Metode klasterisasi dokumen yang memiliki akurasi dan efisiensi waktu yang tinggi sangat diperlukan untuk meningkatkan hasil pada mesin pencari web, dan untuk proses filtering. Salah satu metode klasterisasi yang telah dikenal dan diaplikasikan dalam klasterisasi dokumen adalah K-Means Clustering. Tetapi K-Means Clustering sensitif terhadap pemilihan posisi awal dari titik tengah klaster sehingga pemilihan posisi awal dari titik tengah klaster yang buruk akan mengakibatkan K-Means Clustering terjebak dalam local optimum. KD-Tree K-Means Clustering merupakan perbaikan dari K-Means Clustering. KD-Tree K-Means Clustering menggunakan struktur data K-Dimensional Tree dan nilai kerapatan pada proses inisialisasi titik tengah klaster. Pada makalah ini diimplementasikan algoritma KD-Tree K-Means Clustering untuk permasalahan klasterisasi dokumen. Performa klasterisasi dokumen yang dihasilkan oleh metode KD-Tree K-Means Clustering pada data set 20 newsgroup memiliki nilai distorsi 3×105 lebih rendah dibandingkan dengan nilai rerata distorsi K-Means Clustering dan nilai NIG 0,09 lebih baik dibandingkan dengan nilai NIG K-Means Clustering.

Semi-supervised Probabilistic Distance Clustering and the Uncertainty of Classification

Science.gov (United States)

Iyigun, Cem; Ben-Israel, Adi

Semi-supervised clustering is an attempt to reconcile clustering (unsupervised learning) and classification (supervised learning, using prior information on the data). These two modes of data analysis are combined in a parameterized model, the parameter θ ∈ [0, 1] is the weight attributed to the prior information, θ = 0 corresponding to clustering, and θ = 1 to classification. The results (cluster centers, classification rule) depend on the parameter θ, an insensitivity to θ indicates that the prior information is in agreement with the intrinsic cluster structure, and is otherwise redundant. This explains why some data sets (such as the Wisconsin breast cancer data, Merz and Murphy, UCI repository of machine learning databases, University of California, Irvine, CA) give good results for all reasonable classification methods. The uncertainty of classification is represented here by the geometric mean of the membership probabilities, shown to be an entropic distance related to the Kullback-Leibler divergence.
High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm

Directory of Open Access Journals (Sweden)

Dieter Hendricks

2016-02-01

Full Text Available We implement a master-slave parallel genetic algorithm with a bespoke log-likelihood fitness function to identify emergent clusters within price evolutions. We use graphics processing units (GPUs to implement a parallel genetic algorithm and visualise the results using disjoint minimal spanning trees. We demonstrate that our GPU parallel genetic algorithm, implemented on a commercially available general purpose GPU, is able to recover stock clusters in sub-second speed, based on a subset of stocks in the South African market. This approach represents a pragmatic choice for low-cost, scalable parallel computing and is significantly faster than a prototype serial implementation in an optimised C-based fourth-generation programming language, although the results are not directly comparable because of compiler differences. Combined with fast online intraday correlation matrix estimation from high frequency data for cluster identification, the proposed implementation offers cost-effective, near-real-time risk assessment for financial practitioners.
Horticultural cluster

OpenAIRE

SHERSTIUK S.V.; POSYLAYEVA K.I.

2013-01-01

In the article there are the theoretical and methodological approaches to the nature and existence of the cluster. The cluster differences from other kinds of cooperative and integration associations. Was develop by scientific-practical recommendations for forming a competitive horticultur cluster.
Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.

Science.gov (United States)

He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej

2011-12-01

Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.
Unsupervised segmentation of lung fields in chest radiographs using multiresolution fractal feature vector and deformable models.

Science.gov (United States)

Lee, Wen-Li; Chang, Koyin; Hsieh, Kai-Sheng

2016-09-01

Segmenting lung fields in a chest radiograph is essential for automatically analyzing an image. We present an unsupervised method based on multiresolution fractal feature vector. The feature vector characterizes the lung field region effectively. A fuzzy c-means clustering algorithm is then applied to obtain a satisfactory initial contour. The final contour is obtained by deformable models. The results show the feasibility and high performance of the proposed method. Furthermore, based on the segmentation of lung fields, the cardiothoracic ratio (CTR) can be measured. The CTR is a simple index for evaluating cardiac hypertrophy. After identifying a suspicious symptom based on the estimated CTR, a physician can suggest that the patient undergoes additional extensive tests before a treatment plan is finalized.
Automated Glioblastoma Segmentation Based on a Multiparametric Structured Unsupervised Classification

Science.gov (United States)

Juan-Albarracín, Javier; Fuster-Garcia, Elies; Manjón, José V.; Robles, Montserrat; Aparici, F.; Martí-Bonmatí, L.; García-Gómez, Juan M.

2015-01-01

Automatic brain tumour segmentation has become a key component for the future of brain tumour treatment. Currently, most of brain tumour segmentation approaches arise from the supervised learning standpoint, which requires a labelled training dataset from which to infer the models of the classes. The performance of these models is directly determined by the size and quality of the training corpus, whose retrieval becomes a tedious and time-consuming task. On the other hand, unsupervised approaches avoid these limitations but often do not reach comparable results than the supervised methods. In this sense, we propose an automated unsupervised method for brain tumour segmentation based on anatomical Magnetic Resonance (MR) images. Four unsupervised classification algorithms, grouped by their structured or non-structured condition, were evaluated within our pipeline. Considering the non-structured algorithms, we evaluated K-means, Fuzzy K-means and Gaussian Mixture Model (GMM), whereas as structured classification algorithms we evaluated Gaussian Hidden Markov Random Field (GHMRF). An automated postprocess based on a statistical approach supported by tissue probability maps is proposed to automatically identify the tumour classes after the segmentations. We evaluated our brain tumour segmentation method with the public BRAin Tumor Segmentation (BRATS) 2013 Test and Leaderboard datasets. Our approach based on the GMM model improves the results obtained by most of the supervised methods evaluated with the Leaderboard set and reaches the second position in the ranking. Our variant based on the GHMRF achieves the first position in the Test ranking of the unsupervised approaches and the seventh position in the general Test ranking, which confirms the method as a viable alternative for brain tumour segmentation. PMID:25978453
Genomic signal processing for DNA sequence clustering.

Science.gov (United States)

Mendizabal-Ruiz, Gerardo; Román-Godínez, Israel; Torres-Ramos, Sulema; Salido-Ruiz, Ricardo A; Vélez-Pérez, Hugo; Morales, J Alejandro

2018-01-01

Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.
Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images

Directory of Open Access Journals (Sweden)

LI Jian-Wei

2014-08-01

Full Text Available On the basis of the cluster validity function based on geometric probability in literature [1, 2], propose a cluster analysis method based on geometric probability to process large amount of data in rectangular area. The basic idea is top-down stepwise refinement, firstly categories then subcategories. On all clustering levels, use the cluster validity function based on geometric probability firstly, determine clusters and the gathering direction, then determine the center of clustering and the border of clusters. Through TM remote sensing image classification examples, compare with the supervision and unsupervised classification in ERDAS and the cluster analysis method based on geometric probability in two-dimensional square which is proposed in literature 2. Results show that the proposed method can significantly improve the classification accuracy.
Clustering analysis of malware behavior using Self Organizing Map

DEFF Research Database (Denmark)

Pirscoveanu, Radu-Stefan; Stevanovic, Matija; Pedersen, Jens Myrup

2016-01-01

For the time being, malware behavioral classification is performed by means of Anti-Virus (AV) generated labels. The paper investigates the inconsistencies associated with current practices by evaluating the identified differences between current vendors. In this paper we rely on Self Organizing...... Map, an unsupervised machine learning algorithm, for generating clusters that capture the similarities between malware behavior. A data set of approximately 270,000 samples was used to generate the behavioral profile of malicious types in order to compare the outcome of the proposed clustering...... approach with the labels collected from 57 Antivirus vendors using VirusTotal. Upon evaluating the results, the paper concludes on shortcomings of relying on AV vendors for labeling malware samples. In order to solve the problem, a cluster-based classification is proposed, which should provide more...
Unsupervised learning of facial emotion decoding skills

Directory of Open Access Journals (Sweden)

Jan Oliver Huelle

2014-02-01

Full Text Available Research on the mechanisms underlying human facial emotion recognition has long focussed on genetically determined neural algorithms and often neglected the question of how these algorithms might be tuned by social learning. Here we show that facial emotion decoding skills can be significantly and sustainably improved by practise without an external teaching signal. Participants saw video clips of dynamic facial expressions of five different women and were asked to decide which of four possible emotions (anger, disgust, fear and sadness was shown in each clip. Although no external information about the correctness of the participant’s response or the sender’s true affective state was provided, participants showed a significant increase of facial emotion recognition accuracy both within and across two training sessions two days to several weeks apart. We discuss several similarities and differences between the unsupervised improvement of facial decoding skills observed in the current study, unsupervised perceptual learning of simple stimuli described in previous studies and practise effects often observed in cognitive tasks.
The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis

Directory of Open Access Journals (Sweden)

Chen Yidong

2004-01-01

Full Text Available An unsupervised data clustering method, called the local maximum clustering (LMC method, is proposed for identifying clusters in experiment data sets based on research interest. A magnitude property is defined according to research purposes, and data sets are clustered around each local maximum of the magnitude property. By properly defining a magnitude property, this method can overcome many difficulties in microarray data clustering such as reduced projection in similarities, noises, and arbitrary gene distribution. To critically evaluate the performance of this clustering method in comparison with other methods, we designed three model data sets with known cluster distributions and applied the LMC method as well as the hierarchic clustering method, the -mean clustering method, and the self-organized map method to these model data sets. The results show that the LMC method produces the most accurate clustering results. As an example of application, we applied the method to cluster the leukemia samples reported in the microarray study of Golub et al. (1999.
Unsupervised Learning —A Novel Clustering Method for Rolling Bearing Faults Identification

Science.gov (United States)

Kai, Li; Bo, Luo; Tao, Ma; Xuefeng, Yang; Guangming, Wang

2017-12-01

To promptly process the massive fault data and automatically provide accurate diagnosis results, numerous studies have been conducted on intelligent fault diagnosis of rolling bearing. Among these studies, such as artificial neural networks, support vector machines, decision trees and other supervised learning methods are used commonly. These methods can detect the failure of rolling bearing effectively, but to achieve better detection results, it often requires a lot of training samples. Based on above, a novel clustering method is proposed in this paper. This novel method is able to find the correct number of clusters automatically the effectiveness of the proposed method is validated using datasets from rolling element bearings. The diagnosis results show that the proposed method can accurately detect the fault types of small samples. Meanwhile, the diagnosis results are also relative high accuracy even for massive samples.
A competition in unsupervised color image segmentation

Czech Academy of Sciences Publication Activity Database

Haindl, Michal; Mikeš, Stanislav

2016-01-01

Roč. 57, č. 9 (2016), s. 136-151 ISSN 0031-3203 R&D Projects: GA ČR(CZ) GA14-10911S Institutional support: RVO:67985556 Keywords : Unsupervised image segmentation * Segmentation contest * Texture analysis Subject RIV: BD - Theory of Information Impact factor: 4.582, year: 2016 http://library.utia.cas.cz/separaty/2016/RO/haindl-0459179.pdf
A Novel Unsupervised Segmentation Quality Evaluation Method for Remote Sensing Images.

Science.gov (United States)

Gao, Han; Tang, Yunwei; Jing, Linhai; Li, Hui; Ding, Haifeng

2017-10-24

The segmentation of a high spatial resolution remote sensing image is a critical step in geographic object-based image analysis (GEOBIA). Evaluating the performance of segmentation without ground truth data, i.e., unsupervised evaluation, is important for the comparison of segmentation algorithms and the automatic selection of optimal parameters. This unsupervised strategy currently faces several challenges in practice, such as difficulties in designing effective indicators and limitations of the spectral values in the feature representation. This study proposes a novel unsupervised evaluation method to quantitatively measure the quality of segmentation results to overcome these problems. In this method, multiple spectral and spatial features of images are first extracted simultaneously and then integrated into a feature set to improve the quality of the feature representation of ground objects. The indicators designed for spatial stratified heterogeneity and spatial autocorrelation are included to estimate the properties of the segments in this integrated feature set. These two indicators are then combined into a global assessment metric as the final quality score. The trade-offs of the combined indicators are accounted for using a strategy based on the Mahalanobis distance, which can be exhibited geometrically. The method is tested on two segmentation algorithms and three testing images. The proposed method is compared with two existing unsupervised methods and a supervised method to confirm its capabilities. Through comparison and visual analysis, the results verified the effectiveness of the proposed method and demonstrated the reliability and improvements of this method with respect to other methods.
A Novel Unsupervised Segmentation Quality Evaluation Method for Remote Sensing Images

Directory of Open Access Journals (Sweden)

Han Gao

2017-10-01

Full Text Available The segmentation of a high spatial resolution remote sensing image is a critical step in geographic object-based image analysis (GEOBIA. Evaluating the performance of segmentation without ground truth data, i.e., unsupervised evaluation, is important for the comparison of segmentation algorithms and the automatic selection of optimal parameters. This unsupervised strategy currently faces several challenges in practice, such as difficulties in designing effective indicators and limitations of the spectral values in the feature representation. This study proposes a novel unsupervised evaluation method to quantitatively measure the quality of segmentation results to overcome these problems. In this method, multiple spectral and spatial features of images are first extracted simultaneously and then integrated into a feature set to improve the quality of the feature representation of ground objects. The indicators designed for spatial stratified heterogeneity and spatial autocorrelation are included to estimate the properties of the segments in this integrated feature set. These two indicators are then combined into a global assessment metric as the final quality score. The trade-offs of the combined indicators are accounted for using a strategy based on the Mahalanobis distance, which can be exhibited geometrically. The method is tested on two segmentation algorithms and three testing images. The proposed method is compared with two existing unsupervised methods and a supervised method to confirm its capabilities. Through comparison and visual analysis, the results verified the effectiveness of the proposed method and demonstrated the reliability and improvements of this method with respect to other methods.
Improved regional-scale Brazilian cropping systems' mapping based on a semi-automatic object-based clustering approach

Science.gov (United States)

Bellón, Beatriz; Bégué, Agnès; Lo Seen, Danny; Lebourgeois, Valentine; Evangelista, Balbino Antônio; Simões, Margareth; Demonte Ferraz, Rodrigo Peçanha

2018-06-01

Cropping systems' maps at fine scale over large areas provide key information for further agricultural production and environmental impact assessments, and thus represent a valuable tool for effective land-use planning. There is, therefore, a growing interest in mapping cropping systems in an operational manner over large areas, and remote sensing approaches based on vegetation index time series analysis have proven to be an efficient tool. However, supervised pixel-based approaches are commonly adopted, requiring resource consuming field campaigns to gather training data. In this paper, we present a new object-based unsupervised classification approach tested on an annual MODIS 16-day composite Normalized Difference Vegetation Index time series and a Landsat 8 mosaic of the State of Tocantins, Brazil, for the 2014-2015 growing season. Two variants of the approach are compared: an hyperclustering approach, and a landscape-clustering approach involving a previous stratification of the study area into landscape units on which the clustering is then performed. The main cropping systems of Tocantins, characterized by the crop types and cropping patterns, were efficiently mapped with the landscape-clustering approach. Results show that stratification prior to clustering significantly improves the classification accuracies for underrepresented and sparsely distributed cropping systems. This study illustrates the potential of unsupervised classification for large area cropping systems' mapping and contributes to the development of generic tools for supporting large-scale agricultural monitoring across regions.
Unsupervised learning of a steerable basis for invariant image representations

Science.gov (United States)

Bethge, Matthias; Gerwinn, Sebastian; Macke, Jakob H.

2007-02-01

There are two aspects to unsupervised learning of invariant representations of images: First, we can reduce the dimensionality of the representation by finding an optimal trade-off between temporal stability and informativeness. We show that the answer to this optimization problem is generally not unique so that there is still considerable freedom in choosing a suitable basis. Which of the many optimal representations should be selected? Here, we focus on this second aspect, and seek to find representations that are invariant under geometrical transformations occuring in sequences of natural images. We utilize ideas of 'steerability' and Lie groups, which have been developed in the context of filter design. In particular, we show how an anti-symmetric version of canonical correlation analysis can be used to learn a full-rank image basis which is steerable with respect to rotations. We provide a geometric interpretation of this algorithm by showing that it finds the two-dimensional eigensubspaces of the average bivector. For data which exhibits a variety of transformations, we develop a bivector clustering algorithm, which we use to learn a basis of generalized quadrature pairs (i.e. 'complex cells') from sequences of natural images.
Fuzzy clustering-based segmented attenuation correction in whole-body PET

CERN Document Server

Zaidi, H; Boudraa, A; Slosman, DO

2001-01-01

Segmented-based attenuation correction is now a widely accepted technique to reduce noise contribution of measured attenuation correction. In this paper, we present a new method for segmenting transmission images in positron emission tomography. This reduces the noise on the correction maps while still correcting for differing attenuation coefficients of specific tissues. Based on the Fuzzy C-Means (FCM) algorithm, the method segments the PET transmission images into a given number of clusters to extract specific areas of differing attenuation such as air, the lungs and soft tissue, preceded by a median filtering procedure. The reconstructed transmission image voxels are therefore segmented into populations of uniform attenuation based on the human anatomy. The clustering procedure starts with an over-specified number of clusters followed by a merging process to group clusters with similar properties and remove some undesired substructures using anatomical knowledge. The method is unsupervised, adaptive and a...
Canonical PSO Based K-Means Clustering Approach for Real Datasets.

Science.gov (United States)

Dey, Lopamudra; Chakraborty, Sanjay

2014-01-01

"Clustering" the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.
Unsupervised Video Shot Detection Using Clustering Ensemble with a Color Global Scale-Invariant Feature Transform Descriptor

Directory of Open Access Journals (Sweden)

Yuchou Chang

2008-02-01

Full Text Available Scale-invariant feature transform (SIFT transforms a grayscale image into scale-invariant coordinates of local features that are invariant to image scale, rotation, and changing viewpoints. Because of its scale-invariant properties, SIFT has been successfully used for object recognition and content-based image retrieval. The biggest drawback of SIFT is that it uses only grayscale information and misses important visual information regarding color. In this paper, we present the development of a novel color feature extraction algorithm that addresses this problem, and we also propose a new clustering strategy using clustering ensembles for video shot detection. Based on Fibonacci lattice-quantization, we develop a novel color global scale-invariant feature transform (CGSIFT for better description of color contents in video frames for video shot detection. CGSIFT first quantizes a color image, representing it with a small number of color indices, and then uses SIFT to extract features from the quantized color index image. We also develop a new space description method using small image regions to represent global color features as the second step of CGSIFT. Clustering ensembles focusing on knowledge reuse are then applied to obtain better clustering results than using single clustering methods for video shot detection. Evaluation of the proposed feature extraction algorithm and the new clustering strategy using clustering ensembles reveals very promising results for video shot detection.

Unsupervised Video Shot Detection Using Clustering Ensemble with a Color Global Scale-Invariant Feature Transform Descriptor

Directory of Open Access Journals (Sweden)

Hong Yi

2008-01-01

Full Text Available Abstract Scale-invariant feature transform (SIFT transforms a grayscale image into scale-invariant coordinates of local features that are invariant to image scale, rotation, and changing viewpoints. Because of its scale-invariant properties, SIFT has been successfully used for object recognition and content-based image retrieval. The biggest drawback of SIFT is that it uses only grayscale information and misses important visual information regarding color. In this paper, we present the development of a novel color feature extraction algorithm that addresses this problem, and we also propose a new clustering strategy using clustering ensembles for video shot detection. Based on Fibonacci lattice-quantization, we develop a novel color global scale-invariant feature transform (CGSIFT for better description of color contents in video frames for video shot detection. CGSIFT first quantizes a color image, representing it with a small number of color indices, and then uses SIFT to extract features from the quantized color index image. We also develop a new space description method using small image regions to represent global color features as the second step of CGSIFT. Clustering ensembles focusing on knowledge reuse are then applied to obtain better clustering results than using single clustering methods for video shot detection. Evaluation of the proposed feature extraction algorithm and the new clustering strategy using clustering ensembles reveals very promising results for video shot detection.
Learning from label proportions in brain-computer interfaces: Online unsupervised learning with guarantees

Science.gov (United States)

Verhoeven, Thibault; Schmid, Konstantin; Müller, Klaus-Robert; Tangermann, Michael; Kindermans, Pieter-Jan

2017-01-01

Objective Using traditional approaches, a brain-computer interface (BCI) requires the collection of calibration data for new subjects prior to online use. Calibration time can be reduced or eliminated e.g., by subject-to-subject transfer of a pre-trained classifier or unsupervised adaptive classification methods which learn from scratch and adapt over time. While such heuristics work well in practice, none of them can provide theoretical guarantees. Our objective is to modify an event-related potential (ERP) paradigm to work in unison with the machine learning decoder, and thus to achieve a reliable unsupervised calibrationless decoding with a guarantee to recover the true class means. Method We introduce learning from label proportions (LLP) to the BCI community as a new unsupervised, and easy-to-implement classification approach for ERP-based BCIs. The LLP estimates the mean target and non-target responses based on known proportions of these two classes in different groups of the data. We present a visual ERP speller to meet the requirements of LLP. For evaluation, we ran simulations on artificially created data sets and conducted an online BCI study with 13 subjects performing a copy-spelling task. Results Theoretical considerations show that LLP is guaranteed to minimize the loss function similar to a corresponding supervised classifier. LLP performed well in simulations and in the online application, where 84.5% of characters were spelled correctly on average without prior calibration. Significance The continuously adapting LLP classifier is the first unsupervised decoder for ERP BCIs guaranteed to find the optimal decoder. This makes it an ideal solution to avoid tedious calibration sessions. Additionally, LLP works on complementary principles compared to existing unsupervised methods, opening the door for their further enhancement when combined with LLP. PMID:28407016
Best friends' interactions and substance use: The role of friend pressure and unsupervised co-deviancy.

Science.gov (United States)

Tsakpinoglou, Florence; Poulin, François

2017-10-01

Best friends exert a substantial influence on rising alcohol and marijuana use during adolescence. Two mechanisms occurring within friendship - friend pressure and unsupervised co-deviancy - may partially capture the way friends influence one another. The current study aims to: (1) examine the psychometric properties of a new instrument designed to assess pressure from a youth's best friend and unsupervised co-deviancy; (2) investigate the relative contribution of these processes to alcohol and marijuana use; and (3) determine whether gender moderates these associations. Data were collected through self-report questionnaires completed by 294 Canadian youths (62% female) across two time points (ages 15-16). Principal component analysis yielded a two-factor solution corresponding to friend pressure and unsupervised co-deviancy. Logistic regressions subsequently showed that unsupervised co-deviancy was predictive of an increase in marijuana use one year later. Neither process predicted an increase in alcohol use. Results did not differ as a function of gender. Copyright © 2017 The Foundation for Professionals in Services for Adolescents. Published by Elsevier Ltd. All rights reserved.
Unsupervised Assessment of Subcutaneous and Visceral Fat by MRI

DEFF Research Database (Denmark)

Jørgensen, Peter Stanley; Larsen, Rasmus; Wraae, Kristian

2009-01-01

This paper presents a. method for unsupervised assessment of visceral and subcutaneous adipose tissue in the abdominal region by MRI. The identification of the subcutaneous and the visceral regions were achieved by dynamic programming constrained by points acquired from an active shape model...
Unsupervised classification of operator workload from brain signals

Science.gov (United States)

Schultze-Kraft, Matthias; Dähne, Sven; Gugler, Manfred; Curio, Gabriel; Blankertz, Benjamin

2016-06-01

Objective. In this study we aimed for the classification of operator workload as it is expected in many real-life workplace environments. We explored brain-signal based workload predictors that differ with respect to the level of label information required for training, including entirely unsupervised approaches. Approach. Subjects executed a task on a touch screen that required continuous effort of visual and motor processing with alternating difficulty. We first employed classical approaches for workload state classification that operate on the sensor space of EEG and compared those to the performance of three state-of-the-art spatial filtering methods: common spatial patterns (CSPs) analysis, which requires binary label information; source power co-modulation (SPoC) analysis, which uses the subjects’ error rate as a target function; and canonical SPoC (cSPoC) analysis, which solely makes use of cross-frequency power correlations induced by different states of workload and thus represents an unsupervised approach. Finally, we investigated the effects of fusing brain signals and peripheral physiological measures (PPMs) and examined the added value for improving classification performance. Main results. Mean classification accuracies of 94%, 92% and 82% were achieved with CSP, SPoC, cSPoC, respectively. These methods outperformed the approaches that did not use spatial filtering and they extracted physiologically plausible components. The performance of the unsupervised cSPoC is significantly increased by augmenting it with PPM features. Significance. Our analyses ensured that the signal sources used for classification were of cortical origin and not contaminated with artifacts. Our findings show that workload states can be successfully differentiated from brain signals, even when less and less information from the experimental paradigm is used, thus paving the way for real-world applications in which label information may be noisy or entirely unavailable.
Analog memristive synapse in spiking networks implementing unsupervised learning

Directory of Open Access Journals (Sweden)

Erika Covi

2016-10-01

Full Text Available Emerging brain-inspired architectures call for devices that can emulate the functionality of biological synapses in order to implement new efficient computational schemes able to solve ill-posed problems. Various devices and solutions are still under investigation and, in this respect, a challenge is opened to the researchers in the field. Indeed, the optimal candidate is a device able to reproduce the complete functionality of a synapse, i.e. the typical synaptic process underlying learning in biological systems (activity-dependent synaptic plasticity. This implies a device able to change its resistance (synaptic strength, or weight upon proper electrical stimuli (synaptic activity and showing several stable resistive states throughout its dynamic range (analog behavior. Moreover, it should be able to perform spike timing dependent plasticity (STDP, an associative homosynaptic plasticity learning rule based on the delay time between the two firing neurons the synapse is connected to. This rule is a fundamental learning protocol in state-of-art networks, because it allows unsupervised learning. Notwithstanding this fact, STDP-based unsupervised learning has been proposed several times mainly for binary synapses rather than multilevel synapses composed of many binary memristors. This paper proposes an HfO2-based analog memristor as a synaptic element which performs STDP within a small spiking neuromorphic network operating unsupervised learning for character recognition. The trained network is able to recognize five characters even in case incomplete or noisy characters are displayed and it is robust to a device-to-device variability of up to +/-30%.
Analog Memristive Synapse in Spiking Networks Implementing Unsupervised Learning.

Science.gov (United States)

Covi, Erika; Brivio, Stefano; Serb, Alexander; Prodromakis, Themis; Fanciulli, Marco; Spiga, Sabina

2016-01-01

Emerging brain-inspired architectures call for devices that can emulate the functionality of biological synapses in order to implement new efficient computational schemes able to solve ill-posed problems. Various devices and solutions are still under investigation and, in this respect, a challenge is opened to the researchers in the field. Indeed, the optimal candidate is a device able to reproduce the complete functionality of a synapse, i.e., the typical synaptic process underlying learning in biological systems (activity-dependent synaptic plasticity). This implies a device able to change its resistance (synaptic strength, or weight) upon proper electrical stimuli (synaptic activity) and showing several stable resistive states throughout its dynamic range (analog behavior). Moreover, it should be able to perform spike timing dependent plasticity (STDP), an associative homosynaptic plasticity learning rule based on the delay time between the two firing neurons the synapse is connected to. This rule is a fundamental learning protocol in state-of-art networks, because it allows unsupervised learning. Notwithstanding this fact, STDP-based unsupervised learning has been proposed several times mainly for binary synapses rather than multilevel synapses composed of many binary memristors. This paper proposes an HfO 2 -based analog memristor as a synaptic element which performs STDP within a small spiking neuromorphic network operating unsupervised learning for character recognition. The trained network is able to recognize five characters even in case incomplete or noisy images are displayed and it is robust to a device-to-device variability of up to ±30%.
A comparative evaluation of supervised and unsupervised representation learning approaches for anaplastic medulloblastoma differentiation

Science.gov (United States)

Cruz-Roa, Angel; Arevalo, John; Basavanhally, Ajay; Madabhushi, Anant; González, Fabio

2015-01-01

Learning data representations directly from the data itself is an approach that has shown great success in different pattern recognition problems, outperforming state-of-the-art feature extraction schemes for different tasks in computer vision, speech recognition and natural language processing. Representation learning applies unsupervised and supervised machine learning methods to large amounts of data to find building-blocks that better represent the information in it. Digitized histopathology images represents a very good testbed for representation learning since it involves large amounts of high complex, visual data. This paper presents a comparative evaluation of different supervised and unsupervised representation learning architectures to specifically address open questions on what type of learning architectures (deep or shallow), type of learning (unsupervised or supervised) is optimal. In this paper we limit ourselves to addressing these questions in the context of distinguishing between anaplastic and non-anaplastic medulloblastomas from routine haematoxylin and eosin stained images. The unsupervised approaches evaluated were sparse autoencoders and topographic reconstruct independent component analysis, and the supervised approach was convolutional neural networks. Experimental results show that shallow architectures with more neurons are better than deeper architectures without taking into account local space invariances and that topographic constraints provide useful invariant features in scale and rotations for efficient tumor differentiation.
Information-Based Approach to Unsupervised Machine Learning

Science.gov (United States)

2013-06-19

samples with large fitting error. The above optimization problem can be reduced to a quadratic program (Mangasarian & Musicant , 2000), which can be...recognition. Technicheskaya Kibernetica, 3. in Russian. Mallows, C. L. (1973). Some comments on CP . Technometrics, 15, 661–675. Mangasarian, O. L., & Musicant ...to find correspondence between two sets of objects in different domains in an unsupervised way. Photo album summa- rization is a typical application
A new avenue for classification and prediction of olive cultivars using supervised and unsupervised algorithms.

Directory of Open Access Journals (Sweden)

Amir H Beiki

Full Text Available Various methods have been used to identify cultivares of olive trees; herein we used different bioinformatics algorithms to propose new tools to classify 10 cultivares of olive based on RAPD and ISSR genetic markers datasets generated from PCR reactions. Five RAPD markers (OPA0a21, OPD16a, OP01a1, OPD16a1 and OPA0a8 and five ISSR markers (UBC841a4, UBC868a7, UBC841a14, U12BC807a and UBC810a13 selected as the most important markers by all attribute weighting models. K-Medoids unsupervised clustering run on SVM dataset was fully able to cluster each olive cultivar to the right classes. All trees (176 induced by decision tree models generated meaningful trees and UBC841a4 attribute clearly distinguished between foreign and domestic olive cultivars with 100% accuracy. Predictive machine learning algorithms (SVM and Naïve Bayes were also able to predict the right class of olive cultivares with 100% accuracy. For the first time, our results showed data mining techniques can be effectively used to distinguish between plant cultivares and proposed machine learning based systems in this study can predict new olive cultivars with the best possible accuracy.
Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes.

Science.gov (United States)

Cannistraci, Carlo Vittorio; Ravasi, Timothy; Montevecchi, Franco Maria; Ideker, Trey; Alessio, Massimo

2010-09-15

Nonlinear small datasets, which are characterized by low numbers of samples and very high numbers of measures, occur frequently in computational biology, and pose problems in their investigation. Unsupervised hybrid-two-phase (H2P) procedures-specifically dimension reduction (DR), coupled with clustering-provide valuable assistance, not only for unsupervised data classification, but also for visualization of the patterns hidden in high-dimensional feature space. 'Minimum Curvilinearity' (MC) is a principle that-for small datasets-suggests the approximation of curvilinear sample distances in the feature space by pair-wise distances over their minimum spanning tree (MST), and thus avoids the introduction of any tuning parameter. MC is used to design two novel forms of nonlinear machine learning (NML): Minimum Curvilinear embedding (MCE) for DR, and Minimum Curvilinear affinity propagation (MCAP) for clustering. Compared with several other unsupervised and supervised algorithms, MCE and MCAP, whether individually or combined in H2P, overcome the limits of classical approaches. High performance was attained in the visualization and classification of: (i) pain patients (proteomic measurements) in peripheral neuropathy; (ii) human organ tissues (genomic transcription factor measurements) on the basis of their embryological origin. MC provides a valuable framework to estimate nonlinear distances in small datasets. Its extension to large datasets is prefigured for novel NMLs. Classification of neuropathic pain by proteomic profiles offers new insights for future molecular and systems biology characterization of pain. Improvements in tissue embryological classification refine results obtained in an earlier study, and suggest a possible reinterpretation of skin attribution as mesodermal. https://sites.google.com/site/carlovittoriocannistraci/home.
Unsupervised machine learning account of magnetic transitions in the Hubbard model

Science.gov (United States)

Ch'ng, Kelvin; Vazquez, Nick; Khatami, Ehsan

2018-01-01

We employ several unsupervised machine learning techniques, including autoencoders, random trees embedding, and t -distributed stochastic neighboring ensemble (t -SNE), to reduce the dimensionality of, and therefore classify, raw (auxiliary) spin configurations generated, through Monte Carlo simulations of small clusters, for the Ising and Fermi-Hubbard models at finite temperatures. Results from a convolutional autoencoder for the three-dimensional Ising model can be shown to produce the magnetization and the susceptibility as a function of temperature with a high degree of accuracy. Quantum fluctuations distort this picture and prevent us from making such connections between the output of the autoencoder and physical observables for the Hubbard model. However, we are able to define an indicator based on the output of the t -SNE algorithm that shows a near perfect agreement with the antiferromagnetic structure factor of the model in two and three spatial dimensions in the weak-coupling regime. t -SNE also predicts a transition to the canted antiferromagnetic phase for the three-dimensional model when a strong magnetic field is present. We show that these techniques cannot be expected to work away from half filling when the "sign problem" in quantum Monte Carlo simulations is present.
A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data

Science.gov (United States)

Goldstein, Markus; Uchida, Seiichi

2016-01-01

Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks. PMID:27093601
A new clustering algorithm for scanning electron microscope images

Science.gov (United States)

Yousef, Amr; Duraisamy, Prakash; Karim, Mohammad

2016-04-01

A scanning electron microscope (SEM) is a type of electron microscope that produces images of a sample by scanning it with a focused beam of electrons. The electrons interact with the sample atoms, producing various signals that are collected by detectors. The gathered signals contain information about the sample's surface topography and composition. The electron beam is generally scanned in a raster scan pattern, and the beam's position is combined with the detected signal to produce an image. The most common configuration for an SEM produces a single value per pixel, with the results usually rendered as grayscale images. The captured images may be produced with insufficient brightness, anomalous contrast, jagged edges, and poor quality due to low signal-to-noise ratio, grained topography and poor surface details. The segmentation of the SEM images is a tackling problems in the presence of the previously mentioned distortions. In this paper, we are stressing on the clustering of these type of images. In that sense, we evaluate the performance of the well-known unsupervised clustering and classification techniques such as connectivity based clustering (hierarchical clustering), centroid-based clustering, distribution-based clustering and density-based clustering. Furthermore, we propose a new spatial fuzzy clustering technique that works efficiently on this type of images and compare its results against these regular techniques in terms of clustering validation metrics.
Modeling Visit Behaviour in Smart Homes using Unsupervised Learning

NARCIS (Netherlands)

Nait Aicha, A.; Englebienne, G.; Kröse, B.

2014-01-01

Many algorithms on health monitoring from ambient sensor networks assume that only a single person is present in the home. We present an unsupervised method that models visit behaviour. A Markov modulated multidimensional non-homogeneous Poisson process (M3P2) is described that allows us to model
Re-weighted Discriminatively Embedded K-Means for Multi-view Clustering.

Science.gov (United States)

Xu, Jinglin; Han, Junwei; Nie, Feiping; Li, Xuelong

2017-02-08

Recent years, more and more multi-view data are widely used in many real world applications. This kind of data (such as image data) are high dimensional and obtained from different feature extractors, which represents distinct perspectives of the data. How to cluster such data efficiently is a challenge. In this paper, we propose a novel multi-view clustering framework, called Re-weighted Discriminatively Embedded KMeans (RDEKM), for this task. The proposed method is a multiview least-absolute residual model which induces robustness to efficiently mitigates the influence of outliers and realizes dimension reduction during multi-view clustering. Specifically, the proposed model is an unsupervised optimization scheme which utilizes Iterative Re-weighted Least Squares to solve leastabsolute residual and adaptively controls the distribution of multiple weights in a re-weighted manner only based on its own low-dimensional subspaces and a common clustering indicator matrix. Furthermore, theoretical analysis (including optimality and convergence analysis) and the optimization algorithm are also presented. Compared to several state-of-the-art multi-view clustering methods, the proposed method substantially improves the accuracy of the clustering results on widely used benchmark datasets, which demonstrates the superiority of the proposed work.
Unsupervised Power Profiling for Mobile Devices

DEFF Research Database (Denmark)

Kjærgaard, Mikkel Baun; Blunck, Henrik

Today, power consumption is a main limitation for mobile phones. To minimize the power consumption of popular and traditionally power-hungry location-based services requires knowledge of how individual phone features consume power, so that those features can be utilized intelligently for optimal...... power savings while at the same time maintaining good quality of service. This paper proposes an unsupervised API-level method for power profiling mobile phones based on genetic algorithms. The method enables accurate profiling of the power consumption of devices and thereby provides the information...
Unsupervised Power Profiling for Mobile Devices

DEFF Research Database (Denmark)

Kjærgaard, Mikkel Baun; Blunck, Henrik

2011-01-01

Today, power consumption is a main limitation for mobile phones. To minimize the power consumption of popular and traditionally power-hungry location-based services requires knowledge of how individual phone features consume power, so that those features can be utilized intelligently for optimal...... power savings while at the same time maintaining good quality of service. This paper proposes an unsupervised API-level method for power profiling mobile phones based on genetic algorithms. The method enables accurate profiling of the power consumption of devices and thereby provides the information...
Unsupervised information extraction by text segmentation

CERN Document Server

Cortez, Eli

2013-01-01

A new unsupervised approach to the problem of Information Extraction by Text Segmentation (IETS) is proposed, implemented and evaluated herein. The authors' approach relies on information available on pre-existing data to learn how to associate segments in the input string with attributes of a given domain relying on a very effective set of content-based features. The effectiveness of the content-based features is also exploited to directly learn from test data structure-based features, with no previous human-driven training, a feature unique to the presented approach. Based on the approach, a
Unsupervised detection of salt marsh platforms: a topographic method

Science.gov (United States)

Goodwin, Guillaume C. H.; Mudd, Simon M.; Clubb, Fiona J.

2018-03-01

Salt marshes filter pollutants, protect coastlines against storm surges, and sequester carbon, yet are under threat from sea level rise and anthropogenic modification. The sustained existence of the salt marsh ecosystem depends on the topographic evolution of marsh platforms. Quantifying marsh platform topography is vital for improving the management of these valuable landscapes. The determination of platform boundaries currently relies on supervised classification methods requiring near-infrared data to detect vegetation, or demands labour-intensive field surveys and digitisation. We propose a novel, unsupervised method to reproducibly isolate salt marsh scarps and platforms from a digital elevation model (DEM), referred to as Topographic Identification of Platforms (TIP). Field observations and numerical models show that salt marshes mature into subhorizontal platforms delineated by subvertical scarps. Based on this premise, we identify scarps as lines of local maxima on a slope raster, then fill landmasses from the scarps upward, thus isolating mature marsh platforms. We test the TIP method using lidar-derived DEMs from six salt marshes in England with varying tidal ranges and geometries, for which topographic platforms were manually isolated from tidal flats. Agreement between manual and unsupervised classification exceeds 94 % for DEM resolutions of 1 m, with all but one site maintaining an accuracy superior to 90 % for resolutions up to 3 m. For resolutions of 1 m, platforms detected with the TIP method are comparable in surface area to digitised platforms and have similar elevation distributions. We also find that our method allows for the accurate detection of local block failures as small as 3 times the DEM resolution. Detailed inspection reveals that although tidal creeks were digitised as part of the marsh platform, unsupervised classification categorises them as part of the tidal flat, causing an increase in false negatives and overall platform

Unsupervised detection of salt marsh platforms: a topographic method

Directory of Open Access Journals (Sweden)

G. C. H. Goodwin

2018-03-01

Full Text Available Salt marshes filter pollutants, protect coastlines against storm surges, and sequester carbon, yet are under threat from sea level rise and anthropogenic modification. The sustained existence of the salt marsh ecosystem depends on the topographic evolution of marsh platforms. Quantifying marsh platform topography is vital for improving the management of these valuable landscapes. The determination of platform boundaries currently relies on supervised classification methods requiring near-infrared data to detect vegetation, or demands labour-intensive field surveys and digitisation. We propose a novel, unsupervised method to reproducibly isolate salt marsh scarps and platforms from a digital elevation model (DEM, referred to as Topographic Identification of Platforms (TIP. Field observations and numerical models show that salt marshes mature into subhorizontal platforms delineated by subvertical scarps. Based on this premise, we identify scarps as lines of local maxima on a slope raster, then fill landmasses from the scarps upward, thus isolating mature marsh platforms. We test the TIP method using lidar-derived DEMs from six salt marshes in England with varying tidal ranges and geometries, for which topographic platforms were manually isolated from tidal flats. Agreement between manual and unsupervised classification exceeds 94 % for DEM resolutions of 1 m, with all but one site maintaining an accuracy superior to 90 % for resolutions up to 3 m. For resolutions of 1 m, platforms detected with the TIP method are comparable in surface area to digitised platforms and have similar elevation distributions. We also find that our method allows for the accurate detection of local block failures as small as 3 times the DEM resolution. Detailed inspection reveals that although tidal creeks were digitised as part of the marsh platform, unsupervised classification categorises them as part of the tidal flat, causing an increase in false negatives
Knowledge-Based Topic Model for Unsupervised Object Discovery and Localization.

Science.gov (United States)

Niu, Zhenxing; Hua, Gang; Wang, Le; Gao, Xinbo

Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object instances from a given image collection without any supervision. Previous work has attempted to tackle this problem with vanilla topic models, such as latent Dirichlet allocation (LDA). However, in those methods no prior knowledge for the given image collection is exploited to facilitate object discovery. On the other hand, the topic models used in those methods suffer from the topic coherence issue-some inferred topics do not have clear meaning, which limits the final performance of object discovery. In this paper, prior knowledge in terms of the so-called must-links are exploited from Web images on the Internet. Furthermore, a novel knowledge-based topic model, called LDA with mixture of Dirichlet trees, is proposed to incorporate the must-links into topic modeling for object discovery. In particular, to better deal with the polysemy phenomenon of visual words, the must-link is re-defined as that one must-link only constrains one or some topic(s) instead of all topics, which leads to significantly improved topic coherence. Moreover, the must-links are built and grouped with respect to specific object classes, thus the must-links in our approach are semantic-specific , which allows to more efficiently exploit discriminative prior knowledge from Web images. Extensive experiments validated the efficiency of our proposed approach on several data sets. It is shown that our method significantly improves topic coherence and outperforms the unsupervised methods for object discovery and localization. In addition, compared with discriminative methods, the naturally existing object classes in the given image collection can be subtly discovered, which makes our approach well suited for realistic applications of unsupervised object discovery.Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object
Graph-based unsupervised segmentation algorithm for cultured neuronal networks' structure characterization and modeling.

Science.gov (United States)

de Santos-Sierra, Daniel; Sendiña-Nadal, Irene; Leyva, Inmaculada; Almendral, Juan A; Ayali, Amir; Anava, Sarit; Sánchez-Ávila, Carmen; Boccaletti, Stefano

2015-06-01

Large scale phase-contrast images taken at high resolution through the life of a cultured neuronal network are analyzed by a graph-based unsupervised segmentation algorithm with a very low computational cost, scaling linearly with the image size. The processing automatically retrieves the whole network structure, an object whose mathematical representation is a matrix in which nodes are identified neurons or neurons' clusters, and links are the reconstructed connections between them. The algorithm is also able to extract any other relevant morphological information characterizing neurons and neurites. More importantly, and at variance with other segmentation methods that require fluorescence imaging from immunocytochemistry techniques, our non invasive measures entitle us to perform a longitudinal analysis during the maturation of a single culture. Such an analysis furnishes the way of individuating the main physical processes underlying the self-organization of the neurons' ensemble into a complex network, and drives the formulation of a phenomenological model yet able to describe qualitatively the overall scenario observed during the culture growth. © 2014 International Society for Advancement of Cytometry.
Enhancement of ELM by Clustering Discrimination Manifold Regularization and Multiobjective FOA for Semisupervised Classification.

Science.gov (United States)

Ye, Qing; Pan, Hao; Liu, Changhua

2015-01-01

A novel semisupervised extreme learning machine (ELM) with clustering discrimination manifold regularization (CDMR) framework named CDMR-ELM is proposed for semisupervised classification. By using unsupervised fuzzy clustering method, CDMR framework integrates clustering discrimination of both labeled and unlabeled data with twinning constraints regularization. Aiming at further improving the classification accuracy and efficiency, a new multiobjective fruit fly optimization algorithm (MOFOA) is developed to optimize crucial parameters of CDME-ELM. The proposed MOFOA is implemented with two objectives: simultaneously minimizing the number of hidden nodes and mean square error (MSE). The results of experiments on actual datasets show that the proposed semisupervised classifier can obtain better accuracy and efficiency with relatively few hidden nodes compared with other state-of-the-art classifiers.
Enhancement of ELM by Clustering Discrimination Manifold Regularization and Multiobjective FOA for Semisupervised Classification

Directory of Open Access Journals (Sweden)

Qing Ye

2015-01-01

Full Text Available A novel semisupervised extreme learning machine (ELM with clustering discrimination manifold regularization (CDMR framework named CDMR-ELM is proposed for semisupervised classification. By using unsupervised fuzzy clustering method, CDMR framework integrates clustering discrimination of both labeled and unlabeled data with twinning constraints regularization. Aiming at further improving the classification accuracy and efficiency, a new multiobjective fruit fly optimization algorithm (MOFOA is developed to optimize crucial parameters of CDME-ELM. The proposed MOFOA is implemented with two objectives: simultaneously minimizing the number of hidden nodes and mean square error (MSE. The results of experiments on actual datasets show that the proposed semisupervised classifier can obtain better accuracy and efficiency with relatively few hidden nodes compared with other state-of-the-art classifiers.
An unsupervised adaptive strategy for constructing probabilistic roadmaps

KAUST Repository

Tapia, L.

2009-05-01

Since planning environments are complex and no single planner exists that is best for all problems, much work has been done to explore methods for selecting where and when to apply particular planners. However, these two questions have been difficult to answer, even when adaptive methods meant to facilitate a solution are applied. For example, adaptive solutions such as setting learning rates, hand-classifying spaces, and defining parameters for a library of planners have all been proposed. We demonstrate a strategy based on unsupervised learning methods that makes adaptive planning more practical. The unsupervised strategies require less user intervention, model the topology of the problem in a reasonable and efficient manner, can adapt the sampler depending on characteristics of the problem, and can easily accept new samplers as they become available. Through a series of experiments, we demonstrate that in a wide variety of environments, the regions automatically identified by our technique represent the planning space well both in number and placement.We also show that our technique has little overhead and that it out-performs two existing adaptive methods in all complex cases studied.© 2009 IEEE.
Bilingual Lexical Interactions in an Unsupervised Neural Network Model

Science.gov (United States)

Zhao, Xiaowei; Li, Ping

2010-01-01

In this paper we present an unsupervised neural network model of bilingual lexical development and interaction. We focus on how the representational structures of the bilingual lexicons can emerge, develop, and interact with each other as a function of the learning history. The results show that: (1) distinct representations for the two lexicons…
Geodesic Flow Kernel Support Vector Machine for Hyperspectral Image Classification by Unsupervised Subspace Feature Transfer

Directory of Open Access Journals (Sweden)

Alim Samat

2016-03-01

Full Text Available In order to deal with scenarios where the training data, used to deduce a model, and the validation data have different statistical distributions, we study the problem of transformed subspace feature transfer for domain adaptation (DA in the context of hyperspectral image classification via a geodesic Gaussian flow kernel based support vector machine (GFKSVM. To show the superior performance of the proposed approach, conventional support vector machines (SVMs and state-of-the-art DA algorithms, including information-theoretical learning of discriminative cluster for domain adaptation (ITLDC, joint distribution adaptation (JDA, and joint transfer matching (JTM, are also considered. Additionally, unsupervised linear and nonlinear subspace feature transfer techniques including principal component analysis (PCA, randomized nonlinear principal component analysis (rPCA, factor analysis (FA and non-negative matrix factorization (NNMF are investigated and compared. Experiments on two real hyperspectral images show the cross-image classification performances of the GFKSVM, confirming its effectiveness and suitability when applied to hyperspectral images.
Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes

KAUST Repository

Cannistraci, Carlo

2010-09-01

Motivation: Nonlinear small datasets, which are characterized by low numbers of samples and very high numbers of measures, occur frequently in computational biology, and pose problems in their investigation. Unsupervised hybrid-two-phase (H2P) procedures-specifically dimension reduction (DR), coupled with clustering-provide valuable assistance, not only for unsupervised data classification, but also for visualization of the patterns hidden in high-dimensional feature space. Methods: \\'Minimum Curvilinearity\\' (MC) is a principle that-for small datasets-suggests the approximation of curvilinear sample distances in the feature space by pair-wise distances over their minimum spanning tree (MST), and thus avoids the introduction of any tuning parameter. MC is used to design two novel forms of nonlinear machine learning (NML): Minimum Curvilinear embedding (MCE) for DR, and Minimum Curvilinear affinity propagation (MCAP) for clustering. Results: Compared with several other unsupervised and supervised algorithms, MCE and MCAP, whether individually or combined in H2P, overcome the limits of classical approaches. High performance was attained in the visualization and classification of: (i) pain patients (proteomic measurements) in peripheral neuropathy; (ii) human organ tissues (genomic transcription factor measurements) on the basis of their embryological origin. Conclusion: MC provides a valuable framework to estimate nonlinear distances in small datasets. Its extension to large datasets is prefigured for novel NMLs. Classification of neuropathic pain by proteomic profiles offers new insights for future molecular and systems biology characterization of pain. Improvements in tissue embryological classification refine results obtained in an earlier study, and suggest a possible reinterpretation of skin attribution as mesodermal. © The Author(s) 2010. Published by Oxford University Press.
Remote photoplethysmography system for unsupervised monitoring regional anesthesia effectiveness

Science.gov (United States)

Rubins, U.; Miscuks, A.; Marcinkevics, Z.; Lange, M.

2017-12-01

Determining the level of regional anesthesia (RA) is vitally important to both an anesthesiologist and surgeon, also knowing the RA level can protect the patient and reduce the time of surgery. Normally to detect the level of RA, usually a simple subjective (sensitivity test) and complicated quantitative methods (thermography, neuromyography, etc.) are used, but there is not yet a standardized method for objective RA detection and evaluation. In this study, the advanced remote photoplethysmography imaging (rPPG) system for unsupervised monitoring of human palm RA is demonstrated. The rPPG system comprises compact video camera with green optical filter, surgical lamp as a light source and a computer with custom-developed software. The algorithm implemented in Matlab software recognizes the palm and two dermatomes (Medial and Ulnar innervation), calculates the perfusion map and perfusion changes in real-time to detect effect of RA. Seven patients (aged 18-80 years) undergoing hand surgery received peripheral nerve brachial plexus blocks during the measurements. Clinical experiments showed that our rPPG system is able to perform unsupervised monitoring of RA.
Unsupervised progressive elastic band exercises for frail geriatric inpatients objectively monitored by new exercise-integrated technology

DEFF Research Database (Denmark)

Rathleff, Camilla Rams; Bandholm, T.; Spaich, Erika Geraldina

2017-01-01

the amount of supervised training, and unsupervised training could possibly supplement supervised training thereby increasing the total exercise dose during admission. A new valid and reliable technology, the BandCizer, objectively measures the exact training dosage performed. The purpose was to investigate...... feasibility and acceptability of an unsupervised progressive strength training intervention monitored by BandCizer for frail geriatric inpatients. Methods: This feasibility trial included 15 frail inpatients at a geriatric ward. At hospitalization, the patients were prescribed two elastic band exercises...... of 2-min pauses and a time-under-tension of 8 s. The feasibility criterion for the unsupervised progressive exercises was that 33% of the recommended number of sets would be performed by at least 30% of patients. In addition, patients and staff were interviewed about their experiences...
Automated lesion detection on MRI scans using combined unsupervised and supervised methods

International Nuclear Information System (INIS)

Guo, Dazhou; Fridriksson, Julius; Fillmore, Paul; Rorden, Christopher; Yu, Hongkai; Zheng, Kang; Wang, Song

2015-01-01

Accurate and precise detection of brain lesions on MR images (MRI) is paramount for accurately relating lesion location to impaired behavior. In this paper, we present a novel method to automatically detect brain lesions from a T1-weighted 3D MRI. The proposed method combines the advantages of both unsupervised and supervised methods. First, unsupervised methods perform a unified segmentation normalization to warp images from the native space into a standard space and to generate probability maps for different tissue types, e.g., gray matter, white matter and fluid. This allows us to construct an initial lesion probability map by comparing the normalized MRI to healthy control subjects. Then, we perform non-rigid and reversible atlas-based registration to refine the probability maps of gray matter, white matter, external CSF, ventricle, and lesions. These probability maps are combined with the normalized MRI to construct three types of features, with which we use supervised methods to train three support vector machine (SVM) classifiers for a combined classifier. Finally, the combined classifier is used to accomplish lesion detection. We tested this method using T1-weighted MRIs from 60 in-house stroke patients. Using leave-one-out cross validation, the proposed method can achieve an average Dice coefficient of 73.1 % when compared to lesion maps hand-delineated by trained neurologists. Furthermore, we tested the proposed method on the T1-weighted MRIs in the MICCAI BRATS 2012 dataset. The proposed method can achieve an average Dice coefficient of 66.5 % in comparison to the expert annotated tumor maps provided in MICCAI BRATS 2012 dataset. In addition, on these two test datasets, the proposed method shows competitive performance to three state-of-the-art methods, including Stamatakis et al., Seghier et al., and Sanjuan et al. In this paper, we introduced a novel automated procedure for lesion detection from T1-weighted MRIs by combining both an unsupervised and a
MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering

Directory of Open Access Journals (Sweden)

Ashlock Daniel

2009-08-01

Full Text Available Abstract Background Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. Results We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. Conclusion The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors.
MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering.

Science.gov (United States)

Kim, Eun-Youn; Kim, Seon-Young; Ashlock, Daniel; Nam, Dougu

2009-08-22

Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors.
PosQ: Unsupervised Fingerprinting and Visualization of GPS Positioning Quality

DEFF Research Database (Denmark)

Kjærgaard, Mikkel Baun; Weckemann, Kay

. This paper proposes PosQ, a system for unsupervised fingerprinting and visualization of GPS positioning quality. PosQ provides quality maps to position-based applications and visual overlays to users and managers to reveal the positioning quality in a local environment. The system reveals the quality both...
A Novel Unsupervised Adaptive Learning Method for Long-Term Electromyography (EMG) Pattern Recognition

Science.gov (United States)

Huang, Qi; Yang, Dapeng; Jiang, Li; Zhang, Huajie; Liu, Hong; Kotani, Kiyoshi

2017-01-01

Performance degradation will be caused by a variety of interfering factors for pattern recognition-based myoelectric control methods in the long term. This paper proposes an adaptive learning method with low computational cost to mitigate the effect in unsupervised adaptive learning scenarios. We presents a particle adaptive classifier (PAC), by constructing a particle adaptive learning strategy and universal incremental least square support vector classifier (LS-SVC). We compared PAC performance with incremental support vector classifier (ISVC) and non-adapting SVC (NSVC) in a long-term pattern recognition task in both unsupervised and supervised adaptive learning scenarios. Retraining time cost and recognition accuracy were compared by validating the classification performance on both simulated and realistic long-term EMG data. The classification results of realistic long-term EMG data showed that the PAC significantly decreased the performance degradation in unsupervised adaptive learning scenarios compared with NSVC (9.03% ± 2.23%, p < 0.05) and ISVC (13.38% ± 2.62%, p = 0.001), and reduced the retraining time cost compared with ISVC (2 ms per updating cycle vs. 50 ms per updating cycle). PMID:28608824
A Novel Unsupervised Adaptive Learning Method for Long-Term Electromyography (EMG Pattern Recognition

Directory of Open Access Journals (Sweden)

Qi Huang

2017-06-01

Full Text Available Performance degradation will be caused by a variety of interfering factors for pattern recognition-based myoelectric control methods in the long term. This paper proposes an adaptive learning method with low computational cost to mitigate the effect in unsupervised adaptive learning scenarios. We presents a particle adaptive classifier (PAC, by constructing a particle adaptive learning strategy and universal incremental least square support vector classifier (LS-SVC. We compared PAC performance with incremental support vector classifier (ISVC and non-adapting SVC (NSVC in a long-term pattern recognition task in both unsupervised and supervised adaptive learning scenarios. Retraining time cost and recognition accuracy were compared by validating the classification performance on both simulated and realistic long-term EMG data. The classification results of realistic long-term EMG data showed that the PAC significantly decreased the performance degradation in unsupervised adaptive learning scenarios compared with NSVC (9.03% ± 2.23%, p < 0.05 and ISVC (13.38% ± 2.62%, p = 0.001, and reduced the retraining time cost compared with ISVC (2 ms per updating cycle vs. 50 ms per updating cycle.
Towards Statistical Unsupervised Online Learning for Music Listening with Hearing Devices

DEFF Research Database (Denmark)

Purwins, Hendrik; Marchini, Marco; Marxer, Richard

of sounds into phonetic/instrument categories and learning of instrument event sequences is performed jointly using a Hierarchical Dirichlet Process Hidden Markov Model. Whereas machines often learn by processing a large data base and subsequently updating parameters of the algorithm, humans learn...... and their respective transition counts. We propose to use online learning for the co-evolution of both CI user and machine in (re-)learning musical language. [1] Marco Marchini and Hendrik Purwins. Unsupervised analysis and generation of audio percussion sequences. In International Symposium on Computer Music Modeling...... categories) as well as the temporal context horizon (e.g. storing up to 2-note sequences or up to 10-note sequences) is adaptable. The framework in [1] is based on two cognitively plausible principles: unsupervised learning and statistical learning. Opposed to supervised learning in primary school children...
Clustering for Different Scales of Measurement - the Gap-Ratio Weighted K-means Algorithm

OpenAIRE

Guérin, Joris; Gibaru, Olivier; Thiery, Stéphane; Nyiri, Eric

2017-01-01

This paper describes a method for clustering data that are spread out over large regions and which dimensions are on different scales of measurement. Such an algorithm was developed to implement a robotics application consisting in sorting and storing objects in an unsupervised way. The toy dataset used to validate such application consists of Lego bricks of different shapes and colors. The uncontrolled lighting conditions together with the use of RGB color features, respectively involve data...
Supervised and Unsupervised Speaker Adaptation in the NIST 2005 Speaker Recognition Evaluation

National Research Council Canada - National Science Library

Hansen, Eric G; Slyh, Raymond E; Anderson, Timothy R

2006-01-01

Starting in 2004, the annual NIST Speaker Recognition Evaluation (SRE) has added an optional unsupervised speaker adaptation track where test files are processed sequentially and one may update the target model...

Agglomerative concentric hypersphere clustering applied to structural damage detection

Science.gov (United States)

Silva, Moisés; Santos, Adam; Santos, Reginaldo; Figueiredo, Eloi; Sales, Claudomiro; Costa, João C. W. A.

2017-08-01

The present paper proposes a novel cluster-based method, named as agglomerative concentric hypersphere (ACH), to detect structural damage in engineering structures. Continuous structural monitoring systems often require unsupervised approaches to automatically infer the health condition of a structure. However, when a structure is under linear and nonlinear effects caused by environmental and operational variability, data normalization procedures are also required to overcome these effects. The proposed approach aims, through a straightforward clustering procedure, to discover automatically the optimal number of clusters, representing the main state conditions of a structural system. Three initialization procedures are introduced to evaluate the impact of deterministic and stochastic initializations on the performance of this approach. The ACH is compared to state-of-the-art approaches, based on Gaussian mixture models and Mahalanobis squared distance, on standard data sets from a post-tensioned bridge located in Switzerland: the Z-24 Bridge. The proposed approach demonstrates more efficiency in modeling the normal condition of the structure and its corresponding main clusters. Furthermore, it reveals a better classification performance than the alternative ones in terms of false-positive and false-negative indications of damage, demonstrating a promising applicability in real-world structural health monitoring scenarios.
Deep supervised, but not unsupervised, models may explain IT cortical representation.

Directory of Open Access Journals (Sweden)

Seyed-Mahdi Khaligh-Razavi

2014-11-01

Full Text Available Inferior temporal (IT cortex in human and nonhuman primates serves visual object recognition. Computational object-vision models, although continually improving, do not yet reach human performance. It is unclear to what extent the internal representations of computational models can explain the IT representation. Here we investigate a wide range of computational model representations (37 in total, testing their categorization performance and their ability to account for the IT representational geometry. The models include well-known neuroscientific object-recognition models (e.g. HMAX, VisNet along with several models from computer vision (e.g. SIFT, GIST, self-similarity features, and a deep convolutional neural network. We compared the representational dissimilarity matrices (RDMs of the model representations with the RDMs obtained from human IT (measured with fMRI and monkey IT (measured with cell recording for the same set of stimuli (not used in training the models. Better performing models were more similar to IT in that they showed greater clustering of representational patterns by category. In addition, better performing models also more strongly resembled IT in terms of their within-category representational dissimilarities. Representational geometries were significantly correlated between IT and many of the models. However, the categorical clustering observed in IT was largely unexplained by the unsupervised models. The deep convolutional network, which was trained by supervision with over a million category-labeled images, reached the highest categorization performance and also best explained IT, although it did not fully explain the IT data. Combining the features of this model with appropriate weights and adding linear combinations that maximize the margin between animate and inanimate objects and between faces and other objects yielded a representation that fully explained our IT data. Overall, our results suggest that explaining
Teacher and learner: Supervised and unsupervised learning in communities.

Science.gov (United States)

Shafto, Michael G; Seifert, Colleen M

2015-01-01

How far can teaching methods go to enhance learning? Optimal methods of teaching have been considered in research on supervised and unsupervised learning. Locally optimal methods are usually hybrids of teaching and self-directed approaches. The costs and benefits of specific methods have been shown to depend on the structure of the learning task, the learners, the teachers, and the environment.
Flexible manifold embedding: a framework for semi-supervised and unsupervised dimension reduction.

Science.gov (United States)

Nie, Feiping; Xu, Dong; Tsang, Ivor Wai-Hung; Zhang, Changshui

2010-07-01

We propose a unified manifold learning framework for semi-supervised and unsupervised dimension reduction by employing a simple but effective linear regression function to map the new data points. For semi-supervised dimension reduction, we aim to find the optimal prediction labels F for all the training samples X, the linear regression function h(X) and the regression residue F(0) = F - h(X) simultaneously. Our new objective function integrates two terms related to label fitness and manifold smoothness as well as a flexible penalty term defined on the residue F(0). Our Semi-Supervised learning framework, referred to as flexible manifold embedding (FME), can effectively utilize label information from labeled data as well as a manifold structure from both labeled and unlabeled data. By modeling the mismatch between h(X) and F, we show that FME relaxes the hard linear constraint F = h(X) in manifold regularization (MR), making it better cope with the data sampled from a nonlinear manifold. In addition, we propose a simplified version (referred to as FME/U) for unsupervised dimension reduction. We also show that our proposed framework provides a unified view to explain and understand many semi-supervised, supervised and unsupervised dimension reduction techniques. Comprehensive experiments on several benchmark databases demonstrate the significant improvement over existing dimension reduction algorithms.
Voxel-based clustered imaging by multiparameter diffusion tensor images for glioma grading.

Science.gov (United States)

Inano, Rika; Oishi, Naoya; Kunieda, Takeharu; Arakawa, Yoshiki; Yamao, Yukihiro; Shibata, Sumiya; Kikuchi, Takayuki; Fukuyama, Hidenao; Miyamoto, Susumu

2014-01-01

Gliomas are the most common intra-axial primary brain tumour; therefore, predicting glioma grade would influence therapeutic strategies. Although several methods based on single or multiple parameters from diagnostic images exist, a definitive method for pre-operatively determining glioma grade remains unknown. We aimed to develop an unsupervised method using multiple parameters from pre-operative diffusion tensor images for obtaining a clustered image that could enable visual grading of gliomas. Fourteen patients with low-grade gliomas and 19 with high-grade gliomas underwent diffusion tensor imaging and three-dimensional T1-weighted magnetic resonance imaging before tumour resection. Seven features including diffusion-weighted imaging, fractional anisotropy, first eigenvalue, second eigenvalue, third eigenvalue, mean diffusivity and raw T2 signal with no diffusion weighting, were extracted as multiple parameters from diffusion tensor imaging. We developed a two-level clustering approach for a self-organizing map followed by the K-means algorithm to enable unsupervised clustering of a large number of input vectors with the seven features for the whole brain. The vectors were grouped by the self-organizing map as protoclusters, which were classified into the smaller number of clusters by K-means to make a voxel-based diffusion tensor-based clustered image. Furthermore, we also determined if the diffusion tensor-based clustered image was really helpful for predicting pre-operative glioma grade in a supervised manner. The ratio of each class in the diffusion tensor-based clustered images was calculated from the regions of interest manually traced on the diffusion tensor imaging space, and the common logarithmic ratio scales were calculated. We then applied support vector machine as a classifier for distinguishing between low- and high-grade gliomas. Consequently, the sensitivity, specificity, accuracy and area under the curve of receiver operating characteristic
Cluster analysis of polymers using laser-induced breakdown spectroscopy with K-means

Science.gov (United States)

Yangmin, GUO; Yun, TANG; Yu, DU; Shisong, TANG; Lianbo, GUO; Xiangyou, LI; Yongfeng, LU; Xiaoyan, ZENG

2018-06-01

Laser-induced breakdown spectroscopy (LIBS) combined with K-means algorithm was employed to automatically differentiate industrial polymers under atmospheric conditions. The unsupervised learning algorithm K-means were utilized for the clustering of LIBS dataset measured from twenty kinds of industrial polymers. To prevent the interference from metallic elements, three atomic emission lines (C I 247.86 nm , H I 656.3 nm, and O I 777.3 nm) and one molecular line C–N (0, 0) 388.3 nm were used. The cluster analysis results were obtained through an iterative process. The Davies–Bouldin index was employed to determine the initial number of clusters. The average relative standard deviation values of characteristic spectral lines were used as the iterative criterion. With the proposed approach, the classification accuracy for twenty kinds of industrial polymers achieved 99.6%. The results demonstrated that this approach has great potential for industrial polymers recycling by LIBS.
WESTERN CHARPATHIAN RURAL MOUNTAIN TOURISM MAPPING THROUGH CLUSTER METHODOLOGY

Directory of Open Access Journals (Sweden)

Elena TOMA

2013-10-01

Full Text Available Rural tourism from Western Carpathian Mountain was characterized in the last years by a low occupancy rate and a decline in tourist arrivals, due, beside of the direct effects of economic crises, to the remote location of mountain villages and to the low quality of infrastructure. For this reason we consider that the implementation of complex and integrated products based on tour thematic circuits represents a real opportunity to develop local rural tourism industry. The aim of this paper is to identify which is the best networking solution, based on clustering analysis. The Multidimensional Scaling Method and Hierarchical Cluster Method permitted us to demonstrate and identify the best way of clustering, and, in this way, the best route for a potential tour touristic circuit. Reported to the counties from which the villages take part, the identified cluster concentrate 57.7% of rural touristic accommodations and 65.0% of tourist arrivals, but it has an occupancy rate of only 5.9%. By implementing new complex touristic products we consider that can be assured a rise of this touristic dimension of the cluster and we propose more in depth studies regarding the profile of the potential customers.
Classification of high-resolution multi-swath hyperspectral data using Landsat 8 surface reflectance data as a calibration target and a novel histogram based unsupervised classification technique to determine natural classes from biophysically relevant fit parameters

Science.gov (United States)

McCann, C.; Repasky, K. S.; Morin, M.; Lawrence, R. L.; Powell, S. L.

2016-12-01

Compact, cost-effective, flight-based hyperspectral imaging systems can provide scientifically relevant data over large areas for a variety of applications such as ecosystem studies, precision agriculture, and land management. To fully realize this capability, unsupervised classification techniques based on radiometrically-calibrated data that cluster based on biophysical similarity rather than simply spectral similarity are needed. An automated technique to produce high-resolution, large-area, radiometrically-calibrated hyperspectral data sets based on the Landsat surface reflectance data product as a calibration target was developed and applied to three subsequent years of data covering approximately 1850 hectares. The radiometrically-calibrated data allows inter-comparison of the temporal series. Advantages of the radiometric calibration technique include the need for minimal site access, no ancillary instrumentation, and automated processing. Fitting the reflectance spectra of each pixel using a set of biophysically relevant basis functions reduces the data from 80 spectral bands to 9 parameters providing noise reduction and data compression. Examination of histograms of these parameters allows for determination of natural splitting into biophysical similar clusters. This method creates clusters that are similar in terms of biophysical parameters, not simply spectral proximity. Furthermore, this method can be applied to other data sets, such as urban scenes, by developing other physically meaningful basis functions. The ability to use hyperspectral imaging for a variety of important applications requires the development of data processing techniques that can be automated. The radiometric-calibration combined with the histogram based unsupervised classification technique presented here provide one potential avenue for managing big-data associated with hyperspectral imaging.
AHIMSA - Ad hoc histogram information measure sensing algorithm for feature selection in the context of histogram inspired clustering techniques

Science.gov (United States)

Dasarathy, B. V.

1976-01-01

An algorithm is proposed for dimensionality reduction in the context of clustering techniques based on histogram analysis. The approach is based on an evaluation of the hills and valleys in the unidimensional histograms along the different features and provides an economical means of assessing the significance of the features in a nonparametric unsupervised data environment. The method has relevance to remote sensing applications.
AIDEN: A Density Conscious Artificial Immune System for Automatic Discovery of Arbitrary Shape Clusters in Spatial Patterns

Directory of Open Access Journals (Sweden)

Vishwambhar Pathak

2012-11-01

Full Text Available Recent efforts in modeling of dynamics of the natural immune cells leading to artificial immune systems (AIS have ignited contemporary research interest in finding out its analogies to real world problems. The AIS models have been vastly exploited to develop dependable robust
solutions to clustering. Most of the traditional clustering methods bear limitations in their capability to detect clusters of arbitrary shapes in a fully unsupervised manner. In this paper the recognition and communication dynamics of T Cell Receptors, the recognizing elements in innate immune
system, has been modeled with a kernel density estimation method. The model has been shown to successfully discover non spherical clusters in spatial patterns. Modeling the cohesion of the antibodies and pathogens with ‘local influence’ measure inducts comprehensive extension of the
antibody representation ball (ARB, which in turn corresponds to controlled expansion of clusters and prevents overfitting.
Unsupervised Learning of Spatiotemporal Features by Video Completion

OpenAIRE

Nallabolu, Adithya Reddy

2017-01-01

In this work, we present an unsupervised representation learning approach for learning rich spatiotemporal features from videos without the supervision from semantic labels. We propose to learn the spatiotemporal features by training a 3D convolutional neural network (CNN) using video completion as a surrogate task. Using a large collection of unlabeled videos, we train the CNN to predict the missing pixels of a spatiotemporal hole given the remaining parts of the video through minimizing per...
Supervised and unsupervised condition monitoring of non-stationary acoustic emission signals

DEFF Research Database (Denmark)

Sigurdsson, Sigurdur; Pontoppidan, Niels Henrik; Larsen, Jan

2005-01-01

condition changes across load changes. In this paper we approach this load interpolation problem with supervised and unsupervised learning, i.e. model with normal and fault examples and normal examples only, respectively. We apply non-linear methods for the learning of engine condition changes. Both...
Evaluating unsupervised methods to size and classify suspended particles using digital in-line holography

Science.gov (United States)

Davies, Emlyn J.; Buscombe, Daniel D.; Graham, George W.; Nimmo-Smith, W. Alex M.

2015-01-01

Substantial information can be gained from digital in-line holography of marine particles, eliminating depth-of-field and focusing errors associated with standard lens-based imaging methods. However, for the technique to reach its full potential in oceanographic research, fully unsupervised (automated) methods are required for focusing, segmentation, sizing and classification of particles. These computational challenges are the subject of this paper, in which we draw upon data collected using a variety of holographic systems developed at Plymouth University, UK, from a significant range of particle types, sizes and shapes. A new method for noise reduction in reconstructed planes is found to be successful in aiding particle segmentation and sizing. The performance of an automated routine for deriving particle characteristics (and subsequent size distributions) is evaluated against equivalent size metrics obtained by a trained operative measuring grain axes on screen. The unsupervised method is found to be reliable, despite some errors resulting from over-segmentation of particles. A simple unsupervised particle classification system is developed, and is capable of successfully differentiating sand grains, bubbles and diatoms from within the surf-zone. Avoiding miscounting bubbles and biological particles as sand grains enables more accurate estimates of sand concentrations, and is especially important in deployments of particle monitoring instrumentation in aerated water. Perhaps the greatest potential for further development in the computational aspects of particle holography is in the area of unsupervised particle classification. The simple method proposed here provides a foundation upon which further development could lead to reliable identification of more complex particle populations, such as those containing phytoplankton, zooplankton, flocculated cohesive sediments and oil droplets.
Unsupervised categorization with individuals diagnosed as having moderate traumatic brain injury: Over-selective responding.

Science.gov (United States)

Edwards, Darren J; Wood, Rodger

2016-01-01

This study explored over-selectivity (executive dysfunction) using a standard unsupervised categorization task. Over-selectivity has been demonstrated using supervised categorization procedures (where training is given); however, little has been done in the way of unsupervised categorization (without training). A standard unsupervised categorization task was used to assess levels of over-selectivity in a traumatic brain injury (TBI) population. Individuals with TBI were selected from the Tertiary Traumatic Brain Injury Clinic at Swansea University and were asked to categorize two-dimensional items (pictures on cards), into groups that they felt were most intuitive, and without any learning (feedback from experimenter). This was compared against categories made by a control group for the same task. The findings of this study demonstrate that individuals with TBI had deficits for both easy and difficult categorization sets, as indicated by a larger amount of one-dimensional sorting compared to control participants. Deficits were significantly greater for the easy condition. The implications of these findings are discussed in the context of over-selectivity, and the processes that underlie this deficit. Also, the implications for using this procedure as a screening measure for over-selectivity in TBI are discussed.
Similarity maps and hierarchical clustering for annotating FT-IR spectral images.

Science.gov (United States)

Zhong, Qiaoyong; Yang, Chen; Großerüschkamp, Frederik; Kallenbach-Thieltges, Angela; Serocka, Peter; Gerwert, Klaus; Mosig, Axel

2013-11-20

Unsupervised segmentation of multi-spectral images plays an important role in annotating infrared microscopic images and is an essential step in label-free spectral histopathology. In this context, diverse clustering approaches have been utilized and evaluated in order to achieve segmentations of Fourier Transform Infrared (FT-IR) microscopic images that agree with histopathological characterization. We introduce so-called interactive similarity maps as an alternative annotation strategy for annotating infrared microscopic images. We demonstrate that segmentations obtained from interactive similarity maps lead to similarly accurate segmentations as segmentations obtained from conventionally used hierarchical clustering approaches. In order to perform this comparison on quantitative grounds, we provide a scheme that allows to identify non-horizontal cuts in dendrograms. This yields a validation scheme for hierarchical clustering approaches commonly used in infrared microscopy. We demonstrate that interactive similarity maps may identify more accurate segmentations than hierarchical clustering based approaches, and thus are a viable and due to their interactive nature attractive alternative to hierarchical clustering. Our validation scheme furthermore shows that performance of hierarchical two-means is comparable to the traditionally used Ward's clustering. As the former is much more efficient in time and memory, our results suggest another less resource demanding alternative for annotating large spectral images.
Six weeks of unsupervised Nintendo Wii Fit gaming is effective at improving balance in independent older adults.

Science.gov (United States)

Nicholson, Vaughan Patrick; McKean, Mark; Lowe, John; Fawcett, Christine; Burkett, Brendan

2015-01-01

To determine the effectiveness of unsupervised Nintendo Wii Fit balance training in older adults. Forty-one older adults were recruited from local retirement villages and educational settings to participate in a six-week two-group repeated measures study. The Wii group (n = 19, 75 ± 6 years) undertook 30 min of unsupervised Wii balance gaming three times per week in their retirement village while the comparison group (n = 22, 74 ± 5 years) continued with their usual exercise program. Participants' balance abilities were assessed pre- and postintervention. The Wii Fit group demonstrated significant improvements (P balance, lateral reach (left and right), and gait speed compared with the comparison group. Reported levels of enjoyment following game play increased during the study. Six weeks of unsupervised Wii balance training is an effective modality for improving balance in independent older adults.
Function approximation using combined unsupervised and supervised learning.

Science.gov (United States)

Andras, Peter

2014-03-01

Function approximation is one of the core tasks that are solved using neural networks in the context of many engineering problems. However, good approximation results need good sampling of the data space, which usually requires exponentially increasing volume of data as the dimensionality of the data increases. At the same time, often the high-dimensional data is arranged around a much lower dimensional manifold. Here we propose the breaking of the function approximation task for high-dimensional data into two steps: (1) the mapping of the high-dimensional data onto a lower dimensional space corresponding to the manifold on which the data resides and (2) the approximation of the function using the mapped lower dimensional data. We use over-complete self-organizing maps (SOMs) for the mapping through unsupervised learning, and single hidden layer neural networks for the function approximation through supervised learning. We also extend the two-step procedure by considering support vector machines and Bayesian SOMs for the determination of the best parameters for the nonlinear neurons in the hidden layer of the neural networks used for the function approximation. We compare the approximation performance of the proposed neural networks using a set of functions and show that indeed the neural networks using combined unsupervised and supervised learning outperform in most cases the neural networks that learn the function approximation using the original high-dimensional data.
Klastery v institucional'noj proekcii: k teorii i metodologii lokal'nogo social'no-jekonomicheskogo razvitija [Clusters in the institutional perspective: on the theory and methodology of local socioeconomic development

Directory of Open Access Journals (Sweden)

Gareev Timur

2012-01-01

Full Text Available This article addresses the problem of definition and identification of clusters as localized mesoeconomic systems with fuzzy boundaries that stimulate the development of these systems. The author analyses the influence of the inductive approach to the formation of cluster theory and juxtaposes different typologies of clusters and other types of localized economic systems. The article offers an overview of the existing methodological approaches to the problem of cluster identification and emphasises the major role of institutional dimension in the identification (and functioning of clusters, especially in comparison to cluster formation theory based on the technological connection of adjacent units. The author comes to a conclusion that, without the inclusion of institutional factors, alongside localising and technological ones (demonstrated through different variables, it is virtually impossible to develop an independent cluster theory, different from the general agglomeration theory. For the first time, a hierarchy of institutions affecting the formation of local economic systems is considered against the background of the identification of institutional levels, whose full development makes it possible to speak of the formation of clusters as most successful mesoeconomic systems. At the same time, the author emphasizes that, in economies gravitating towards the market type of organisation, the development of mesoeconomic systems is closely connected to competition for innovative rent. The article outlines the methodology for cluster studies, which makes it possible to consider such relatively new to the regional science phenomena as innovative and “transborder” clusters.
Damage detection methodology under variable load conditions based on strain field pattern recognition using FBGs, nonlinear principal component analysis, and clustering techniques

Science.gov (United States)

Sierra-Pérez, Julián; Torres-Arredondo, M.-A.; Alvarez-Montoya, Joham

2018-01-01

Structural health monitoring consists of using sensors integrated within structures together with algorithms to perform load monitoring, damage detection, damage location, damage size and severity, and prognosis. One possibility is to use strain sensors to infer structural integrity by comparing patterns in the strain field between the pristine and damaged conditions. In previous works, the authors have demonstrated that it is possible to detect small defects based on strain field pattern recognition by using robust machine learning techniques. They have focused on methodologies based on principal component analysis (PCA) and on the development of several unfolding and standardization techniques, which allow dealing with multiple load conditions. However, before a real implementation of this approach in engineering structures, changes in the strain field due to conditions different from damage occurrence need to be isolated. Since load conditions may vary in most engineering structures and promote significant changes in the strain field, it is necessary to implement novel techniques for uncoupling such changes from those produced by damage occurrence. A damage detection methodology based on optimal baseline selection (OBS) by means of clustering techniques is presented. The methodology includes the use of hierarchical nonlinear PCA as a nonlinear modeling technique in conjunction with Q and nonlinear-T 2 damage indices. The methodology is experimentally validated using strain measurements obtained by 32 fiber Bragg grating sensors bonded to an aluminum beam under dynamic bending loads and simultaneously submitted to variations in its pitch angle. The results demonstrated the capability of the methodology for clustering data according to 13 different load conditions (pitch angles), performing the OBS and detecting six different damages induced in a cumulative way. The proposed methodology showed a true positive rate of 100% and a false positive rate of 1.28% for a
Automatic segmentation of dynamic neuroreceptor single-photon emission tomography images using fuzzy clustering

International Nuclear Information System (INIS)

Acton, P.D.; Pilowsky, L.S.; Kung, H.F.; Ell, P.J.

1999-01-01

The segmentation of medical images is one of the most important steps in the analysis and quantification of imaging data. However, partial volume artefacts make accurate tissue boundary definition difficult, particularly for images with lower resolution commonly used in nuclear medicine. In single-photon emission tomography (SPET) neuroreceptor studies, areas of specific binding are usually delineated by manually drawing regions of interest (ROIs), a time-consuming and subjective process. This paper applies the technique of fuzzy c-means clustering (FCM) to automatically segment dynamic neuroreceptor SPET images. Fuzzy clustering was tested using a realistic, computer-generated, dynamic SPET phantom derived from segmenting an MR image of an anthropomorphic brain phantom. Also, the utility of applying FCM to real clinical data was assessed by comparison against conventional ROI analysis of iodine-123 iodobenzamide (IBZM) binding to dopamine D 2 /D 3 receptors in the brains of humans. In addition, a further test of the methodology was assessed by applying FCM segmentation to [ 123 I]IDAM images (5-iodo-2-[[2-2-[(dimethylamino)methyl]phenyl]thio] benzyl alcohol) of serotonin transporters in non-human primates. In the simulated dynamic SPET phantom, over a wide range of counts and ratios of specific binding to background, FCM correlated very strongly with the true counts (correlation coefficient r 2 >0.99, P 123 I]IBZM data comparable with manual ROI analysis, with the binding ratios derived from both methods significantly correlated (r 2 =0.83, P<0.0001). Fuzzy clustering is a powerful tool for the automatic, unsupervised segmentation of dynamic neuroreceptor SPET images. Where other automated techniques fail completely, and manual ROI definition would be highly subjective, FCM is capable of segmenting noisy images in a robust and repeatable manner. (orig.)

Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings

OpenAIRE

Aldarmaki, Hanan; Mohan, Mahesh; Diab, Mona

2017-01-01

Most existing methods for automatic bilingual dictionary induction rely on prior alignments between the source and target languages, such as parallel corpora or seed dictionaries. For many language pairs, such supervised alignments are not readily available. We propose an unsupervised approach for learning a bilingual dictionary for a pair of languages given their independently-learned monolingual word embeddings. The proposed method exploits local and global structures in monolingual vector ...
Changing cluster composition in cluster randomised controlled trials: design and analysis considerations

Science.gov (United States)

2014-01-01

Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations
Landsat TM band 431 combine on clustering analysis for pattern recognition land use using idrisi 4.2 software

International Nuclear Information System (INIS)

Wiweka, Arief H.; Izzawati, Tjahyaningsih A.

1997-01-01

The recognition of earth object's pattern which is recorded on remote sensing digital image can do by classification process based on the group of spectral pixel value. The spectral assessment on a spatial which represent the object characteristic can be helped through supervised or unsupervised. On certain case, there no media, such as maps, airborne, photo, the capability of field observation and the knowledge of object's location. Classification process can be done by clustering. The group of pixel based on the wide of the whole value interval of spectral image, then the class group base on the desired accuracy. The clustering method in Idris 4.2 software equipments are sequential method, statistic, iso data, and RGB. The clustering existence can help pre-process pattern recognition
Validation of a free software for unsupervised assessment of abdominal fat in MRI.

Science.gov (United States)

Maddalo, Michele; Zorza, Ivan; Zubani, Stefano; Nocivelli, Giorgio; Calandra, Giulio; Soldini, Pierantonio; Mascaro, Lorella; Maroldi, Roberto

2017-05-01

To demonstrate the accuracy of an unsupervised (fully automated) software for fat segmentation in magnetic resonance imaging. The proposed software is a freeware solution developed in ImageJ that enables the quantification of metabolically different adipose tissues in large cohort studies. The lumbar part of the abdomen (19cm in craniocaudal direction, centered in L3) of eleven healthy volunteers (age range: 21-46years, BMI range: 21.7-31.6kg/m 2 ) was examined in a breath hold on expiration with a GE T1 Dixon sequence. Single-slice and volumetric data were considered for each subject. The results of the visceral and subcutaneous adipose tissue assessments obtained by the unsupervised software were compared to supervised segmentations of reference. The associated statistical analysis included Pearson correlations, Bland-Altman plots and volumetric differences (VD % ). Values calculated by the unsupervised software significantly correlated with corresponding supervised segmentations of reference for both subcutaneous adipose tissue - SAT (R=0.9996, psoftware is capable of segmenting the metabolically different adipose tissues with a high degree of accuracy. This free add-on software for ImageJ can easily have a widespread and enable large-scale population studies regarding the adipose tissue and its related diseases. Copyright © 2017 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.
Modeling electronic defects in La2CuO4 and LiCl using embedded quantum cluster methodology

International Nuclear Information System (INIS)

Grimes, R.W.; Shluger, A.L.; Baetzold, R.; Catlow, C.R.A.

1991-01-01

By exploiting recent developments in computer simulation methods the authors modeled the behavior of hole states in La 2 CuO 4 and excited state defects such as the exciton in LiCl. The authors methodology employs a Hartree-Fock cluster embedded in a classical surround. Although the method is discussed with respect to the hole and exciton defects in particular, the scope of the talk includes other material problems currently being investigated by this method. Thus, the types of problems for which the method are appropriate are illustrated and the present limitations are discussed
Approximate fuzzy C-means (AFCM) cluster analysis of medical magnetic resonance image (MRI) data

International Nuclear Information System (INIS)

DelaPaz, R.L.; Chang, P.J.; Bernstein, R.; Dave, J.V.

1987-01-01

The authors describe the application of an approximate fuzzy C-means (AFCM) clustering algorithm as a data dimension reduction approach to medical magnetic resonance images (MRI). Image data consisted of one T1-weighted, two T2-weighted, and one T2*-weighted (magnetic susceptibility) image for each cranial study and a matrix of 10 images generated from 10 combinations of TE and TR for each body lymphoma study. All images were obtained with a 1.5 Tesla imaging system (GE Signa). Analyses were performed on over 100 MR image sets with a variety of pathologies. The cluster analysis was operated in an unsupervised mode and computational overhead was minimized by utilizing a table look-up approach without adversely affecting accuracy. Image data were first segmented into 2 coarse clusters, each of which was then subdivided into 16 fine clusters. The final tissue classifications were presented as color-coded anatomically-mapped images and as two and three dimensional displays of cluster center data in selected feature space (minimum spanning tree). Fuzzy cluster analysis appears to be a clinically useful dimension reduction technique which results in improved diagnostic specificity of medical magnetic resonance images
Unsupervised Language Acquisition

Science.gov (United States)

de Marcken, Carl

1996-11-01

This thesis presents a computational theory of unsupervised language acquisition, precisely defining procedures for learning language from ordinary spoken or written utterances, with no explicit help from a teacher. The theory is based heavily on concepts borrowed from machine learning and statistical estimation. In particular, learning takes place by fitting a stochastic, generative model of language to the evidence. Much of the thesis is devoted to explaining conditions that must hold for this general learning strategy to arrive at linguistically desirable grammars. The thesis introduces a variety of technical innovations, among them a common representation for evidence and grammars, and a learning strategy that separates the ``content'' of linguistic parameters from their representation. Algorithms based on it suffer from few of the search problems that have plagued other computational approaches to language acquisition. The theory has been tested on problems of learning vocabularies and grammars from unsegmented text and continuous speech, and mappings between sound and representations of meaning. It performs extremely well on various objective criteria, acquiring knowledge that causes it to assign almost exactly the same structure to utterances as humans do. This work has application to data compression, language modeling, speech recognition, machine translation, information retrieval, and other tasks that rely on either structural or stochastic descriptions of language.
Time series clustering in large data sets

Directory of Open Access Journals (Sweden)

Jiří Fejfar

2011-01-01

Full Text Available The clustering of time series is a widely researched area. There are many methods for dealing with this task. We are actually using the Self-organizing map (SOM with the unsupervised learning algorithm for clustering of time series. After the first experiment (Fejfar, Weinlichová, Šťastný, 2009 it seems that the whole concept of the clustering algorithm is correct but that we have to perform time series clustering on much larger dataset to obtain more accurate results and to find the correlation between configured parameters and results more precisely. The second requirement arose in a need for a well-defined evaluation of results. It seems useful to use sound recordings as instances of time series again. There are many recordings to use in digital libraries, many interesting features and patterns can be found in this area. We are searching for recordings with the similar development of information density in this experiment. It can be used for musical form investigation, cover songs detection and many others applications.The objective of the presented paper is to compare clustering results made with different parameters of feature vectors and the SOM itself. We are describing time series in a simplistic way evaluating standard deviations for separated parts of recordings. The resulting feature vectors are clustered with the SOM in batch training mode with different topologies varying from few neurons to large maps.There are other algorithms discussed, usable for finding similarities between time series and finally conclusions for further research are presented. We also present an overview of the related actual literature and projects.
Clustering of tethered satellite system simulation data by an adaptive neuro-fuzzy algorithm

Science.gov (United States)

Mitra, Sunanda; Pemmaraju, Surya

1992-01-01

Recent developments in neuro-fuzzy systems indicate that the concepts of adaptive pattern recognition, when used to identify appropriate control actions corresponding to clusters of patterns representing system states in dynamic nonlinear control systems, may result in innovative designs. A modular, unsupervised neural network architecture, in which fuzzy learning rules have been embedded is used for on-line identification of similar states. The architecture and control rules involved in Adaptive Fuzzy Leader Clustering (AFLC) allow this system to be incorporated in control systems for identification of system states corresponding to specific control actions. We have used this algorithm to cluster the simulation data of Tethered Satellite System (TSS) to estimate the range of delta voltages necessary to maintain the desired length rate of the tether. The AFLC algorithm is capable of on-line estimation of the appropriate control voltages from the corresponding length error and length rate error without a priori knowledge of their membership functions and familarity with the behavior of the Tethered Satellite System.
Unsupervised Learning of Action Primitives

DEFF Research Database (Denmark)

Baby, Sanmohan; Krüger, Volker; Kragic, Danica

2010-01-01

and scale, the use of the object can provide a strong invariant for the detection of motion primitives. In this paper we propose an unsupervised learning approach for action primitives that makes use of the human movements as well as the object state changes. We group actions according to the changes......Action representation is a key issue in imitation learning for humanoids. With the recent finding of mirror neurons there has been a growing interest in expressing actions as a combination meaningful subparts called primitives. Primitives could be thought of as an alphabet for the human actions....... In this paper we observe that human actions and objects can be seen as being intertwined: we can interpret actions from the way the body parts are moving, but as well from how their effect on the involved object. While human movements can look vastly different even under minor changes in location, orientation...
Analyzing Dynamic Probabilistic Risk Assessment Data through Topology-Based Clustering

Energy Technology Data Exchange (ETDEWEB)

Diego Mandelli; Dan Maljovec; BeiWang; Valerio Pascucci; Peer-Timo Bremer

2013-09-01

We investigate the use of a topology-based clustering technique on the data generated by dynamic event tree methodologies. The clustering technique we utilizes focuses on a domain-partitioning algorithm based on topological structures known as the Morse-Smale complex, which partitions the data points into clusters based on their uniform gradient flow behavior. We perform both end state analysis and transient analysis to classify the set of nuclear scenarios. We demonstrate our methodology on a dataset generated for a sodium-cooled fast reactor during an aircraft crash scenario. The simulation tracks the temperature of the reactor as well as the time for a recovery team to fix the passive cooling system. Combined with clustering results obtained previously through mean shift methodology, we present the user with complementary views of the data that help illuminate key features that may be otherwise hidden using a single methodology. By clustering the data, the number of relevant test cases to be selected for further analysis can be drastically reduced by selecting a representative from each cluster. Identifying the similarities of simulations within a cluster can also aid in the drawing of important conclusions with respect to safety analysis.
An Intelligent Clustering Based Methodology for Confusable ...

African Journals Online (AJOL)

Journal of the Nigerian Association of Mathematical Physics ... The system assigns patients with severity levels in all the clusters. ... The system compares favorably with diagnosis arrived at by experienced physicians and also provides patients' level of severity in each confusable disease and the degree of confusability of ...
Hierarchical Multiple Markov Chain Model for Unsupervised Texture Segmentation

Czech Academy of Sciences Publication Activity Database

Scarpa, G.; Gaetano, R.; Haindl, Michal; Zerubia, J.

2009-01-01

Roč. 18, č. 8 (2009), s. 1830-1843 ISSN 1057-7149 R&D Projects: GA ČR GA102/08/0593 EU Projects: European Commission(XE) 507752 - MUSCLE Institutional research plan: CEZ:AV0Z10750506 Keywords : Classification * texture analysis * segmentation * hierarchical image models * Markov process Subject RIV: BD - Theory of Information Impact factor: 2.848, year: 2009 http://library.utia.cas.cz/separaty/2009/RO/haindl-hierarchical multiple markov chain model for unsupervised texture segmentation.pdf
Automated and unsupervised detection of malarial parasites in microscopic images

Directory of Open Access Journals (Sweden)

Purwar Yashasvi

2011-12-01

Full Text Available Abstract Background Malaria is a serious infectious disease. According to the World Health Organization, it is responsible for nearly one million deaths each year. There are various techniques to diagnose malaria of which manual microscopy is considered to be the gold standard. However due to the number of steps required in manual assessment, this diagnostic method is time consuming (leading to late diagnosis and prone to human error (leading to erroneous diagnosis, even in experienced hands. The focus of this study is to develop a robust, unsupervised and sensitive malaria screening technique with low material cost and one that has an advantage over other techniques in that it minimizes human reliance and is, therefore, more consistent in applying diagnostic criteria. Method A method based on digital image processing of Giemsa-stained thin smear image is developed to facilitate the diagnostic process. The diagnosis procedure is divided into two parts; enumeration and identification. The image-based method presented here is designed to automate the process of enumeration and identification; with the main advantage being its ability to carry out the diagnosis in an unsupervised manner and yet have high sensitivity and thus reducing cases of false negatives. Results The image based method is tested over more than 500 images from two independent laboratories. The aim is to distinguish between positive and negative cases of malaria using thin smear blood slide images. Due to the unsupervised nature of method it requires minimal human intervention thus speeding up the whole process of diagnosis. Overall sensitivity to capture cases of malaria is 100% and specificity ranges from 50-88% for all species of malaria parasites. Conclusion Image based screening method will speed up the whole process of diagnosis and is more advantageous over laboratory procedures that are prone to errors and where pathological expertise is minimal. Further this method
Unsupervised Symbolization of Signal Time Series for Extraction of the Embedded Information

Directory of Open Access Journals (Sweden)

Yue Li

2017-03-01

Full Text Available This paper formulates an unsupervised algorithm for symbolization of signal time series to capture the embedded dynamic behavior. The key idea is to convert time series of the digital signal into a string of (spatially discrete symbols from which the embedded dynamic information can be extracted in an unsupervised manner (i.e., no requirement for labeling of time series. The main challenges here are: (1 definition of the symbol assignment for the time series; (2 identification of the partitioning segment locations in the signal space of time series; and (3 construction of probabilistic finite-state automata (PFSA from the symbol strings that contain temporal patterns. The reported work addresses these challenges by maximizing the mutual information measures between symbol strings and PFSA states. The proposed symbolization method has been validated by numerical simulation as well as by experimentation in a laboratory environment. Performance of the proposed algorithm has been compared to that of two commonly used algorithms of time series partitioning.
Improved Anomaly Detection using Integrated Supervised and Unsupervised Processing

Science.gov (United States)

Hunt, B.; Sheppard, D. G.; Wetterer, C. J.

There are two broad technologies of signal processing applicable to space object feature identification using nonresolved imagery: supervised processing analyzes a large set of data for common characteristics that can be then used to identify, transform, and extract information from new data taken of the same given class (e.g. support vector machine); unsupervised processing utilizes detailed physics-based models that generate comparison data that can then be used to estimate parameters presumed to be governed by the same models (e.g. estimation filters). Both processes have been used in non-resolved space object identification and yield similar results yet arrived at using vastly different processes. The goal of integrating the results of the two is to seek to achieve an even greater performance by building on the process diversity. Specifically, both supervised processing and unsupervised processing will jointly operate on the analysis of brightness (radiometric flux intensity) measurements reflected by space objects and observed by a ground station to determine whether a particular day conforms to a nominal operating mode (as determined from a training set) or exhibits anomalous behavior where a particular parameter (e.g. attitude, solar panel articulation angle) has changed in some way. It is demonstrated in a variety of different scenarios that the integrated process achieves a greater performance than each of the separate processes alone.
UNSUPERVISED TRANSIENT LIGHT CURVE ANALYSIS VIA HIERARCHICAL BAYESIAN INFERENCE

International Nuclear Information System (INIS)

Sanders, N. E.; Soderberg, A. M.; Betancourt, M.

2015-01-01

Historically, light curve studies of supernovae (SNe) and other transient classes have focused on individual objects with copious and high signal-to-noise observations. In the nascent era of wide field transient searches, objects with detailed observations are decreasing as a fraction of the overall known SN population, and this strategy sacrifices the majority of the information contained in the data about the underlying population of transients. A population level modeling approach, simultaneously fitting all available observations of objects in a transient sub-class of interest, fully mines the data to infer the properties of the population and avoids certain systematic biases. We present a novel hierarchical Bayesian statistical model for population level modeling of transient light curves, and discuss its implementation using an efficient Hamiltonian Monte Carlo technique. As a test case, we apply this model to the Type IIP SN sample from the Pan-STARRS1 Medium Deep Survey, consisting of 18,837 photometric observations of 76 SNe, corresponding to a joint posterior distribution with 9176 parameters under our model. Our hierarchical model fits provide improved constraints on light curve parameters relevant to the physical properties of their progenitor stars relative to modeling individual light curves alone. Moreover, we directly evaluate the probability for occurrence rates of unseen light curve characteristics from the model hyperparameters, addressing observational biases in survey methodology. We view this modeling framework as an unsupervised machine learning technique with the ability to maximize scientific returns from data to be collected by future wide field transient searches like LSST
UNSUPERVISED TRANSIENT LIGHT CURVE ANALYSIS VIA HIERARCHICAL BAYESIAN INFERENCE

Energy Technology Data Exchange (ETDEWEB)

Sanders, N. E.; Soderberg, A. M. [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138 (United States); Betancourt, M., E-mail: nsanders@cfa.harvard.edu [Department of Statistics, University of Warwick, Coventry CV4 7AL (United Kingdom)

2015-02-10

Historically, light curve studies of supernovae (SNe) and other transient classes have focused on individual objects with copious and high signal-to-noise observations. In the nascent era of wide field transient searches, objects with detailed observations are decreasing as a fraction of the overall known SN population, and this strategy sacrifices the majority of the information contained in the data about the underlying population of transients. A population level modeling approach, simultaneously fitting all available observations of objects in a transient sub-class of interest, fully mines the data to infer the properties of the population and avoids certain systematic biases. We present a novel hierarchical Bayesian statistical model for population level modeling of transient light curves, and discuss its implementation using an efficient Hamiltonian Monte Carlo technique. As a test case, we apply this model to the Type IIP SN sample from the Pan-STARRS1 Medium Deep Survey, consisting of 18,837 photometric observations of 76 SNe, corresponding to a joint posterior distribution with 9176 parameters under our model. Our hierarchical model fits provide improved constraints on light curve parameters relevant to the physical properties of their progenitor stars relative to modeling individual light curves alone. Moreover, we directly evaluate the probability for occurrence rates of unseen light curve characteristics from the model hyperparameters, addressing observational biases in survey methodology. We view this modeling framework as an unsupervised machine learning technique with the ability to maximize scientific returns from data to be collected by future wide field transient searches like LSST.
Decoding Decoders: Finding Optimal Representation Spaces for Unsupervised Similarity Tasks

OpenAIRE

Zhelezniak, Vitalii; Busbridge, Dan; Shen, April; Smith, Samuel L.; Hammerla, Nils Y.

2018-01-01

Experimental evidence indicates that simple models outperform complex deep networks on many unsupervised similarity tasks. We provide a simple yet rigorous explanation for this behaviour by introducing the concept of an optimal representation space, in which semantically close symbols are mapped to representations that are close under a similarity measure induced by the model's objective function. In addition, we present a straightforward procedure that, without any retraining or architectura...
The Development Of Red Chili Agribusiness Cluster With Soft System Methodology (Ssm Approach In Garut, West Java

Directory of Open Access Journals (Sweden)

Sri Ayu Andayani

2016-12-01

Full Text Available Red chili is one of the commodities with high price fluctuation and gives influence to inflation. It happens due to the unsustainable supply of red chili from the central production centers to the market. Bank Indonesia (central bank initiates a cluster system to support price controlling and regional economic growth. In this regard, the study is conducted in Garut regency, which is one of the centers of red chili plantation in West Java and uses as cluster development, and yet there are still many obstacles along the way. This paper has the objective to describe the problem which causes unsustainable production and affects industrial supplies systemically and also to analyze the existing partnerships in order to maintain the continuity of supply as an alternative solution.This study was designed qualitatively with case study method through a system approach namely soft system methodology (SSM. The results shows that the problems in the cluster of red chili are ranging from production planning to the delay of sales payment process which systemically interlinked and the collaboration of executors that have not been optimally implemented. This study offers solution for those problems accordance with change formulation of SSM and industrial emphasis on fairness, transparency and integrated optimization with the principle of production sustainability from all stakeholders through participative collaboration to maintain continuity of production.

The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

Science.gov (United States)

Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

2017-07-01

Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.
Feasibility Study of Parallel Finite Element Analysis on Cluster-of-Clusters

Science.gov (United States)

Muraoka, Masae; Okuda, Hiroshi

With the rapid growth of WAN infrastructure and development of Grid middleware, it's become a realistic and attractive methodology to connect cluster machines on wide-area network for the execution of computation-demanding applications. Many existing parallel finite element (FE) applications have been, however, designed and developed with a single computing resource in mind, since such applications require frequent synchronization and communication among processes. There have been few FE applications that can exploit the distributed environment so far. In this study, we explore the feasibility of FE applications on the cluster-of-clusters. First, we classify FE applications into two types, tightly coupled applications (TCA) and loosely coupled applications (LCA) based on their communication pattern. A prototype of each application is implemented on the cluster-of-clusters. We perform numerical experiments executing TCA and LCA on both the cluster-of-clusters and a single cluster. Thorough these experiments, by comparing the performances and communication cost in each case, we evaluate the feasibility of FEA on the cluster-of-clusters.
Supervised and Unsupervised Self-Testing for HIV in High- and Low-Risk Populations: A Systematic Review

Science.gov (United States)

Pant Pai, Nitika; Sharma, Jigyasa; Shivkumar, Sushmita; Pillay, Sabrina; Vadnais, Caroline; Joseph, Lawrence; Dheda, Keertan; Peeling, Rosanna W.

2013-01-01

Background Stigma, discrimination, lack of privacy, and long waiting times partly explain why six out of ten individuals living with HIV do not access facility-based testing. By circumventing these barriers, self-testing offers potential for more people to know their sero-status. Recent approval of an in-home HIV self test in the US has sparked self-testing initiatives, yet data on acceptability, feasibility, and linkages to care are limited. We systematically reviewed evidence on supervised (self-testing and counselling aided by a health care professional) and unsupervised (performed by self-tester with access to phone/internet counselling) self-testing strategies. Methods and Findings Seven databases (Medline [via PubMed], Biosis, PsycINFO, Cinahl, African Medicus, LILACS, and EMBASE) and conference abstracts of six major HIV/sexually transmitted infections conferences were searched from 1st January 2000–30th October 2012. 1,221 citations were identified and 21 studies included for review. Seven studies evaluated an unsupervised strategy and 14 evaluated a supervised strategy. For both strategies, data on acceptability (range: 74%–96%), preference (range: 61%–91%), and partner self-testing (range: 80%–97%) were high. A high specificity (range: 99.8%–100%) was observed for both strategies, while a lower sensitivity was reported in the unsupervised (range: 92.9%–100%; one study) versus supervised (range: 97.4%–97.9%; three studies) strategy. Regarding feasibility of linkage to counselling and care, 96% (n = 102/106) of individuals testing positive for HIV stated they would seek post-test counselling (unsupervised strategy, one study). No extreme adverse events were noted. The majority of data (n = 11,019/12,402 individuals, 89%) were from high-income settings and 71% (n = 15/21) of studies were cross-sectional in design, thus limiting our analysis. Conclusions Both supervised and unsupervised testing strategies were highly acceptable
The Hubble Space Telescope Medium Deep Survey Cluster Sample: Methodology and Data

Science.gov (United States)

Ostrander, E. J.; Nichol, R. C.; Ratnatunga, K. U.; Griffiths, R. E.

1998-12-01

We present a new, objectively selected, sample of galaxy overdensities detected in the Hubble Space Telescope Medium Deep Survey (MDS). These clusters/groups were found using an automated procedure that involved searching for statistically significant galaxy overdensities. The contrast of the clusters against the field galaxy population is increased when morphological data are used to search around bulge-dominated galaxies. In total, we present 92 overdensities above a probability threshold of 99.5%. We show, via extensive Monte Carlo simulations, that at least 60% of these overdensities are likely to be real clusters and groups and not random line-of-sight superpositions of galaxies. For each overdensity in the MDS cluster sample, we provide a richness and the average of the bulge-to-total ratio of galaxies within each system. This MDS cluster sample potentially contains some of the most distant clusters/groups ever detected, with about 25% of the overdensities having estimated redshifts z > ~0.9. We have made this sample publicly available to facilitate spectroscopic confirmation of these clusters and help more detailed studies of cluster and galaxy evolution. We also report the serendipitous discovery of a new cluster close on the sky to the rich optical cluster Cl l0016+16 at z = 0.546. This new overdensity, HST 001831+16208, may be coincident with both an X-ray source and a radio source. HST 001831+16208 is the third cluster/group discovered near to Cl 0016+16 and appears to strengthen the claims of Connolly et al. of superclustering at high redshift.
Unsupervised sub-categorization for object detection: fInding cars from a driving vehicle

NARCIS (Netherlands)

Wijnhoven, R.G.J.; With, de P.H.N.

2011-01-01

We present a novel algorithm for unsupervised subcategorization of an object class, in the context of object detection. Dividing the detection problem into smaller subproblems simplifies the object vs. background classification. The algorithm uses an iterative split-and-merge procedure and uses both
Unsupervised learning via self-organization a dynamic approach

CERN Document Server

Kyan, Matthew; Jarrah, Kambiz; Guan, Ling

2014-01-01

To aid in intelligent data mining, this book introduces a new family of unsupervised algorithms that have a basis in self-organization, yet are free from many of the constraints typical of other well known self-organizing architectures. It then moves through a series of pertinent real world applications with regards to the processing of multimedia data from its role in generic image processing techniques such as the automated modeling and removal of impulse noise in digital images, to problems in digital asset management, and its various roles in feature extraction, visual enhancement, segmentation, and analysis of microbiological image data.
Evaluating unsupervised thesaurus-based labeling of audiovisual content in an archive production environment

NARCIS (Netherlands)

de Boer, V.; Ordelman, Roeland J.; Schuurman, Josefien

2016-01-01

In this paper we report on a two-stage evaluation of unsupervised labeling of audiovisual content using collateral text data sources to investigate how such an approach can provide acceptable results for given requirements with respect to archival quality, authority and service levels to external
Evaluating Unsupervised Thesaurus-based Labeling of Audiovisual Content in an Archive Production Environment

NARCIS (Netherlands)

de Boer, Victor; Ordelman, Roeland J.F.; Schuurman, Josefien

In this paper we report on a two-stage evaluation of unsupervised labeling of audiovisual content using collateral text data sources to investigate how such an approach can provide acceptable results for given requirements with respect to archival quality, authority and service levels to external
An intelligent clustering based methodology for confusable diseases ...

African Journals Online (AJOL)

Journal of Computer Science and Its Application ... In this paper, an intelligent system driven by fuzzy clustering algorithm and Adaptive Neuro-Fuzzy Inference System for ... Data on patients diagnosed and confirmed by laboratory tests of viral ...
Wavelet-based unsupervised learning method for electrocardiogram suppression in surface electromyograms.

Science.gov (United States)

Niegowski, Maciej; Zivanovic, Miroslav

2016-03-01

We present a novel approach aimed at removing electrocardiogram (ECG) perturbation from single-channel surface electromyogram (EMG) recordings by means of unsupervised learning of wavelet-based intensity images. The general idea is to combine the suitability of certain wavelet decomposition bases which provide sparse electrocardiogram time-frequency representations, with the capacity of non-negative matrix factorization (NMF) for extracting patterns from images. In order to overcome convergence problems which often arise in NMF-related applications, we design a novel robust initialization strategy which ensures proper signal decomposition in a wide range of ECG contamination levels. Moreover, the method can be readily used because no a priori knowledge or parameter adjustment is needed. The proposed method was evaluated on real surface EMG signals against two state-of-the-art unsupervised learning algorithms and a singular spectrum analysis based method. The results, expressed in terms of high-to-low energy ratio, normalized median frequency, spectral power difference and normalized average rectified value, suggest that the proposed method enables better ECG-EMG separation quality than the reference methods. Copyright © 2015 IPEM. Published by Elsevier Ltd. All rights reserved.
Individualized unsupervised exercise programs and chest physiotherapy in children with cystic fibrosis

Directory of Open Access Journals (Sweden)

Bogdan ALMĂJAN-GUȚĂ

2013-12-01

Full Text Available Traditionally, physiotherapy for cystic fibrosis focused mainly on airway clearance (clearing mucus from the lungs. This still makes up a large part of daily treatment, but the role of the physiotherapist in cystic fibrosis has expanded to include daily exercise, inhalation therapy, posture awareness and, for some, the management of urinary incontinence. The purpose of this study is to demonstrate the necessity and the efficiency of various methods of chest physiotherapy and individualized unsupervised exercise program, in the improvement of body composition and physical performance. This study included 12 children with cystic fibrosis, with ages between 8-13 years. Each subject was evaluated in terms of body composition, effort capacity and lower body muscular performance, at the beginning of the study and after 12 months.The intervention consisted in classic respiratory clearance and physiotherapy techniques (5 times a week and an individualized unsupervised exercise program (3 times a week. After 12 months we noticed a significant improvement of the measured parameters: body weight increased from 32.25±5.5 to 33.53±5.4 kg (p <0.001, skeletal muscle mass increased from a mean of 16.04±4.1 to 17.01±4.2 (p<0.001, the fitness score, increased from a mean of 71±3.8 points to73±3.8, (p<0.001 and power and force also registered positive evolutions (from 19.3±2.68 to 21.65±2.4 W/kg and respectively 19.68±2.689 to 20.81±2.98 N/kg.The association between physiotherapy procedures and an individualized (after a proper clinical assessment unsupervised exercise program, proved to be an effective, relatively simple and accessible (regardless of social class intervention.
Unsupervised behaviour-specific dictionary learning for abnormal event detection

DEFF Research Database (Denmark)

Ren, Huamin; Liu, Weifeng; Olsen, Søren Ingvor

2015-01-01

the training data is only a small proportion of the surveillance data. Therefore, we propose behavior-specific dictionaries (BSD) through unsupervised learning, pursuing atoms from the same type of behavior to represent one behavior dictionary. To further improve the dictionary by introducing information from...... potential infrequent normal patterns, we refine the dictionary by searching ‘missed atoms’ that have compact coefficients. Experimental results show that our BSD algorithm outperforms state-of-the-art dictionaries in abnormal event detection on the public UCSD dataset. Moreover, BSD has less false alarms...
The threshold bootstrap clustering: a new approach to find families or transmission clusters within molecular quasispecies.

Directory of Open Access Journals (Sweden)

Mattia C F Prosperi

2010-10-01

Full Text Available Phylogenetic methods produce hierarchies of molecular species, inferring knowledge about taxonomy and evolution. However, there is not yet a consensus methodology that provides a crisp partition of taxa, desirable when considering the problem of intra/inter-patient quasispecies classification or infection transmission event identification. We introduce the threshold bootstrap clustering (TBC, a new methodology for partitioning molecular sequences, that does not require a phylogenetic tree estimation.The TBC is an incremental partition algorithm, inspired by the stochastic Chinese restaurant process, and takes advantage of resampling techniques and models of sequence evolution. TBC uses as input a multiple alignment of molecular sequences and its output is a crisp partition of the taxa into an automatically determined number of clusters. By varying initial conditions, the algorithm can produce different partitions. We describe a procedure that selects a prime partition among a set of candidate ones and calculates a measure of cluster reliability. TBC was successfully tested for the identification of type-1 human immunodeficiency and hepatitis C virus subtypes, and compared with previously established methodologies. It was also evaluated in the problem of HIV-1 intra-patient quasispecies clustering, and for transmission cluster identification, using a set of sequences from patients with known transmission event histories.TBC has been shown to be effective for the subtyping of HIV and HCV, and for identifying intra-patient quasispecies. To some extent, the algorithm was able also to infer clusters corresponding to events of infection transmission. The computational complexity of TBC is quadratic in the number of taxa, lower than other established methods; in addition, TBC has been enhanced with a measure of cluster reliability. The TBC can be useful to characterise molecular quasipecies in a broad context.
The threshold bootstrap clustering: a new approach to find families or transmission clusters within molecular quasispecies.

Science.gov (United States)

Prosperi, Mattia C F; De Luca, Andrea; Di Giambenedetto, Simona; Bracciale, Laura; Fabbiani, Massimiliano; Cauda, Roberto; Salemi, Marco

2010-10-25

Phylogenetic methods produce hierarchies of molecular species, inferring knowledge about taxonomy and evolution. However, there is not yet a consensus methodology that provides a crisp partition of taxa, desirable when considering the problem of intra/inter-patient quasispecies classification or infection transmission event identification. We introduce the threshold bootstrap clustering (TBC), a new methodology for partitioning molecular sequences, that does not require a phylogenetic tree estimation. The TBC is an incremental partition algorithm, inspired by the stochastic Chinese restaurant process, and takes advantage of resampling techniques and models of sequence evolution. TBC uses as input a multiple alignment of molecular sequences and its output is a crisp partition of the taxa into an automatically determined number of clusters. By varying initial conditions, the algorithm can produce different partitions. We describe a procedure that selects a prime partition among a set of candidate ones and calculates a measure of cluster reliability. TBC was successfully tested for the identification of type-1 human immunodeficiency and hepatitis C virus subtypes, and compared with previously established methodologies. It was also evaluated in the problem of HIV-1 intra-patient quasispecies clustering, and for transmission cluster identification, using a set of sequences from patients with known transmission event histories. TBC has been shown to be effective for the subtyping of HIV and HCV, and for identifying intra-patient quasispecies. To some extent, the algorithm was able also to infer clusters corresponding to events of infection transmission. The computational complexity of TBC is quadratic in the number of taxa, lower than other established methods; in addition, TBC has been enhanced with a measure of cluster reliability. The TBC can be useful to characterise molecular quasipecies in a broad context.
How effective is the comprehensive approach to rehabilitation (CARe) methodology? A cluster randomized controlled trial.

Science.gov (United States)

Bitter, Neis; Roeg, Diana; van Assen, Marcel; van Nieuwenhuizen, Chijs; van Weeghel, Jaap

2017-12-11

The CARe methodology aims to improve the quality of life of people with severe mental illness by supporting them in realizing their goals, handling their vulnerability and improving the quality of their social environment. This study aims to investigate the effectiveness of the CARe methodology for people with severe mental illness on their quality of life, personal recovery, participation, hope, empowerment, self-efficacy beliefs and unmet needs. A cluster Randomized Controlled Trial (RCT) was conducted in 14 teams of three organizations for sheltered and supported housing in the Netherlands. Teams in the intervention group received training in the CARe methodology. Teams in the control group continued working according to care as usual. Questionnaires were filled out at baseline, after 10 months and after 20 months. A total of 263 clients participated in the study. Quality of life increased in both groups, however, no differences between the intervention and control group were found. Recovery and social functioning did not change over time. Regarding the secondary outcomes, the number of unmet needs decreased in both groups. All intervention teams received the complete training program. The model fidelity at T1 was 53.4% for the intervention group and 33.4% for the control group. At T2 this was 50.6% for the intervention group and 37.2% for the control group. All clients improved in quality of life. However we did not find significant differences between the clients of the both conditions on any outcome measure. Possible explanations of these results are: the difficulty to implement rehabilitation-supporting practice, the content of the methodology and the difficulty to improve the lives of a group of people with longstanding and severe impairments in a relatively short period. More research is needed on how to improve effects of rehabilitation trainings in practice and on outcome level. ISRCTN77355880 , retrospectively registered (05/07/2013).
Multiple-Features-Based Semisupervised Clustering DDoS Detection Method

Directory of Open Access Journals (Sweden)

Yonghao Gu

2017-01-01

Full Text Available DDoS attack stream from different agent host converged at victim host will become very large, which will lead to system halt or network congestion. Therefore, it is necessary to propose an effective method to detect the DDoS attack behavior from the massive data stream. In order to solve the problem that large numbers of labeled data are not provided in supervised learning method, and the relatively low detection accuracy and convergence speed of unsupervised k-means algorithm, this paper presents a semisupervised clustering detection method using multiple features. In this detection method, we firstly select three features according to the characteristics of DDoS attacks to form detection feature vector. Then, Multiple-Features-Based Constrained-K-Means (MF-CKM algorithm is proposed based on semisupervised clustering. Finally, using MIT Laboratory Scenario (DDoS 1.0 data set, we verify that the proposed method can improve the convergence speed and accuracy of the algorithm under the condition of using a small amount of labeled data sets.
An automatic taxonomy of galaxy morphology using unsupervised machine learning

Science.gov (United States)

Hocking, Alex; Geach, James E.; Sun, Yi; Davey, Neil

2018-01-01

We present an unsupervised machine learning technique that automatically segments and labels galaxies in astronomical imaging surveys using only pixel data. Distinct from previous unsupervised machine learning approaches used in astronomy we use no pre-selection or pre-filtering of target galaxy type to identify galaxies that are similar. We demonstrate the technique on the Hubble Space Telescope (HST) Frontier Fields. By training the algorithm using galaxies from one field (Abell 2744) and applying the result to another (MACS 0416.1-2403), we show how the algorithm can cleanly separate early and late type galaxies without any form of pre-directed training for what an 'early' or 'late' type galaxy is. We then apply the technique to the HST Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) fields, creating a catalogue of approximately 60 000 classifications. We show how the automatic classification groups galaxies of similar morphological (and photometric) type and make the classifications public via a catalogue, a visual catalogue and galaxy similarity search. We compare the CANDELS machine-based classifications to human-classifications from the Galaxy Zoo: CANDELS project. Although there is not a direct mapping between Galaxy Zoo and our hierarchical labelling, we demonstrate a good level of concordance between human and machine classifications. Finally, we show how the technique can be used to identify rarer objects and present lensed galaxy candidates from the CANDELS imaging.
A semi-supervised method to detect seismic random noise with fuzzy GK clustering

International Nuclear Information System (INIS)

Hashemi, Hosein; Javaherian, Abdolrahim; Babuska, Robert

2008-01-01

We present a new method to detect random noise in seismic data using fuzzy Gustafson–Kessel (GK) clustering. First, using an adaptive distance norm, a matrix is constructed from the observed seismic amplitudes. The next step is to find centres of ellipsoidal clusters and construct a partition matrix which determines the soft decision boundaries between seismic events and random noise. The GK algorithm updates the cluster centres in order to iteratively minimize the cluster variance. Multiplication of the fuzzy membership function with values of each sample yields new sections; we name them 'clustered sections'. The seismic amplitude values of the clustered sections are given in a way to decrease the level of noise in the original noisy seismic input. In pre-stack data, it is essential to study the clustered sections in a f–k domain; finding the quantitative index for weighting the post-stack data needs a similar approach. Using the knowledge of a human specialist together with the fuzzy unsupervised clustering, the method is a semi-supervised random noise detection. The efficiency of this method is investigated on synthetic and real seismic data for both pre- and post-stack data. The results show a significant improvement of the input noisy sections without harming the important amplitude and phase information of the original data. The procedure for finding the final weights of each clustered section should be carefully done in order to keep almost all the evident seismic amplitudes in the output section. The method interactively uses the knowledge of the seismic specialist in detecting the noise
Macroeconomic Dimensions in the Clusterization Processes: Lithuanian Biomass Cluster Case

Directory of Open Access Journals (Sweden)

Navickas Valentinas

2017-03-01

Full Text Available The Future production systems’ increasing significance will impose work, which maintains not a competitive, but a collaboration basis, with concentrated resources and expertise, which can help to reach the general purpose. One form of collaboration among medium-size business organizations is work in clusters. Clusterization as a phenomenon has been known from quite a long time, but it offers simple benefits to researches at micro and medium levels. The clusterization process evaluation in macroeconomic dimensions has been comparatively little investigated. Thereby, in this article, the clusterization processes is analysed by concentrating our attention on macroeconomic factor researches. The authors analyse clusterization’s influence on country’s macroeconomic growth; they apply a structure research methodology for clusterization’s macroeconomic influence evaluation and propose that clusterization processes benefit macroeconomic analysis. The theoretical model of clusterization processes was validated by referring to a biomass cluster case. Because biomass cluster case is a new phenomenon, currently there are no other scientific approaches to them. The authors’ accomplished researches show that clusterization allows the achievement of a large positive slip in macroeconomics, which proves to lead to a high value added to creation, a faster country economic growth, and social situation amelioration.
Cluster analysis in severe emphysema subjects using phenotype and genotype data: an exploratory investigation

Directory of Open Access Journals (Sweden)

Martinez Fernando J

2010-03-01

Full Text Available Abstract Background Numerous studies have demonstrated associations between genetic markers and COPD, but results have been inconsistent. One reason may be heterogeneity in disease definition. Unsupervised learning approaches may assist in understanding disease heterogeneity. Methods We selected 31 phenotypic variables and 12 SNPs from five candidate genes in 308 subjects in the National Emphysema Treatment Trial (NETT Genetics Ancillary Study cohort. We used factor analysis to select a subset of phenotypic variables, and then used cluster analysis to identify subtypes of severe emphysema. We examined the phenotypic and genotypic characteristics of each cluster. Results We identified six factors accounting for 75% of the shared variability among our initial phenotypic variables. We selected four phenotypic variables from these factors for cluster analysis: 1 post-bronchodilator FEV1 percent predicted, 2 percent bronchodilator responsiveness, and quantitative CT measurements of 3 apical emphysema and 4 airway wall thickness. K-means cluster analysis revealed four clusters, though separation between clusters was modest: 1 emphysema predominant, 2 bronchodilator responsive, with higher FEV1; 3 discordant, with a lower FEV1 despite less severe emphysema and lower airway wall thickness, and 4 airway predominant. Of the genotypes examined, membership in cluster 1 (emphysema-predominant was associated with TGFB1 SNP rs1800470. Conclusions Cluster analysis may identify meaningful disease subtypes and/or groups of related phenotypic variables even in a highly selected group of severe emphysema subjects, and may be useful for genetic association studies.

Classification of multispectral or hyperspectral satellite imagery using clustering of sparse approximations on sparse representations in learned dictionaries obtained using efficient convolutional sparse coding

Science.gov (United States)

Moody, Daniela; Wohlberg, Brendt

2018-01-02

An approach for land cover classification, seasonal and yearly change detection and monitoring, and identification of changes in man-made features may use a clustering of sparse approximations (CoSA) on sparse representations in learned dictionaries. The learned dictionaries may be derived using efficient convolutional sparse coding to build multispectral or hyperspectral, multiresolution dictionaries that are adapted to regional satellite image data. Sparse image representations of images over the learned dictionaries may be used to perform unsupervised k-means clustering into land cover categories. The clustering process behaves as a classifier in detecting real variability. This approach may combine spectral and spatial textural characteristics to detect geologic, vegetative, hydrologic, and man-made features, as well as changes in these features over time.
Down-Regulation of Olfactory Receptors in Response to Traumatic Brain Injury Promotes Risk for Alzheimers Disease

Science.gov (United States)

2015-12-01

group assignment of samples in unsupervised hierarchical clustering by the Unweighted Pair-Group Method using Arithmetic averages ( UPGMA ) based on...log2 transformed MAS5.0 signal values; probe set clustering was performed by the UPGMA method using Cosine correlation as the similarity met- ric. For...differentially-regulated genes identified were subjected to unsupervised hierarchical clustering analysis using the UPGMA algorithm with cosine correlation as
Down-Regulation of Olfactory Receptors in Response to Traumatic Brain Injury Promotes Risk for Alzheimer’s Disease

Science.gov (United States)

2013-10-01

correct group assignment of samples in unsupervised hierarchical clustering by the Unweighted Pair-Group Method using Arithmetic averages ( UPGMA ) based on...centering of log2 transformed MAS5.0 signal values; probe set clustering was performed by the UPGMA method using Cosine correlation as the similarity met...A) The 108 differentially-regulated genes identified were subjected to unsupervised hierarchical clustering analysis using the UPGMA algorithm with
Discriminative clustering on manifold for adaptive transductive classification.

Science.gov (United States)

Zhang, Zhao; Jia, Lei; Zhang, Min; Li, Bing; Zhang, Li; Li, Fanzhang

2017-10-01

In this paper, we mainly propose a novel adaptive transductive label propagation approach by joint discriminative clustering on manifolds for representing and classifying high-dimensional data. Our framework seamlessly combines the unsupervised manifold learning, discriminative clustering and adaptive classification into a unified model. Also, our method incorporates the adaptive graph weight construction with label propagation. Specifically, our method is capable of propagating label information using adaptive weights over low-dimensional manifold features, which is different from most existing studies that usually predict the labels and construct the weights in the original Euclidean space. For transductive classification by our formulation, we first perform the joint discriminative K-means clustering and manifold learning to capture the low-dimensional nonlinear manifolds. Then, we construct the adaptive weights over the learnt manifold features, where the adaptive weights are calculated through performing the joint minimization of the reconstruction errors over features and soft labels so that the graph weights can be joint-optimal for data representation and classification. Using the adaptive weights, we can easily estimate the unknown labels of samples. After that, our method returns the updated weights for further updating the manifold features. Extensive simulations on image classification and segmentation show that our proposed algorithm can deliver the state-of-the-art performance on several public datasets. Copyright © 2017 Elsevier Ltd. All rights reserved.
Tensor decomposition-based unsupervised feature extraction applied to matrix products for multi-view data processing

Science.gov (United States)

2017-01-01

In the current era of big data, the amount of data available is continuously increasing. Both the number and types of samples, or features, are on the rise. The mixing of distinct features often makes interpretation more difficult. However, separate analysis of individual types requires subsequent integration. A tensor is a useful framework to deal with distinct types of features in an integrated manner without mixing them. On the other hand, tensor data is not easy to obtain since it requires the measurements of huge numbers of combinations of distinct features; if there are m kinds of features, each of which has N dimensions, the number of measurements needed are as many as Nm, which is often too large to measure. In this paper, I propose a new method where a tensor is generated from individual features without combinatorial measurements, and the generated tensor was decomposed back to matrices, by which unsupervised feature extraction was performed. In order to demonstrate the usefulness of the proposed strategy, it was applied to synthetic data, as well as three omics datasets. It outperformed other matrix-based methodologies. PMID:28841719
Semi-supervised weighted kernel clustering based on gravitational search for fault diagnosis.

Science.gov (United States)

Li, Chaoshun; Zhou, Jianzhong

2014-09-01

Supervised learning method, like support vector machine (SVM), has been widely applied in diagnosing known faults, however this kind of method fails to work correctly when new or unknown fault occurs. Traditional unsupervised kernel clustering can be used for unknown fault diagnosis, but it could not make use of the historical classification information to improve diagnosis accuracy. In this paper, a semi-supervised kernel clustering model is designed to diagnose known and unknown faults. At first, a novel semi-supervised weighted kernel clustering algorithm based on gravitational search (SWKC-GS) is proposed for clustering of dataset composed of labeled and unlabeled fault samples. The clustering model of SWKC-GS is defined based on wrong classification rate of labeled samples and fuzzy clustering index on the whole dataset. Gravitational search algorithm (GSA) is used to solve the clustering model, while centers of clusters, feature weights and parameter of kernel function are selected as optimization variables. And then, new fault samples are identified and diagnosed by calculating the weighted kernel distance between them and the fault cluster centers. If the fault samples are unknown, they will be added in historical dataset and the SWKC-GS is used to partition the mixed dataset and update the clustering results for diagnosing new fault. In experiments, the proposed method has been applied in fault diagnosis for rotatory bearing, while SWKC-GS has been compared not only with traditional clustering methods, but also with SVM and neural network, for known fault diagnosis. In addition, the proposed method has also been applied in unknown fault diagnosis. The results have shown effectiveness of the proposed method in achieving expected diagnosis accuracy for both known and unknown faults of rotatory bearing. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
An Improved EMD-Based Dissimilarity Metric for Unsupervised Linear Subspace Learning

Directory of Open Access Journals (Sweden)

Xiangchun Yu

2018-01-01

Full Text Available We investigate a novel way of robust face image feature extraction by adopting the methods based on Unsupervised Linear Subspace Learning to extract a small number of good features. Firstly, the face image is divided into blocks with the specified size, and then we propose and extract pooled Histogram of Oriented Gradient (pHOG over each block. Secondly, an improved Earth Mover’s Distance (EMD metric is adopted to measure the dissimilarity between blocks of one face image and the corresponding blocks from the rest of face images. Thirdly, considering the limitations of the original Locality Preserving Projections (LPP, we proposed the Block Structure LPP (BSLPP, which effectively preserves the structural information of face images. Finally, an adjacency graph is constructed and a small number of good features of a face image are obtained by methods based on Unsupervised Linear Subspace Learning. A series of experiments have been conducted on several well-known face databases to evaluate the effectiveness of the proposed algorithm. In addition, we construct the noise, geometric distortion, slight translation, slight rotation AR, and Extended Yale B face databases, and we verify the robustness of the proposed algorithm when faced with a certain degree of these disturbances.
Practice-Oriented Evaluation of Unsupervised Labeling of Audiovisual Content in an Archive Production Environment

NARCIS (Netherlands)

de Boer, Victor; Ordelman, Roeland J.F.; Schuurman, Josefien

In this paper we report on an evaluation of unsupervised labeling of audiovisual content using collateral text data sources to investigate how such an approach can provide acceptable results given requirements with respect to archival quality, authority and service levels to external users. We
A method for unsupervised change detection and automatic radiometric normalization in multispectral data

DEFF Research Database (Denmark)

Nielsen, Allan Aasbjerg; Canty, Morton John

2011-01-01

Based on canonical correlation analysis the iteratively re-weighted multivariate alteration detection (MAD) method is used to successfully perform unsupervised change detection in bi-temporal Landsat ETM+ images covering an area with villages, woods, agricultural fields and open pit mines in North...... to carry out the analyses is available from the authors' websites....
Modeling Language and Cognition with Deep Unsupervised Learning:A Tutorial Overview

OpenAIRE

Marco eZorzi; Marco eZorzi; Alberto eTestolin; Ivilin Peev Stoianov; Ivilin Peev Stoianov

2013-01-01

Deep unsupervised learning in stochastic recurrent neural networks with many layers of hidden units is a recent breakthrough in neural computation research. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. In this article we discuss the theoretical foundations of this approach and we review key issues related to training, testing and analysis of deep networks for modeling language and cog...
Modeling language and cognition with deep unsupervised learning: a tutorial overview

OpenAIRE

Zorzi, Marco; Testolin, Alberto; Stoianov, Ivilin P.

2013-01-01

Deep unsupervised learning in stochastic recurrent neural networks with many layers of hidden units is a recent breakthrough in neural computation research. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. In this article we discuss the theoretical foundations of this approach and we review key issues related to training, testing and analysis of deep networks for modeling language and cog...
Unsupervised Scalable Statistical Method for Identifying Influential Users in Online Social Networks.

Science.gov (United States)

Azcorra, A; Chiroque, L F; Cuevas, R; Fernández Anta, A; Laniado, H; Lillo, R E; Romo, J; Sguera, C

2018-05-03

Billions of users interact intensively every day via Online Social Networks (OSNs) such as Facebook, Twitter, or Google+. This makes OSNs an invaluable source of information, and channel of actuation, for sectors like advertising, marketing, or politics. To get the most of OSNs, analysts need to identify influential users that can be leveraged for promoting products, distributing messages, or improving the image of companies. In this report we propose a new unsupervised method, Massive Unsupervised Outlier Detection (MUOD), based on outliers detection, for providing support in the identification of influential users. MUOD is scalable, and can hence be used in large OSNs. Moreover, it labels the outliers as of shape, magnitude, or amplitude, depending of their features. This allows classifying the outlier users in multiple different classes, which are likely to include different types of influential users. Applying MUOD to a subset of roughly 400 million Google+ users, it has allowed identifying and discriminating automatically sets of outlier users, which present features associated to different definitions of influential users, like capacity to attract engagement, capacity to attract a large number of followers, or high infection capacity.
Structure-related clustering of gene expression fingerprints of thp-1 cells exposed to smaller polycyclic aromatic hydrocarbons.

Science.gov (United States)

Wan, B; Yarbrough, J W; Schultz, T W

2008-01-01

This study was undertaken to test the hypothesis that structurally similar PAHs induce similar gene expression profiles. THP-1 cells were exposed to a series of 12 selected PAHs at 50 microM for 24 hours and gene expressions profiles were analyzed using both unsupervised and supervised methods. Clustering analysis of gene expression profiles revealed that the 12 tested chemicals were grouped into five clusters. Within each cluster, the gene expression profiles are more similar to each other than to the ones outside the cluster. One-methylanthracene and 1-methylfluorene were found to have the most similar profiles; dibenzothiophene and dibenzofuran were found to share common profiles with fluorine. As expression pattern comparisons were expanded, similarity in genomic fingerprint dropped off dramatically. Prediction analysis of microarrays (PAM) based on the clustering pattern generated 49 predictor genes that can be used for sample discrimination. Moreover, a significant analysis of Microarrays (SAM) identified 598 genes being modulated by tested chemicals with a variety of biological processes, such as cell cycle, metabolism, and protein binding and KEGG pathways being significantly (p < 0.05) affected. It is feasible to distinguish structurally different PAHs based on their genomic fingerprints, which are mechanism based.
Metabolic Heterogeneity Evidenced by MRS among Patient-Derived Glioblastoma Multiforme Stem-Like Cells Accounts for Cell Clustering and Different Responses to Drugs

Directory of Open Access Journals (Sweden)

Sveva Grande

2018-01-01

Full Text Available Clustering of patient-derived glioma stem-like cells (GSCs through unsupervised analysis of metabolites detected by magnetic resonance spectroscopy (MRS evidenced three subgroups, namely clusters 1a and 1b, with high intergroup similarity and neural fingerprints, and cluster 2, with a metabolism typical of commercial tumor lines. In addition, subclones generated by the same GSC line showed different metabolic phenotypes. Aerobic glycolysis prevailed in cluster 2 cells as demonstrated by higher lactate production compared to cluster 1 cells. Oligomycin, a mitochondrial ATPase inhibitor, induced high lactate extrusion only in cluster 1 cells, where it produced neutral lipid accumulation detected as mobile lipid signals by MRS and lipid droplets by confocal microscopy. These results indicate a relevant role of mitochondrial fatty acid oxidation for energy production in GSCs. On the other hand, further metabolic differences, likely accounting for different therapy responsiveness observed after etomoxir treatment, suggest that caution must be used in considering patient treatment with mitochondria FAO blockers. Metabolomics and metabolic profiling may contribute to discover new diagnostic or prognostic biomarkers to be used for personalized therapies.
A critical cluster analysis of 44 indicators of author-level performance

DEFF Research Database (Denmark)

Wildgaard, Lorna Elizabeth

2016-01-01

-four indicators of individual researcher performance were computed using the data. The clustering solution was supported by continued reference to the researcher’s curriculum vitae, an effect analysis and a risk analysis. Disciplinary appropriate indicators were identified and used to divide the researchers......This paper explores a 7-stage cluster methodology as a process to identify appropriate indicators for evaluation of individual researchers at a disciplinary and seniority level. Publication and citation data for 741 researchers from 4 disciplines was collected in Web of Science. Forty...... of statistics in research evaluation. The strength of the 7-stage cluster methodology is that it makes clear that in the evaluation of individual researchers, statistics cannot stand alone. The methodology is reliant on contextual information to verify the bibliometric values and cluster solution...
Supervised and Unsupervised Aspect Category Detection for Sentiment Analysis with Co-occurrence Data.

Science.gov (United States)

Schouten, Kim; van der Weijde, Onne; Frasincar, Flavius; Dekker, Rommert

2018-04-01

Using online consumer reviews as electronic word of mouth to assist purchase-decision making has become increasingly popular. The Web provides an extensive source of consumer reviews, but one can hardly read all reviews to obtain a fair evaluation of a product or service. A text processing framework that can summarize reviews, would therefore be desirable. A subtask to be performed by such a framework would be to find the general aspect categories addressed in review sentences, for which this paper presents two methods. In contrast to most existing approaches, the first method presented is an unsupervised method that applies association rule mining on co-occurrence frequency data obtained from a corpus to find these aspect categories. While not on par with state-of-the-art supervised methods, the proposed unsupervised method performs better than several simple baselines, a similar but supervised method, and a supervised baseline, with an -score of 67%. The second method is a supervised variant that outperforms existing methods with an -score of 84%.
Content-Based High-Resolution Remote Sensing Image Retrieval via Unsupervised Feature Learning and Collaborative Affinity Metric Fusion

Directory of Open Access Journals (Sweden)

Yansheng Li

2016-08-01

Full Text Available With the urgent demand for automatic management of large numbers of high-resolution remote sensing images, content-based high-resolution remote sensing image retrieval (CB-HRRS-IR has attracted much research interest. Accordingly, this paper proposes a novel high-resolution remote sensing image retrieval approach via multiple feature representation and collaborative affinity metric fusion (IRMFRCAMF. In IRMFRCAMF, we design four unsupervised convolutional neural networks with different layers to generate four types of unsupervised features from the fine level to the coarse level. In addition to these four types of unsupervised features, we also implement four traditional feature descriptors, including local binary pattern (LBP, gray level co-occurrence (GLCM, maximal response 8 (MR8, and scale-invariant feature transform (SIFT. In order to fully incorporate the complementary information among multiple features of one image and the mutual information across auxiliary images in the image dataset, this paper advocates collaborative affinity metric fusion to measure the similarity between images. The performance evaluation of high-resolution remote sensing image retrieval is implemented on two public datasets, the UC Merced (UCM dataset and the Wuhan University (WH dataset. Large numbers of experiments show that our proposed IRMFRCAMF can significantly outperform the state-of-the-art approaches.
Learning representation hierarchies by sharing visual features: a computational investigation of Persian character recognition with unsupervised deep learning.

Science.gov (United States)

Sadeghi, Zahra; Testolin, Alberto

2017-08-01

In humans, efficient recognition of written symbols is thought to rely on a hierarchical processing system, where simple features are progressively combined into more abstract, high-level representations. Here, we present a computational model of Persian character recognition based on deep belief networks, where increasingly more complex visual features emerge in a completely unsupervised manner by fitting a hierarchical generative model to the sensory data. Crucially, high-level internal representations emerging from unsupervised deep learning can be easily read out by a linear classifier, achieving state-of-the-art recognition accuracy. Furthermore, we tested the hypothesis that handwritten digits and letters share many common visual features: A generative model that captures the statistical structure of the letters distribution should therefore also support the recognition of written digits. To this aim, deep networks trained on Persian letters were used to build high-level representations of Persian digits, which were indeed read out with high accuracy. Our simulations show that complex visual features, such as those mediating the identification of Persian symbols, can emerge from unsupervised learning in multilayered neural networks and can support knowledge transfer across related domains.
AN EFFICIENT INITIALIZATION METHOD FOR K-MEANS CLUSTERING OF HYPERSPECTRAL DATA

Directory of Open Access Journals (Sweden)

A. Alizade Naeini

2014-10-01

Full Text Available K-means is definitely the most frequently used partitional clustering algorithm in the remote sensing community. Unfortunately due to its gradient decent nature, this algorithm is highly sensitive to the initial placement of cluster centers. This problem deteriorates for the high-dimensional data such as hyperspectral remotely sensed imagery. To tackle this problem, in this paper, the spectral signatures of the endmembers in the image scene are extracted and used as the initial positions of the cluster centers. For this purpose, in the first step, A Neyman–Pearson detection theory based eigen-thresholding method (i.e., the HFC method has been employed to estimate the number of endmembers in the image. Afterwards, the spectral signatures of the endmembers are obtained using the Minimum Volume Enclosing Simplex (MVES algorithm. Eventually, these spectral signatures are used to initialize the k-means clustering algorithm. The proposed method is implemented on a hyperspectral dataset acquired by ROSIS sensor with 103 spectral bands over the Pavia University campus, Italy. For comparative evaluation, two other commonly used initialization methods (i.e., Bradley & Fayyad (BF and Random methods are implemented and compared. The confusion matrix, overall accuracy and Kappa coefficient are employed to assess the methods’ performance. The evaluations demonstrate that the proposed solution outperforms the other initialization methods and can be applied for unsupervised classification of hyperspectral imagery for landcover mapping.
Unsupervised ensemble ranking of terms in electronic health record notes based on their importance to patients.

Science.gov (United States)

Chen, Jinying; Yu, Hong

2017-04-01

Allowing patients to access their own electronic health record (EHR) notes through online patient portals has the potential to improve patient-centered care. However, EHR notes contain abundant medical jargon that can be difficult for patients to comprehend. One way to help patients is to reduce information overload and help them focus on medical terms that matter most to them. Targeted education can then be developed to improve patient EHR comprehension and the quality of care. The aim of this work was to develop FIT (Finding Important Terms for patients), an unsupervised natural language processing (NLP) system that ranks medical terms in EHR notes based on their importance to patients. We built FIT on a new unsupervised ensemble ranking model derived from the biased random walk algorithm to combine heterogeneous information resources for ranking candidate terms from each EHR note. Specifically, FIT integrates four single views (rankers) for term importance: patient use of medical concepts, document-level term salience, word co-occurrence based term relatedness, and topic coherence. It also incorporates partial information of term importance as conveyed by terms' unfamiliarity levels and semantic types. We evaluated FIT on 90 expert-annotated EHR notes and used the four single-view rankers as baselines. In addition, we implemented three benchmark unsupervised ensemble ranking methods as strong baselines. FIT achieved 0.885 AUC-ROC for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FIT for identifying important terms from EHR notes was 0.813 AUC-ROC. Both performance scores significantly exceeded the corresponding scores from the four single rankers (P<0.001). FIT also outperformed the three ensemble rankers for most metrics. Its performance is relatively insensitive to its parameter. FIT can automatically identify EHR terms important to patients. It may help develop future interventions

Prediction of rat behavior outcomes in memory tasks using functional connections among neurons.

Directory of Open Access Journals (Sweden)

Hu Lu

Full Text Available BACKGROUND: Analyzing the neuronal organizational structures and studying the changes in the behavior of the organism is key to understanding cognitive functions of the brain. Although some studies have indicated that spatiotemporal firing patterns of neuronal populations have a certain relationship with the behavioral responses, the issues of whether there are any relationships between the functional networks comprised of these cortical neurons and behavioral tasks and whether it is possible to take advantage of these networks to predict correct and incorrect outcomes of single trials of animals are still unresolved. METHODOLOGY/PRINCIPAL FINDINGS: This paper presents a new method of analyzing the structures of whole-recorded neuronal functional networks (WNFNs and local neuronal circuit groups (LNCGs. The activity of these neurons was recorded in several rats. The rats performed two different behavioral tasks, the Y-maze task and the U-maze task. Using the results of the assessment of the WNFNs and LNCGs, this paper describes a realization procedure for predicting the behavioral outcomes of single trials. The methodology consists of four main parts: construction of WNFNs from recorded neuronal spike trains, partitioning the WNFNs into the optimal LNCGs using social community analysis, unsupervised clustering of all trials from each dataset into two different clusters, and predicting the behavioral outcomes of single trials. The results show that WNFNs and LNCGs correlate with the behavior of the animal. The U-maze datasets show higher accuracy for unsupervised clustering results than those from the Y-maze task, and these datasets can be used to predict behavioral responses effectively. CONCLUSIONS/SIGNIFICANCE: The results of the present study suggest that a methodology proposed in this paper is suitable for analysis of the characteristics of neuronal functional networks and the prediction of rat behavior. These types of structures in cortical
Conjunctive Conceptual Clustering: A Methodology and Experimentation.

Science.gov (United States)

1987-09-01

observing a typical restaurant table on vhich there are such objects as food on a plate, a salad, utensils, salt and pepper, napkins , a ase with flowers, a...colored graph has nodes and inks that match only if they have corre-ponding link-olor and node-color labelg 4w 80 [SEtexture sa lif ba p S M i e d If...LINK LINK LINK LINK LINK 9 0 1 OPENdRECT RECTLOD 1 2 CL 10 0 0 LINK LINK INK LINK LINK ,~ . 0.5. Input file for attribute-based clustering The
Node clustering for wireless sensor networks

International Nuclear Information System (INIS)

Bhatti, S.; Qureshi, I.A.; Memon, S.

2012-01-01

Recent years have witnessed considerable growth in the development and deployment of clustering methods which are not only used to maintain network resources but also increases the reliability of the WSNs (Wireless Sensor Network) and the facts manifest by the wide range of clustering solutions. Node clustering by selecting key parameters to tackle the dynamic behaviour of resource constraint WSN is a challenging issue. This paper highlights the recent progress which has been carried out pertaining to the development of clustering solutions for the WSNs. The paper presents classification of node clustering methods and their comparison based on the objectives, clustering criteria and methodology. In addition, the potential open issues which need to be considered for future work are high lighted. Keywords: Clustering, Sensor Network, Static, Dynamic
Information and methodological support of the cluster of children wellness recreation

Directory of Open Access Journals (Sweden)

Ivanova Svetlana

2016-02-01

Full Text Available The article considers the instrument of transfer of pedagogical experience of the staff at children's health camps, which can be used as one of the components of the cluster to ensure the functioning of the health-educational system children's health camps. The transformation of the notion "cluster" from economic category in social and pedagogical is shown
Automated assessment and tracking of human body thermal variations using unsupervised clustering.

Science.gov (United States)

Yousefi, Bardia; Fleuret, Julien; Zhang, Hai; Maldague, Xavier P V; Watt, Raymond; Klein, Matthieu

2016-12-01

The presented approach addresses a review of the overheating that occurs during radiological examinations, such as magnetic resonance imaging, and a series of thermal experiments to determine a thermally suitable fabric material that should be used for radiological gowns. Moreover, an automatic system for detecting and tracking of the thermal fluctuation is presented. It applies hue-saturated-value-based kernelled k-means clustering, which initializes and controls the points that lie on the region-of-interest (ROI) boundary. Afterward, a particle filter tracks the targeted ROI during the video sequence independently of previous locations of overheating spots. The proposed approach was tested during experiments and under conditions very similar to those used during real radiology exams. Six subjects have voluntarily participated in these experiments. To simulate the hot spots occurring during radiology, a controllable heat source was utilized near the subject's body. The results indicate promising accuracy for the proposed approach to track hot spots. Some approximations were used regarding the transmittance of the atmosphere, and emissivity of the fabric could be neglected because of the independence of the proposed approach for these parameters. The approach can track the heating spots continuously and correctly, even for moving subjects, and provides considerable robustness against motion artifact, which occurs during most medical radiology procedures.
Unsupervised heart-rate estimation in wearables with Liquid states and a probabilistic readout.

Science.gov (United States)

Das, Anup; Pradhapan, Paruthi; Groenendaal, Willemijn; Adiraju, Prathyusha; Rajan, Raj Thilak; Catthoor, Francky; Schaafsma, Siebren; Krichmar, Jeffrey L; Dutt, Nikil; Van Hoof, Chris

2018-03-01

Heart-rate estimation is a fundamental feature of modern wearable devices. In this paper we propose a machine learning technique to estimate heart-rate from electrocardiogram (ECG) data collected using wearable devices. The novelty of our approach lies in (1) encoding spatio-temporal properties of ECG signals directly into spike train and using this to excite recurrently connected spiking neurons in a Liquid State Machine computation model; (2) a novel learning algorithm; and (3) an intelligently designed unsupervised readout based on Fuzzy c-Means clustering of spike responses from a subset of neurons (Liquid states), selected using particle swarm optimization. Our approach differs from existing works by learning directly from ECG signals (allowing personalization), without requiring costly data annotations. Additionally, our approach can be easily implemented on state-of-the-art spiking-based neuromorphic systems, offering high accuracy, yet significantly low energy footprint, leading to an extended battery-life of wearable devices. We validated our approach with CARLsim, a GPU accelerated spiking neural network simulator modeling Izhikevich spiking neurons with Spike Timing Dependent Plasticity (STDP) and homeostatic scaling. A range of subjects is considered from in-house clinical trials and public ECG databases. Results show high accuracy and low energy footprint in heart-rate estimation across subjects with and without cardiac irregularities, signifying the strong potential of this approach to be integrated in future wearable devices. Copyright © 2018 Elsevier Ltd. All rights reserved.
A Dimensionally Reduced Clustering Methodology for Heterogeneous Occupational Medicine Data Mining.

Science.gov (United States)

Saâdaoui, Foued; Bertrand, Pierre R; Boudet, Gil; Rouffiac, Karine; Dutheil, Frédéric; Chamoux, Alain

2015-10-01

Clustering is a set of techniques of the statistical learning aimed at finding structures of heterogeneous partitions grouping homogenous data called clusters. There are several fields in which clustering was successfully applied, such as medicine, biology, finance, economics, etc. In this paper, we introduce the notion of clustering in multifactorial data analysis problems. A case study is conducted for an occupational medicine problem with the purpose of analyzing patterns in a population of 813 individuals. To reduce the data set dimensionality, we base our approach on the Principal Component Analysis (PCA), which is the statistical tool most commonly used in factorial analysis. However, the problems in nature, especially in medicine, are often based on heterogeneous-type qualitative-quantitative measurements, whereas PCA only processes quantitative ones. Besides, qualitative data are originally unobservable quantitative responses that are usually binary-coded. Hence, we propose a new set of strategies allowing to simultaneously handle quantitative and qualitative data. The principle of this approach is to perform a projection of the qualitative variables on the subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the resulting PCA-regressed subspaces.
Model–Free Visualization of Suspicious Lesions in Breast MRI Based on Supervised and Unsupervised Learning

NARCIS (Netherlands)

Twellmann, T.; Meyer-Bäse, A.; Lange, O.; Foo, S.; Nattkemper, T.W.

2008-01-01

Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) has become an important tool in breast cancer diagnosis, but evaluation of multitemporal 3D image data holds new challenges for human observers. To aid the image analysis process, we apply supervised and unsupervised pattern recognition
Software usage in unsupervised digital doorway computing environments in disadvantaged South African communities: Focusing on youthful users

CSIR Research Space (South Africa)

Gush, K

2011-01-01

Full Text Available Digital Doorways provide computing infrastructure in low-income communities in South Africa. The unsupervised DD terminals offer various software applications, from entertainment through educational resources to research material, encouraging...
Current trends in Bayesian methodology with applications

CERN Document Server

Upadhyay, Satyanshu K; Dey, Dipak K; Loganathan, Appaia

2015-01-01

Collecting Bayesian material scattered throughout the literature, Current Trends in Bayesian Methodology with Applications examines the latest methodological and applied aspects of Bayesian statistics. The book covers biostatistics, econometrics, reliability and risk analysis, spatial statistics, image analysis, shape analysis, Bayesian computation, clustering, uncertainty assessment, high-energy astrophysics, neural networking, fuzzy information, objective Bayesian methodologies, empirical Bayes methods, small area estimation, and many more topics.Each chapter is self-contained and focuses on
Unsupervised Tensor Mining for Big Data Practitioners.

Science.gov (United States)

Papalexakis, Evangelos E; Faloutsos, Christos

2016-09-01

Multiaspect data are ubiquitous in modern Big Data applications. For instance, different aspects of a social network are the different types of communication between people, the time stamp of each interaction, and the location associated to each individual. How can we jointly model all those aspects and leverage the additional information that they introduce to our analysis? Tensors, which are multidimensional extensions of matrices, are a principled and mathematically sound way of modeling such multiaspect data. In this article, our goal is to popularize tensors and tensor decompositions to Big Data practitioners by demonstrating their effectiveness, outlining challenges that pertain to their application in Big Data scenarios, and presenting our recent work that tackles those challenges. We view this work as a step toward a fully automated, unsupervised tensor mining tool that can be easily and broadly adopted by practitioners in academia and industry.
Unsupervised Learning of Word-Sequence Representations from Scratch via Convolutional Tensor Decomposition

OpenAIRE

Huang, Furong; Anandkumar, Animashree

2016-01-01

Unsupervised text embeddings extraction is crucial for text understanding in machine learning. Word2Vec and its variants have received substantial success in mapping words with similar syntactic or semantic meaning to vectors close to each other. However, extracting context-aware word-sequence embedding remains a challenging task. Training over large corpus is difficult as labels are difficult to get. More importantly, it is challenging for pre-trained models to obtain word-...
Unsupervised progressive elastic band exercises for frail geriatric inpatients objectively monitored by new exercise-integrated technology-a feasibility trial with an embedded qualitative study.

Science.gov (United States)

Rathleff, C R; Bandholm, T; Spaich, E G; Jorgensen, M; Andreasen, J

2017-01-01

Frailty is a serious condition frequently present in geriatric inpatients that potentially causes serious adverse events. Strength training is acknowledged as a means of preventing or delaying frailty and loss of function in these patients. However, limited hospital resources challenge the amount of supervised training, and unsupervised training could possibly supplement supervised training thereby increasing the total exercise dose during admission. A new valid and reliable technology, the BandCizer, objectively measures the exact training dosage performed. The purpose was to investigate feasibility and acceptability of an unsupervised progressive strength training intervention monitored by BandCizer for frail geriatric inpatients. This feasibility trial included 15 frail inpatients at a geriatric ward. At hospitalization, the patients were prescribed two elastic band exercises to be performed unsupervised once daily. A BandCizer Datalogger enabling measurement of the number of sets, repetitions, and time-under-tension was attached to the elastic band. The patients were instructed in performing strength training: 3 sets of 10 repetitions (10-12 repetition maximum (RM)) with a separation of 2-min pauses and a time-under-tension of 8 s. The feasibility criterion for the unsupervised progressive exercises was that 33% of the recommended number of sets would be performed by at least 30% of patients. In addition, patients and staff were interviewed about their experiences with the intervention. Four (27%) out of 15 patients completed 33% of the recommended number of sets. For the total sample, the average percent of performed sets was 23% and for those who actually trained ( n = 12) 26%. Patients and staff expressed a general positive attitude towards the unsupervised training as an addition to the supervised training sessions. However, barriers were also described-especially constant interruptions. Based on the predefined criterion for feasibility, the
Designing ordering and inventory management methodologies for purchased parts

NARCIS (Netherlands)

de Boer, L.; Looman, Arnold; Ruffini, F.A.J.

2002-01-01

This article presents a method for redesigning the ordering and inventory management methodologies for purchased parts in a manufacturing firm. The method takes the perspective of the purchasing and logistics manager, defines clusters of purchased items, and subsequently assigns each cluster to a
Why so GLUMM? Detecting depression clusters through graphing lifestyle-environs using machine-learning methods (GLUMM).

Science.gov (United States)

Dipnall, J F; Pasco, J A; Berk, M; Williams, L J; Dodd, S; Jacka, F N; Meyer, D

2017-01-01

Key lifestyle-environ risk factors are operative for depression, but it is unclear how risk factors cluster. Machine-learning (ML) algorithms exist that learn, extract, identify and map underlying patterns to identify groupings of depressed individuals without constraints. The aim of this research was to use a large epidemiological study to identify and characterise depression clusters through "Graphing lifestyle-environs using machine-learning methods" (GLUMM). Two ML algorithms were implemented: unsupervised Self-organised mapping (SOM) to create GLUMM clusters and a supervised boosted regression algorithm to describe clusters. Ninety-six "lifestyle-environ" variables were used from the National health and nutrition examination study (2009-2010). Multivariate logistic regression validated clusters and controlled for possible sociodemographic confounders. The SOM identified two GLUMM cluster solutions. These solutions contained one dominant depressed cluster (GLUMM5-1, GLUMM7-1). Equal proportions of members in each cluster rated as highly depressed (17%). Alcohol consumption and demographics validated clusters. Boosted regression identified GLUMM5-1 as more informative than GLUMM7-1. Members were more likely to: have problems sleeping; unhealthy eating; ≤2 years in their home; an old home; perceive themselves underweight; exposed to work fumes; experienced sex at ≤14 years; not perform moderate recreational activities. A positive relationship between GLUMM5-1 (OR: 7.50, Pdepression was found, with significant interactions with those married/living with partner (P=0.001). Using ML based GLUMM to form ordered depressive clusters from multitudinous lifestyle-environ variables enabled a deeper exploration of the heterogeneous data to uncover better understandings into relationships between the complex mental health factors. Copyright © 2016 Elsevier Masson SAS. All rights reserved.
The impact of initialization procedures on unsupervised unmixing of hyperspectral imagery using the constrained positive matrix factorization

Science.gov (United States)

Masalmah, Yahya M.; Vélez-Reyes, Miguel

2007-04-01

The authors proposed in previous papers the use of the constrained Positive Matrix Factorization (cPMF) to perform unsupervised unmixing of hyperspectral imagery. Two iterative algorithms were proposed to compute the cPMF based on the Gauss-Seidel and penalty approaches to solve optimization problems. Results presented in previous papers have shown the potential of the proposed method to perform unsupervised unmixing in HYPERION and AVIRIS imagery. The performance of iterative methods is highly dependent on the initialization scheme. Good initialization schemes can improve convergence speed, whether or not a global minimum is found, and whether or not spectra with physical relevance are retrieved as endmembers. In this paper, different initializations using random selection, longest norm pixels, and standard endmembers selection routines are studied and compared using simulated and real data.
A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning

DEFF Research Database (Denmark)

Fraccaro, Marco; Kamronn, Simon Due; Paquet, Ulrich

2017-01-01

This paper takes a step towards temporal reasoning in a dynamically changing video, not in the pixel space that constitutes its frames, but in a latent space that describes the non-linear dynamics of the objects in its world. We introduce the Kalman variational auto-encoder, a framework...... for unsupervised learning of sequential data that disentangles two latent representations: an object’s representation, coming from a recognition model, and a latent state describing its dynamics. As a result, the evolution of the world can be imagined and missing data imputed, both without the need to generate...
Mathematical classification and clustering

CERN Document Server

Mirkin, Boris

1996-01-01

I am very happy to have this opportunity to present the work of Boris Mirkin, a distinguished Russian scholar in the areas of data analysis and decision making methodologies. The monograph is devoted entirely to clustering, a discipline dispersed through many theoretical and application areas, from mathematical statistics and combina torial optimization to biology, sociology and organizational structures. It compiles an immense amount of research done to date, including many original Russian de velopments never presented to the international community before (for instance, cluster-by-cluster versions of the K-Means method in Chapter 4 or uniform par titioning in Chapter 5). The author's approach, approximation clustering, allows him both to systematize a great part of the discipline and to develop many in novative methods in the framework of optimization problems. The optimization methods considered are proved to be meaningful in the contexts of data analysis and clustering. The material presented in ...
Clustering analysis of line indices for LAMOST spectra with AstroStat

Science.gov (United States)

Chen, Shu-Xin; Sun, Wei-Min; Yan, Qi

2018-06-01

The application of data mining in astronomical surveys, such as the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) survey, provides an effective approach to automatically analyze a large amount of complex survey data. Unsupervised clustering could help astronomers find the associations and outliers in a big data set. In this paper, we employ the k-means method to perform clustering for the line index of LAMOST spectra with the powerful software AstroStat. Implementing the line index approach for analyzing astronomical spectra is an effective way to extract spectral features for low resolution spectra, which can represent the main spectral characteristics of stars. A total of 144 340 line indices for A type stars is analyzed through calculating their intra and inter distances between pairs of stars. For intra distance, we use the definition of Mahalanobis distance to explore the degree of clustering for each class, while for outlier detection, we define a local outlier factor for each spectrum. AstroStat furnishes a set of visualization tools for illustrating the analysis results. Checking the spectra detected as outliers, we find that most of them are problematic data and only a few correspond to rare astronomical objects. We show two examples of these outliers, a spectrum with abnormal continuumand a spectrum with emission lines. Our work demonstrates that line index clustering is a good method for examining data quality and identifying rare objects.
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.

Science.gov (United States)

Bourobou, Serge Thomas Mickala; Yoo, Younghwan

2015-05-21

This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.

User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm

Directory of Open Access Journals (Sweden)

Serge Thomas Mickala Bourobou

2015-05-01

Full Text Available This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen’s temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.
The effects of an unsupervised water exercise program on low back pain and sick leave among healthy pregnant women - A randomised controlled trial

DEFF Research Database (Denmark)

Backhausen, Mette G; Tabor, Ann; Albert, Hanne

2017-01-01

BACKGROUND: Low back pain is highly prevalent among pregnant women, but evidence of an effective treatment are still lacking. Supervised exercise-either land or water based-has shown benefits for low back pain, but no trial has investigated the evidence of an unsupervised water exercise program...... on low back pain. We aimed to assess the effect of an unsupervised water exercise program on low back pain intensity and days spent on sick leave among healthy pregnant women. METHODS: In this randomised, controlled, parallel-group trial, 516 healthy pregnant women were randomly assigned to either...... unsupervised water exercise twice a week for a period of 12 weeks or standard prenatal care. Healthy pregnant women aged 18 years or older, with a single fetus and between 16-17 gestational weeks were eligible. The primary outcome was low back pain intensity measured by the Low Back Pain Rating scale at 32...
Widespread Micropollutant Monitoring in the Hudson River Estuary Reveals Spatiotemporal Micropollutant Clusters and Their Sources.

Science.gov (United States)

Carpenter, Corey M G; Helbling, Damian E

2018-06-05

The objective of this study was to identify sources of micropollutants in the Hudson River Estuary (HRE). We collected 127 grab samples at 17 sites along the HRE over 2 years and screened for up to 200 micropollutants. We quantified 168 of the micropollutants in at least one of the samples. Atrazine, gabapentin, metolachlor, and sucralose were measured in every sample. We used data-driven unsupervised methods to cluster the micropollutants on the basis of their spatiotemporal occurrence and normalized-concentration patterns. Three major clusters of micropollutants were identified: ubiquitous and mixed-use (core micropollutants), sourced from sewage treatment plant outfalls (STP micropollutants), and derived from diffuse upstream sources (diffuse micropollutants). Each of these clusters was further refined into subclusters that were linked to specific sources on the basis of relationships identified through geospatial analysis of watershed features. Evaluation of cumulative loadings of each subcluster revealed that the Mohawk River and Rondout Creek are major contributors of most core micropollutants and STP micropollutants and the upper HRE is a major contributor of diffuse micropollutants. These data provide the first comprehensive evaluation of micropollutants in the HRE and define distinct spatiotemporal micropollutant clusters that are linked to sources and conserved across surface water systems around the world.
An Unsupervised Anomalous Event Detection and Interactive Analysis Framework for Large-scale Satellite Data

Science.gov (United States)

LIU, Q.; Lv, Q.; Klucik, R.; Chen, C.; Gallaher, D. W.; Grant, G.; Shang, L.

2016-12-01

Due to the high volume and complexity of satellite data, computer-aided tools for fast quality assessments and scientific discovery are indispensable for scientists in the era of Big Data. In this work, we have developed a framework for automated anomalous event detection in massive satellite data. The framework consists of a clustering-based anomaly detection algorithm and a cloud-based tool for interactive analysis of detected anomalies. The algorithm is unsupervised and requires no prior knowledge of the data (e.g., expected normal pattern or known anomalies). As such, it works for diverse data sets, and performs well even in the presence of missing and noisy data. The cloud-based tool provides an intuitive mapping interface that allows users to interactively analyze anomalies using multiple features. As a whole, our framework can (1) identify outliers in a spatio-temporal context, (2) recognize and distinguish meaningful anomalous events from individual outliers, (3) rank those events based on "interestingness" (e.g., rareness or total number of outliers) defined by users, and (4) enable interactively query, exploration, and analysis of those anomalous events. In this presentation, we will demonstrate the effectiveness and efficiency of our framework in the application of detecting data quality issues and unusual natural events using two satellite datasets. The techniques and tools developed in this project are applicable for a diverse set of satellite data and will be made publicly available for scientists in early 2017.
Data Clustering

Science.gov (United States)

Wagstaff, Kiri L.

2012-03-01

On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained
Hanging out with Which Friends? Friendship-Level Predictors of Unstructured and Unsupervised Socializing in Adolescence

Science.gov (United States)

Siennick, Sonja E.; Osgood, D. Wayne

2012-01-01

Companions are central to explanations of the risky nature of unstructured and unsupervised socializing, yet we know little about whom adolescents are with when hanging out. We examine predictors of how often friendship dyads hang out via multilevel analyses of longitudinal friendship-level data on over 5,000 middle schoolers. Adolescents hang out…
Constrained Versions of DEDICOM for Use in Unsupervised Part-Of-Speech Tagging

Energy Technology Data Exchange (ETDEWEB)

Dunlavy, Daniel; Peter A. Chew

2016-05-01

This reports describes extensions of DEDICOM (DEcomposition into DIrectional COMponents) data models [3] that incorporate bound and linear constraints. The main purpose of these extensions is to investigate the use of improved data models for unsupervised part-of-speech tagging, as described by Chew et al. [2]. In that work, a single domain, two-way DEDICOM model was computed on a matrix of bigram fre- quencies of tokens in a corpus and used to identify parts-of-speech as an unsupervised approach to that problem. An open problem identi ed in that work was the com- putation of a DEDICOM model that more closely resembled the matrices used in a Hidden Markov Model (HMM), speci cally through post-processing of the DEDICOM factor matrices. The work reported here consists of the description of several models that aim to provide a direct solution to that problem and a way to t those models. The approach taken here is to incorporate the model requirements as bound and lin- ear constrains into the DEDICOM model directly and solve the data tting problem as a constrained optimization problem. This is in contrast to the typical approaches in the literature, where the DEDICOM model is t using unconstrained optimization approaches, and model requirements are satis ed as a post-processing step.
Unsupervised Object Modeling and Segmentation with Symmetry Detection for Human Activity Recognition

Directory of Open Access Journals (Sweden)

Jui-Yuan Su

2015-04-01

Full Text Available In this paper we present a novel unsupervised approach to detecting and segmenting objects as well as their constituent symmetric parts in an image. Traditional unsupervised image segmentation is limited by two obvious deficiencies: the object detection accuracy degrades with the misaligned boundaries between the segmented regions and the target, and pre-learned models are required to group regions into meaningful objects. To tackle these difficulties, the proposed approach aims at incorporating the pair-wise detection of symmetric patches to achieve the goal of segmenting images into symmetric parts. The skeletons of these symmetric parts then provide estimates of the bounding boxes to locate the target objects. Finally, for each detected object, the graphcut-based segmentation algorithm is applied to find its contour. The proposed approach has significant advantages: no a priori object models are used, and multiple objects are detected. To verify the effectiveness of the approach based on the cues that a face part contains an oval shape and skin colors, human objects are extracted from among the detected objects. The detected human objects and their parts are finally tracked across video frames to capture the object part movements for learning the human activity models from video clips. Experimental results show that the proposed method gives good performance on publicly available datasets.
Technology Clusters Exploration for Patent Portfolio through Patent Abstract Analysis

Directory of Open Access Journals (Sweden)

Gabjo Kim

2016-12-01

Full Text Available This study explores technology clusters through patent analysis. The aim of exploring technology clusters is to grasp competitors’ levels of sustainable research and development (R&D and establish a sustainable strategy for entering an industry. To achieve this, we first grouped the patent documents with similar technologies by applying affinity propagation (AP clustering, which is effective while grouping large amounts of data. Next, in order to define the technology clusters, we adopted the term frequency-inverse document frequency (TF-IDF weight, which lists the terms in order of importance. We collected the patent data of Korean electric car companies from the United States Patent and Trademark Office (USPTO to verify our proposed methodology. As a result, our proposed methodology presents more detailed information on the Korean electric car industry than previous studies.
Defining functioning levels in patients with schizophrenia: A combination of a novel clustering method and brain SPECT analysis.

Science.gov (United States)

Catherine, Faget-Agius; Aurélie, Vincenti; Eric, Guedj; Pierre, Michel; Raphaëlle, Richieri; Marine, Alessandrini; Pascal, Auquier; Christophe, Lançon; Laurent, Boyer

2017-12-30

This study aims to define functioning levels of patients with schizophrenia by using a method of interpretable clustering based on a specific functioning scale, the Functional Remission Of General Schizophrenia (FROGS) scale, and to test their validity regarding clinical and neuroimaging characterization. In this observational study, patients with schizophrenia have been classified using a hierarchical top-down method called clustering using unsupervised binary trees (CUBT). Socio-demographic, clinical, and neuroimaging SPECT perfusion data were compared between the different clusters to ensure their clinical relevance. A total of 242 patients were analyzed. A four-group functioning level structure has been identified: 54 are classified as "minimal", 81 as "low", 64 as "moderate", and 43 as "high". The clustering shows satisfactory statistical properties, including reproducibility and discriminancy. The 4 clusters consistently differentiate patients. "High" functioning level patients reported significantly the lowest scores on the PANSS and the CDSS, and the highest scores on the GAF, the MARS and S-QoL 18. Functioning levels were significantly associated with cerebral perfusion of two relevant areas: the left inferior parietal cortex and the anterior cingulate. Our study provides relevant functioning levels in schizophrenia, and may enhance the use of functioning scale. Copyright © 2017 Elsevier B.V. All rights reserved.
Unsupervised online classifier in sleep scoring for sleep deprivation studies.

Science.gov (United States)

Libourel, Paul-Antoine; Corneyllie, Alexandra; Luppi, Pierre-Hervé; Chouvet, Guy; Gervasoni, Damien

2015-05-01

This study was designed to evaluate an unsupervised adaptive algorithm for real-time detection of sleep and wake states in rodents. We designed a Bayesian classifier that automatically extracts electroencephalogram (EEG) and electromyogram (EMG) features and categorizes non-overlapping 5-s epochs into one of the three major sleep and wake states without any human supervision. This sleep-scoring algorithm is coupled online with a new device to perform selective paradoxical sleep deprivation (PSD). Controlled laboratory settings for chronic polygraphic sleep recordings and selective PSD. Ten adult Sprague-Dawley rats instrumented for chronic polysomnographic recordings. The performance of the algorithm is evaluated by comparison with the score obtained by a human expert reader. Online detection of PS is then validated with a PSD protocol with duration of 72 hours. Our algorithm gave a high concordance with human scoring with an average κ coefficient > 70%. Notably, the specificity to detect PS reached 92%. Selective PSD using real-time detection of PS strongly reduced PS amounts, leaving only brief PS bouts necessary for the detection of PS in EEG and EMG signals (4.7 ± 0.7% over 72 h, versus 8.9 ± 0.5% in baseline), and was followed by a significant PS rebound (23.3 ± 3.3% over 150 minutes). Our fully unsupervised data-driven algorithm overcomes some limitations of the other automated methods such as the selection of representative descriptors or threshold settings. When used online and coupled with our sleep deprivation device, it represents a better option for selective PSD than other methods like the tedious gentle handling or the platform method. © 2015 Associated Professional Sleep Societies, LLC.
Clustering with position-specific constraints on variance: Applying redescending M-estimators to label-free LC-MS data analysis

Directory of Open Access Journals (Sweden)

Mani D R

2011-08-01

Full Text Available Abstract Background Clustering is a widely applicable pattern recognition method for discovering groups of similar observations in data. While there are a large variety of clustering algorithms, very few of these can enforce constraints on the variation of attributes for data points included in a given cluster. In particular, a clustering algorithm that can limit variation within a cluster according to that cluster's position (centroid location can produce effective and optimal results in many important applications ranging from clustering of silicon pixels or calorimeter cells in high-energy physics to label-free liquid chromatography based mass spectrometry (LC-MS data analysis in proteomics and metabolomics. Results We present MEDEA (M-Estimator with DEterministic Annealing, an M-estimator based, new unsupervised algorithm that is designed to enforce position-specific constraints on variance during the clustering process. The utility of MEDEA is demonstrated by applying it to the problem of "peak matching"--identifying the common LC-MS peaks across multiple samples--in proteomic biomarker discovery. Using real-life datasets, we show that MEDEA not only outperforms current state-of-the-art model-based clustering methods, but also results in an implementation that is significantly more efficient, and hence applicable to much larger LC-MS data sets. Conclusions MEDEA is an effective and efficient solution to the problem of peak matching in label-free LC-MS data. The program implementing the MEDEA algorithm, including datasets, clustering results, and supplementary information is available from the author website at http://www.hephy.at/user/fru/medea/.
Natural-Annotation-based Unsupervised Construction of Korean-Chinese Domain Dictionary

Science.gov (United States)

Liu, Wuying; Wang, Lin

2018-03-01

The large-scale bilingual parallel resource is significant to statistical learning and deep learning in natural language processing. This paper addresses the automatic construction issue of the Korean-Chinese domain dictionary, and presents a novel unsupervised construction method based on the natural annotation in the raw corpus. We firstly extract all Korean-Chinese word pairs from Korean texts according to natural annotations, secondly transform the traditional Chinese characters into the simplified ones, and finally distill out a bilingual domain dictionary after retrieving the simplified Chinese words in an extra Chinese domain dictionary. The experimental results show that our method can automatically build multiple Korean-Chinese domain dictionaries efficiently.
Performance of some supervised and unsupervised multivariate techniques for grouping authentic and unauthentic Viagra and Cialis

Directory of Open Access Journals (Sweden)

Michel J. Anzanello

2014-09-01

Full Text Available A typical application of multivariate techniques in forensic analysis consists of discriminating between authentic and unauthentic samples of seized drugs, in addition to finding similar properties in the unauthentic samples. In this paper, the performance of several methods belonging to two different classes of multivariate techniques–supervised and unsupervised techniques–were compared. The supervised techniques (ST are the k-Nearest Neighbor (KNN, Support Vector Machine (SVM, Probabilistic Neural Networks (PNN and Linear Discriminant Analysis (LDA; the unsupervised techniques are the k-Means CA and the Fuzzy C-Means (FCM. The methods are applied to Infrared Spectroscopy by Fourier Transform (FTIR from authentic and unauthentic Cialis and Viagra. The FTIR data are also transformed by Principal Components Analysis (PCA and kernel functions aimed at improving the grouping performance. ST proved to be a more reasonable choice when the analysis is conducted on the original data, while the UT led to better results when applied to transformed data.
Spike sorting using locality preserving projection with gap statistics and landmark-based spectral clustering.

Science.gov (United States)

Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

2014-12-30

Understanding neural functions requires knowledge from analysing electrophysiological data. The process of assigning spikes of a multichannel signal into clusters, called spike sorting, is one of the important problems in such analysis. There have been various automated spike sorting techniques with both advantages and disadvantages regarding accuracy and computational costs. Therefore, developing spike sorting methods that are highly accurate and computationally inexpensive is always a challenge in the biomedical engineering practice. An automatic unsupervised spike sorting method is proposed in this paper. The method uses features extracted by the locality preserving projection (LPP) algorithm. These features afterwards serve as inputs for the landmark-based spectral clustering (LSC) method. Gap statistics (GS) is employed to evaluate the number of clusters before the LSC can be performed. The proposed LPP-LSC is highly accurate and computationally inexpensive spike sorting approach. LPP spike features are very discriminative; thereby boost the performance of clustering methods. Furthermore, the LSC method exhibits its efficiency when integrated with the cluster evaluator GS. The proposed method's accuracy is approximately 13% superior to that of the benchmark combination between wavelet transformation and superparamagnetic clustering (WT-SPC). Additionally, LPP-LSC computing time is six times less than that of the WT-SPC. LPP-LSC obviously demonstrates a win-win spike sorting solution meeting both accuracy and computational cost criteria. LPP and LSC are linear algorithms that help reduce computational burden and thus their combination can be applied into real-time spike analysis. Copyright © 2014 Elsevier B.V. All rights reserved.
Single-Trial Classification of Bistable Perception by Integrating Empirical Mode Decomposition, Clustering, and Support Vector Machine

Directory of Open Access Journals (Sweden)

Hualou Liang

2008-04-01

Full Text Available We propose an empirical mode decomposition (EMD- based method to extract features from the multichannel recordings of local field potential (LFP, collected from the middle temporal (MT visual cortex in a macaque monkey, for decoding its bistable structure-from-motion (SFM perception. The feature extraction approach consists of three stages. First, we employ EMD to decompose nonstationary single-trial time series into narrowband components called intrinsic mode functions (IMFs with time scales dependent on the data. Second, we adopt unsupervised K-means clustering to group the IMFs and residues into several clusters across all trials and channels. Third, we use the supervised common spatial patterns (CSP approach to design spatial filters for the clustered spatiotemporal signals. We exploit the support vector machine (SVM classifier on the extracted features to decode the reported perception on a single-trial basis. We demonstrate that the CSP feature of the cluster in the gamma frequency band outperforms the features in other frequency bands and leads to the best decoding performance. We also show that the EMD-based feature extraction can be useful for evoked potential estimation. Our proposed feature extraction approach may have potential for many applications involving nonstationary multivariable time series such as brain-computer interfaces (BCI.
Unsupervised classification of lidar-based vegetation structure metrics at Jean Lafitte National Historical Park and Preserve

Science.gov (United States)

Kranenburg, Christine J.; Palaseanu-Lovejoy, Monica; Nayegandhi, Amar; Brock, John; Woodman, Robert

2012-01-01

Traditional vegetation maps capture the horizontal distribution of various vegetation properties, for example, type, species and age/senescence, across a landscape. Ecologists have long known, however, that many important forest properties, for example, interior microclimate, carbon capacity, biomass and habitat suitability, are also dependent on the vertical arrangement of branches and leaves within tree canopies. The objective of this study was to use a digital elevation model (DEM) along with tree canopy-structure metrics derived from a lidar survey conducted using the Experimental Advanced Airborne Research Lidar (EAARL) to capture a three-dimensional view of vegetation communities in the Barataria Preserve unit of Jean Lafitte National Historical Park and Preserve, Louisiana. The EAARL instrument is a raster-scanning, full waveform-resolving, small-footprint, green-wavelength (532-nanometer) lidar system designed to map coastal bathymetry, topography and vegetation structure simultaneously. An unsupervised clustering procedure was then applied to the 3-dimensional-based metrics and DEM to produce a vegetation map based on the vertical structure of the park's vegetation, which includes a flotant marsh, scrub-shrub wetland, bottomland hardwood forest, and baldcypress-tupelo swamp forest. This study was completed in collaboration with the National Park Service Inventory and Monitoring Program's Gulf Coast Network. The methods presented herein are intended to be used as part of a cost-effective monitoring tool to capture change in park resources.
Uncertainty of a detected spatial cluster in 1D: quantification and visualization

KAUST Repository

Lee, Junho; Gangnon, Ronald E.; Zhu, Jun; Liang, Jingjing

2017-01-01

Spatial cluster detection is an important problem in a variety of scientific disciplines such as environmental sciences, epidemiology and sociology. However, there appears to be very limited statistical methodology for quantifying the uncertainty of a detected cluster. In this paper, we develop a new method for the quantification and visualization of uncertainty associated with a detected cluster. Our approach is defining a confidence set for the true cluster and visualizing the confidence set, based on the maximum likelihood, in time or in one-dimensional space. We evaluate the pivotal property of the statistic used to construct the confidence set and the coverage rate for the true cluster via empirical distributions. For illustration, our methodology is applied to both simulated data and an Alaska boreal forest dataset. Copyright © 2017 John Wiley & Sons, Ltd.
Uncertainty of a detected spatial cluster in 1D: quantification and visualization

KAUST Repository

Lee, Junho

2017-10-19

Spatial cluster detection is an important problem in a variety of scientific disciplines such as environmental sciences, epidemiology and sociology. However, there appears to be very limited statistical methodology for quantifying the uncertainty of a detected cluster. In this paper, we develop a new method for the quantification and visualization of uncertainty associated with a detected cluster. Our approach is defining a confidence set for the true cluster and visualizing the confidence set, based on the maximum likelihood, in time or in one-dimensional space. We evaluate the pivotal property of the statistic used to construct the confidence set and the coverage rate for the true cluster via empirical distributions. For illustration, our methodology is applied to both simulated data and an Alaska boreal forest dataset. Copyright © 2017 John Wiley & Sons, Ltd.
Sensitization trajectories in childhood revealed by using a cluster analysis

DEFF Research Database (Denmark)

Schoos, Ann-Marie M.; Chawes, Bo L.; Melen, Erik

2017-01-01

Prospective Studies on Asthma in Childhood 2000 (COPSAC2000) birth cohort with specific IgE against 13 common food and inhalant allergens at the ages of ½, 1½, 4, and 6 years. An unsupervised cluster analysis for 3-dimensional data (nonnegative sparse parallel factor analysis) was used to extract latent......BACKGROUND: Assessment of sensitization at a single time point during childhood provides limited clinical information. We hypothesized that sensitization develops as specific patterns with respect to age at debut, development over time, and involved allergens and that such patterns might be more...... biologically and clinically relevant. OBJECTIVE: We sought to explore latent patterns of sensitization during the first 6 years of life and investigate whether such patterns associate with the development of asthma, rhinitis, and eczema. METHODS: We investigated 398 children from the at-risk Copenhagen...

An Introduction to Topic Modeling as an Unsupervised Machine Learning Way to Organize Text Information

Science.gov (United States)

Snyder, Robin M.

2015-01-01

The field of topic modeling has become increasingly important over the past few years. Topic modeling is an unsupervised machine learning way to organize text (or image or DNA, etc.) information such that related pieces of text can be identified. This paper/session will present/discuss the current state of topic modeling, why it is important, and…
ADAPTIVE WEB SITE DENGAN METODE FUZZY CLUSTERING

Directory of Open Access Journals (Sweden)

Muchammad Husni

2004-01-01

Full Text Available Normal 0 false false false IN X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Ledakan pertumbuhan dan perkembangan informasi dalam dunia maya menjadikan personalisasian informasi menjadi isu yang penting. Personalisasi informasi yang akan diberikan oleh situs web akan sangat mempengaruhi pola dan perilaku pengguna dalam pencarian informasi, terutama pada perdagangan elektronis (e-commerce. Salah satu pendekatan yang memungkinkan dalam personalisasian web adalah mencari profil pengguna (user profile dari data historis yang sangat besar di file web log. Pengklasifikasian data tanpa pengawasan (unsupervised clasification atau metode metode clustering cukup baik untuk menganalisa data log akses pengguna yang semi terstruktur. Pada metode ini, didefinisikan "user session" dan juga ukuran perbedaan (dissimilarity diantara dua web session yang menggambarkan pengorganisasian sebuah web site. Untuk mendapatkan sebuah profil akses pengguna, dilakukan pembagian user session berdasarkan pasangan ketidaksamaan menggunakan algoritma Fuzzy Clustering. Kata kunci : Adaptive Website, Fuzzy Clustering, personalisasi informasi.
A heuristic approach to possibilistic clustering algorithms and applications

CERN Document Server

Viattchenin, Dmitri A

2013-01-01

The present book outlines a new approach to possibilistic clustering in which the sought clustering structure of the set of objects is based directly on the formal definition of fuzzy cluster and the possibilistic memberships are determined directly from the values of the pairwise similarity of objects. The proposed approach can be used for solving different classification problems. Here, some techniques that might be useful at this purpose are outlined, including a methodology for constructing a set of labeled objects for a semi-supervised clustering algorithm, a methodology for reducing analyzed attribute space dimensionality and a methods for asymmetric data processing. Moreover, a technique for constructing a subset of the most appropriate alternatives for a set of weak fuzzy preference relations, which are defined on a universe of alternatives, is described in detail, and a method for rapidly prototyping the Mamdani’s fuzzy inference systems is introduced. This book addresses engineers, scientist...
Land cover classification using reformed fuzzy C-means

Indian Academy of Sciences (India)

This paper uses segmentation based on unsupervised clustering techniques for classification of land cover. ∗ ... and unsupervised classification can be solved by FCM. ..... They also act as input to the development and monitoring of a range of ...
Intelligent Fault Diagnosis of Rotary Machinery Based on Unsupervised Multiscale Representation Learning

Science.gov (United States)

Jiang, Guo-Qian; Xie, Ping; Wang, Xiao; Chen, Meng; He, Qun

2017-11-01

The performance of traditional vibration based fault diagnosis methods greatly depends on those handcrafted features extracted using signal processing algorithms, which require significant amounts of domain knowledge and human labor, and do not generalize well to new diagnosis domains. Recently, unsupervised representation learning provides an alternative promising solution to feature extraction in traditional fault diagnosis due to its superior learning ability from unlabeled data. Given that vibration signals usually contain multiple temporal structures, this paper proposes a multiscale representation learning (MSRL) framework to learn useful features directly from raw vibration signals, with the aim to capture rich and complementary fault pattern information at different scales. In our proposed approach, a coarse-grained procedure is first employed to obtain multiple scale signals from an original vibration signal. Then, sparse filtering, a newly developed unsupervised learning algorithm, is applied to automatically learn useful features from each scale signal, respectively, and then the learned features at each scale to be concatenated one by one to obtain multiscale representations. Finally, the multiscale representations are fed into a supervised classifier to achieve diagnosis results. Our proposed approach is evaluated using two different case studies: motor bearing and wind turbine gearbox fault diagnosis. Experimental results show that the proposed MSRL approach can take full advantages of the availability of unlabeled data to learn discriminative features and achieved better performance with higher accuracy and stability compared to the traditional approaches.
Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP.

Directory of Open Access Journals (Sweden)

Yoonsik Shim

2016-10-01

Full Text Available We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP. The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.
Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP.

Science.gov (United States)

Shim, Yoonsik; Philippides, Andrew; Staras, Kevin; Husbands, Phil

2016-10-01

We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.
Chemical modeling of groundwater in the Banat Plain, southwestern Romania, with elevated As content and co-occurring species by combining diagrams and unsupervised multivariate statistical approaches.

Science.gov (United States)

Butaciu, Sinziana; Senila, Marin; Sarbu, Costel; Ponta, Michaela; Tanaselia, Claudiu; Cadar, Oana; Roman, Marius; Radu, Emil; Sima, Mihaela; Frentiu, Tiberiu

2017-04-01

The study proposes a combined model based on diagrams (Gibbs, Piper, Stuyfzand Hydrogeochemical Classification System) and unsupervised statistical approaches (Cluster Analysis, Principal Component Analysis, Fuzzy Principal Component Analysis, Fuzzy Hierarchical Cross-Clustering) to describe natural enrichment of inorganic arsenic and co-occurring species in groundwater in the Banat Plain, southwestern Romania. Speciation of inorganic As (arsenite, arsenate), ion concentrations (Na + , K + , Ca 2+ , Mg 2+ , HCO 3 - , Cl - , F - , SO 4 2- , PO 4 3- , NO 3 - ), pH, redox potential, conductivity and total dissolved substances were performed. Classical diagrams provided the hydrochemical characterization, while statistical approaches were helpful to establish (i) the mechanism of naturally occurring of As and F - species and the anthropogenic one for NO 3 - , SO 4 2- , PO 4 3- and K + and (ii) classification of groundwater based on content of arsenic species. The HCO 3 - type of local groundwater and alkaline pH (8.31-8.49) were found to be responsible for the enrichment of arsenic species and occurrence of F - but by different paths. The PO 4 3- -AsO 4 3- ion exchange, water-rock interaction (silicates hydrolysis and desorption from clay) were associated to arsenate enrichment in the oxidizing aquifer. Fuzzy Hierarchical Cross-Clustering was the strongest tool for the rapid simultaneous classification of groundwaters as a function of arsenic content and hydrogeochemical characteristics. The approach indicated the Na + -F - -pH cluster as marker for groundwater with naturally elevated As and highlighted which parameters need to be monitored. A chemical conceptual model illustrating the natural and anthropogenic paths and enrichment of As and co-occurring species in the local groundwater supported by mineralogical analysis of rocks was established. Copyright © 2016 Elsevier Ltd. All rights reserved.
A Clustering Methodology of Web Log Data for Learning Management Systems

Science.gov (United States)

Valsamidis, Stavros; Kontogiannis, Sotirios; Kazanidis, Ioannis; Theodosiou, Theodosios; Karakos, Alexandros

2012-01-01

Learning Management Systems (LMS) collect large amounts of data. Data mining techniques can be applied to analyse their web data log files. The instructors may use this data for assessing and measuring their courses. In this respect, we have proposed a methodology for analysing LMS courses and students' activity. This methodology uses a Markov…
Legacy ExtraGalactic UV Survey with The Hubble Space Telescope: Stellar Cluster Catalogs and First Insights Into Cluster Formation and Evolution in NGC 628

NARCIS (Netherlands)

Adamo, A.; Ryon, J.E.; Messa, M.; Kim, H.; Grasha, K.; Cook, D.O.; Calzetti, D.; Lee, J.C.; Whitmore, B.C.; Elmegreen, B.G.; Ubeda, L.; Smith, L.J.; Bright, S.N.; Runnholm, A.; Andrews, J.E.; Fumagalli, M.; Gouliermis, D.A.; Kahre, L.; Nair, P.; Thilker, D.; Walterbos, R.; Wofford, A.; Aloisi, A.; Ashworth, G.; Brown, T.M.; Chandar, R.; Christian, C.; Cignoni, M.; Clayton, G.C.; Dale, D.A.; de Mink, S.E.; Dobbs, C.; Elmegreen, D.M.; Evans, A.S.; Gallagher III, J.S.; Grebel, E.K.; Herrero, A.; Hunter, D.A.; Johnson, K.E.; Kennicutt, R.C.; Krumholz, M.R.; Lennon, D.; Levay, K.; Martin, C.; Nota, A.; Östlin, G.; Pellerin, A.; Prieto, J.; Regan, M.W.; Sabbi, E.; Sacchi, E.; Schaerer, D.; Schiminovich, D.; Shabani, F.; Tosi, M.; Van Dyk, S.D.; Zackrisson, E.

2017-01-01

We report the large effort that is producing comprehensive high-level young star cluster (YSC) catalogs for a significant fraction of galaxies observed with the Legacy ExtraGalactic UV Survey (LEGUS) Hubble treasury program. We present the methodology developed to extract cluster positions, verify
Methodology сomparative statistical analysis of Russian industry based on cluster analysis

Directory of Open Access Journals (Sweden)

Sergey S. Shishulin

2017-01-01

Full Text Available The article is devoted to researching of the possibilities of applying multidimensional statistical analysis in the study of industrial production on the basis of comparing its growth rates and structure with other developed and developing countries of the world. The purpose of this article is to determine the optimal set of statistical methods and the results of their application to industrial production data, which would give the best access to the analysis of the result.Data includes such indicators as output, output, gross value added, the number of employed and other indicators of the system of national accounts and operational business statistics. The objects of observation are the industry of the countrys of the Customs Union, the United States, Japan and Erope in 2005-2015. As the research tool used as the simplest methods of transformation, graphical and tabular visualization of data, and methods of statistical analysis. In particular, based on a specialized software package (SPSS, the main components method, discriminant analysis, hierarchical methods of cluster analysis, Ward’s method and k-means were applied.The application of the method of principal components to the initial data makes it possible to substantially and effectively reduce the initial space of industrial production data. Thus, for example, in analyzing the structure of industrial production, the reduction was from fifteen industries to three basic, well-interpreted factors: the relatively extractive industries (with a low degree of processing, high-tech industries and consumer goods (medium-technology sectors. At the same time, as a result of comparison of the results of application of cluster analysis to the initial data and data obtained on the basis of the principal components method, it was established that clustering industrial production data on the basis of new factors significantly improves the results of clustering.As a result of analyzing the parameters of
A Fault Diagnosis Approach for Gas Turbine Exhaust Gas Temperature Based on Fuzzy C-Means Clustering and Support Vector Machine

Directory of Open Access Journals (Sweden)

Zhi-tao Wang

2015-01-01

Full Text Available As an important gas path performance parameter of gas turbine, exhaust gas temperature (EGT can represent the thermal health condition of gas turbine. In order to monitor and diagnose the EGT effectively, a fusion approach based on fuzzy C-means (FCM clustering algorithm and support vector machine (SVM classification model is proposed in this paper. Considering the distribution characteristics of gas turbine EGT, FCM clustering algorithm is used to realize clustering analysis and obtain the state pattern, on the basis of which the preclassification of EGT is completed. Then, SVM multiclassification model is designed to carry out the state pattern recognition and fault diagnosis. As an example, the historical monitoring data of EGT from an industrial gas turbine is analyzed and used to verify the performance of the fusion fault diagnosis approach presented in this paper. The results show that this approach can make full use of the unsupervised feature extraction ability of FCM clustering algorithm and the sample classification generalization properties of SVM multiclassification model, which offers an effective way to realize the online condition recognition and fault diagnosis of gas turbine EGT.
Unsupervised/supervised learning concept for 24-hour load forecasting

Energy Technology Data Exchange (ETDEWEB)

Djukanovic, M [Electrical Engineering Inst. ' Nikola Tesla' , Belgrade (Yugoslavia); Babic, B [Electrical Power Industry of Serbia, Belgrade (Yugoslavia); Sobajic, D J; Pao, Y -H [Case Western Reserve Univ., Cleveland, OH (United States). Dept. of Electrical Engineering and Computer Science

1993-07-01

An application of artificial neural networks in short-term load forecasting is described. An algorithm using an unsupervised/supervised learning concept and historical relationship between the load and temperature for a given season, day type and hour of the day to forecast hourly electric load with a lead time of 24 hours is proposed. An additional approach using functional link net, temperature variables, average load and last one-hour load of previous day is introduced and compared with the ANN model with one hidden layer load forecast. In spite of limited available weather variables (maximum, minimum and average temperature for the day) quite acceptable results have been achieved. The 24-hour-ahead forecast errors (absolute average) ranged from 2.78% for Saturdays and 3.12% for working days to 3.54% for Sundays. (Author)
Assessment of anaesthetic depth by clustering analysis and autoregressive modelling of electroencephalograms

DEFF Research Database (Denmark)

Thomsen, C E; Rosenfalck, A; Nørregaard Christensen, K

1991-01-01

The brain activity electroencephalogram (EEG) was recorded from 30 healthy women scheduled for hysterectomy. The patients were anaesthetized with isoflurane, halothane or etomidate/fentanyl. A multiparametric method was used for extraction of amplitude and frequency information from the EEG....... The method applied autoregressive modelling of the signal, segmented in 2 s fixed intervals. The features from the EEG segments were used for learning and for classification. The learning process was unsupervised and hierarchical clustering analysis was used to construct a learning set of EEG amplitude......-frequency patterns for each of the three anaesthetic drugs. These EEG patterns were assigned to a colour code corresponding to similar clinical states. A common learning set could be used for all patients anaesthetized with the same drug. The classification process could be performed on-line and the results were...
The cylindrical K-function and Poisson line cluster point processes

DEFF Research Database (Denmark)

Møller, Jesper; Safavimanesh, Farzaneh; Rasmussen, Jakob G.

Poisson line cluster point processes, is also introduced. Parameter estimation based on moment methods or Bayesian inference for this model is discussed when the underlying Poisson line process and the cluster memberships are treated as hidden processes. To illustrate the methodologies, we analyze two...
Network clustering coefficient approach to DNA sequence analysis

Energy Technology Data Exchange (ETDEWEB)

Gerhardt, Guenther J.L. [Universidade Federal do Rio Grande do Sul-Hospital de Clinicas de Porto Alegre, Rua Ramiro Barcelos 2350/sala 2040/90035-003 Porto Alegre (Brazil); Departamento de Fisica e Quimica da Universidade de Caxias do Sul, Rua Francisco Getulio Vargas 1130, 95001-970 Caxias do Sul (Brazil); Lemke, Ney [Programa Interdisciplinar em Computacao Aplicada, Unisinos, Av. Unisinos, 950, 93022-000 Sao Leopoldo, RS (Brazil); Corso, Gilberto [Departamento de Biofisica e Farmacologia, Centro de Biociencias, Universidade Federal do Rio Grande do Norte, Campus Universitario, 59072 970 Natal, RN (Brazil)]. E-mail: corso@dfte.ufrn.br

2006-05-15

In this work we propose an alternative DNA sequence analysis tool based on graph theoretical concepts. The methodology investigates the path topology of an organism genome through a triplet network. In this network, triplets in DNA sequence are vertices and two vertices are connected if they occur juxtaposed on the genome. We characterize this network topology by measuring the clustering coefficient. We test our methodology against two main bias: the guanine-cytosine (GC) content and 3-bp (base pairs) periodicity of DNA sequence. We perform the test constructing random networks with variable GC content and imposed 3-bp periodicity. A test group of some organisms is constructed and we investigate the methodology in the light of the constructed random networks. We conclude that the clustering coefficient is a valuable tool since it gives information that is not trivially contained in 3-bp periodicity neither in the variable GC content.
Unsupervised feature learning for autonomous rock image classification

Science.gov (United States)

Shu, Lei; McIsaac, Kenneth; Osinski, Gordon R.; Francis, Raymond

2017-09-01

Autonomous rock image classification can enhance the capability of robots for geological detection and enlarge the scientific returns, both in investigation on Earth and planetary surface exploration on Mars. Since rock textural images are usually inhomogeneous and manually hand-crafting features is not always reliable, we propose an unsupervised feature learning method to autonomously learn the feature representation for rock images. In our tests, rock image classification using the learned features shows that the learned features can outperform manually selected features. Self-taught learning is also proposed to learn the feature representation from a large database of unlabelled rock images of mixed class. The learned features can then be used repeatedly for classification of any subclass. This takes advantage of the large dataset of unlabelled rock images and learns a general feature representation for many kinds of rocks. We show experimental results supporting the feasibility of self-taught learning on rock images.
Swarm controlled emergence for ant clustering

DEFF Research Database (Denmark)

Scheidler, Alexander; Merkle, Daniel; Middendorf, Martin

2013-01-01

.g. moving robots, and clustering algorithms. Design/methodology/approach: Different types of control agents for that ant clustering model are designed by introducing slight changes to the behavioural rules of the normal agents. The clustering behaviour of the resulting swarms is investigated by extensive...... for future research to investigate the application of the method in other swarm systems. Swarm controlled emergence might be applied to control emergent effects in computing systems that consist of many autonomous components which make decentralized decisions based on local information. Practical...... simulation studies. Findings: It is shown that complex behavior can emerge in systems with two types of agents (normal agents and control agents). For a particular behavior of the control agents, an interesting swarm size dependent effect was found. The behaviour prevents clustering when the number...
Unsupervised exercise in survivors of human papillomavirus related head and neck cancer: how many can go it alone?

Science.gov (United States)

Bauml, Joshua; Kim, Jiyoung; Zhang, Xiaochen; Aggarwal, Charu; Cohen, Roger B; Schmitz, Kathryn

2017-08-01

Patients with human papillomavirus (HPV)-related head and neck cancer (HNC) have a better prognosis relative to other types of HNC, making survivorship an emerging and critical issue. Exercise is a core component of survivorship care, but little is known about how many survivors of HPV-related HNC can safely be advised to start exercising on their own, as opposed to needing further evaluation or supervised exercise. We utilized guidelines to identify health issues that would indicate value of further evaluation prior to being safely prescribed unsupervised exercise. We performed a retrospective chart review of 150 patients with HPV-related HNC to assess health issues 6 months after completing definitive therapy. Patients with at least one health issue were deemed appropriate to receive further evaluation prior to prescription for unsupervised exercise. We utilized logistic regression to identify clinical and demographic factors associated with the need for further evaluation, likely performed by outpatient rehabilitation clinicians. In this cohort of patients, 39.3% could safely be prescribed unsupervised exercise 6 months after completing definitive therapy. On multivariable regression, older age, BMI >30, and receipt of radiation were associated with an increased likelihood for requiring further evaluation or supervised exercise. Over half of patients with HPV-related HNC would benefit from referral to physical therapy or an exercise professional for further evaluation to determine the most appropriate level of exercise supervision, based upon current guidelines. Development of such referral systems will be essential to enhance survivorship outcomes for patients who have completed treatment.
Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers

Energy Technology Data Exchange (ETDEWEB)

Sreepathi, Sarat [ORNL; Kumar, Jitendra [ORNL; Mills, Richard T. [Argonne National Laboratory; Hoffman, Forrest M. [ORNL; Sripathi, Vamsi [Intel Corporation; Hargrove, William Walter [United States Department of Agriculture (USDA), United States Forest Service (USFS)

2017-09-01

A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like the Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.

The relationship between supplier networks and industrial clusters: an analysis based on the cluster mapping method

Directory of Open Access Journals (Sweden)

Ichiro IWASAKI

2010-06-01

Full Text Available Michael Porter’s concept of competitive advantages emphasizes the importance of regional cooperation of various actors in order to gain competitiveness on globalized markets. Foreign investors may play an important role in forming such cooperation networks. Their local suppliers tend to concentrate regionally. They can form, together with local institutions of education, research, financial and other services, development agencies, the nucleus of cooperative clusters. This paper deals with the relationship between supplier networks and clusters. Two main issues are discussed in more detail: the interest of multinational companies in entering regional clusters and the spillover effects that may stem from their participation. After the discussion on the theoretical background, the paper introduces a relatively new analytical method: “cluster mapping” - a method that can spot regional hot spots of specific economic activities with cluster building potential. Experience with the method was gathered in the US and in the European Union. After the discussion on the existing empirical evidence, the authors introduce their own cluster mapping results, which they obtained by using a refined version of the original methodology.
Cluster-based global firms' use of local capabilities

DEFF Research Database (Denmark)

Andersen, Poul Houman; Bøllingtoft, Anne

2011-01-01

Purpose – Despite growing interest in clusters role for the global competitiveness of firms, there has been little research into how globalization affects cluster-based firms’ (CBFs) use of local knowledge resources and the combination of local and global knowledge used. Using the cluster......’s knowledge base as a mediating variable, the purpose of this paper is to examine how globalization affected the studied firms’ use of local cluster-based knowledge, integration of local and global knowledge, and networking capabilities. Design/methodology/approach – Qualitative case studies of nine firms...... in three clusters strongly affected by increasing global division of labour. Findings – The paper suggests that globalization has affected how firms use local resources and combine local and global knowledge. Unexpectedly, clustered firms with explicit procedures and established global fora for exchanging...
Application of unsupervised learning methods in high energy physics

Energy Technology Data Exchange (ETDEWEB)

Koevesarki, Peter; Nuncio Quiroz, Adriana Elizabeth; Brock, Ian C. [Physikalisches Institut, Universitaet Bonn, Bonn (Germany)

2011-07-01

High energy physics is a home for a variety of multivariate techniques, mainly due to the fundamentally probabilistic behaviour of nature. These methods generally require training based on some theory, in order to discriminate a known signal from a background. Nevertheless, new physics can show itself in ways that previously no one thought about, and in these cases conventional methods give little or no help. A possible way to discriminate between known processes (like vector bosons or top-quark production) or look for new physics is using unsupervised machine learning to extract the features of the data. A technique was developed, based on the combination of neural networks and the method of principal curves, to find a parametrisation of the non-linear correlations of the data. The feasibility of the method is shown on ATLAS data.
Improving Layman Readability of Clinical Narratives with Unsupervised Synonym Replacement.

Science.gov (United States)

Moen, Hans; Peltonen, Laura-Maria; Koivumäki, Mikko; Suhonen, Henry; Salakoski, Tapio; Ginter, Filip; Salanterä, Sanna

2018-01-01

We report on the development and evaluation of a prototype tool aimed to assist laymen/patients in understanding the content of clinical narratives. The tool relies largely on unsupervised machine learning applied to two large corpora of unlabeled text - a clinical corpus and a general domain corpus. A joint semantic word-space model is created for the purpose of extracting easier to understand alternatives for words considered difficult to understand by laymen. Two domain experts evaluate the tool and inter-rater agreement is calculated. When having the tool suggest ten alternatives to each difficult word, it suggests acceptable lay words for 55.51% of them. This and future manual evaluation will serve to further improve performance, where also supervised machine learning will be used.
Solving Large Clustering Problems with Meta-Heuristic Search

DEFF Research Database (Denmark)

Turkensteen, Marcel; Andersen, Kim Allan; Bang-Jensen, Jørgen

In Clustering Problems, groups of similar subjects are to be retrieved from data sets. In this paper, Clustering Problems with the frequently used Minimum Sum-of-Squares Criterion are solved using meta-heuristic search. Tabu search has proved to be a successful methodology for solving optimization...... problems, but applications to large clustering problems are rare. The simulated annealing heuristic has mainly been applied to relatively small instances. In this paper, we implement tabu search and simulated annealing approaches and compare them to the commonly used k-means approach. We find that the meta-heuristic...
Unsupervised Feature Selection for Interval Ordered Information Systems%区间序信息系统的无监督特征选择

Institute of Scientific and Technical Information of China (English)

闫岳君; 代建华

2017-01-01

目前已有很多针对单值信息系统的无监督特征选择方法,但针对区间值信息系统的无监督特征选择方法却很少.针对区间序信息系统,文中提出模糊优势关系,并基于此关系扩展模糊排序信息熵和模糊排序互信息,用于评价特征的重要性.再结合一种综合考虑信息量和冗余度的无监督最大信息最小冗余(UmIMR)准则,构造无监督特征选择方法.最后通过实验证明文中方法的有效性.%There are a number of unsupervised feature selection methods proposed for single-valued information systems, but little research focuses on unsupervised feature selection of interval-valued information systems. In this paper, a fuzzy dominance relation is proposed for interval ordered information systems. Then, fuzzy rank information entropy and fuzzy rank mutual information are extended to evaluate the importance of features. Consequently, an unsupervised feature selection method is designed based on an unsupervised maximum information and minimum redundancy ( UmImR ) criterion. In the UmImR criterion, the amount of information and redundancy are taken into account. Experimental results demonstrate the effectiveness of the proposed method.
Simultaneous field-aligned currents at Swarm and Cluster satellites

DEFF Research Database (Denmark)

Dunlop, M. W.; Yang, J. Y.; Yang, Y. Y.

2015-01-01

altitude) orbits using a particular Swarm and Cluster conjunction. The Cluster signatures are interpreted and ordered through joint mapping of the ground/magnetospheric footprints and estimation of the auroral zone boundaries (taken as indication of the boundaries of Region 1 and Region 2 currents). We...... find clear evidence of both small-scale and large-scale FACs and clear matching of the behavior and structure of the large-scale currents at both Cluster and Swarm. The methodology is made possible through the joint operations of Cluster and Swarm, which contain, in the first several months of Swarm...
Using Cluster Analysis for Data Mining in Educational Technology Research

Science.gov (United States)

Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

2012-01-01

Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…
Unsupervised Fault Diagnosis of a Gear Transmission Chain Using a Deep Belief Network

Directory of Open Access Journals (Sweden)

Jun He

2017-07-01

Full Text Available Artificial intelligence (AI techniques, which can effectively analyze massive amounts of fault data and automatically provide accurate diagnosis results, have been widely applied to fault diagnosis of rotating machinery. Conventional AI methods are applied using features selected by a human operator, which are manually extracted based on diagnostic techniques and field expertise. However, developing robust features for each diagnostic purpose is often labour-intensive and time-consuming, and the features extracted for one specific task may be unsuitable for others. In this paper, a novel AI method based on a deep belief network (DBN is proposed for the unsupervised fault diagnosis of a gear transmission chain, and the genetic algorithm is used to optimize the structural parameters of the network. Compared to the conventional AI methods, the proposed method can adaptively exploit robust features related to the faults by unsupervised feature learning, thus requires less prior knowledge about signal processing techniques and diagnostic expertise. Besides, it is more powerful at modelling complex structured data. The effectiveness of the proposed method is validated using datasets from rolling bearings and gearbox. To show the superiority of the proposed method, its performance is compared with two well-known classifiers, i.e., back propagation neural network (BPNN and support vector machine (SVM. The fault classification accuracies are 99.26% for rolling bearings and 100% for gearbox when using the proposed method, which are much higher than that of the other two methods.
Unsupervised Fault Diagnosis of a Gear Transmission Chain Using a Deep Belief Network.

Science.gov (United States)

He, Jun; Yang, Shixi; Gan, Chunbiao

2017-07-04

Artificial intelligence (AI) techniques, which can effectively analyze massive amounts of fault data and automatically provide accurate diagnosis results, have been widely applied to fault diagnosis of rotating machinery. Conventional AI methods are applied using features selected by a human operator, which are manually extracted based on diagnostic techniques and field expertise. However, developing robust features for each diagnostic purpose is often labour-intensive and time-consuming, and the features extracted for one specific task may be unsuitable for others. In this paper, a novel AI method based on a deep belief network (DBN) is proposed for the unsupervised fault diagnosis of a gear transmission chain, and the genetic algorithm is used to optimize the structural parameters of the network. Compared to the conventional AI methods, the proposed method can adaptively exploit robust features related to the faults by unsupervised feature learning, thus requires less prior knowledge about signal processing techniques and diagnostic expertise. Besides, it is more powerful at modelling complex structured data. The effectiveness of the proposed method is validated using datasets from rolling bearings and gearbox. To show the superiority of the proposed method, its performance is compared with two well-known classifiers, i.e., back propagation neural network (BPNN) and support vector machine (SVM). The fault classification accuracies are 99.26% for rolling bearings and 100% for gearbox when using the proposed method, which are much higher than that of the other two methods.
Unsupervised Retinal Vessel Segmentation Using Combined Filters.

Directory of Open Access Journals (Sweden)

Wendeson S Oliveira

Full Text Available Image segmentation of retinal blood vessels is a process that can help to predict and diagnose cardiovascular related diseases, such as hypertension and diabetes, which are known to affect the retinal blood vessels' appearance. This work proposes an unsupervised method for the segmentation of retinal vessels images using a combined matched filter, Frangi's filter and Gabor Wavelet filter to enhance the images. The combination of these three filters in order to improve the segmentation is the main motivation of this work. We investigate two approaches to perform the filter combination: weighted mean and median ranking. Segmentation methods are tested after the vessel enhancement. Enhanced images with median ranking are segmented using a simple threshold criterion. Two segmentation procedures are applied when considering enhanced retinal images using the weighted mean approach. The first method is based on deformable models and the second uses fuzzy C-means for the image segmentation. The procedure is evaluated using two public image databases, Drive and Stare. The experimental results demonstrate that the proposed methods perform well for vessel segmentation in comparison with state-of-the-art methods.
Vertebra identification using template matching modelmp and K-means clustering.

Science.gov (United States)

Larhmam, Mohamed Amine; Benjelloun, Mohammed; Mahmoudi, Saïd

2014-03-01

Accurate vertebra detection and segmentation are essential steps for automating the diagnosis of spinal disorders. This study is dedicated to vertebra alignment measurement, the first step in a computer-aided diagnosis tool for cervical spine trauma. Automated vertebral segment alignment determination is a challenging task due to low contrast imaging and noise. A software tool for segmenting vertebrae and detecting subluxations has clinical significance. A robust method was developed and tested for cervical vertebra identification and segmentation that extracts parameters used for vertebra alignment measurement. Our contribution involves a novel combination of a template matching method and an unsupervised clustering algorithm. In this method, we build a geometric vertebra mean model. To achieve vertebra detection, manual selection of the region of interest is performed initially on the input image. Subsequent preprocessing is done to enhance image contrast and detect edges. Candidate vertebra localization is then carried out by using a modified generalized Hough transform (GHT). Next, an adapted cost function is used to compute local voted centers and filter boundary data. Thereafter, a K-means clustering algorithm is applied to obtain clusters distribution corresponding to the targeted vertebrae. These clusters are combined with the vote parameters to detect vertebra centers. Rigid segmentation is then carried out by using GHT parameters. Finally, cervical spine curves are extracted to measure vertebra alignment. The proposed approach was successfully applied to a set of 66 high-resolution X-ray images. Robust detection was achieved in 97.5 % of the 330 tested cervical vertebrae. An automated vertebral identification method was developed and demonstrated to be robust to noise and occlusion. This work presents a first step toward an automated computer-aided diagnosis system for cervical spine trauma detection.
Unsupervised Machine Learning for Developing Personalised Behaviour Models Using Activity Data.

Science.gov (United States)

Fiorini, Laura; Cavallo, Filippo; Dario, Paolo; Eavis, Alexandra; Caleb-Solly, Praminda

2017-05-04

The goal of this study is to address two major issues that undermine the large scale deployment of smart home sensing solutions in people's homes. These include the costs associated with having to install and maintain a large number of sensors, and the pragmatics of annotating numerous sensor data streams for activity classification. Our aim was therefore to propose a method to describe individual users' behavioural patterns starting from unannotated data analysis of a minimal number of sensors and a "blind" approach for activity recognition. The methodology included processing and analysing sensor data from 17 older adults living in community-based housing to extract activity information at different times of the day. The findings illustrate that 55 days of sensor data from a sensor configuration comprising three sensors, and extracting appropriate features including a "busyness" measure, are adequate to build robust models which can be used for clustering individuals based on their behaviour patterns with a high degree of accuracy (>85%). The obtained clusters can be used to describe individual behaviour over different times of the day. This approach suggests a scalable solution to support optimising the personalisation of care by utilising low-cost sensing and analysis. This approach could be used to track a person's needs over time and fine-tune their care plan on an ongoing basis in a cost-effective manner.
Unsupervised Machine Learning for Developing Personalised Behaviour Models Using Activity Data

Directory of Open Access Journals (Sweden)

Laura Fiorini

2017-05-01

Full Text Available The goal of this study is to address two major issues that undermine the large scale deployment of smart home sensing solutions in people’s homes. These include the costs associated with having to install and maintain a large number of sensors, and the pragmatics of annotating numerous sensor data streams for activity classification. Our aim was therefore to propose a method to describe individual users’ behavioural patterns starting from unannotated data analysis of a minimal number of sensors and a ”blind” approach for activity recognition. The methodology included processing and analysing sensor data from 17 older adults living in community-based housing to extract activity information at different times of the day. The findings illustrate that 55 days of sensor data from a sensor configuration comprising three sensors, and extracting appropriate features including a “busyness” measure, are adequate to build robust models which can be used for clustering individuals based on their behaviour patterns with a high degree of accuracy (>85%. The obtained clusters can be used to describe individual behaviour over different times of the day. This approach suggests a scalable solution to support optimising the personalisation of care by utilising low-cost sensing and analysis. This approach could be used to track a person’s needs over time and fine-tune their care plan on an ongoing basis in a cost-effective manner.
Clustering by Partitioning around Medoids using Distance-Based ...

African Journals Online (AJOL)

OLUWASOGO

outperforms both the Euclidean and Manhattan distance metrics in certain situations. KEYWORDS: PAM ... version of a dataset, compare the quality of clusters obtained from the Euclidean .... B. Theoretical Framework and Methodology.
Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition

Directory of Open Access Journals (Sweden)

Suzuki Motoyuki

2009-01-01

Full Text Available Abstract We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents. The proposed method chooses keywords from the preliminary recognition result and retrieves Web documents using the chosen keywords. A problem is that the selected keywords tend to contain misrecognized words. The proposed method introduces two new ideas for avoiding the effects of keywords derived from misrecognized words. The first idea is to compose multiple queries from selected keyword candidates so that the misrecognized words and correct words do not fall into one query. The second idea is that the number of Web documents downloaded for each query is determined according to the "query relevance." Combining these two ideas, we can alleviate bad effect of misrecognized keywords by decreasing the number of downloaded Web documents from queries that contain misrecognized keywords. Finally, we examine a method of determining the number of iterative adaptations based on the recognition likelihood. Experiments have shown that the proposed stopping criterion can determine almost the optimum number of iterations. In the final experiment, the word accuracy without adaptation (55.29% was improved to 60.38%, which was 1.13 point better than the result of the conventional unsupervised adaptation method (59.25%.
Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition

Directory of Open Access Journals (Sweden)

Akinori Ito

2009-01-01

Full Text Available We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents. The proposed method chooses keywords from the preliminary recognition result and retrieves Web documents using the chosen keywords. A problem is that the selected keywords tend to contain misrecognized words. The proposed method introduces two new ideas for avoiding the effects of keywords derived from misrecognized words. The first idea is to compose multiple queries from selected keyword candidates so that the misrecognized words and correct words do not fall into one query. The second idea is that the number of Web documents downloaded for each query is determined according to the “query relevance.” Combining these two ideas, we can alleviate bad effect of misrecognized keywords by decreasing the number of downloaded Web documents from queries that contain misrecognized keywords. Finally, we examine a method of determining the number of iterative adaptations based on the recognition likelihood. Experiments have shown that the proposed stopping criterion can determine almost the optimum number of iterations. In the final experiment, the word accuracy without adaptation (55.29% was improved to 60.38%, which was 1.13 point better than the result of the conventional unsupervised adaptation method (59.25%.
CHISSL: A Human-Machine Collaboration Space for Unsupervised Learning

Energy Technology Data Exchange (ETDEWEB)

Arendt, Dustin L.; Komurlu, Caner; Blaha, Leslie M.

2017-07-14

We developed CHISSL, a human-machine interface that utilizes supervised machine learning in an unsupervised context to help the user group unlabeled instances by her own mental model. The user primarily interacts via correction (moving a misplaced instance into its correct group) or confirmation (accepting that an instance is placed in its correct group). Concurrent with the user's interactions, CHISSL trains a classification model guided by the user's grouping of the data. It then predicts the group of unlabeled instances and arranges some of these alongside the instances manually organized by the user. We hypothesize that this mode of human and machine collaboration is more effective than Active Learning, wherein the machine decides for itself which instances should be labeled by the user. We found supporting evidence for this hypothesis in a pilot study where we applied CHISSL to organize a collection of handwritten digits.
Unsupervised Feature Learning for Heart Sounds Classification Using Autoencoder

Science.gov (United States)

Hu, Wei; Lv, Jiancheng; Liu, Dongbo; Chen, Yao

2018-04-01

Cardiovascular disease seriously threatens the health of many people. It is usually diagnosed during cardiac auscultation, which is a fast and efficient method of cardiovascular disease diagnosis. In recent years, deep learning approach using unsupervised learning has made significant breakthroughs in many fields. However, to our knowledge, deep learning has not yet been used for heart sound classification. In this paper, we first use the average Shannon energy to extract the envelope of the heart sounds, then find the highest point of S1 to extract the cardiac cycle. We convert the time-domain signals of the cardiac cycle into spectrograms and apply principal component analysis whitening to reduce the dimensionality of the spectrogram. Finally, we apply a two-layer autoencoder to extract the features of the spectrogram. The experimental results demonstrate that the features from the autoencoder are suitable for heart sound classification.
Object-Based Change Detection in Urban Areas: The Effects of Segmentation Strategy, Scale, and Feature Space on Unsupervised Methods

Directory of Open Access Journals (Sweden)

Lei Ma

2016-09-01

Full Text Available Object-based change detection (OBCD has recently been receiving increasing attention as a result of rapid improvements in the resolution of remote sensing data. However, some OBCD issues relating to the segmentation of high-resolution images remain to be explored. For example, segmentation units derived using different segmentation strategies, segmentation scales, feature space, and change detection methods have rarely been assessed. In this study, we have tested four common unsupervised change detection methods using different segmentation strategies and a series of segmentation scale parameters on two WorldView-2 images of urban areas. We have also evaluated the effect of adding extra textural and Normalized Difference Vegetation Index (NDVI information instead of using only spectral information. Our results indicated that change detection methods performed better at a medium scale than at a fine scale where close to the pixel size. Multivariate Alteration Detection (MAD always outperformed the other methods tested, at the same confidence level. The overall accuracy appeared to benefit from using a two-date segmentation strategy rather than single-date segmentation. Adding textural and NDVI information appeared to reduce detection accuracy, but the magnitude of this reduction was not consistent across the different unsupervised methods and segmentation strategies. We conclude that a two-date segmentation strategy is useful for change detection in high-resolution imagery, but that the optimization of thresholds is critical for unsupervised change detection methods. Advanced methods need be explored that can take advantage of additional textural or other parameters.

Perceptual approach for unsupervised digital color restoration of cinematographic archives

Science.gov (United States)

Chambah, Majed; Rizzi, Alessandro; Gatta, Carlo; Besserer, Bernard; Marini, Daniele

2003-01-01

The cinematographic archives represent an important part of our collective memory. We present in this paper some advances in automating the color fading restoration process, especially with regard to the automatic color correction technique. The proposed color correction method is based on the ACE model, an unsupervised color equalization algorithm based on a perceptual approach and inspired by some adaptation mechanisms of the human visual system, in particular lightness constancy and color constancy. There are some advantages in a perceptual approach: mainly its robustness and its local filtering properties, that lead to more effective results. The resulting technique, is not just an application of ACE on movie images, but an enhancement of ACE principles to meet the requirements in the digital film restoration field. The presented preliminary results are satisfying and promising.
Use of advanced cluster analysis to characterize fish consumption patterns and methylmercury dietary exposures from fish and other sea foods among pregnant women

DEFF Research Database (Denmark)

Pouzaud, Francois; Ibbou, Assia; Blanchemanche, Sandrine

2010-01-01

Hg) exposure in a sample of 161 French pregnant women consuming sea food, including fish, molluscs and crustaceans, and to explore the use of unsupervised statistical learning as an advanced type of cluster analysis to identify patterns of fish consumption that could predict exposure to MeHg and the coverage...... of the Recommended Daily Allowance for n-3 polyunsaturated fatty acid (PUFA). The proportion of about 5% of pregnant women exposed at levels higher than the tolerable weekly intake for MeHg is similar to that observed among women of childbearing age in earlier French studies. At the same time, only about 50...
Sparsity enabled cluster reduced-order models for control

Science.gov (United States)

Kaiser, Eurika; Morzyński, Marek; Daviller, Guillaume; Kutz, J. Nathan; Brunton, Bingni W.; Brunton, Steven L.

2018-01-01

Characterizing and controlling nonlinear, multi-scale phenomena are central goals in science and engineering. Cluster-based reduced-order modeling (CROM) was introduced to exploit the underlying low-dimensional dynamics of complex systems. CROM builds a data-driven discretization of the Perron-Frobenius operator, resulting in a probabilistic model for ensembles of trajectories. A key advantage of CROM is that it embeds nonlinear dynamics in a linear framework, which enables the application of standard linear techniques to the nonlinear system. CROM is typically computed on high-dimensional data; however, access to and computations on this full-state data limit the online implementation of CROM for prediction and control. Here, we address this key challenge by identifying a small subset of critical measurements to learn an efficient CROM, referred to as sparsity-enabled CROM. In particular, we leverage compressive measurements to faithfully embed the cluster geometry and preserve the probabilistic dynamics. Further, we show how to identify fewer optimized sensor locations tailored to a specific problem that outperform random measurements. Both of these sparsity-enabled sensing strategies significantly reduce the burden of data acquisition and processing for low-latency in-time estimation and control. We illustrate this unsupervised learning approach on three different high-dimensional nonlinear dynamical systems from fluids with increasing complexity, with one application in flow control. Sparsity-enabled CROM is a critical facilitator for real-time implementation on high-dimensional systems where full-state information may be inaccessible.
CLUSTER DEVELOPMENT OF ECONOMY OF REGION: THEORETICAL OPPORTUNITIES AND PRACTICAL EXPERIENCE

Directory of Open Access Journals (Sweden)

O.A. Romanova

2007-12-01

Full Text Available In clause theoretical approaches to formation industrial cluster кластеров in regions of the Russian Federation are considered. Оn the basis of which the methodological scheme of the project of cluster creation is offered. On an example hi-tech cluster “Titanic valley”, created in Sverdlovsk area, basic elements of its formation reveal: a substantiation of use cluster forms of the organization of business, an estimation of preconditions of creation, the description of the cluster purposes, problems, structures; mechanism of management and stages of realization of the project of cluster creation, measures of the state support.
The reflection of hierarchical cluster analysis of co-occurrence matrices in SPSS

NARCIS (Netherlands)

Zhou, Q.; Leng, F.; Leydesdorff, L.

2015-01-01

Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare
Unsupervised learning of binary vectors: A Gaussian scenario

International Nuclear Information System (INIS)

Copelli, Mauro; Van den Broeck, Christian

2000-01-01

We study a model of unsupervised learning where the real-valued data vectors are isotropically distributed, except for a single symmetry-breaking binary direction B(set-membership sign){-1,+1} N , onto which the projections have a Gaussian distribution. We show that a candidate vector J undergoing Gibbs learning in this discrete space, approaches the perfect match J=B exponentially. In addition to the second-order ''retarded learning'' phase transition for unbiased distributions, we show that first-order transitions can also occur. Extending the known result that the center of mass of the Gibbs ensemble has Bayes-optimal performance, we show that taking the sign of the components of this vector (clipping) leads to the vector with optimal performance in the binary space. These upper bounds are shown generally not to be saturated with the technique of transforming the components of a special continuous vector, except in asymptotic limits and in a special linear case. Simulations are presented which are in excellent agreement with the theoretical results. (c) 2000 The American Physical Society
An unsupervised method for summarizing egocentric sport videos

Science.gov (United States)

Habibi Aghdam, Hamed; Jahani Heravi, Elnaz; Puig, Domenec

2015-12-01

People are getting more interested to record their sport activities using head-worn or hand-held cameras. This type of videos which is called egocentric sport videos has different motion and appearance patterns compared with life-logging videos. While a life-logging video can be defined in terms of well-defined human-object interactions, notwithstanding, it is not trivial to describe egocentric sport videos using well-defined activities. For this reason, summarizing egocentric sport videos based on human-object interaction might fail to produce meaningful results. In this paper, we propose an unsupervised method for summarizing egocentric videos by identifying the key-frames of the video. Our method utilizes both appearance and motion information and it automatically finds the number of the key-frames. Our blind user study on the new dataset collected from YouTube shows that in 93:5% cases, the users choose the proposed method as their first video summary choice. In addition, our method is within the top 2 choices of the users in 99% of studies.
Unsupervised Event Characterization and Detection in Multichannel Signals: An EEG application

Directory of Open Access Journals (Sweden)

Angel Mur

2016-04-01

Full Text Available In this paper, we propose a new unsupervised method to automatically characterize and detect events in multichannel signals. This method is used to identify artifacts in electroencephalogram (EEG recordings of brain activity. The proposed algorithm has been evaluated and compared with a supervised method. To this end an example of the performance of the algorithm to detect artifacts is shown. The results show that although both methods obtain similar classification, the proposed method allows detecting events without training data and can also be applied in signals whose events are unknown a priori. Furthermore, the proposed method provides an optimal window whereby an optimal detection and characterization of events is found. The detection of events can be applied in real-time.
JUSTIFICATION OF THE PRIORITIES OF THE CLUSTERING OF AGRO-INDUSTRIES OF THE VORONEZH REGION

Directory of Open Access Journals (Sweden)

Y. A. Salikov

2014-01-01

Full Text Available Currently, in many regions of the Russian Federation initiated a large-scale work on the development and implementation of cluster policy in accordance with Federal and regional socio-economic development until 2020. The analysis of the status of implementation adopted in 2012, the concept of cluster policy of the Voronezh region showed that the complex is made on the date of the event is mainly responsible for the informational and infrastructural nature. However, from the total number of promising clusters by 2014, formed in fact, only two-thirds, while among the uncreated shall apply the cluster processing of agricultural products having a high rating prospects. Given that the formation of the agro-industrial cluster corresponds to the requirements and conditions in this study developed a new methodological approach, which carried out the rationale for the priority of the formation of the meat cluster in the agro-industrial complex of the Voronezh region. The basis of this methodological approach is the algorithm for the identification of areas of clustering, developed by the authors using statistics Forsythe, represents an efficient tool for the formation of priorities to achieve a qualitatively new results in the field of economy, science and technology. The proposed algorithm includes the serial combination of the following methodological stages: the formation of the object of research, identifying sources of reliable information on the basis of expert assessments, identify areas clustering of industries (including analysis legal framework the study of statistical data on the level of localization of industries and analysis of the practice of implementation of the cluster policy regions-analogues, identification of areas for additional clustering of industries and their mapping, and de-termination of the priority directions of the additional clustering of industries by ranking. The results of the study, carried out in accordance with this
Experience with a clustered parallel reduction machine

NARCIS (Netherlands)

Beemster, M.; Hartel, Pieter H.; Hertzberger, L.O.; Hofman, R.F.H.; Langendoen, K.G.; Li, L.L.; Milikowski, R.; Vree, W.G.; Barendregt, H.P.; Mulder, J.C.

A clustered architecture has been designed to exploit divide and conquer parallelism in functional programs. The programming methodology developed for the machine is based on explicit annotations and program transformations. It has been successfully applied to a number of algorithms resulting in a
Cluster models, factors and characteristics for the competitive advantage of Lithuanian Maritime sector

OpenAIRE

Viederytė, Rasa; Didžiokas, Rimantas

2014-01-01

Paper analyses several cluster models on the basis of competitiveness: Nine-factor model, Double diamond model, Funnel model of cluster determinants, Destination Competitiveness and sustainability models, which are related to Porter’s Diamond model and concentrate to the classical one - adopt M. Porter’s Diamond model methodology to the evaluation of Lithuanian Maritime sector’s clustering on the basis of competitiveness. Despite the advances in cluster research, this model remains a complex ...
Fuzzy Clustering based Methodology for Multidimensional Data Analysis in Computational Forensic Domain

OpenAIRE

Kilian Stoffel; Paul Cotofrei; Dong Han

2012-01-01

As interdisciplinary domain requiring advanced and innovative methodologies the computational forensics domain is characterized by data being simultaneously large scaled and uncertain multidimensional and approximate. Forensic domain experts trained to discover hidden pattern from crime data are limited in their analysis without the assistance of a computational intelligence approach. In this paper a methodology and an automatic procedure based on fuzzy set theory and designed to infer precis...
Clustering Categories in Support Vector Machines

DEFF Research Database (Denmark)

Carrizosa, Emilio; Nogales-Gómez, Amaya; Morales, Dolores Romero

2017-01-01

The support vector machine (SVM) is a state-of-the-art method in supervised classification. In this paper the Cluster Support Vector Machine (CLSVM) methodology is proposed with the aim to increase the sparsity of the SVM classifier in the presence of categorical features, leading to a gain in in...
Will an unsupervised self-testing strategy for HIV work in health care workers of South Africa? A cross sectional pilot feasibility study.

Science.gov (United States)

Pant Pai, Nitika; Behlim, Tarannum; Abrahams, Lameze; Vadnais, Caroline; Shivkumar, Sushmita; Pillay, Sabrina; Binder, Anke; Deli-Houssein, Roni; Engel, Nora; Joseph, Lawrence; Dheda, Keertan

2013-01-01

In South Africa, stigma, discrimination, social visibility and fear of loss of confidentiality impede health facility-based HIV testing. With 50% of adults having ever tested for HIV in their lifetime, private, alternative testing options are urgently needed. Non-invasive, oral self-tests offer a potential for a confidential, unsupervised HIV self-testing option, but global data are limited. A pilot cross-sectional study was conducted from January to June 2012 in health care workers based at the University of Cape Town, South Africa. An innovative, unsupervised, self-testing strategy was evaluated for feasibility; defined as completion of self-testing process (i.e., self test conduct, interpretation and linkage). An oral point-of-care HIV test, an Internet and paper-based self-test HIV applications, and mobile phones were synergized to create an unsupervised strategy. Self-tests were additionally confirmed with rapid tests on site and laboratory tests. Of 270 health care workers (18 years and above, of unknown HIV status approached), 251 consented for participation. Overall, about 91% participants rated a positive experience with the strategy. Of 251 participants, 126 evaluated the Internet and 125 the paper-based application successfully; completion rate of 99.2%. All sero-positives were linked to treatment (completion rate:100% (95% CI, 66.0-100). About half of sero-negatives were offered counselling on mobile phones; completion rate: 44.6% (95% CI, 38.0-51.0). A majority of participants (78.1%) were females, aged 18-24 years (61.4%). Nine participants were found sero-positive after confirmatory tests (prevalence 3.6% 95% CI, 1.8-6.9). Six of nine positive self-tests were accurately interpreted; sensitivity: 66.7% (95% CI, 30.9-91.0); specificity:100% (95% CI, 98.1-100). Our unsupervised self-testing strategy was feasible to operationalize in health care workers in South Africa. Linkages were successfully operationalized with mobile phones in all sero
Source Apportionment and Risk Assessment of Emerging Contaminants: An Approach of Pharmaco-Signature in Water Systems

Science.gov (United States)

Jiang, Jheng Jie; Lee, Chon Lin; Fang, Meng Der; Boyd, Kenneth G.; Gibb, Stuart W.

2015-01-01

This paper presents a methodology based on multivariate data analysis for characterizing potential source contributions of emerging contaminants (ECs) detected in 26 river water samples across multi-scape regions during dry and wet seasons. Based on this methodology, we unveil an approach toward potential source contributions of ECs, a concept we refer to as the “Pharmaco-signature.” Exploratory analysis of data points has been carried out by unsupervised pattern recognition (hierarchical cluster analysis, HCA) and receptor model (principal component analysis-multiple linear regression, PCA-MLR) in an attempt to demonstrate significant source contributions of ECs in different land-use zone. Robust cluster solutions grouped the database according to different EC profiles. PCA-MLR identified that 58.9% of the mean summed ECs were contributed by domestic impact, 9.7% by antibiotics application, and 31.4% by drug abuse. Diclofenac, ibuprofen, codeine, ampicillin, tetracycline, and erythromycin-H2O have significant pollution risk quotients (RQ>1), indicating potentially high risk to aquatic organisms in Taiwan. PMID:25874375
An unsupervised technique for optimal feature selection in attribute profiles for spectral-spatial classification of hyperspectral images

Science.gov (United States)

Bhardwaj, Kaushal; Patra, Swarnajyoti

2018-04-01

Inclusion of spatial information along with spectral features play a significant role in classification of remote sensing images. Attribute profiles have already proved their ability to represent spatial information. In order to incorporate proper spatial information, multiple attributes are required and for each attribute large profiles need to be constructed by varying the filter parameter values within a wide range. Thus, the constructed profiles that represent spectral-spatial information of an hyperspectral image have huge dimension which leads to Hughes phenomenon and increases computational burden. To mitigate these problems, this work presents an unsupervised feature selection technique that selects a subset of filtered image from the constructed high dimensional multi-attribute profile which are sufficiently informative to discriminate well among classes. In this regard the proposed technique exploits genetic algorithms (GAs). The fitness function of GAs are defined in an unsupervised way with the help of mutual information. The effectiveness of the proposed technique is assessed using one-against-all support vector machine classifier. The experiments conducted on three hyperspectral data sets show the robustness of the proposed method in terms of computation time and classification accuracy.
Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.

Science.gov (United States)

Yang, Bin; Peng, Yu; Leung, Henry Chi-Ming; Yiu, Siu-Ming; Chen, Jing-Chi; Chin, Francis Yuk-Lun

2010-04-16

With the rapid development of genome sequencing techniques, traditional research methods based on the isolation and cultivation of microorganisms are being gradually replaced by metagenomics, which is also known as environmental genomics. The first step, which is still a major bottleneck, of metagenomics is the taxonomic characterization of DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as "binning". Existing binning methods are based on supervised or semi-supervised approaches which rely heavily on reference genomes of known microorganisms and phylogenetic marker genes. Due to the limited availability of reference genomes and the bias and instability of marker genes, existing binning methods may not be applicable in many cases. In this paper, we present an unsupervised binning method based on the distribution of a carefully selected set of l-mers (substrings of length l in DNA fragments). From our experiments, we show that our method can accurately bin DNA fragments with various lengths and relative species abundance ratios without using any reference and training datasets. Another feature of our method is its error robustness. The binning accuracy decreases by less than 1% when the sequencing error rate increases from 0% to 5%. Note that the typical sequencing error rate of existing commercial sequencing platforms is less than 2%. We provide a new and effective tool to solve the metagenome binning problem without using any reference datasets or markers information of any known reference genomes (species). The source code of our software tool, the reference genomes of the species for generating the test datasets and the corresponding test datasets are available at http://i.cs.hku.hk/~alse/MetaCluster/.
COMPARISON AND EVALUATION OF CLUSTER BASED IMAGE SEGMENTATION TECHNIQUES

OpenAIRE

Hetangi D. Mehta*, Daxa Vekariya, Pratixa Badelia

2017-01-01

Image segmentation is the classification of an image into different groups. Numerous algorithms using different approaches have been proposed for image segmentation. A major challenge in segmentation evaluation comes from the fundamental conflict between generality and objectivity. A review is done on different types of clustering methods used for image segmentation. Also a methodology is proposed to classify and quantify different clustering algorithms based on their consistency in different...
Spatial Cluster Detection for Repeatedly Measured Outcomes while Accounting for Residential History

OpenAIRE

Cook, Andrea J.; Gold, Diane R.; Li, Yi

2009-01-01

Spatial cluster detection has become an important methodology in quantifying the effect of hazardous exposures. Previous methods have focused on cross-sectional outcomes that are binary or continuous. There are virtually no spatial cluster detection methods proposed for longitudinal outcomes. This paper proposes a new spatial cluster detection method for repeated outcomes using cumulative geographic residuals. A major advantage of this method is its ability to readily incorporate information ...
Advanced defect detection algorithm using clustering in ultrasonic NDE

Science.gov (United States)

Gongzhang, Rui; Gachagan, Anthony

2016-02-01

A range of materials used in industry exhibit scattering properties which limits ultrasonic NDE. Many algorithms have been proposed to enhance defect detection ability, such as the well-known Split Spectrum Processing (SSP) technique. Scattering noise usually cannot be fully removed and the remaining noise can be easily confused with real feature signals, hence becoming artefacts during the image interpretation stage. This paper presents an advanced algorithm to further reduce the influence of artefacts remaining in A-scan data after processing using a conventional defect detection algorithm. The raw A-scan data can be acquired from either traditional single transducer or phased array configurations. The proposed algorithm uses the concept of unsupervised machine learning to cluster segmental defect signals from pre-processed A-scans into different classes. The distinction and similarity between each class and the ensemble of randomly selected noise segments can be observed by applying a classification algorithm. Each class will then be labelled as `legitimate reflector' or `artefacts' based on this observation and the expected probability of defection (PoD) and probability of false alarm (PFA) determined. To facilitate data collection and validate the proposed algorithm, a 5MHz linear array transducer is used to collect A-scans from both austenitic steel and Inconel samples. Each pulse-echo A-scan is pre-processed using SSP and the subsequent application of the proposed clustering algorithm has provided an additional reduction to PFA while maintaining PoD for both samples compared with SSP results alone.

CUSTOMER SEGMENTATION DENGAN METODE SELF ORGANIZING MAP (STUDI KASUS: UD. FENNY

Directory of Open Access Journals (Sweden)

A. A. Gde Bagus Ariana

2012-11-01

Full Text Available Saat ini persaingan bisnis pada perusahaan retail tidak hanya dengan menggunakan perangkat sistem informasi namun sudah dilengkapi dengan sistem pendukung keputusan. Salah satu metode sistem pendukung keputusan yang digunakan adalah data mining. Data mining digunakan untuk menemukan pola-pola yang tersembunyi pada database. UD. Fenny sebagai perusahaan retail ingin menemukan pola segmentasi pelanggan dengan menggunakan model RFM (Recency, Frequency, Monetary. Metode data mining untuk melakukan proses segmentasi adalah metode clustering. Clustering merupakan proses penggugusan data menjadi kelompok-kelompok yang memiliki kemiripan secara tidak terawasi (unsupervised. Sebelum melakukan proses clustering, dilakukan proses persiapan data dengan membuat datawarehouse menggunakan skema bintang (star scema. Selanjutnya dilakukan proses clustering dengan menggunakan metode Self Organizing Map (SOM/Kohonen. Metode ini merupakan salah satu model jaringan saraf tiruan yang menggunakan metode unsupervised. Dari hasil percobaan metode SOM melakukan proses clustering dan menggambarkan hasil clustering pada SOM plot. Dengan melakukan proses clustering, pihak pengambil keputusan dapat memahami segmentasi customer dan melakukan upaya peningkatan pelayanan customer.
Clustering of near clusters versus cluster compactness

International Nuclear Information System (INIS)

Yu Gao; Yipeng Jing

1989-01-01

The clustering properties of near Zwicky clusters are studied by using the two-point angular correlation function. The angular correlation functions for compact and medium compact clusters, for open clusters, and for all near Zwicky clusters are estimated. The results show much stronger clustering for compact and medium compact clusters than for open clusters, and that open clusters have nearly the same clustering strength as galaxies. A detailed study of the compactness-dependence of correlation function strength is worth investigating. (author)
Combining cluster number counts and galaxy clustering

Energy Technology Data Exchange (ETDEWEB)

Lacasa, Fabien; Rosenfeld, Rogerio, E-mail: fabien@ift.unesp.br, E-mail: rosenfel@ift.unesp.br [ICTP South American Institute for Fundamental Research, Instituto de Física Teórica, Universidade Estadual Paulista, São Paulo (Brazil)

2016-08-01

The abundance of clusters and the clustering of galaxies are two of the important cosmological probes for current and future large scale surveys of galaxies, such as the Dark Energy Survey. In order to combine them one has to account for the fact that they are not independent quantities, since they probe the same density field. It is important to develop a good understanding of their correlation in order to extract parameter constraints. We present a detailed modelling of the joint covariance matrix between cluster number counts and the galaxy angular power spectrum. We employ the framework of the halo model complemented by a Halo Occupation Distribution model (HOD). We demonstrate the importance of accounting for non-Gaussianity to produce accurate covariance predictions. Indeed, we show that the non-Gaussian covariance becomes dominant at small scales, low redshifts or high cluster masses. We discuss in particular the case of the super-sample covariance (SSC), including the effects of galaxy shot-noise, halo second order bias and non-local bias. We demonstrate that the SSC obeys mathematical inequalities and positivity. Using the joint covariance matrix and a Fisher matrix methodology, we examine the prospects of combining these two probes to constrain cosmological and HOD parameters. We find that the combination indeed results in noticeably better constraints, with improvements of order 20% on cosmological parameters compared to the best single probe, and even greater improvement on HOD parameters, with reduction of error bars by a factor 1.4-4.8. This happens in particular because the cross-covariance introduces a synergy between the probes on small scales. We conclude that accounting for non-Gaussian effects is required for the joint analysis of these observables in galaxy surveys.
Cluster-cluster clustering

International Nuclear Information System (INIS)

Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C.S.; Yale Univ., New Haven, CT; California Univ., Santa Barbara; Cambridge Univ., England; Sussex Univ., Brighton, England)

1985-01-01

The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales. 30 references
Clustering Molecular Dynamics Trajectories for Optimizing Docking Experiments

Directory of Open Access Journals (Sweden)

Renata De Paris

2015-01-01

Full Text Available Molecular dynamics simulations of protein receptors have become an attractive tool for rational drug discovery. However, the high computational cost of employing molecular dynamics trajectories in virtual screening of large repositories threats the feasibility of this task. Computational intelligence techniques have been applied in this context, with the ultimate goal of reducing the overall computational cost so the task can become feasible. Particularly, clustering algorithms have been widely used as a means to reduce the dimensionality of molecular dynamics trajectories. In this paper, we develop a novel methodology for clustering entire trajectories using structural features from the substrate-binding cavity of the receptor in order to optimize docking experiments on a cloud-based environment. The resulting partition was selected based on three clustering validity criteria, and it was further validated by analyzing the interactions between 20 ligands and a fully flexible receptor (FFR model containing a 20 ns molecular dynamics simulation trajectory. Our proposed methodology shows that taking into account features of the substrate-binding cavity as input for the k-means algorithm is a promising technique for accurately selecting ensembles of representative structures tailored to a specific ligand.
CSR in Industrial Clusters

DEFF Research Database (Denmark)

Lund-Thomsen, Peter; Pillay, Renginee G.

2012-01-01

Purpose – The paper seeks to review the literature on CSR in industrial clusters in developing countries, identifying the main strengths, weaknesses, and gaps in this literature, pointing to future research directions and policy implications in the area of CSR and industrial cluster development....... Design/methodology/approach – A literature review is conducted of both academic and policy-oriented writings that contain the keywords “industrial clusters” and “developing countries” in combination with one or more of the following terms: corporate social responsibility, environmental management, labor...... standards, child labor, climate change, social upgrading, and environmental upgrading. The authors examine the key themes in this literature, identify the main gaps, and point to areas where future work in this area could usefully be undertaken. Feedback has been sought from some of the leading authors...
Effects of a Supervised versus an Unsupervised Combined Balance and Strength Training Program on Balance and Muscle Power in Healthy Older Adults: A Randomized Controlled Trial.

Science.gov (United States)

Lacroix, André; Kressig, Reto W; Muehlbauer, Thomas; Gschwind, Yves J; Pfenninger, Barbara; Bruegger, Othmar; Granacher, Urs

2016-01-01

Losses in lower extremity muscle strength/power, muscle mass and deficits in static and particularly dynamic balance due to aging are associated with impaired functional performance and an increased fall risk. It has been shown that the combination of balance and strength training (BST) mitigates these age-related deficits. However, it is unresolved whether supervised versus unsupervised BST is equally effective in improving muscle power and balance in older adults. This study examined the impact of a 12-week BST program followed by 12 weeks of detraining on measures of balance and muscle power in healthy older adults enrolled in supervised (SUP) or unsupervised (UNSUP) training. Sixty-six older adults (men: 25, women: 41; age 73 ± 4 years) were randomly assigned to a SUP group (2/week supervised training, 1/week unsupervised training; n = 22), an UNSUP group (3/week unsupervised training; n = 22) or a passive control group (CON; n = 22). Static (i.e., Romberg Test) and dynamic (i.e., 10-meter walk test) steady-state, proactive (i.e., Timed Up and Go Test, Functional Reach Test), and reactive balance (e.g., Push and Release Test), as well as lower extremity muscle power (i.e., Chair Stand Test; Stair Ascent and Descent Test) were tested before and after the active training phase as well as after detraining. Adherence rates to training were 92% for SUP and 97% for UNSUP. BST resulted in significant group × time interactions. Post hoc analyses showed, among others, significant training-related improvements for the Romberg Test, stride velocity, Timed Up and Go Test, and Chair Stand Test in favor of the SUP group. Following detraining, significantly enhanced performances (compared to baseline) were still present in 13 variables for the SUP group and in 10 variables for the UNSUP group. Twelve weeks of BST proved to be safe (no training-related injuries) and feasible (high attendance rates of >90%). Deficits of balance and lower extremity muscle power can be
An Unsupervised Method of Change Detection in Multi-Temporal PolSAR Data Using a Test Statistic and an Improved K&I Algorithm

Directory of Open Access Journals (Sweden)

Jinqi Zhao

2017-12-01

Full Text Available In recent years, multi-temporal imagery from spaceborne sensors has provided a fast and practical means for surveying and assessing changes in terrain surfaces. Owing to the all-weather imaging capability, polarimetric synthetic aperture radar (PolSAR has become a key tool for change detection. Change detection methods include both unsupervised and supervised methods. Supervised change detection, which needs some human intervention, is generally ineffective and impractical. Due to this limitation, unsupervised methods are widely used in change detection. The traditional unsupervised methods only use a part of the polarization information, and the required thresholding algorithms are independent of the multi-temporal data, which results in the change detection map being ineffective and inaccurate. To solve these problems, a novel method of change detection using a test statistic based on the likelihood ratio test and the improved Kittler and Illingworth (K&I minimum-error thresholding algorithm is introduced in this paper. The test statistic is used to generate the comparison image (CI of the multi-temporal PolSAR images, and improved K&I using a generalized Gaussian model simulates the distribution of the CI. As a result of these advantages, we can obtain the change detection map using an optimum threshold. The efficiency of the proposed method is demonstrated by the use of multi-temporal PolSAR images acquired by RADARSAT-2 over Wuhan, China. The experimental results show that the proposed method is effective and highly accurate.
A robust methodology for modal parameters estimation applied to SHM

Science.gov (United States)

Cardoso, Rharã; Cury, Alexandre; Barbosa, Flávio

2017-10-01

The subject of structural health monitoring is drawing more and more attention over the last years. Many vibration-based techniques aiming at detecting small structural changes or even damage have been developed or enhanced through successive researches. Lately, several studies have focused on the use of raw dynamic data to assess information about structural condition. Despite this trend and much skepticism, many methods still rely on the use of modal parameters as fundamental data for damage detection. Therefore, it is of utmost importance that modal identification procedures are performed with a sufficient level of precision and automation. To fulfill these requirements, this paper presents a novel automated time-domain methodology to identify modal parameters based on a two-step clustering analysis. The first step consists in clustering modes estimates from parametric models of different orders, usually presented in stabilization diagrams. In an automated manner, the first clustering analysis indicates which estimates correspond to physical modes. To circumvent the detection of spurious modes or the loss of physical ones, a second clustering step is then performed. The second step consists in the data mining of information gathered from the first step. To attest the robustness and efficiency of the proposed methodology, numerically generated signals as well as experimental data obtained from a simply supported beam tested in laboratory and from a railway bridge are utilized. The results appeared to be more robust and accurate comparing to those obtained from methods based on one-step clustering analysis.
Extracting aerobic system dynamics during unsupervised activities of daily living using wearable sensor machine learning models.

Science.gov (United States)

Beltrame, Thomas; Amelard, Robert; Wong, Alexander; Hughson, Richard L

2018-02-01

Physical activity levels are related through algorithms to the energetic demand, with no information regarding the integrity of the multiple physiological systems involved in the energetic supply. Longitudinal analysis of the oxygen uptake (V̇o 2 ) by wearable sensors in realistic settings might permit development of a practical tool for the study of the longitudinal aerobic system dynamics (i.e., V̇o 2 kinetics). This study evaluated aerobic system dynamics based on predicted V̇o 2 data obtained from wearable sensors during unsupervised activities of daily living (μADL). Thirteen healthy men performed a laboratory-controlled moderate exercise protocol and were monitored for ≈6 h/day for 4 days (μADL data). Variables derived from hip accelerometer (ACC HIP ), heart rate monitor, and respiratory bands during μADL were extracted and processed by a validated random forest regression model to predict V̇o 2 . The aerobic system analysis was based on the frequency-domain analysis of ACC HIP and predicted V̇o 2 data obtained during μADL. Optimal samples for frequency domain analysis (constrained to ≤0.01 Hz) were selected when ACC HIP was higher than 0.05 g at a given frequency (i.e., participants were active). The temporal characteristics of predicted V̇o 2 data during μADL correlated with the temporal characteristics of measured V̇o 2 data during laboratory-controlled protocol ([Formula: see text] = 0.82, P system dynamics can be investigated during unsupervised activities of daily living by wearable sensors. Although speculative, these algorithms have the potential to be incorporated into wearable systems for early detection of changes in health status in realistic environments by detecting changes in aerobic response dynamics. NEW & NOTEWORTHY The early detection of subclinical aerobic system impairments might be indicative of impaired physiological reserves that impact the capacity for physical activity. This study is the first to use wearable
Unsupervised Learning Through Randomized Algorithms for High-Volume High-Velocity Data (ULTRA-HV).

Energy Technology Data Exchange (ETDEWEB)

Pinar, Ali [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Kolda, Tamara G. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Carlberg, Kevin Thomas [Wake Forest Univ., Winston-Salem, MA (United States); Ballard, Grey [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mahoney, Michael [Univ. of California, Berkeley, CA (United States)

2018-01-01

Through long-term investments in computing, algorithms, facilities, and instrumentation, DOE is an established leader in massive-scale, high-fidelity simulations, as well as science-leading experimentation. In both cases, DOE is generating more data than it can analyze and the problem is intensifying quickly. The need for advanced algorithms that can automatically convert the abundance of data into a wealth of useful information by discovering hidden structures is well recognized. Such efforts however, are hindered by the massive volume of the data and its high velocity. Here, the challenge is developing unsupervised learning methods to discover hidden structure in high-volume, high-velocity data.
Emulating galaxy clustering and galaxy-galaxy lensing into the deeply nonlinear regime: methodology, information, and forecasts

OpenAIRE

Wibking, Benjamin D.; Salcedo, Andrés N.; Weinberg, David H.; Garrison, Lehman H.; Ferrer, Douglas; Tinker, Jeremy; Eisenstein, Daniel; Metchnik, Marc; Pinto, Philip

2017-01-01

The combination of galaxy-galaxy lensing (GGL) with galaxy clustering is one of the most promising routes to determining the amplitude of matter clustering at low redshifts. We show that extending clustering+GGL analyses from the linear regime down to $\\sim 0.5 \\, h^{-1}$ Mpc scales increases their constraining power considerably, even after marginalizing over a flexible model of non-linear galaxy bias. Using a grid of cosmological N-body simulations, we construct a Taylor-expansion emulator ...
Unsupervised progressive elastic band exercises for frail geriatric inpatients objectively monitored by new exercise-integrated technology-a feasibility trial with an embedded qualitative study

DEFF Research Database (Denmark)

Rathleff, C R; Bandholm, T; Spaich, E G

2017-01-01

feasibility and acceptability of an unsupervised progressive strength training intervention monitored by BandCizer for frail geriatric inpatients. Methods: This feasibility trial included 15 frail inpatients at a geriatric ward. At hospitalization, the patients were prescribed two elastic band exercises......Background: Frailty is a serious condition frequently present in geriatric inpatients that potentially causes serious adverse events. Strength training is acknowledged as a means of preventing or delaying frailty and loss of function in these patients. However, limited hospital resources challenge...... the amount of supervised training, and unsupervised training could possibly supplement supervised training thereby increasing the total exercise dose during admission. A new valid and reliable technology, the BandCizer, objectively measures the exact training dosage performed. The purpose was to investigate...
The delicate balance between parental protection, unsupervised wandering, and adolescents' autonomy and its relation with antisocial behavior : The TRAILS study

NARCIS (Netherlands)

Sentse, M.; Dijkstra, J.K.; Lindenberg, S.; Ormel, J.; Veenstra, R.

In a large sample of early adolescents (T2: N = 1023; M age = 13.51; 55.5% girls), the impact of parental protection and unsupervised wandering on adolescents' antisocial behavior 2.5 years later was tested in this TRAILS study; gender and parental knowledge were controlled for. In addition, the
Indoor localization using unsupervised manifold alignment with geometry perturbation

KAUST Repository

Majeed, Khaqan

2014-04-01

The main limitation of deploying/updating Received Signal Strength (RSS) based indoor localization is the construction of fingerprinted radio map, which is quite a hectic and time-consuming process especially when the indoor area is enormous and/or dynamic. Different approaches have been undertaken to reduce such deployment/update efforts, but the performance degrades when the fingerprinting load is reduced below a certain level. In this paper, we propose an indoor localization scheme that requires as low as 1% fingerprinting load. This scheme employs unsupervised manifold alignment that takes crowd sourced RSS readings and localization requests as source data set and the environment\\'s plan coordinates as destination data set. The 1% fingerprinting load is only used to perturb the local geometries in the destination data set. Our proposed algorithm was shown to achieve less than 5 m mean localization error with 1% fingerprinting load and a limited number of crowd sourced readings, when other learning based localization schemes pass the 10 m mean error with the same information.
Indoor localization using unsupervised manifold alignment with geometry perturbation

KAUST Repository

Majeed, Khaqan; Sorour, Sameh; Al-Naffouri, Tareq Y.; Valaee, Shahrokh

2014-01-01

The main limitation of deploying/updating Received Signal Strength (RSS) based indoor localization is the construction of fingerprinted radio map, which is quite a hectic and time-consuming process especially when the indoor area is enormous and/or dynamic. Different approaches have been undertaken to reduce such deployment/update efforts, but the performance degrades when the fingerprinting load is reduced below a certain level. In this paper, we propose an indoor localization scheme that requires as low as 1% fingerprinting load. This scheme employs unsupervised manifold alignment that takes crowd sourced RSS readings and localization requests as source data set and the environment's plan coordinates as destination data set. The 1% fingerprinting load is only used to perturb the local geometries in the destination data set. Our proposed algorithm was shown to achieve less than 5 m mean localization error with 1% fingerprinting load and a limited number of crowd sourced readings, when other learning based localization schemes pass the 10 m mean error with the same information.
An Unsupervised kNN Method to Systematically Detect Changes in Protein Localization in High-Throughput Microscopy Images.

Directory of Open Access Journals (Sweden)

Alex Xijie Lu

Full Text Available Despite the importance of characterizing genes that exhibit subcellular localization changes between conditions in proteome-wide imaging experiments, many recent studies still rely upon manual evaluation to assess the results of high-throughput imaging experiments. We describe and demonstrate an unsupervised k-nearest neighbours method for the detection of localization changes. Compared to previous classification-based supervised change detection methods, our method is much simpler and faster, and operates directly on the feature space to overcome limitations in needing to manually curate training sets that may not generalize well between screens. In addition, the output of our method is flexible in its utility, generating both a quantitatively ranked list of localization changes that permit user-defined cut-offs, and a vector for each gene describing feature-wise direction and magnitude of localization changes. We demonstrate that our method is effective at the detection of localization changes using the Δrpd3 perturbation in Saccharomyces cerevisiae, where we capture 71.4% of previously known changes within the top 10% of ranked genes, and find at least four new localization changes within the top 1% of ranked genes. The results of our analysis indicate that simple unsupervised methods may be able to identify localization changes in images without laborious manual image labelling steps.
An Unsupervised kNN Method to Systematically Detect Changes in Protein Localization in High-Throughput Microscopy Images.

Science.gov (United States)

Lu, Alex Xijie; Moses, Alan M

2016-01-01

Despite the importance of characterizing genes that exhibit subcellular localization changes between conditions in proteome-wide imaging experiments, many recent studies still rely upon manual evaluation to assess the results of high-throughput imaging experiments. We describe and demonstrate an unsupervised k-nearest neighbours method for the detection of localization changes. Compared to previous classification-based supervised change detection methods, our method is much simpler and faster, and operates directly on the feature space to overcome limitations in needing to manually curate training sets that may not generalize well between screens. In addition, the output of our method is flexible in its utility, generating both a quantitatively ranked list of localization changes that permit user-defined cut-offs, and a vector for each gene describing feature-wise direction and magnitude of localization changes. We demonstrate that our method is effective at the detection of localization changes using the Δrpd3 perturbation in Saccharomyces cerevisiae, where we capture 71.4% of previously known changes within the top 10% of ranked genes, and find at least four new localization changes within the top 1% of ranked genes. The results of our analysis indicate that simple unsupervised methods may be able to identify localization changes in images without laborious manual image labelling steps.
Probabilistic Rule Generator: A new methodology of variable-valued logic synthesis

International Nuclear Information System (INIS)

Lee, W.D.; Ray, S.R.

1986-01-01

A new methodology to synthesize variable-valued logic formulas from training data events is presented. Probablistic Rule Generator (PRG) employs not only information-theoretic entropy as a heuristic to capture a path expression but also multiple-valued logic to expand a captured complex. PRG is efficient for capturing major clusters in the event space, and is more general than previous methodologies in providing probabilistic features
Resting-state fMRI activity predicts unsupervised learning and memory in an immersive virtual reality environment.

Directory of Open Access Journals (Sweden)

Chi Wah Wong

Full Text Available In the real world, learning often proceeds in an unsupervised manner without explicit instructions or feedback. In this study, we employed an experimental paradigm in which subjects explored an immersive virtual reality environment on each of two days. On day 1, subjects implicitly learned the location of 39 objects in an unsupervised fashion. On day 2, the locations of some of the objects were changed, and object location recall performance was assessed and found to vary across subjects. As prior work had shown that functional magnetic resonance imaging (fMRI measures of resting-state brain activity can predict various measures of brain performance across individuals, we examined whether resting-state fMRI measures could be used to predict object location recall performance. We found a significant correlation between performance and the variability of the resting-state fMRI signal in the basal ganglia, hippocampus, amygdala, thalamus, insula, and regions in the frontal and temporal lobes, regions important for spatial exploration, learning, memory, and decision making. In addition, performance was significantly correlated with resting-state fMRI connectivity between the left caudate and the right fusiform gyrus, lateral occipital complex, and superior temporal gyrus. Given the basal ganglia's role in exploration, these findings suggest that tighter integration of the brain systems responsible for exploration and visuospatial processing may be critical for learning in a complex environment.

A scale space approach for unsupervised feature selection in mass spectra classification for ovarian cancer detection.

Science.gov (United States)

Ceccarelli, Michele; d'Acierno, Antonio; Facchiano, Angelo

2009-10-15

Mass spectrometry spectra, widely used in proteomics studies as a screening tool for protein profiling and to detect discriminatory signals, are high dimensional data. A large number of local maxima (a.k.a. peaks) have to be analyzed as part of computational pipelines aimed at the realization of efficient predictive and screening protocols. With this kind of data dimensions and samples size the risk of over-fitting and selection bias is pervasive. Therefore the development of bio-informatics methods based on unsupervised feature extraction can lead to general tools which can be applied to several fields of predictive proteomics. We propose a method for feature selection and extraction grounded on the theory of multi-scale spaces for high resolution spectra derived from analysis of serum. Then we use support vector machines for classification. In particular we use a database containing 216 samples spectra divided in 115 cancer and 91 control samples. The overall accuracy averaged over a large cross validation study is 98.18. The area under the ROC curve of the best selected model is 0.9962. We improved previous known results on the problem on the same data, with the advantage that the proposed method has an unsupervised feature selection phase. All the developed code, as MATLAB scripts, can be downloaded from http://medeaserver.isa.cnr.it/dacierno/spectracode.htm.
Clusters - Tourism Activity Increase Competitiveness Support

Directory of Open Access Journals (Sweden)

Carmen IORDACHE

2010-05-01

Full Text Available Tourism represents one of those areas with the greatest potential of global expansion. Tourism development strategy in terms of maximizing its positive effects on regional economic increase and implicitly on the national one starts from the premise that in global economy value is created in regions which are defined as particular geographical entities, separated by geographical reasons and not as political-administrative structures, and economic increase is centrally cumulated and valued according to the economic policy and the national legal system.Regional economic system approach based on “cluster” concept is explained by the fact that the regional activities portfolio is based on an inter and intra-industry networking grouped by cluster, in which is created the value that increases as the activity results are leading to the final consumers.This type of communication aims to highlight the tourism role as a factor in regional development, the clustering process significance in obtaining some competitiveness advantages, clusters development in tourism beginnings, and also the identification methodology used to select one touristic area to create the cluster.
Scenario aggregation and analysis via Mean-Shift Methodology

International Nuclear Information System (INIS)

Mandelli, D.; Yilmaz, A.; Metzroth, K.; Aldemir, T.; Denning, R.

2010-01-01

A new generation of dynamic methodologies is being developed for nuclear reactor probabilistic risk assessment (PRA) which explicitly account for the time element in modeling the probabilistic system evolution and use numerical simulation tools to account for possible dependencies between failure events. The dynamic event tree (DET) approach is one of these methodologies. One challenge with dynamic PRA methodologies is the large amount of data they produce which may be difficult to analyze without appropriate software tools. The concept of 'data mining' is well known in the computer science community and several methodologies have been developed in order to extract useful information from a dataset with a large number of records. Using the dataset generated by the DET analysis of the reactor vessel auxiliary cooling system (RVACS) of an ABR-1000 for an aircraft crash recovery scenario and the Mean-Shift Methodology for data mining, it is shown how clusters of transients with common characteristics can be identified and classified. (authors)
Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient.

Science.gov (United States)

Yao, Jianchao; Chang, Chunqi; Salmi, Mari L; Hung, Yeung Sam; Loraine, Ann; Roux, Stanley J

2008-06-18

Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. This study shows that SCC is an alternative to the Pearson
A low-cost single-board solution for real-time, unsupervised waveform classification of multineuron recordings.

Science.gov (United States)

Kreiter, A K; Aertsen, A M; Gerstein, G L

1989-10-01

We describe a low-cost single-board system for unsupervised, real-time spike sorting of recordings from a number of neurons on a single microelectrode. The maximum number of spike classes depends on the quality of the recording; it will typically be between 2 and 5. The spike sorter communicates with a conventional microcomputer through a standard serial port (RS232). For typical firing rates as measured in the mammalian central nervous system, this set-up will accommodate up to some 10 parallel spike sorters for as many separate microelectrodes.
Unsupervised Neural Network Quantifies the Cost of Visual Information Processing.

Directory of Open Access Journals (Sweden)

Levente L Orbán

Full Text Available Untrained, "flower-naïve" bumblebees display behavioural preferences when presented with visual properties such as colour, symmetry, spatial frequency and others. Two unsupervised neural networks were implemented to understand the extent to which these models capture elements of bumblebees' unlearned visual preferences towards flower-like visual properties. The computational models, which are variants of Independent Component Analysis and Feature-Extracting Bidirectional Associative Memory, use images of test-patterns that are identical to ones used in behavioural studies. Each model works by decomposing images of floral patterns into meaningful underlying factors. We reconstruct the original floral image using the components and compare the quality of the reconstructed image to the original image. Independent Component Analysis matches behavioural results substantially better across several visual properties. These results are interpreted to support a hypothesis that the temporal and energetic costs of information processing by pollinators served as a selective pressure on floral displays: flowers adapted to pollinators' cognitive constraints.
Unsupervised Neural Network Quantifies the Cost of Visual Information Processing.

Science.gov (United States)

Orbán, Levente L; Chartier, Sylvain

2015-01-01

Untrained, "flower-naïve" bumblebees display behavioural preferences when presented with visual properties such as colour, symmetry, spatial frequency and others. Two unsupervised neural networks were implemented to understand the extent to which these models capture elements of bumblebees' unlearned visual preferences towards flower-like visual properties. The computational models, which are variants of Independent Component Analysis and Feature-Extracting Bidirectional Associative Memory, use images of test-patterns that are identical to ones used in behavioural studies. Each model works by decomposing images of floral patterns into meaningful underlying factors. We reconstruct the original floral image using the components and compare the quality of the reconstructed image to the original image. Independent Component Analysis matches behavioural results substantially better across several visual properties. These results are interpreted to support a hypothesis that the temporal and energetic costs of information processing by pollinators served as a selective pressure on floral displays: flowers adapted to pollinators' cognitive constraints.
Massive open star clusters using the VVV survey. II. Discovery of six clusters with Wolf-Rayet stars

Science.gov (United States)

Chené, A.-N.; Borissova, J.; Bonatto, C.; Majaess, D. J.; Baume, G.; Clarke, J. R. A.; Kurtev, R.; Schnurr, O.; Bouret, J.-C.; Catelan, M.; Emerson, J. P.; Feinstein, C.; Geisler, D.; de Grijs, R.; Hervé, A.; Ivanov, V. D.; Kumar, M. S. N.; Lucas, P.; Mahy, L.; Martins, F.; Mauro, F.; Minniti, D.; Moni Bidin, C.

2013-01-01

Context. The ESO Public Survey "VISTA Variables in the Vía Láctea" (VVV) provides deep multi-epoch infrared observations for an unprecedented 562 sq. degrees of the Galactic bulge, and adjacent regions of the disk. Nearly 150 new open clusters and cluster candidates have been discovered in this survey. Aims: This is the second in a series of papers about young, massive open clusters observed using the VVV survey. We present the first study of six recently discovered clusters. These clusters contain at least one newly discovered Wolf-Rayet (WR) star. Methods: Following the methodology presented in the first paper of the series, wide-field, deep JHKs VVV observations, combined with new infrared spectroscopy, are employed to constrain fundamental parameters for a subset of clusters. Results: We find that the six studied stellar groups are real young (2-7 Myr) and massive (between 0.8 and 2.2 × 103 M⊙) clusters. They are highly obscured (AV ~ 5-24 mag) and compact (1-2 pc). In addition to WR stars, two of the six clusters also contain at least one red supergiant star, and one of these two clusters also contains a blue supergiant. We claim the discovery of 8 new WR stars, and 3 stars showing WR-like emission lines which could be classified WR or OIf. Preliminary analysis provides initial masses of ~30-50 M⊙ for the WR stars. Finally, we discuss the spiral structure of the Galaxy using the six new clusters as tracers, together with the previously studied VVV clusters. Based on observations with ISAAC, VLT, ESO (programme 087.D-0341A), New Technology Telescope at ESO's La Silla Observatory (programme 087.D-0490A) and with the Clay telescope at the Las Campanas Observatory (programme CN2011A-086). Also based on data from the VVV survey (programme 172.B-2002).
Clusters of sirenomelia in South America.

Science.gov (United States)

Orioli, Iêda M; Mastroiacovo, Pierpaolo; López-Camelo, Jorge S; Saldarriaga, Wilmar; Isaza, Carolina; Aiello, Horacio; Zarante, Ignacio; Castilla, Eduardo E

2009-02-01

One hospital in the city of Cali, Colombia, of the ECLAMC (Latin-American Collaborative Study of Congenital Malformations) network, reported the unusual occurrence of four cases of sirenomelia within a 55-day period. An ECLAMC routine for cluster evaluation (RUMOR) was followed that included: calculations of observed/expected ratios, site visits, comparison with comprehensively collected local, South American, and worldwide data, cluster analysis, and search for risk factors. All four Cali sirenomelia cases were born to mothers living in a 2 km(2) area, in neighboring communes, within the municipality of Cali. Considering the total births of the city of Cali as the denominator, and based on ECLAMC baseline birth prevalence rates (per 100,000) for sirenomelia (2.25, 95% CI: 2.66, 3.80), the cluster for this congenital abnormality was unlikely to have occurred by chance (observed/expected ratio = 5.77; 95% CI: 1.57-14.78; p = .002). No consistent common factor was identified, but vicinity to an open landfill as the cause could not be rejected. Another ECLAMC hospital in San Justo, Buenos Aires, Argentina, reported three further cases but these did not seem to constitute a nonrandom cluster. The methodology used to evaluate the two possible clusters of sirenomelia determined that the Cali sirenomelia cluster was unlikely to have occurred by chance whereas the sirenomelia cluster from San Justo seemed to be random. . (c) 2008 Wiley-Liss, Inc.
Post-Graduate Student Performance in "Supervised In-Class" vs. "Unsupervised Online" Multiple Choice Tests: Implications for Cheating and Test Security

Science.gov (United States)

Ladyshewsky, Richard K.

2015-01-01

This research explores differences in multiple choice test (MCT) scores in a cohort of post-graduate students enrolled in a management and leadership course. A total of 250 students completed the MCT in either a supervised in-class paper and pencil test or an unsupervised online test. The only statistically significant difference between the nine…
Segmentation of Residential Gas Consumers Using Clustering Analysis

Directory of Open Access Journals (Sweden)

Marta P. Fernandes

2017-12-01

Full Text Available The growing environmental concerns and liberalization of energy markets have resulted in an increased competition between utilities and a strong focus on efficiency. To develop new energy efficiency measures and optimize operations, utilities seek new market-related insights and customer engagement strategies. This paper proposes a clustering-based methodology to define the segmentation of residential gas consumers. The segments of gas consumers are obtained through a detailed clustering analysis using smart metering data. Insights are derived from the segmentation, where the segments result from the clustering process and are characterized based on the consumption profiles, as well as according to information regarding consumers’ socio-economic and household key features. The study is based on a sample of approximately one thousand households over one year. The representative load profiles of consumers are essentially characterized by two evident consumption peaks, one in the morning and the other in the evening, and an off-peak consumption. Significant insights can be derived from this methodology regarding typical consumption curves of the different segments of consumers in the population. This knowledge can assist energy utilities and policy makers in the development of consumer engagement strategies, demand forecasting tools and in the design of more sophisticated tariff systems.
A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network.

Science.gov (United States)

Fiannaca, Antonino; La Rosa, Massimo; Rizzo, Riccardo; Urso, Alfonso

2015-07-01

In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments. Copyright © 2015 Elsevier B.V. All rights reserved.
Preliminary hard and soft bottom seafloor substrate map derived from an unsupervised classification of gridded backscatter and bathymetry derivatives of Ni'ihau Island, Hawaii, USA.

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — Preliminary hard and soft seafloor substrate map derived from an unsupervised classification of multibeam backscatter and bathymety derivatives of Ni'ihau Island,...
IDENTIFICAÇÃO DE CLUSTERS INTERNACIONAIS COM BASE NAS DIMENSÕES CULTURAIS DE HOFSTEDE. / Identification of international clusters based on the hofstede’s cultural dimensions

Directory of Open Access Journals (Sweden)

Valderí de Castro Alcântara1

2012-08-01

Full Text Available Haja vista que a cultura de um país influencia a cultura organizacional das empresas nele presente e ainda é fator determinante no processo de internacionalização, torna-se relevante compreender e mensurar as características culturais de cada país. Os estudos de Hofstede (1984 apresentam uma metodologia útil para comparação entre culturas. Tal metodologia leva em consideração as características deuma cultura que possibilita diferenciar um país de outro. Dessa forma, é possível observar que determinados países compartilham certos traços culturais e, assim, é possível agrupá-los segundo critérios pré-estabelecidos. O presente trabalho objetiva utilizar-se de procedimentos estatísticos multivariados Clusters Analyses, K-Means Cluster Analysis e Análise Discriminante para determinar e validar agrupamentos de países, com base nas dimensões culturais de Hofstede (Distance Index, Individualism, Masculinity e Uncertainty Avoidance Index. Os resultados determinaram quatro clusters: Cluster 1 - países com cultura masculina e individualista; Cluster 2 - cultura coletivista e aversa à incerteza; Cluster 3 - cultura feminina e com baixa distância hierárquica; e Cluster 4 - cultura com elevada distância hierárquica e propensão à incerteza./ Considering that the culture of a country influences the organizational culture of this company and it is still a determining factor in the internationalization process becomes important to understand and measure the cultural characteristics of each country. The studies of Hofstede (1984 present a useful methodology for comparing cultures, this methodology takes into account the characteristics of a culturethat allows to differentiate one from another country. Thus one can observe that certain countries share certain cultural traits and so it is possible grouping them according to predetermined criteria. The present work aims to utilize multivariate statistical procedures Cluster Analyses
GenClust: A genetic algorithm for clustering gene expression data

Directory of Open Access Journals (Sweden)

Raimondi Alessandra

2005-12-01

Full Text Available Abstract Background Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. Results GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a a novel coding of the search space that is simple, compact and easy to update; (b it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. Conclusion Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology.
Unsupervised neural spike sorting for high-density microelectrode arrays with convolutive independent component analysis.

Science.gov (United States)

Leibig, Christian; Wachtler, Thomas; Zeck, Günther

2016-09-15

Unsupervised identification of action potentials in multi-channel extracellular recordings, in particular from high-density microelectrode arrays with thousands of sensors, is an unresolved problem. While independent component analysis (ICA) achieves rapid unsupervised sorting, it ignores the convolutive structure of extracellular data, thus limiting the unmixing to a subset of neurons. Here we present a spike sorting algorithm based on convolutive ICA (cICA) to retrieve a larger number of accurately sorted neurons than with instantaneous ICA while accounting for signal overlaps. Spike sorting was applied to datasets with varying signal-to-noise ratios (SNR: 3-12) and 27% spike overlaps, sampled at either 11.5 or 23kHz on 4365 electrodes. We demonstrate how the instantaneity assumption in ICA-based algorithms has to be relaxed in order to improve the spike sorting performance for high-density microelectrode array recordings. Reformulating the convolutive mixture as an instantaneous mixture by modeling several delayed samples jointly is necessary to increase signal-to-noise ratio. Our results emphasize that different cICA algorithms are not equivalent. Spike sorting performance was assessed with ground-truth data generated from experimentally derived templates. The presented spike sorter was able to extract ≈90% of the true spike trains with an error rate below 2%. It was superior to two alternative (c)ICA methods (≈80% accurately sorted neurons) and comparable to a supervised sorting. Our new algorithm represents a fast solution to overcome the current bottleneck in spike sorting of large datasets generated by simultaneous recording with thousands of electrodes. Copyright © 2016 Elsevier B.V. All rights reserved.
The Delicate Balance between Parental Protection, Unsupervised Wandering, and Adolescents' Autonomy and Its Relation with Antisocial Behavior: The TRAILS Study

Science.gov (United States)

Sentse, Miranda; Dijkstra, Jan Kornelis; Lindenberg, Siegwart; Ormel, Johan; Veenstra, Rene

2010-01-01

In a large sample of early adolescents (T2: N = 1023; M age = 13.51; 55.5% girls), the impact of parental protection and unsupervised wandering on adolescents' antisocial behavior 2.5 years later was tested in this TRAILS study; gender and parental knowledge were controlled for. In addition, the level of biological maturation and having antisocial…
An Unsupervised Algorithm for Change Detection in Hyperspectral Remote Sensing Data Using Synthetically Fused Images and Derivative Spectral Profiles

Directory of Open Access Journals (Sweden)

Youkyung Han

2017-01-01

Full Text Available Multitemporal hyperspectral remote sensing data have the potential to detect altered areas on the earth’s surface. However, dissimilar radiometric and geometric properties between the multitemporal data due to the acquisition time or position of the sensors should be resolved to enable hyperspectral imagery for detecting changes in natural and human-impacted areas. In addition, data noise in the hyperspectral imagery spectrum decreases the change-detection accuracy when general change-detection algorithms are applied to hyperspectral images. To address these problems, we present an unsupervised change-detection algorithm based on statistical analyses of spectral profiles; the profiles are generated from a synthetic image fusion method for multitemporal hyperspectral images. This method aims to minimize the noise between the spectra corresponding to the locations of identical positions by increasing the change-detection rate and decreasing the false-alarm rate without reducing the dimensionality of the original hyperspectral data. Using a quantitative comparison of an actual dataset acquired by airborne hyperspectral sensors, we demonstrate that the proposed method provides superb change-detection results relative to the state-of-the-art unsupervised change-detection algorithms.
Dark Energy Survey Year 1 Results: Methodology and Projections for Joint Analysis of Galaxy Clustering, Galaxy Lensing, and CMB Lensing Two-point Functions

Energy Technology Data Exchange (ETDEWEB)

Giannantonio, T.; et al.

2018-02-14

Optical imaging surveys measure both the galaxy density and the gravitational lensing-induced shear fields across the sky. Recently, the Dark Energy Survey (DES) collaboration used a joint fit to two-point correlations between these observables to place tight constraints on cosmology (DES Collaboration et al. 2017). In this work, we develop the methodology to extend the DES Collaboration et al. (2017) analysis to include cross-correlations of the optical survey observables with gravitational lensing of the cosmic microwave background (CMB) as measured by the South Pole Telescope (SPT) and Planck. Using simulated analyses, we show how the resulting set of five two-point functions increases the robustness of the cosmological constraints to systematic errors in galaxy lensing shear calibration. Additionally, we show that contamination of the SPT+Planck CMB lensing map by the thermal Sunyaev-Zel'dovich effect is a potentially large source of systematic error for two-point function analyses, but show that it can be reduced to acceptable levels in our analysis by masking clusters of galaxies and imposing angular scale cuts on the two-point functions. The methodology developed here will be applied to the analysis of data from the DES, the SPT, and Planck in a companion work.
A SURVEY ON DOCUMENT CLUSTERING APPROACH FOR COMPUTER FORENSIC ANALYSIS

OpenAIRE

Monika Raghuvanshi*, Rahul Patel

2016-01-01

In a forensic analysis, large numbers of files are examined. Much of the information comprises of in unstructured format, so it’s quite difficult task for computer forensic to perform such analysis. That’s why to do the forensic analysis of document within a limited period of time require a special approach such as document clustering. This paper review different document clustering algorithms methodologies for example K-mean, K-medoid, single link, complete link, average link in accorandance...

Prediction of Solvent Physical Properties using the Hierarchical Clustering Method

Science.gov (United States)

Recently a QSAR (Quantitative Structure Activity Relationship) method, the hierarchical clustering method, was developed to estimate acute toxicity values for large, diverse datasets. This methodology has now been applied to the estimate solvent physical properties including sur...
Integrative analysis of gene expression and DNA methylation using unsupervised feature extraction for detecting candidate cancer biomarkers.

Science.gov (United States)

Moon, Myungjin; Nakai, Kenta

2018-04-01

Currently, cancer biomarker discovery is one of the important research topics worldwide. In particular, detecting significant genes related to cancer is an important task for early diagnosis and treatment of cancer. Conventional studies mostly focus on genes that are differentially expressed in different states of cancer; however, noise in gene expression datasets and insufficient information in limited datasets impede precise analysis of novel candidate biomarkers. In this study, we propose an integrative analysis of gene expression and DNA methylation using normalization and unsupervised feature extractions to identify candidate biomarkers of cancer using renal cell carcinoma RNA-seq datasets. Gene expression and DNA methylation datasets are normalized by Box-Cox transformation and integrated into a one-dimensional dataset that retains the major characteristics of the original datasets by unsupervised feature extraction methods, and differentially expressed genes are selected from the integrated dataset. Use of the integrated dataset demonstrated improved performance as compared with conventional approaches that utilize gene expression or DNA methylation datasets alone. Validation based on the literature showed that a considerable number of top-ranked genes from the integrated dataset have known relationships with cancer, implying that novel candidate biomarkers can also be acquired from the proposed analysis method. Furthermore, we expect that the proposed method can be expanded for applications involving various types of multi-omics datasets.
Preliminary hard and soft bottom seafloor substrate map derived from an unsupervised classification of gridded backscatter and bathymetry derivatives at Tutuila Island, American Samoa, South Pacific.

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — Preliminary hard and soft seafloor substrate map derived from an unsupervised classification of multibeam backscatter and bathymety derivatives at Tutuila Island,...
Effect of early supervised progressive resistance training compared to unsupervised home-based exercise after fast-track total hip replacement applied to patients with preoperative functional limitations

DEFF Research Database (Denmark)

Mikkelsen, L R; Mechlenburg, I; Søballe, K

2014-01-01

OBJECTIVE: To examine if 2 weekly sessions of supervised progressive resistance training (PRT) in combination with 5 weekly sessions of unsupervised home-based exercise is more effective than 7 weekly sessions of unsupervised home-based exercise in improving leg-extension power of the operated leg...... 10 weeks after total hip replacement (THR) in patients with lower pre-operative function. METHOD: A total of 73 patients scheduled for THR were randomised (1:1) to intervention group (IG, home based exercise 5 days/week and PRT 2 days/week) or control group (CG, home based exercise 7 days...... of the operated leg, at the primary endpoint 10 weeks after surgery in THR patients with lower pre-operative function. TRIAL REGISTRATION: NCT01214954....
Net-zero Building Cluster Simulations and On-line Energy Forecasting for Adaptive and Real-Time Control and Decisions

Science.gov (United States)

Li, Xiwang

Buildings consume about 41.1% of primary energy and 74% of the electricity in the U.S. Moreover, it is estimated by the National Energy Technology Laboratory that more than 1/4 of the 713 GW of U.S. electricity demand in 2010 could be dispatchable if only buildings could respond to that dispatch through advanced building energy control and operation strategies and smart grid infrastructure. In this study, it is envisioned that neighboring buildings will have the tendency to form a cluster, an open cyber-physical system to exploit the economic opportunities provided by a smart grid, distributed power generation, and storage devices. Through optimized demand management, these building clusters will then reduce overall primary energy consumption and peak time electricity consumption, and be more resilient to power disruptions. Therefore, this project seeks to develop a Net-zero building cluster simulation testbed and high fidelity energy forecasting models for adaptive and real-time control and decision making strategy development that can be used in a Net-zero building cluster. The following research activities are summarized in this thesis: 1) Development of a building cluster emulator for building cluster control and operation strategy assessment. 2) Development of a novel building energy forecasting methodology using active system identification and data fusion techniques. In this methodology, a systematic approach for building energy system characteristic evaluation, system excitation and model adaptation is included. The developed methodology is compared with other literature-reported building energy forecasting methods; 3) Development of the high fidelity on-line building cluster energy forecasting models, which includes energy forecasting models for buildings, PV panels, batteries and ice tank thermal storage systems 4) Small scale real building validation study to verify the performance of the developed building energy forecasting methodology. The outcomes of
Unsupervised neural networks for solving Troesch's problem

International Nuclear Information System (INIS)

Raja Muhammad Asif Zahoor

2014-01-01

In this study, stochastic computational intelligence techniques are presented for the solution of Troesch's boundary value problem. The proposed stochastic solvers use the competency of a feed-forward artificial neural network for mathematical modeling of the problem in an unsupervised manner, whereas the learning of unknown parameters is made with local and global optimization methods as well as their combinations. Genetic algorithm (GA) and pattern search (PS) techniques are used as the global search methods and the interior point method (IPM) is used for an efficient local search. The combination of techniques like GA hybridized with IPM (GA-IPM) and PS hybridized with IPM (PS-IPM) are also applied to solve different forms of the equation. A comparison of the proposed results obtained from GA, PS, IPM, PS-IPM and GA-IPM has been made with the standard solutions including well known analytic techniques of the Adomian decomposition method, the variational iterational method and the homotopy perturbation method. The reliability and effectiveness of the proposed schemes, in term of accuracy and convergence, are evaluated from the results of statistical analysis based on sufficiently large independent runs. (interdisciplinary physics and related areas of science and technology)
Improved Performance of Unsupervised Method by Renovated K-Means

OpenAIRE

Ashok, P.; Nawaz, G. M Kadhar; Elayaraja, E.; Vadivel, V.

2013-01-01

Clustering is a separation of data into groups of similar objects. Every group called cluster consists of objects that are similar to one another and dissimilar to objects of other groups. In this paper, the K-Means algorithm is implemented by three distance functions and to identify the optimal distance function for clustering methods. The proposed K-Means algorithm is compared with K-Means, Static Weighted K-Means (SWK-Means) and Dynamic Weighted K-Means (DWK-Means) algorithm by using Davis...
Superresolution Imaging of Aquaporin-4 Cluster Size in Antibody-Stained Paraffin Brain Sections.

Science.gov (United States)

Smith, Alex J; Verkman, Alan S

2015-12-15

The water channel aquaporin-4 (AQP4) forms supramolecular clusters whose size is determined by the ratio of M1- and M23-AQP4 isoforms. In cultured astrocytes, differences in the subcellular localization and macromolecular interactions of small and large AQP4 clusters results in distinct physiological roles for M1- and M23-AQP4. Here, we developed quantitative superresolution optical imaging methodology to measure AQP4 cluster size in antibody-stained paraffin sections of mouse cerebral cortex and spinal cord, human postmortem brain, and glioma biopsy specimens. This methodology was used to demonstrate that large AQP4 clusters are formed in AQP4(-/-) astrocytes transfected with only M23-AQP4, but not in those expressing only M1-AQP4, both in vitro and in vivo. Native AQP4 in mouse cortex, where both isoforms are expressed, was enriched in astrocyte foot-processes adjacent to microcapillaries; clusters in perivascular regions of the cortex were larger than in parenchymal regions, demonstrating size-dependent subcellular segregation of AQP4 clusters. Two-color superresolution imaging demonstrated colocalization of Kir4.1 with AQP4 clusters in perivascular areas but not in parenchyma. Surprisingly, the subcellular distribution of AQP4 clusters was different between gray and white matter astrocytes in spinal cord, demonstrating regional specificity in cluster polarization. Changes in AQP4 subcellular distribution are associated with several neurological diseases and we demonstrate that AQP4 clustering was preserved in a postmortem human cortical brain tissue specimen, but that AQP4 was not substantially clustered in a human glioblastoma specimen despite high-level expression. Our results demonstrate the utility of superresolution optical imaging for measuring the size of AQP4 supramolecular clusters in paraffin sections of brain tissue and support AQP4 cluster size as a primary determinant of its subcellular distribution. Copyright © 2015 Biophysical Society
Performance of clustering techniques for solving multi depot vehicle routing problem

Directory of Open Access Journals (Sweden)

Eliana M. Toro-Ocampo

2016-01-01

Full Text Available The vehicle routing problem considering multiple depots is classified as NP-hard. MDVRP determines simultaneously the routes of a set of vehicles and aims to meet a set of clients with a known demand. The objective function of the problem is to minimize the total distance traveled by the routes given that all customers must be served considering capacity constraints in depots and vehicles. This paper presents a hybrid methodology that combines agglomerative clustering techniques to generate initial solutions with an iterated local search algorithm (ILS to solve the problem. Although previous studies clustering methods have been proposed like strategies to generate initial solutions, in this work the search is intensified on the information generated after applying the clustering technique. Besides an extensive analysis on the performance of techniques, and their effect in the final solution is performed. The operation of the proposed methodology is feasible and effective to solve the problem regarding the quality of the answers and computational times obtained on request evaluated literature
Seeing deconvolution of globular clusters in M31

International Nuclear Information System (INIS)

Bendinelli, O.; Zavatti, F.; Parmeggiani, G.; Djorgovski, S.

1990-01-01

The morphology of six M31 globular clusters is examined using seeing-deconvolved CCD images. The deconvolution techniques developed by Bendinelli (1989) are reviewed and applied to the M31 globular clusters to demonstrate the methodology. It is found that the effective resolution limit of the method is about 0.1-0.3 arcsec for CCD images obtained in FWHM = 1 arcsec seeing, and sampling of 0.3 arcsec/pixel. Also, the robustness of the method is discussed. The implications of the technique for future studies using data from the Hubble Space Telescope are considered. 68 refs
Simultaneous gains tuning in boiler/turbine PID-based controller clusters using iterative feedback tuning methodology.

Science.gov (United States)

Zhang, Shu; Taft, Cyrus W; Bentsman, Joseph; Hussey, Aaron; Petrus, Bryan

2012-09-01

Tuning a complex multi-loop PID based control system requires considerable experience. In today's power industry the number of available qualified tuners is dwindling and there is a great need for better tuning tools to maintain and improve the performance of complex multivariable processes. Multi-loop PID tuning is the procedure for the online tuning of a cluster of PID controllers operating in a closed loop with a multivariable process. This paper presents the first application of the simultaneous tuning technique to the multi-input-multi-output (MIMO) PID based nonlinear controller in the power plant control context, with the closed-loop system consisting of a MIMO nonlinear boiler/turbine model and a nonlinear cluster of six PID-type controllers. Although simplified, the dynamics and cross-coupling of the process and the PID cluster are similar to those used in a real power plant. The particular technique selected, iterative feedback tuning (IFT), utilizes the linearized version of the PID cluster for signal conditioning, but the data collection and tuning is carried out on the full nonlinear closed-loop system. Based on the figure of merit for the control system performance, the IFT is shown to deliver performance favorably comparable to that attained through the empirical tuning carried out by an experienced control engineer. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.
A Link-Based Cluster Ensemble Approach For Improved Gene Expression Data Analysis

Directory of Open Access Journals (Sweden)

P.Balaji

2015-01-01

Full Text Available Abstract It is difficult from possibilities to select a most suitable effective way of clustering algorithm and its dataset for a defined set of gene expression data because we have a huge number of ways and huge number of gene expressions. At present many researchers are preferring to use hierarchical clustering in different forms this is no more totally optimal. Cluster ensemble research can solve this type of problem by automatically merging multiple data partitions from a wide range of different clusterings of any dimensions to improve both the quality and robustness of the clustering result. But we have many existing ensemble approaches using an association matrix to condense sample-cluster and co-occurrence statistics and relations within the ensemble are encapsulated only at raw level while the existing among clusters are totally discriminated. Finding these missing associations can greatly expand the capability of those ensemble methodologies for microarray data clustering. We propose general K-means cluster ensemble approach for the clustering of general categorical data into required number of partitions.
Thermodynamic free-energy minimization for unsupervised fusion of dual-color infrared breast images

Science.gov (United States)

Szu, Harold; Miao, Lidan; Qi, Hairong

2006-04-01

This paper presents algorithmic details of an unsupervised neural network and unbiased diagnostic methodology, that is, no lookup table is needed that labels the input training data with desired outputs. We deploy the smart algorithm on two satellite-grade infrared (IR) cameras. Although an early malignant tumor must be small in size and cannot be resolved by a single pixel that images about hundreds cells, these cells reveal themselves physiologically by emitting spontaneously thermal radiation due to the rapid cell growth angiogenesis effect (In Greek: vessels generation for increasing tumor blood supply), shifting toward, according to physics, a shorter IR wavelengths emission band. If we use those exceedingly sensitive IR spectral band cameras, we can in principle detect whether or not the breast tumor is perhaps malignant through a thin blouse in a close-up dark room. If this protocol turns out to be reliable in a large scale follow-on Vatican experiment in 2006, which might generate business investment interests of nano-engineering manufacture of nano-camera made of 1-D Carbon Nano-Tubes without traditional liquid Nitrogen coolant for Mid IR camera, then one can accumulate the probability of any type of malignant tumor at every pixel over time in the comfort of privacy without religious or other concerns. Such a non-intrusive protocol alone may not have enough information to make the decision, but the changes tracked over time will be surely becoming significant. Such an ill-posed inverse heat source transfer problem can be solved because of the universal constraint of equilibrium physics governing the blackbody Planck radiation distribution, to be spatio-temporally sampled. Thus, we must gather two snapshots with two IR cameras to form a vector data X(t) per pixel to invert the matrix-vector equation X=[A]S pixel-by-pixel independently, known as a single-pixel blind sources separation (BSS). Because the unknown heat transfer matrix or the impulse response
The clustering of diet, physical activity and sedentary behavior in children and adolescents: a review

OpenAIRE

Leech, Rebecca M; McNaughton, Sarah A; Timperio, Anna

2014-01-01

Diet, physical activity (PA) and sedentary behavior are important, yet modifiable, determinants of obesity. Recent research into the clustering of these behaviors suggests that children and adolescents have multiple obesogenic risk factors. This paper reviews studies using empirical, data-driven methodologies, such as cluster analysis (CA) and latent class analysis (LCA), to identify clustering patterns of diet, PA and sedentary behavior among children or adolescents and their associations wi...
ClustOfVar: An R Package for the Clustering of Variables

Directory of Open Access Journals (Sweden)

Marie Chavent

2012-09-01

Full Text Available Clustering of variables is as a way to arrange variables into homogeneous clusters, i.e., groups of variables which are strongly related to each other and thus bring the same information. These approaches can then be useful for dimension reduction and variable selection. Several specific methods have been developed for the clustering of numerical variables. However concerning qualitative variables or mixtures of quantitative and qualitative variables, far fewer methods have been proposed. The R package ClustOfVar was specifically developed for this purpose. The homogeneity criterion of a cluster is defined as the sum of correlation ratios (for qualitative variables and squared correlations (for quantitative variables to a synthetic quantitative variable, summarizing ``as good as possible'' the variables in the cluster. This synthetic variable is the first principal component obtained with the PCAMIX method. Two clustering algorithms are proposed to optimize the homogeneity criterion: iterative relocation algorithm and ascendant hierarchical clustering. We also propose a bootstrap approach in order to determine suitable numbers of clusters. We illustrate the methodologies and the associated package on small datasets.
The clustering of diet, physical activity and sedentary behavior in children and adolescents: a review.

Science.gov (United States)

Leech, Rebecca M; McNaughton, Sarah A; Timperio, Anna

2014-01-22

Diet, physical activity (PA) and sedentary behavior are important, yet modifiable, determinants of obesity. Recent research into the clustering of these behaviors suggests that children and adolescents have multiple obesogenic risk factors. This paper reviews studies using empirical, data-driven methodologies, such as cluster analysis (CA) and latent class analysis (LCA), to identify clustering patterns of diet, PA and sedentary behavior among children or adolescents and their associations with socio-demographic indicators, and overweight and obesity. A literature search of electronic databases was undertaken to identify studies which have used data-driven methodologies to investigate the clustering of diet, PA and sedentary behavior among children and adolescents aged 5-18 years old. Eighteen studies (62% of potential studies) were identified that met the inclusion criteria, of which eight examined the clustering of PA and sedentary behavior and eight examined diet, PA and sedentary behavior. Studies were mostly cross-sectional and conducted in older children and adolescents (≥ 9 years). Findings from the review suggest that obesogenic cluster patterns are complex with a mixed PA/sedentary behavior cluster observed most frequently, but healthy and unhealthy patterning of all three behaviors was also reported. Cluster membership was found to differ according to age, gender and socio-economic status (SES). The tendency for older children/adolescents, particularly females, to comprise clusters defined by low PA was the most robust finding. Findings to support an association between obesogenic cluster patterns and overweight and obesity were inconclusive, with longitudinal research in this area limited. Diet, PA and sedentary behavior cluster together in complex ways that are not well understood. Further research, particularly in younger children, is needed to understand how cluster membership differs according to socio-demographic profile. Longitudinal research is
Spike timing analysis in neural networks with unsupervised synaptic plasticity

Science.gov (United States)

Mizusaki, B. E. P.; Agnes, E. J.; Brunnet, L. G.; Erichsen, R., Jr.

2013-01-01

The synaptic plasticity rules that sculpt a neural network architecture are key elements to understand cortical processing, as they may explain the emergence of stable, functional activity, while avoiding runaway excitation. For an associative memory framework, they should be built in a way as to enable the network to reproduce a robust spatio-temporal trajectory in response to an external stimulus. Still, how these rules may be implemented in recurrent networks and the way they relate to their capacity of pattern recognition remains unclear. We studied the effects of three phenomenological unsupervised rules in sparsely connected recurrent networks for associative memory: spike-timing-dependent-plasticity, short-term-plasticity and an homeostatic scaling. The system stability is monitored during the learning process of the network, as the mean firing rate converges to a value determined by the homeostatic scaling. Afterwards, it is possible to measure the recovery efficiency of the activity following each initial stimulus. This is evaluated by a measure of the correlation between spike fire timings, and we analysed the full memory separation capacity and limitations of this system.
BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

Science.gov (United States)

Hoff, Katharina J; Lange, Simone; Lomsadze, Alexandre; Borodovsky, Mark; Stanke, Mario

2016-03-01

Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction. We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step. BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/braker/ and http://exon.gatech.edu/GeneMark/ katharina.hoff@uni-greifswald.de or borodovsky@gatech.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Towards a new classification of stable phase schizophrenia into major and simple neuro-cognitive psychosis: Results of unsupervised machine learning analysis.

Science.gov (United States)

Kanchanatawan, Buranee; Sriswasdi, Sira; Thika, Supaksorn; Stoyanov, Drozdstoy; Sirivichayakul, Sunee; Carvalho, André F; Geffard, Michel; Maes, Michael

2018-05-23

Deficit schizophrenia, as defined by the Schedule for Deficit Syndrome, may represent a distinct diagnostic class defined by neurocognitive impairments coupled with changes in IgA/IgM responses to tryptophan catabolites (TRYCATs). Adequate classifications should be based on supervised and unsupervised learning rather than on consensus criteria. This study used machine learning as means to provide a more accurate classification of patients with stable phase schizophrenia. We found that using negative symptoms as discriminatory variables, schizophrenia patients may be divided into two distinct classes modelled by (A) impairments in IgA/IgM responses to noxious and generally more protective tryptophan catabolites, (B) impairments in episodic and semantic memory, paired associative learning and false memory creation, and (C) psychotic, excitation, hostility, mannerism, negative, and affective symptoms. The first cluster shows increased negative, psychotic, excitation, hostility, mannerism, depression and anxiety symptoms, and more neuroimmune and cognitive disorders and is therefore called "major neurocognitive psychosis" (MNP). The second cluster, called "simple neurocognitive psychosis" (SNP) is discriminated from normal controls by the same features although the impairments are less well developed than in MNP. The latter is additionally externally validated by lowered quality of life, body mass (reflecting a leptosome body type), and education (reflecting lower cognitive reserve). Previous distinctions including "type 1" (positive)/"type 2" (negative) and DSM-IV-TR (eg, paranoid) schizophrenia could not be validated using machine learning techniques. Previous names of the illness, including schizophrenia, are not very adequate because they do not describe the features of the illness, namely, interrelated neuroimmune, cognitive, and clinical features. Stable-phase schizophrenia consists of 2 relevant qualitatively distinct categories or nosological entities with SNP
Estimating extinction using unsupervised machine learning

Science.gov (United States)

Meingast, Stefan; Lombardi, Marco; Alves, João

2017-05-01

Dust extinction is the most robust tracer of the gas distribution in the interstellar medium, but measuring extinction is limited by the systematic uncertainties involved in estimating the intrinsic colors to background stars. In this paper we present a new technique, Pnicer, that estimates intrinsic colors and extinction for individual stars using unsupervised machine learning algorithms. This new method aims to be free from any priors with respect to the column density and intrinsic color distribution. It is applicable to any combination of parameters and works in arbitrary numbers of dimensions. Furthermore, it is not restricted to color space. Extinction toward single sources is determined by fitting Gaussian mixture models along the extinction vector to (extinction-free) control field observations. In this way it becomes possible to describe the extinction for observed sources with probability densities, rather than a single value. Pnicer effectively eliminates known biases found in similar methods and outperforms them in cases of deep observational data where the number of background galaxies is significant, or when a large number of parameters is used to break degeneracies in the intrinsic color distributions. This new method remains computationally competitive, making it possible to correctly de-redden millions of sources within a matter of seconds. With the ever-increasing number of large-scale high-sensitivity imaging surveys, Pnicer offers a fast and reliable way to efficiently calculate extinction for arbitrary parameter combinations without prior information on source characteristics. The Pnicer software package also offers access to the well-established Nicer technique in a simple unified interface and is capable of building extinction maps including the Nicest correction for cloud substructure. Pnicer is offered to the community as an open-source software solution and is entirely written in Python.

Modeling language and cognition with deep unsupervised learning: a tutorial overview.

Science.gov (United States)

Zorzi, Marco; Testolin, Alberto; Stoianov, Ivilin P

2013-01-01

Deep unsupervised learning in stochastic recurrent neural networks with many layers of hidden units is a recent breakthrough in neural computation research. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. In this article we discuss the theoretical foundations of this approach and we review key issues related to training, testing and analysis of deep networks for modeling language and cognitive processing. The classic letter and word perception problem of McClelland and Rumelhart (1981) is used as a tutorial example to illustrate how structured and abstract representations may emerge from deep generative learning. We argue that the focus on deep architectures and generative (rather than discriminative) learning represents a crucial step forward for the connectionist modeling enterprise, because it offers a more plausible model of cortical learning as well as a way to bridge the gap between emergentist connectionist models and structured Bayesian models of cognition.
Modeling Language and Cognition with Deep Unsupervised Learning:A Tutorial Overview

Directory of Open Access Journals (Sweden)

Marco eZorzi

2013-08-01

Full Text Available Deep unsupervised learning in stochastic recurrent neural networks with many layers of hidden units is a recent breakthrough in neural computation research. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. In this article we discuss the theoretical foundations of this approach and we review key issues related to training, testing and analysis of deep networks for modeling language and cognitive processing. The classic letter and word perception problem of McClelland and Rumelhart (1981 is used as a tutorial example to illustrate how structured and abstract representations may emerge from deep generative learning. We argue that the focus on deep architectures and generative (rather than discriminative learning represents a crucial step forward for the connectionist modeling enterprise, because it offers a more plausible model of cortical learning as well as way to bridge the gap between emergentist connectionist models and structured Bayesian models of cognition.
Towards unsupervised polyaromatic hydrocarbons structural assignment from SA-TIMS-FTMS data.

Science.gov (United States)

Benigni, Paolo; Marin, Rebecca; Fernandez-Lima, Francisco

2015-10-01

With the advent of high resolution ion mobility analyzers and their coupling to ultrahigh resolution mass spectrometers, there is a need to further develop a theoretical workflow capable of correlating experimental accurate mass and mobility measurements with tridimensional candidate structures. In the present work, a general workflow is described for unsupervised tridimensional structural assignment based on accurate mass measurements, mobility measurements, in silico 2D-3D structure generation, and theoretical mobility calculations. In particular, the potential of this workflow will be shown for the analysis of polyaromatic hydrocarbons from Coal Tar SRM 1597a using selected accumulation - trapped ion mobility spectrometry (SA-TIMS) coupled to Fourier transform-ion cyclotron resonance mass spectrometry (FT-ICR MS). The proposed workflow can be adapted to different IMS scenarios, can utilize different collisional cross-section calculators and has the potential to include MS n and IMS n measurements for faster and more accurate tridimensional structural assignment.
Modeling language and cognition with deep unsupervised learning: a tutorial overview

Science.gov (United States)

Zorzi, Marco; Testolin, Alberto; Stoianov, Ivilin P.

2013-01-01

Deep unsupervised learning in stochastic recurrent neural networks with many layers of hidden units is a recent breakthrough in neural computation research. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. In this article we discuss the theoretical foundations of this approach and we review key issues related to training, testing and analysis of deep networks for modeling language and cognitive processing. The classic letter and word perception problem of McClelland and Rumelhart (1981) is used as a tutorial example to illustrate how structured and abstract representations may emerge from deep generative learning. We argue that the focus on deep architectures and generative (rather than discriminative) learning represents a crucial step forward for the connectionist modeling enterprise, because it offers a more plausible model of cortical learning as well as a way to bridge the gap between emergentist connectionist models and structured Bayesian models of cognition. PMID:23970869
Monitoring of Oil Exploitation Infrastructure by Combining Unsupervised Pixel-Based Classification of Polarimetric SAR and Object-Based Image Analysis

Directory of Open Access Journals (Sweden)

Simon Plank

2014-12-01

Full Text Available In developing countries, there is a high correlation between the dependence of oil exports and violent conflicts. Furthermore, even in countries which experienced a peaceful development of their oil industry, land use and environmental issues occur. Therefore, independent monitoring of oil field infrastructure may support problem solving. Earth observation data enables fast monitoring of large areas which allows comparing the real amount of land used by the oil exploitation and the companies’ contractual obligations. The target feature of this monitoring is the infrastructure of the oil exploitation, oil well pads—rectangular features of bare land covering an area of approximately 50–60 m × 100 m. This article presents an automated feature extraction procedure based on the combination of a pixel-based unsupervised classification of polarimetric synthetic aperture radar data (PolSAR and an object-based post-classification. The method is developed and tested using dual-polarimetric TerraSAR-X imagery acquired over the Doba basin in south Chad. The advantages of PolSAR are independence of the cloud coverage (vs. optical imagery and the possibility of detailed land use classification (vs. single-pol SAR. The PolSAR classification uses the polarimetric Wishart probability density function based on the anisotropy/entropy/alpha decomposition. The object-based post-classification refinement, based on properties of the feature targets such as shape and area, increases the user’s accuracy of the methodology by an order of a magnitude. The final achieved user’s and producer’s accuracy is 59%–71% in each case (area based accuracy assessment. Considering only the numbers of correctly/falsely detected oil well pads, the user’s and producer’s accuracies increase to even 74%–89%. In an iterative training procedure the best suited polarimetric speckle filter and processing parameters of the developed feature extraction procedure are
Preliminary hard and soft bottom seafloor substrate map derived from an unsupervised classification of gridded backscatter and bathymetry derivatives at Swains Island, Territory of American Samoa, USA.

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — Preliminary hard and soft seafloor substrate map derived from an unsupervised classification of multibeam backscatter and bathymetry derivatives at Swains Island,...
Spatial cluster detection for repeatedly measured outcomes while accounting for residential history.

Science.gov (United States)

Cook, Andrea J; Gold, Diane R; Li, Yi

2009-10-01

Spatial cluster detection has become an important methodology in quantifying the effect of hazardous exposures. Previous methods have focused on cross-sectional outcomes that are binary or continuous. There are virtually no spatial cluster detection methods proposed for longitudinal outcomes. This paper proposes a new spatial cluster detection method for repeated outcomes using cumulative geographic residuals. A major advantage of this method is its ability to readily incorporate information on study participants relocation, which most cluster detection statistics cannot. Application of these methods will be illustrated by the Home Allergens and Asthma prospective cohort study analyzing the relationship between environmental exposures and repeated measured outcome, occurrence of wheeze in the last 6 months, while taking into account mobile locations.
Support Vector Data Descriptions and k-Means Clustering: One Class?

Science.gov (United States)

Gornitz, Nico; Lima, Luiz Alberto; Muller, Klaus-Robert; Kloft, Marius; Nakajima, Shinichi

2017-09-27

We present ClusterSVDD, a methodology that unifies support vector data descriptions (SVDDs) and k-means clustering into a single formulation. This allows both methods to benefit from one another, i.e., by adding flexibility using multiple spheres for SVDDs and increasing anomaly resistance and flexibility through kernels to k-means. In particular, our approach leads to a new interpretation of k-means as a regularized mode seeking algorithm. The unifying formulation further allows for deriving new algorithms by transferring knowledge from one-class learning settings to clustering settings and vice versa. As a showcase, we derive a clustering method for structured data based on a one-class learning scenario. Additionally, our formulation can be solved via a particularly simple optimization scheme. We evaluate our approach empirically to highlight some of the proposed benefits on artificially generated data, as well as on real-world problems, and provide a Python software package comprising various implementations of primal and dual SVDD as well as our proposed ClusterSVDD.
Unsupervised grammar induction of clinical report sublanguage

Directory of Open Access Journals (Sweden)

Kate Rohit J

2012-10-01

Full Text Available Abstract Background Clinical reports are written using a subset of natural language while employing many domain-specific terms; such a language is also known as a sublanguage for a scientific or a technical domain. Different genres of clinical reports use different sublaguages, and in addition, different medical facilities use different medical language conventions. This makes supervised training of a parser for clinical sentences very difficult as it would require expensive annotation effort to adapt to every type of clinical text. Methods In this paper, we present an unsupervised method which automatically induces a grammar and a parser for the sublanguage of a given genre of clinical reports from a corpus with no annotations. In order to capture sentence structures specific to clinical domains, the grammar is induced in terms of semantic classes of clinical terms in addition to part-of-speech tags. Our method induces grammar by minimizing the combined encoding cost of the grammar and the corresponding sentence derivations. The probabilities for the productions of the induced grammar are then learned from the unannotated corpus using an instance of the expectation-maximization algorithm. Results Our experiments show that the induced grammar is able to parse novel sentences. Using a dataset of discharge summary sentences with no annotations, our method obtains 60.5% F-measure for parse-bracketing on sentences of maximum length 10. By varying a parameter, the method can induce a range of grammars, from very specific to very general, and obtains the best performance in between the two extremes.
Unsupervised grammar induction of clinical report sublanguage.

Science.gov (United States)

Kate, Rohit J

2012-10-05

Clinical reports are written using a subset of natural language while employing many domain-specific terms; such a language is also known as a sublanguage for a scientific or a technical domain. Different genres of clinical reports use different sublaguages, and in addition, different medical facilities use different medical language conventions. This makes supervised training of a parser for clinical sentences very difficult as it would require expensive annotation effort to adapt to every type of clinical text. In this paper, we present an unsupervised method which automatically induces a grammar and a parser for the sublanguage of a given genre of clinical reports from a corpus with no annotations. In order to capture sentence structures specific to clinical domains, the grammar is induced in terms of semantic classes of clinical terms in addition to part-of-speech tags. Our method induces grammar by minimizing the combined encoding cost of the grammar and the corresponding sentence derivations. The probabilities for the productions of the induced grammar are then learned from the unannotated corpus using an instance of the expectation-maximization algorithm. Our experiments show that the induced grammar is able to parse novel sentences. Using a dataset of discharge summary sentences with no annotations, our method obtains 60.5% F-measure for parse-bracketing on sentences of maximum length 10. By varying a parameter, the method can induce a range of grammars, from very specific to very general, and obtains the best performance in between the two extremes.
Joint Clustering and Component Analysis of Correspondenceless Point Sets: Application to Cardiac Statistical Modeling.

Science.gov (United States)

Gooya, Ali; Lekadir, Karim; Alba, Xenia; Swift, Andrew J; Wild, Jim M; Frangi, Alejandro F

2015-01-01

Construction of Statistical Shape Models (SSMs) from arbitrary point sets is a challenging problem due to significant shape variation and lack of explicit point correspondence across the training data set. In medical imaging, point sets can generally represent different shape classes that span healthy and pathological exemplars. In such cases, the constructed SSM may not generalize well, largely because the probability density function (pdf) of the point sets deviates from the underlying assumption of Gaussian statistics. To this end, we propose a generative model for unsupervised learning of the pdf of point sets as a mixture of distinctive classes. A Variational Bayesian (VB) method is proposed for making joint inferences on the labels of point sets, and the principal modes of variations in each cluster. The method provides a flexible framework to handle point sets with no explicit point-to-point correspondences. We also show that by maximizing the marginalized likelihood of the model, the optimal number of clusters of point sets can be determined. We illustrate this work in the context of understanding the anatomical phenotype of the left and right ventricles in heart. To this end, we use a database containing hearts of healthy subjects, patients with Pulmonary Hypertension (PH), and patients with Hypertrophic Cardiomyopathy (HCM). We demonstrate that our method can outperform traditional PCA in both generalization and specificity measures.
A CLUSTERING OF DJA STOCKS - THE APPLICATION IN FINANCE OF A METHOD FIRST USED IN GENE TRAJECTORY STUDY

Directory of Open Access Journals (Sweden)

Silaghi Gheorghe Cosmin

2009-05-01

Full Text Available Previously we employed the Gene Trajectory Clustering methodology to search for different associations of the stocks composing the DJA index, with the aim of finding different, logic clusters, supported by economic reasons, preferably different than the
Cluster concentrations in correlated and non-correlated continuum percolation problems

International Nuclear Information System (INIS)

Borstnik, B.; Jesudason, C.G.; Lukman, D.

1996-01-01

The methodologies are developed how to evaluate properties of clusters of correlated and non-correlated particles. As an example of correlated particles, the two dimensional hard core disks with attractive square well potential are taken. Narrow and deep square well potential is used in order to mimic the adhesive potential, suitable for modeling of colloidal systems. Permeable disks in two dimensions are taken as an example of non-correlated systems. In both cases the dependence of cluster concentrations upon the density of particles is studied. Percolation threshold densities and critical exponents which govern the zeroth, first and second moments of cluster distributions are evaluated. It is found that the calculation of density dependence of cluster concentrations gives enough information to evaluate the percolation threshold density, some critical exponents, as well as to reproduce the Rushbrooke scaling law
CRED Preliminary hard and soft bottom seafloor substrate map derived from an unsupervised classification of gridded backscatter and bathymetry derivatives at the U.S. Territory of Guam.

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — Preliminary hard and soft seafloor substrate map derived from an unsupervised classification of multibeam backscatter and bathymety derivatives at the U.S. Territory...
Cluster analysis of signal-intensity time course in dynamic breast MRI: does unsupervised vector quantization help to evaluate small mammographic lesions?

Energy Technology Data Exchange (ETDEWEB)

Leinsinger, Gerda; Schlossbauer, Thomas; Scherr, Michael; Lange, Oliver; Reiser, Maximilian; Wismueller, Axel [Institute for Clinical Radiology University of Munich, Munich (Germany)

2006-05-15

We examined whether neural network clustering could support the characterization of diagnostically challenging breast lesions in dynamic magnetic resonance imaging (MRI). We examined 88 patients with 92 breast lesions (51 malignant, 41 benign). Lesions were detected by mammography and classified Breast Imaging and Reporting Data System (BIRADS) III (median diameter 14 mm). MRI was performed with a dynamic T1-weighted gradient echo sequence (one precontrast and five postcontrast series). Lesions with an initial contrast enhancement {>=}50% were selected with semiautomatic segmentation. For conventional analysis, we calculated the mean initial signal increase and postinitial course of all voxels included in a lesion. Secondly, all voxels within the lesions were divided into four clusters using minimal-free-energy vector quantization (VQ). With conventional analysis, maximum accuracy in detecting breast cancer was 71%. With VQ, a maximum accuracy of 75% was observed. The slight improvement using VQ was mainly achieved by an increase of sensitivity, especially in invasive lobular carcinoma and ductal carcinoma in situ (DCIS). For lesion size, a high correlation between different observers was found (R{sup 2} = 0.98). VQ slightly improved the discrimination between malignant and benign indeterminate lesions (BIRADS III) in comparison with a standard evaluation method. (orig.)
Unsupervised learning by spike timing dependent plasticity in phase change memory (PCM synapses

Directory of Open Access Journals (Sweden)

Stefano eAmbrogio

2016-03-01

Full Text Available We present a novel one-transistor/one-resistor (1T1R synapse for neuromorphic networks, based on phase change memory (PCM technology. The synapse is capable of spike-timing dependent plasticity (STDP, where gradual potentiation relies on set transition, namely crystallization, in the PCM, while depression is achieved via reset or amorphization of a chalcogenide active volume. STDP characteristics are demonstrated by experiments under variable initial conditions and number of pulses. Finally, we support the applicability of the 1T1R synapse for learning and recognition of visual patterns by simulations of fully connected neuromorphic networks with 2 or 3 layers with high recognition efficiency. The proposed scheme provides a feasible low-power solution for on-line unsupervised machine learning in smart reconfigurable sensors.
The use of hierarchical clustering for the design of optimized monitoring networks

Science.gov (United States)

Soares, Joana; Makar, Paul Andrew; Aklilu, Yayne; Akingunola, Ayodeji

2018-05-01

Associativity analysis is a powerful tool to deal with large-scale datasets by clustering the data on the basis of (dis)similarity and can be used to assess the efficacy and design of air quality monitoring networks. We describe here our use of Kolmogorov-Zurbenko filtering and hierarchical clustering of NO2 and SO2 passive and continuous monitoring data to analyse and optimize air quality networks for these species in the province of Alberta, Canada. The methodology applied in this study assesses dissimilarity between monitoring station time series based on two metrics: 1 - R, R being the Pearson correlation coefficient, and the Euclidean distance; we find that both should be used in evaluating monitoring site similarity. We have combined the analytic power of hierarchical clustering with the spatial information provided by deterministic air quality model results, using the gridded time series of model output as potential station locations, as a proxy for assessing monitoring network design and for network optimization. We demonstrate that clustering results depend on the air contaminant analysed, reflecting the difference in the respective emission sources of SO2 and NO2 in the region under study. Our work shows that much of the signal identifying the sources of NO2 and SO2 emissions resides in shorter timescales (hourly to daily) due to short-term variation of concentrations and that longer-term averages in data collection may lose the information needed to identify local sources. However, the methodology identifies stations mainly influenced by seasonality, if larger timescales (weekly to monthly) are considered. We have performed the first dissimilarity analysis based on gridded air quality model output and have shown that the methodology is capable of generating maps of subregions within which a single station will represent the entire subregion, to a given level of dissimilarity. We have also shown that our approach is capable of identifying different
Cluster model of s- and p-shell ΛΛ hypernuclei

Indian Academy of Sciences (India)

simplifications the use of cluster model to S = −2 systems has given ..... constructed from Nijmegen soft-core NSC97e potential and are denoted as V e1. ΛΛ ..... This convergence of results reinforces the confidence in the methodology of all the.
Clustering consumers based on trust, confidence and giving behaviour: data-driven model building for charitable involvement in the Australian not-for-profit sector.

Science.gov (United States)

de Vries, Natalie Jane; Reis, Rodrigo; Moscato, Pablo

2015-01-01

Organisations in the Not-for-Profit and charity sector face increasing competition to win time, money and efforts from a common donor base. Consequently, these organisations need to be more proactive than ever. The increased level of communications between individuals and organisations today, heightens the need for investigating the drivers of charitable giving and understanding the various consumer groups, or donor segments, within a population. It is contended that `trust' is the cornerstone of the not-for-profit sector's survival, making it an inevitable topic for research in this context. It has become imperative for charities and not-for-profit organisations to adopt for-profit's research, marketing and targeting strategies. This study provides the not-for-profit sector with an easily-interpretable segmentation method based on a novel unsupervised clustering technique (MST-kNN) followed by a feature saliency method (the CM1 score). A sample of 1,562 respondents from a survey conducted by the Australian Charities and Not-for-profits Commission is analysed to reveal donor segments. Each cluster's most salient features are identified using the CM1 score. Furthermore, symbolic regression modelling is employed to find cluster-specific models to predict `low' or `high' involvement in clusters. The MST-kNN method found seven clusters. Based on their salient features they were labelled as: the `non-institutionalist charities supporters', the `resource allocation critics', the `information-seeking financial sceptics', the `non-questioning charity supporters', the `non-trusting sceptics', the `charity management believers' and the `institutionalist charity believers'. Each cluster exhibits their own characteristics as well as different drivers of `involvement'. The method in this study provides the not-for-profit sector with a guideline for clustering, segmenting, understanding and potentially targeting their donor base better. If charities and not
Clustering consumers based on trust, confidence and giving behaviour: data-driven model building for charitable involvement in the Australian not-for-profit sector.

Directory of Open Access Journals (Sweden)

Natalie Jane de Vries

Full Text Available Organisations in the Not-for-Profit and charity sector face increasing competition to win time, money and efforts from a common donor base. Consequently, these organisations need to be more proactive than ever. The increased level of communications between individuals and organisations today, heightens the need for investigating the drivers of charitable giving and understanding the various consumer groups, or donor segments, within a population. It is contended that `trust' is the cornerstone of the not-for-profit sector's survival, making it an inevitable topic for research in this context. It has become imperative for charities and not-for-profit organisations to adopt for-profit's research, marketing and targeting strategies. This study provides the not-for-profit sector with an easily-interpretable segmentation method based on a novel unsupervised clustering technique (MST-kNN followed by a feature saliency method (the CM1 score. A sample of 1,562 respondents from a survey conducted by the Australian Charities and Not-for-profits Commission is analysed to reveal donor segments. Each cluster's most salient features are identified using the CM1 score. Furthermore, symbolic regression modelling is employed to find cluster-specific models to predict `low' or `high' involvement in clusters. The MST-kNN method found seven clusters. Based on their salient features they were labelled as: the `non-institutionalist charities supporters', the `resource allocation critics', the `information-seeking financial sceptics', the `non-questioning charity supporters', the `non-trusting sceptics', the `charity management believers' and the `institutionalist charity believers'. Each cluster exhibits their own characteristics as well as different drivers of `involvement'. The method in this study provides the not-for-profit sector with a guideline for clustering, segmenting, understanding and potentially targeting their donor base better. If charities and not

Some links on this page may take you to non-federal websites. Their policies may differ from this site.