WorldWideScience

Sample records for unsupervised clustering techniques

  1. Unsupervised color image segmentation using a lattice algebra clustering technique

    Science.gov (United States)

    Urcid, Gonzalo; Ritter, Gerhard X.

    2011-08-01

    In this paper we introduce a lattice algebra clustering technique for segmenting digital images in the Red-Green- Blue (RGB) color space. The proposed technique is a two step procedure. Given an input color image, the first step determines the finite set of its extreme pixel vectors within the color cube by means of the scaled min-W and max-M lattice auto-associative memory matrices, including the minimum and maximum vector bounds. In the second step, maximal rectangular boxes enclosing each extreme color pixel are found using the Chebychev distance between color pixels; afterwards, clustering is performed by assigning each image pixel to its corresponding maximal box. The two steps in our proposed method are completely unsupervised or autonomous. Illustrative examples are provided to demonstrate the color segmentation results including a brief numerical comparison with two other non-maximal variations of the same clustering technique.

  2. Clustervision: Visual Supervision of Unsupervised Clustering.

    Science.gov (United States)

    Kwon, Bum Chul; Eysenbach, Ben; Verma, Janu; Ng, Kenney; De Filippi, Christopher; Stewart, Walter F; Perer, Adam

    2018-01-01

    Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While having access to a wide variety of algorithms is helpful, in practice, it is quite difficult for data scientists to choose and parameterize algorithms to get the clustering results relevant for their dataset and analytical tasks. To alleviate this problem, we built Clustervision, a visual analytics tool that helps ensure data scientists find the right clustering among the large amount of techniques and parameters available. Our system clusters data using a variety of clustering techniques and parameters and then ranks clustering results utilizing five quality metrics. In addition, users can guide the system to produce more relevant results by providing task-relevant constraints on the data. Our visual user interface allows users to find high quality clustering results, explore the clusters using several coordinated visualization techniques, and select the cluster result that best suits their task. We demonstrate this novel approach using a case study with a team of researchers in the medical domain and showcase that our system empowers users to choose an effective representation of their complex data.

  3. Factored Translation with Unsupervised Word Clusters

    DEFF Research Database (Denmark)

    Rishøj, Christian; Søgaard, Anders

    2011-01-01

    Unsupervised word clustering algorithms — which form word clusters based on a measure of distributional similarity — have proven to be useful in providing beneficial features for various natural language processing tasks involving supervised learning. This work explores the utility of such word...... clusters as factors in statistical machine translation. Although some of the language pairs in this work clearly benefit from the factor augmentation, there is no consistent improvement in translation accuracy across the board. For all language pairs, the word clusters clearly improve translation for some...... proportion of the sentences in the test set, but has a weak or even detrimental effect on the rest. It is shown that if one could determine whether or not to use a factor when translating a given sentence, rather substantial improvements in precision could be achieved for all of the language pairs evaluated...

  4. Application of cluster analysis and unsupervised learning to multivariate tissue characterization

    International Nuclear Information System (INIS)

    Momenan, R.; Insana, M.F.; Wagner, R.F.; Garra, B.S.; Loew, M.H.

    1987-01-01

    This paper describes a procedure for classifying tissue types from unlabeled acoustic measurements (data type unknown) using unsupervised cluster analysis. These techniques are being applied to unsupervised ultrasonic image segmentation and tissue characterization. The performance of a new clustering technique is measured and compared with supervised methods, such as a linear Bayes classifier. In these comparisons two objectives are sought: a) How well does the clustering method group the data?; b) Do the clusters correspond to known tissue classes? The first question is investigated by a measure of cluster similarity and dispersion. The second question involves a comparison with a supervised technique using labeled data

  5. Performance Analysis of Unsupervised Clustering Methods for Brain Tumor Segmentation

    Directory of Open Access Journals (Sweden)

    Tushar H Jaware

    2013-10-01

    Full Text Available Medical image processing is the most challenging and emerging field of neuroscience. The ultimate goal of medical image analysis in brain MRI is to extract important clinical features that would improve methods of diagnosis & treatment of disease. This paper focuses on methods to detect & extract brain tumour from brain MR images. MATLAB is used to design, software tool for locating brain tumor, based on unsupervised clustering methods. K-Means clustering algorithm is implemented & tested on data base of 30 images. Performance evolution of unsupervised clusteringmethods is presented.

  6. Unsupervised Learning (Clustering) of Odontocete Echolocation Clicks

    Science.gov (United States)

    2015-09-30

    develop methods for clustering of marine mammal echolocation clicks to learn about species assemblages where little or no prior knowledge exists about... Mexico or the Atlanic. 2 APPROACH Acoustic encounters with odontocetes are detected automatically and noise-corrected cepstral features...Estmation of Marine Mammals Using Passive Acoustic Monitoring (DCLDE). KL divergence maps were created for all known species, but the sperm whale

  7. Misty Mountain clustering: application to fast unsupervised flow cytometry gating

    Directory of Open Access Journals (Sweden)

    Sealfon Stuart C

    2010-10-01

    Full Text Available Abstract Background There are many important clustering questions in computational biology for which no satisfactory method exists. Automated clustering algorithms, when applied to large, multidimensional datasets, such as flow cytometry data, prove unsatisfactory in terms of speed, problems with local minima or cluster shape bias. Model-based approaches are restricted by the assumptions of the fitting functions. Furthermore, model based clustering requires serial clustering for all cluster numbers within a user defined interval. The final cluster number is then selected by various criteria. These supervised serial clustering methods are time consuming and frequently different criteria result in different optimal cluster numbers. Various unsupervised heuristic approaches that have been developed such as affinity propagation are too expensive to be applied to datasets on the order of 106 points that are often generated by high throughput experiments. Results To circumvent these limitations, we developed a new, unsupervised density contour clustering algorithm, called Misty Mountain, that is based on percolation theory and that efficiently analyzes large data sets. The approach can be envisioned as a progressive top-down removal of clouds covering a data histogram relief map to identify clusters by the appearance of statistically distinct peaks and ridges. This is a parallel clustering method that finds every cluster after analyzing only once the cross sections of the histogram. The overall run time for the composite steps of the algorithm increases linearly by the number of data points. The clustering of 106 data points in 2D data space takes place within about 15 seconds on a standard laptop PC. Comparison of the performance of this algorithm with other state of the art automated flow cytometry gating methods indicate that Misty Mountain provides substantial improvements in both run time and in the accuracy of cluster assignment. Conclusions

  8. GibbsCluster: unsupervised clustering and alignment of peptide sequences

    DEFF Research Database (Denmark)

    Andreatta, Massimo; Alvarez, Bruno; Nielsen, Morten

    2017-01-01

    motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large......-scale peptidome data generated by mass spectrometry. The server is available at http://www.cbs.dtu.dk/services/GibbsCluster-2.0....

  9. Artificial immune kernel clustering network for unsupervised image segmentation

    Institute of Scientific and Technical Information of China (English)

    Wenlong Huang; Licheng Jiao

    2008-01-01

    An immune kernel clustering network (IKCN) is proposed based on the combination of the artificial immune network and the support vector domain description (SVDD) for the unsupervised image segmentation. In the network, a new antibody neighborhood and an adaptive learning coefficient, which is inspired by the long-term memory in cerebral cortices are presented. Starting from IKCN algorithm, we divide the image feature sets into subsets by the antibodies, and then map each subset into a high dimensional feature space by a mercer kernel, where each antibody neighborhood is represented as a support vector hypersphere. The clustering results of the local support vector hyperspheres are combined to yield a global clustering solution by the minimal spanning tree (MST), where a predefined number of clustering is not needed. We compare the proposed methods with two common clustering algorithms for the artificial synthetic data set and several image data sets, including the synthetic texture images and the SAR images, and encouraging experimental results are obtained.

  10. Unsupervised active learning based on hierarchical graph-theoretic clustering.

    Science.gov (United States)

    Hu, Weiming; Hu, Wei; Xie, Nianhua; Maybank, Steve

    2009-10-01

    Most existing active learning approaches are supervised. Supervised active learning has the following problems: inefficiency in dealing with the semantic gap between the distribution of samples in the feature space and their labels, lack of ability in selecting new samples that belong to new categories that have not yet appeared in the training samples, and lack of adaptability to changes in the semantic interpretation of sample categories. To tackle these problems, we propose an unsupervised active learning framework based on hierarchical graph-theoretic clustering. In the framework, two promising graph-theoretic clustering algorithms, namely, dominant-set clustering and spectral clustering, are combined in a hierarchical fashion. Our framework has some advantages, such as ease of implementation, flexibility in architecture, and adaptability to changes in the labeling. Evaluations on data sets for network intrusion detection, image classification, and video classification have demonstrated that our active learning framework can effectively reduce the workload of manual classification while maintaining a high accuracy of automatic classification. It is shown that, overall, our framework outperforms the support-vector-machine-based supervised active learning, particularly in terms of dealing much more efficiently with new samples whose categories have not yet appeared in the training samples.

  11. Unsupervised Two-Way Clustering of Metagenomic Sequences

    Directory of Open Access Journals (Sweden)

    Shruthi Prabhakara

    2012-01-01

    Full Text Available A major challenge facing metagenomics is the development of tools for the characterization of functional and taxonomic content of vast amounts of short metagenome reads. The efficacy of clustering methods depends on the number of reads in the dataset, the read length and relative abundances of source genomes in the microbial community. In this paper, we formulate an unsupervised naive Bayes multispecies, multidimensional mixture model for reads from a metagenome. We use the proposed model to cluster metagenomic reads by their species of origin and to characterize the abundance of each species. We model the distribution of word counts along a genome as a Gaussian for shorter, frequent words and as a Poisson for longer words that are rare. We employ either a mixture of Gaussians or mixture of Poissons to model reads within each bin. Further, we handle the high-dimensionality and sparsity associated with the data, by grouping the set of words comprising the reads, resulting in a two-way mixture model. Finally, we demonstrate the accuracy and applicability of this method on simulated and real metagenomes. Our method can accurately cluster reads as short as 100 bps and is robust to varying abundances, divergences and read lengths.

  12. A Distributed Algorithm for the Cluster-Based Outlier Detection Using Unsupervised Extreme Learning Machines

    Directory of Open Access Journals (Sweden)

    Xite Wang

    2017-01-01

    Full Text Available Outlier detection is an important data mining task, whose target is to find the abnormal or atypical objects from a given dataset. The techniques for detecting outliers have a lot of applications, such as credit card fraud detection and environment monitoring. Our previous work proposed the Cluster-Based (CB outlier and gave a centralized method using unsupervised extreme learning machines to compute CB outliers. In this paper, we propose a new distributed algorithm for the CB outlier detection (DACB. On the master node, we collect a small number of points from the slave nodes to obtain a threshold. On each slave node, we design a new filtering method that can use the threshold to efficiently speed up the computation. Furthermore, we also propose a ranking method to optimize the order of cluster scanning. At last, the effectiveness and efficiency of the proposed approaches are verified through a plenty of simulation experiments.

  13. Unsupervised clustering with spiking neurons by sparse temporal coding and multi-layer RBF networks

    NARCIS (Netherlands)

    S.M. Bohte (Sander); J.A. La Poutré (Han); J.N. Kok (Joost)

    2000-01-01

    textabstractWe demonstrate that spiking neural networks encoding information in spike times are capable of computing and learning clusters from realistic data. We show how a spiking neural network based on spike-time coding and Hebbian learning can successfully perform unsupervised clustering on

  14. Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning.

    Science.gov (United States)

    Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi; Mao, Youdong

    2017-01-01

    Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.

  15. Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning.

    Directory of Open Access Journals (Sweden)

    Jiayi Wu

    Full Text Available Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM. We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.

  16. Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm.

    Science.gov (United States)

    Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong

    2016-01-01

    In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis.

  17. The composite sequential clustering technique for analysis of multispectral scanner data

    Science.gov (United States)

    Su, M. Y.

    1972-01-01

    The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.

  18. Rough-fuzzy clustering and unsupervised feature selection for wavelet based MR image segmentation.

    Directory of Open Access Journals (Sweden)

    Pradipta Maji

    Full Text Available Image segmentation is an indispensable process in the visualization of human tissues, particularly during clinical analysis of brain magnetic resonance (MR images. For many human experts, manual segmentation is a difficult and time consuming task, which makes an automated brain MR image segmentation method desirable. In this regard, this paper presents a new segmentation method for brain MR images, integrating judiciously the merits of rough-fuzzy computing and multiresolution image analysis technique. The proposed method assumes that the major brain tissues, namely, gray matter, white matter, and cerebrospinal fluid from the MR images are considered to have different textural properties. The dyadic wavelet analysis is used to extract the scale-space feature vector for each pixel, while the rough-fuzzy clustering is used to address the uncertainty problem of brain MR image segmentation. An unsupervised feature selection method is introduced, based on maximum relevance-maximum significance criterion, to select relevant and significant textural features for segmentation problem, while the mathematical morphology based skull stripping preprocessing step is proposed to remove the non-cerebral tissues like skull. The performance of the proposed method, along with a comparison with related approaches, is demonstrated on a set of synthetic and real brain MR images using standard validity indices.

  19. ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data.

    Science.gov (United States)

    Oluwadare, Oluwatosin; Cheng, Jianlin

    2017-11-14

    With the development of chromosomal conformation capturing techniques, particularly, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The Hi-C technique can generate genome-wide chromosomal interaction (contact) data, which can be used to investigate the higher-level organization of chromosomes, such as Topologically Associated Domains (TAD), i.e., locally packed chromosome regions bounded together by intra chromosomal contacts. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function. Here, we formulate the TAD identification problem as an unsupervised machine learning (clustering) problem, and develop a new TAD identification method called ClusterTAD. We introduce a novel method to represent chromosomal contacts as features to be used by the clustering algorithm. Our results show that ClusterTAD can accurately predict the TADs on a simulated Hi-C data. Our method is also largely complementary and consistent with existing methods on the real Hi-C datasets of two mouse cells. The validation with the chromatin immunoprecipitation (ChIP) sequencing (ChIP-Seq) data shows that the domain boundaries identified by ClusterTAD have a high enrichment of CTCF binding sites, promoter-related marks, and enhancer-related histone modifications. As ClusterTAD is based on a proven clustering approach, it opens a new avenue to apply a large array of clustering methods developed in the machine learning field to the TAD identification problem. The source code, the results, and the TADs generated for the simulated and real Hi-C datasets are available here: https://github.com/BDM-Lab/ClusterTAD .

  20. Data mining with unsupervised clustering using photonic micro-ring resonators

    Science.gov (United States)

    McAulay, Alastair D.

    2013-09-01

    Data is commonly moved through optical fiber in modern data centers and may be stored optically. We propose an optical method of data mining for future data centers to enhance performance. For example, in clustering, a form of unsupervised learning, we propose that parameters corresponding to information in a database are converted from analog values to frequencies, as in the brain's neurons, where similar data will have close frequencies. We describe the Wilson-Cowan model for oscillating neurons. In optics we implement the frequencies with micro ring resonators. Due to the influence of weak coupling, a group of resonators will form clusters of similar frequencies that will indicate the desired parameters having close relations. Fewer clusters are formed as clustering proceeds, which allows the creation of a tree showing topics of importance and their relationships in the database. The tree can be used for instance to target advertising and for planning.

  1. Unsupervised Performance Evaluation Strategy for Bridge Superstructure Based on Fuzzy Clustering and Field Data

    Directory of Open Access Journals (Sweden)

    Yubo Jiao

    2013-01-01

    Full Text Available Performance evaluation of a bridge is critical for determining the optimal maintenance strategy. An unsupervised bridge superstructure state assessment method is proposed in this paper based on fuzzy clustering and bridge field measured data. Firstly, the evaluation index system of bridge is constructed. Secondly, a certain number of bridge health monitoring data are selected as clustering samples to obtain the fuzzy similarity matrix and fuzzy equivalent matrix. Finally, different thresholds are selected to form dynamic clustering maps and determine the best classification based on statistic analysis. The clustering result is regarded as a sample base, and the bridge state can be evaluated by calculating the fuzzy nearness between the unknown bridge state data and the sample base. Nanping Bridge in Jilin Province is selected as the engineering project to verify the effectiveness of the proposed method.

  2. Hierarchical Adaptive Means (HAM) clustering for hardware-efficient, unsupervised and real-time spike sorting.

    Science.gov (United States)

    Paraskevopoulou, Sivylla E; Wu, Di; Eftekhar, Amir; Constandinou, Timothy G

    2014-09-30

    This work presents a novel unsupervised algorithm for real-time adaptive clustering of neural spike data (spike sorting). The proposed Hierarchical Adaptive Means (HAM) clustering method combines centroid-based clustering with hierarchical cluster connectivity to classify incoming spikes using groups of clusters. It is described how the proposed method can adaptively track the incoming spike data without requiring any past history, iteration or training and autonomously determines the number of spike classes. Its performance (classification accuracy) has been tested using multiple datasets (both simulated and recorded) achieving a near-identical accuracy compared to k-means (using 10-iterations and provided with the number of spike classes). Also, its robustness in applying to different feature extraction methods has been demonstrated by achieving classification accuracies above 80% across multiple datasets. Last but crucially, its low complexity, that has been quantified through both memory and computation requirements makes this method hugely attractive for future hardware implementation. Copyright © 2014 Elsevier B.V. All rights reserved.

  3. Unsupervised Approach Data Analysis Based on Fuzzy Possibilistic Clustering: Application to Medical Image MRI

    Directory of Open Access Journals (Sweden)

    Nour-Eddine El Harchaoui

    2013-01-01

    Full Text Available The analysis and processing of large data are a challenge for researchers. Several approaches have been used to model these complex data, and they are based on some mathematical theories: fuzzy, probabilistic, possibilistic, and evidence theories. In this work, we propose a new unsupervised classification approach that combines the fuzzy and possibilistic theories; our purpose is to overcome the problems of uncertain data in complex systems. We used the membership function of fuzzy c-means (FCM to initialize the parameters of possibilistic c-means (PCM, in order to solve the problem of coinciding clusters that are generated by PCM and also overcome the weakness of FCM to noise. To validate our approach, we used several validity indexes and we compared them with other conventional classification algorithms: fuzzy c-means, possibilistic c-means, and possibilistic fuzzy c-means. The experiments were realized on different synthetics data sets and real brain MR images.

  4. Enhancement of Tropical Land Cover Mapping with Wavelet-Based Fusion and Unsupervised Clustering of SAR and Landsat Image Data

    Science.gov (United States)

    LeMoigne, Jacqueline; Laporte, Nadine; Netanyahuy, Nathan S.; Zukor, Dorothy (Technical Monitor)

    2001-01-01

    The characterization and the mapping of land cover/land use of forest areas, such as the Central African rainforest, is a very complex task. This complexity is mainly due to the extent of such areas and, as a consequence, to the lack of full and continuous cloud-free coverage of those large regions by one single remote sensing instrument, In order to provide improved vegetation maps of Central Africa and to develop forest monitoring techniques for applications at the local and regional scales, we propose to utilize multi-sensor remote sensing observations coupled with in-situ data. Fusion and clustering of multi-sensor data are the first steps towards the development of such a forest monitoring system. In this paper, we will describe some preliminary experiments involving the fusion of SAR and Landsat image data of the Lope Reserve in Gabon. Similarly to previous fusion studies, our fusion method is wavelet-based. The fusion provides a new image data set which contains more detailed texture features and preserves the large homogeneous regions that are observed by the Thematic Mapper sensor. The fusion step is followed by unsupervised clustering and provides a vegetation map of the area.

  5. Predicting protein complexes from weighted protein-protein interaction graphs with a novel unsupervised methodology: Evolutionary enhanced Markov clustering.

    Science.gov (United States)

    Theofilatos, Konstantinos; Pavlopoulou, Niki; Papasavvas, Christoforos; Likothanassis, Spiros; Dimitrakopoulos, Christos; Georgopoulos, Efstratios; Moschopoulos, Charalampos; Mavroudi, Seferina

    2015-03-01

    Proteins are considered to be the most important individual components of biological systems and they combine to form physical protein complexes which are responsible for certain molecular functions. Despite the large availability of protein-protein interaction (PPI) information, not much information is available about protein complexes. Experimental methods are limited in terms of time, efficiency, cost and performance constraints. Existing computational methods have provided encouraging preliminary results, but they phase certain disadvantages as they require parameter tuning, some of them cannot handle weighted PPI data and others do not allow a protein to participate in more than one protein complex. In the present paper, we propose a new fully unsupervised methodology for predicting protein complexes from weighted PPI graphs. The proposed methodology is called evolutionary enhanced Markov clustering (EE-MC) and it is a hybrid combination of an adaptive evolutionary algorithm and a state-of-the-art clustering algorithm named enhanced Markov clustering. EE-MC was compared with state-of-the-art methodologies when applied to datasets from the human and the yeast Saccharomyces cerevisiae organisms. Using public available datasets, EE-MC outperformed existing methodologies (in some datasets the separation metric was increased by 10-20%). Moreover, when applied to new human datasets its performance was encouraging in the prediction of protein complexes which consist of proteins with high functional similarity. In specific, 5737 protein complexes were predicted and 72.58% of them are enriched for at least one gene ontology (GO) function term. EE-MC is by design able to overcome intrinsic limitations of existing methodologies such as their inability to handle weighted PPI networks, their constraint to assign every protein in exactly one cluster and the difficulties they face concerning the parameter tuning. This fact was experimentally validated and moreover, new

  6. Performance of some supervised and unsupervised multivariate techniques for grouping authentic and unauthentic Viagra and Cialis

    Directory of Open Access Journals (Sweden)

    Michel J. Anzanello

    2014-09-01

    Full Text Available A typical application of multivariate techniques in forensic analysis consists of discriminating between authentic and unauthentic samples of seized drugs, in addition to finding similar properties in the unauthentic samples. In this paper, the performance of several methods belonging to two different classes of multivariate techniques–supervised and unsupervised techniques–were compared. The supervised techniques (ST are the k-Nearest Neighbor (KNN, Support Vector Machine (SVM, Probabilistic Neural Networks (PNN and Linear Discriminant Analysis (LDA; the unsupervised techniques are the k-Means CA and the Fuzzy C-Means (FCM. The methods are applied to Infrared Spectroscopy by Fourier Transform (FTIR from authentic and unauthentic Cialis and Viagra. The FTIR data are also transformed by Principal Components Analysis (PCA and kernel functions aimed at improving the grouping performance. ST proved to be a more reasonable choice when the analysis is conducted on the original data, while the UT led to better results when applied to transformed data.

  7. Classification of behavior using unsupervised temporal neural networks

    International Nuclear Information System (INIS)

    Adair, K.L.

    1998-03-01

    Adding recurrent connections to unsupervised neural networks used for clustering creates a temporal neural network which clusters a sequence of inputs as they appear over time. The model presented combines the Jordan architecture with the unsupervised learning technique Adaptive Resonance Theory, Fuzzy ART. The combination yields a neural network capable of quickly clustering sequential pattern sequences as the sequences are generated. The applicability of the architecture is illustrated through a facility monitoring problem

  8. Classification and unsupervised clustering of LIGO data with Deep Transfer Learning

    Science.gov (United States)

    George, Daniel; Shen, Hongyu; Huerta, E. A.

    2018-05-01

    Gravitational wave detection requires a detailed understanding of the response of the LIGO and Virgo detectors to true signals in the presence of environmental and instrumental noise. Of particular interest is the study of anomalous non-Gaussian transients, such as glitches, since their occurrence rate in LIGO and Virgo data can obscure or even mimic true gravitational wave signals. Therefore, successfully identifying and excising these anomalies from gravitational wave data is of utmost importance for the detection and characterization of true signals and for the accurate computation of their significance. To facilitate this work, we present the first application of deep learning combined with transfer learning to show that knowledge from pretrained models for real-world object recognition can be transferred for classifying spectrograms of glitches. To showcase this new method, we use a data set of twenty-two classes of glitches, curated and labeled by the Gravity Spy project using data collected during LIGO's first discovery campaign. We demonstrate that our Deep Transfer Learning method enables an optimal use of very deep convolutional neural networks for glitch classification given small and unbalanced training data sets, significantly reduces the training time, and achieves state-of-the-art accuracy above 98.8%, lowering the previous error rate by over 60%. More importantly, once trained via transfer learning on the known classes, we show that our neural networks can be truncated and used as feature extractors for unsupervised clustering to automatically group together new unknown classes of glitches and anomalous signals. This novel capability is of paramount importance to identify and remove new types of glitches which will occur as the LIGO/Virgo detectors gradually attain design sensitivity.

  9. Technique Based on Image Pyramid and Bayes Rule for Noise Reduction in Unsupervised Change Detection

    Institute of Scientific and Technical Information of China (English)

    LI Zhi-qiang; HUO hong; FANG Tao; ZHU Ju-lian; GE Wei-li

    2009-01-01

    In this paper, a technique based on image pyramid and Bayes rule for reducing noise effects in unsupervised change detection is proposed. By using Gaussian pyramid to process two multitemporal images respectively, two image pyramids are constructed. The difference pyramid images are obtained by point-by-point subtraction between the same level images of the two image pyramids. By resizing all difference pyramid images to the size of the original multitemporal image and then making product operator among them, a map being similar to the difference image is obtained. The difference image is generated by point-by-point subtraction between the two multitemporal images directly. At last, the Bayes rule is used to distinguish the changed pixels. Both synthetic and real data sets are used to evaluate the performance of the proposed technique. Experimental results show that the map from the proposed technique is more robust to noise than the difference image.

  10. AUTOMATED UNSUPERVISED CLASSIFICATION OF THE SLOAN DIGITAL SKY SURVEY STELLAR SPECTRA USING k-MEANS CLUSTERING

    Energy Technology Data Exchange (ETDEWEB)

    Sanchez Almeida, J.; Allende Prieto, C., E-mail: jos@iac.es, E-mail: callende@iac.es [Instituto de Astrofisica de Canarias, E-38205 La Laguna, Tenerife (Spain)

    2013-01-20

    Large spectroscopic surveys require automated methods of analysis. This paper explores the use of k-means clustering as a tool for automated unsupervised classification of massive stellar spectral catalogs. The classification criteria are defined by the data and the algorithm, with no prior physical framework. We work with a representative set of stellar spectra associated with the Sloan Digital Sky Survey (SDSS) SEGUE and SEGUE-2 programs, which consists of 173,390 spectra from 3800 to 9200 A sampled on 3849 wavelengths. We classify the original spectra as well as the spectra with the continuum removed. The second set only contains spectral lines, and it is less dependent on uncertainties of the flux calibration. The classification of the spectra with continuum renders 16 major classes. Roughly speaking, stars are split according to their colors, with enough finesse to distinguish dwarfs from giants of the same effective temperature, but with difficulties to separate stars with different metallicities. There are classes corresponding to particular MK types, intrinsically blue stars, dust-reddened, stellar systems, and also classes collecting faulty spectra. Overall, there is no one-to-one correspondence between the classes we derive and the MK types. The classification of spectra without continuum renders 13 classes, the color separation is not so sharp, but it distinguishes stars of the same effective temperature and different metallicities. Some classes thus obtained present a fairly small range of physical parameters (200 K in effective temperature, 0.25 dex in surface gravity, and 0.35 dex in metallicity), so that the classification can be used to estimate the main physical parameters of some stars at a minimum computational cost. We also analyze the outliers of the classification. Most of them turn out to be failures of the reduction pipeline, but there are also high redshift QSOs, multiple stellar systems, dust-reddened stars, galaxies, and, finally, odd

  11. flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding.

    Science.gov (United States)

    Ge, Yongchao; Sealfon, Stuart C

    2012-08-01

    For flow cytometry data, there are two common approaches to the unsupervised clustering problem: one is based on the finite mixture model and the other on spatial exploration of the histograms. The former is computationally slow and has difficulty to identify clusters of irregular shapes. The latter approach cannot be applied directly to high-dimensional data as the computational time and memory become unmanageable and the estimated histogram is unreliable. An algorithm without these two problems would be very useful. In this article, we combine ideas from the finite mixture model and histogram spatial exploration. This new algorithm, which we call flowPeaks, can be applied directly to high-dimensional data and identify irregular shape clusters. The algorithm first uses K-means algorithm with a large K to partition the cell population into many small clusters. These partitioned data allow the generation of a smoothed density function using the finite mixture model. All local peaks are exhaustively searched by exploring the density function and the cells are clustered by the associated local peak. The algorithm flowPeaks is automatic, fast and reliable and robust to cluster shape and outliers. This algorithm has been applied to flow cytometry data and it has been compared with state of the art algorithms, including Misty Mountain, FLOCK, flowMeans, flowMerge and FLAME. The R package flowPeaks is available at https://github.com/yongchao/flowPeaks. yongchao.ge@mssm.edu Supplementary data are available at Bioinformatics online.

  12. Tune Your Brown Clustering, Please

    DEFF Research Database (Denmark)

    Derczynski, Leon; Chester, Sean; Bøgh, Kenneth Sejdenfaden

    2015-01-01

    Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly...

  13. Exploitation of Clustering Techniques in Transactional Healthcare Data

    Directory of Open Access Journals (Sweden)

    Naeem Ahmad Mahoto

    2014-03-01

    Full Text Available Healthcare service centres equipped with electronic health systems have improved their resources as well as treatment processes. The dynamic nature of healthcare data of each individual makes it complex and difficult for physicians to manually mediate them; therefore, automatic techniques are essential to manage the quality and standardization of treatment procedures. Exploratory data analysis, patternanalysis and grouping of data is managed using clustering techniques, which work as an unsupervised classification. A number of healthcare applications are developed that use several data mining techniques for classification, clustering and extracting useful information from healthcare data. The challenging issue in this domain is to select adequate data mining algorithm for optimal results. This paper exploits three different clustering algorithms: DBSCAN (Density-Based Clustering, agglomerative hierarchical and k-means in real transactional healthcare data of diabetic patients (taken as case study to analyse their performance in large and dispersed healthcare data. The best solution of cluster sets among the exploited algorithms is evaluated using clustering quality indexes and is selected to identify the possible subgroups of patients having similar treatment patterns

  14. An evaluation of unsupervised and supervised learning algorithms for clustering landscape types in the United States

    Science.gov (United States)

    Wendel, Jochen; Buttenfield, Barbara P.; Stanislawski, Larry V.

    2016-01-01

    Knowledge of landscape type can inform cartographic generalization of hydrographic features, because landscape characteristics provide an important geographic context that affects variation in channel geometry, flow pattern, and network configuration. Landscape types are characterized by expansive spatial gradients, lacking abrupt changes between adjacent classes; and as having a limited number of outliers that might confound classification. The US Geological Survey (USGS) is exploring methods to automate generalization of features in the National Hydrography Data set (NHD), to associate specific sequences of processing operations and parameters with specific landscape characteristics, thus obviating manual selection of a unique processing strategy for every NHD watershed unit. A chronology of methods to delineate physiographic regions for the United States is described, including a recent maximum likelihood classification based on seven input variables. This research compares unsupervised and supervised algorithms applied to these seven input variables, to evaluate and possibly refine the recent classification. Evaluation metrics for unsupervised methods include the Davies–Bouldin index, the Silhouette index, and the Dunn index as well as quantization and topographic error metrics. Cross validation and misclassification rate analysis are used to evaluate supervised classification methods. The paper reports the comparative analysis and its impact on the selection of landscape regions. The compared solutions show problems in areas of high landscape diversity. There is some indication that additional input variables, additional classes, or more sophisticated methods can refine the existing classification.

  15. Unsupervised Feature Subset Selection

    DEFF Research Database (Denmark)

    Søndberg-Madsen, Nicolaj; Thomsen, C.; Pena, Jose

    2003-01-01

    This paper studies filter and hybrid filter-wrapper feature subset selection for unsupervised learning (data clustering). We constrain the search for the best feature subset by scoring the dependence of every feature on the rest of the features, conjecturing that these scores discriminate some ir...... irrelevant features. We report experimental results on artificial and real data for unsupervised learning of naive Bayes models. Both the filter and hybrid approaches perform satisfactorily....

  16. Unsupervised versus Supervised Identification of Prognostic Factors in Patients with Localized Retroperitoneal Sarcoma: A Data Clustering and Mahalanobis Distance Approach

    Directory of Open Access Journals (Sweden)

    Rita De Sanctis

    2018-01-01

    Full Text Available The aim of this report is to unveil specific prognostic factors for retroperitoneal sarcoma (RPS patients by univariate and multivariate statistical techniques. A phase I-II study on localized RPS treated with high-dose ifosfamide and radiotherapy followed by surgery (ISG-STS 0303 protocol demonstrated that chemo/radiotherapy was safe and increased the 3-year relapse-free survival (RFS with respect to historical controls. Of 70 patients, twenty-six developed local, 10 distant, and 5 combined relapse. Median disease-free interval (DFI was 29.47 months. According to a discriminant function analysis, DFI, histology, relapse pattern, and the first treatment approach at relapse had a statistically significant prognostic impact. Based on scientific literature and clinical expertise, clinicopathological data were analyzed using both a supervised and an unsupervised classification method to predict the prognosis, with similar sample sizes (66 and 65, resp., in casewise approach and 70 in mean-substitution one. This is the first attempt to predict patients’ prognosis by means of multivariate statistics, and in this light, it looks noticable that (i some clinical data have a well-defined prognostic value, (ii the unsupervised model produced comparable results with respect to the supervised one, and (iii the appropriate combination of both models appears fruitful and easily extensible to different clinical contexts.

  17. Unsupervised learning algorithms

    CERN Document Server

    Aydin, Kemal

    2016-01-01

    This book summarizes the state-of-the-art in unsupervised learning. The contributors discuss how with the proliferation of massive amounts of unlabeled data, unsupervised learning algorithms, which can automatically discover interesting and useful patterns in such data, have gained popularity among researchers and practitioners. The authors outline how these algorithms have found numerous applications including pattern recognition, market basket analysis, web mining, social network analysis, information retrieval, recommender systems, market research, intrusion detection, and fraud detection. They present how the difficulty of developing theoretically sound approaches that are amenable to objective evaluation have resulted in the proposal of numerous unsupervised learning algorithms over the past half-century. The intended audience includes researchers and practitioners who are increasingly using unsupervised learning algorithms to analyze their data. Topics of interest include anomaly detection, clustering,...

  18. A COMPARISON OF TWO FUZZY CLUSTERING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    Samarjit Das

    2013-10-01

    Full Text Available - In fuzzy clustering, unlike hard clustering, depending on the membership value, a single object may belong exactly to one cluster or partially to more than one cluster. Out of a number of fuzzy clustering techniques Bezdek’s Fuzzy C-Means and GustafsonKessel clustering techniques are well known where Euclidian distance and Mahalanobis distance are used respectively as a measure of similarity. We have applied these two fuzzy clustering techniques on a dataset of individual differences consisting of fifty feature vectors of dimension (feature three. Based on some validity measures we have tried to see the performances of these two clustering techniques from three different aspects- first, by initializing the membership values of the feature vectors considering the values of the three features separately one at a time, secondly, by changing the number of the predefined clusters and thirdly, by changing the size of the dataset.

  19. AHIMSA - Ad hoc histogram information measure sensing algorithm for feature selection in the context of histogram inspired clustering techniques

    Science.gov (United States)

    Dasarathy, B. V.

    1976-01-01

    An algorithm is proposed for dimensionality reduction in the context of clustering techniques based on histogram analysis. The approach is based on an evaluation of the hills and valleys in the unidimensional histograms along the different features and provides an economical means of assessing the significance of the features in a nonparametric unsupervised data environment. The method has relevance to remote sensing applications.

  20. High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm

    Directory of Open Access Journals (Sweden)

    Dieter Hendricks

    2016-02-01

    Full Text Available We implement a master-slave parallel genetic algorithm with a bespoke log-likelihood fitness function to identify emergent clusters within price evolutions. We use graphics processing units (GPUs to implement a parallel genetic algorithm and visualise the results using disjoint minimal spanning trees. We demonstrate that our GPU parallel genetic algorithm, implemented on a commercially available general purpose GPU, is able to recover stock clusters in sub-second speed, based on a subset of stocks in the South African market. This approach represents a pragmatic choice for low-cost, scalable parallel computing and is significantly faster than a prototype serial implementation in an optimised C-based fourth-generation programming language, although the results are not directly comparable because of compiler differences. Combined with fast online intraday correlation matrix estimation from high frequency data for cluster identification, the proposed implementation offers cost-effective, near-real-time risk assessment for financial practitioners.

  1. Unsupervised Learning —A Novel Clustering Method for Rolling Bearing Faults Identification

    Science.gov (United States)

    Kai, Li; Bo, Luo; Tao, Ma; Xuefeng, Yang; Guangming, Wang

    2017-12-01

    To promptly process the massive fault data and automatically provide accurate diagnosis results, numerous studies have been conducted on intelligent fault diagnosis of rolling bearing. Among these studies, such as artificial neural networks, support vector machines, decision trees and other supervised learning methods are used commonly. These methods can detect the failure of rolling bearing effectively, but to achieve better detection results, it often requires a lot of training samples. Based on above, a novel clustering method is proposed in this paper. This novel method is able to find the correct number of clusters automatically the effectiveness of the proposed method is validated using datasets from rolling element bearings. The diagnosis results show that the proposed method can accurately detect the fault types of small samples. Meanwhile, the diagnosis results are also relative high accuracy even for massive samples.

  2. An unsupervised technique for optimal feature selection in attribute profiles for spectral-spatial classification of hyperspectral images

    Science.gov (United States)

    Bhardwaj, Kaushal; Patra, Swarnajyoti

    2018-04-01

    Inclusion of spatial information along with spectral features play a significant role in classification of remote sensing images. Attribute profiles have already proved their ability to represent spatial information. In order to incorporate proper spatial information, multiple attributes are required and for each attribute large profiles need to be constructed by varying the filter parameter values within a wide range. Thus, the constructed profiles that represent spectral-spatial information of an hyperspectral image have huge dimension which leads to Hughes phenomenon and increases computational burden. To mitigate these problems, this work presents an unsupervised feature selection technique that selects a subset of filtered image from the constructed high dimensional multi-attribute profile which are sufficiently informative to discriminate well among classes. In this regard the proposed technique exploits genetic algorithms (GAs). The fitness function of GAs are defined in an unsupervised way with the help of mutual information. The effectiveness of the proposed technique is assessed using one-against-all support vector machine classifier. The experiments conducted on three hyperspectral data sets show the robustness of the proposed method in terms of computation time and classification accuracy.

  3. Automated assessment and tracking of human body thermal variations using unsupervised clustering.

    Science.gov (United States)

    Yousefi, Bardia; Fleuret, Julien; Zhang, Hai; Maldague, Xavier P V; Watt, Raymond; Klein, Matthieu

    2016-12-01

    The presented approach addresses a review of the overheating that occurs during radiological examinations, such as magnetic resonance imaging, and a series of thermal experiments to determine a thermally suitable fabric material that should be used for radiological gowns. Moreover, an automatic system for detecting and tracking of the thermal fluctuation is presented. It applies hue-saturated-value-based kernelled k-means clustering, which initializes and controls the points that lie on the region-of-interest (ROI) boundary. Afterward, a particle filter tracks the targeted ROI during the video sequence independently of previous locations of overheating spots. The proposed approach was tested during experiments and under conditions very similar to those used during real radiology exams. Six subjects have voluntarily participated in these experiments. To simulate the hot spots occurring during radiology, a controllable heat source was utilized near the subject's body. The results indicate promising accuracy for the proposed approach to track hot spots. Some approximations were used regarding the transmittance of the atmosphere, and emissivity of the fabric could be neglected because of the independence of the proposed approach for these parameters. The approach can track the heating spots continuously and correctly, even for moving subjects, and provides considerable robustness against motion artifact, which occurs during most medical radiology procedures.

  4. Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis

    Science.gov (United States)

    2015-01-01

    A. Porter and my advisor. The text is primarily written by me. Chapter 5 is a version of [46] where my contribution is all of the analytical ...inn Euclidean space, a variational method refers to using calculus of variation techniques to find the minimizer (or maximizer) of a functional (energy... geometric inter- pretation of modularity optimization contrasts with existing interpretations (e.g., probabilistic ones or in terms of the Potts model

  5. Percolation technique for galaxy clustering

    Science.gov (United States)

    Klypin, Anatoly; Shandarin, Sergei F.

    1993-01-01

    We study percolation in mass and galaxy distributions obtained in 3D simulations of the CDM, C + HDM, and the power law (n = -1) models in the Omega = 1 universe. Percolation statistics is used here as a quantitative measure of the degree to which a mass or galaxy distribution is of a filamentary or cellular type. The very fast code used calculates the statistics of clusters along with the direct detection of percolation. We found that the two parameters mu(infinity), characterizing the size of the largest cluster, and mu-squared, characterizing the weighted mean size of all clusters excluding the largest one, are extremely useful for evaluating the percolation threshold. An advantage of using these parameters is their low sensitivity to boundary effects. We show that both the CDM and the C + HDM models are extremely filamentary both in mass and galaxy distribution. The percolation thresholds for the mass distributions are determined.

  6. Mining FDA drug labels using an unsupervised learning technique--topic modeling.

    Science.gov (United States)

    Bisgin, Halil; Liu, Zhichao; Fang, Hong; Xu, Xiaowei; Tong, Weida

    2011-10-18

    The Food and Drug Administration (FDA) approved drug labels contain a broad array of information, ranging from adverse drug reactions (ADRs) to drug efficacy, risk-benefit consideration, and more. However, the labeling language used to describe these information is free text often containing ambiguous semantic descriptions, which poses a great challenge in retrieving useful information from the labeling text in a consistent and accurate fashion for comparative analysis across drugs. Consequently, this task has largely relied on the manual reading of the full text by experts, which is time consuming and labor intensive. In this study, a novel text mining method with unsupervised learning in nature, called topic modeling, was applied to the drug labeling with a goal of discovering "topics" that group drugs with similar safety concerns and/or therapeutic uses together. A total of 794 FDA-approved drug labels were used in this study. First, the three labeling sections (i.e., Boxed Warning, Warnings and Precautions, Adverse Reactions) of each drug label were processed by the Medical Dictionary for Regulatory Activities (MedDRA) to convert the free text of each label to the standard ADR terms. Next, the topic modeling approach with latent Dirichlet allocation (LDA) was applied to generate 100 topics, each associated with a set of drugs grouped together based on the probability analysis. Lastly, the efficacy of the topic modeling was evaluated based on known information about the therapeutic uses and safety data of drugs. The results demonstrate that drugs grouped by topics are associated with the same safety concerns and/or therapeutic uses with statistical significance (P<0.05). The identified topics have distinct context that can be directly linked to specific adverse events (e.g., liver injury or kidney injury) or therapeutic application (e.g., antiinfectives for systemic use). We were also able to identify potential adverse events that might arise from specific

  7. Mining FDA drug labels using an unsupervised learning technique - topic modeling

    Science.gov (United States)

    2011-01-01

    Background The Food and Drug Administration (FDA) approved drug labels contain a broad array of information, ranging from adverse drug reactions (ADRs) to drug efficacy, risk-benefit consideration, and more. However, the labeling language used to describe these information is free text often containing ambiguous semantic descriptions, which poses a great challenge in retrieving useful information from the labeling text in a consistent and accurate fashion for comparative analysis across drugs. Consequently, this task has largely relied on the manual reading of the full text by experts, which is time consuming and labor intensive. Method In this study, a novel text mining method with unsupervised learning in nature, called topic modeling, was applied to the drug labeling with a goal of discovering “topics” that group drugs with similar safety concerns and/or therapeutic uses together. A total of 794 FDA-approved drug labels were used in this study. First, the three labeling sections (i.e., Boxed Warning, Warnings and Precautions, Adverse Reactions) of each drug label were processed by the Medical Dictionary for Regulatory Activities (MedDRA) to convert the free text of each label to the standard ADR terms. Next, the topic modeling approach with latent Dirichlet allocation (LDA) was applied to generate 100 topics, each associated with a set of drugs grouped together based on the probability analysis. Lastly, the efficacy of the topic modeling was evaluated based on known information about the therapeutic uses and safety data of drugs. Results The results demonstrate that drugs grouped by topics are associated with the same safety concerns and/or therapeutic uses with statistical significance (P<0.05). The identified topics have distinct context that can be directly linked to specific adverse events (e.g., liver injury or kidney injury) or therapeutic application (e.g., antiinfectives for systemic use). We were also able to identify potential adverse events that

  8. Classification of high-resolution multi-swath hyperspectral data using Landsat 8 surface reflectance data as a calibration target and a novel histogram based unsupervised classification technique to determine natural classes from biophysically relevant fit parameters

    Science.gov (United States)

    McCann, C.; Repasky, K. S.; Morin, M.; Lawrence, R. L.; Powell, S. L.

    2016-12-01

    Compact, cost-effective, flight-based hyperspectral imaging systems can provide scientifically relevant data over large areas for a variety of applications such as ecosystem studies, precision agriculture, and land management. To fully realize this capability, unsupervised classification techniques based on radiometrically-calibrated data that cluster based on biophysical similarity rather than simply spectral similarity are needed. An automated technique to produce high-resolution, large-area, radiometrically-calibrated hyperspectral data sets based on the Landsat surface reflectance data product as a calibration target was developed and applied to three subsequent years of data covering approximately 1850 hectares. The radiometrically-calibrated data allows inter-comparison of the temporal series. Advantages of the radiometric calibration technique include the need for minimal site access, no ancillary instrumentation, and automated processing. Fitting the reflectance spectra of each pixel using a set of biophysically relevant basis functions reduces the data from 80 spectral bands to 9 parameters providing noise reduction and data compression. Examination of histograms of these parameters allows for determination of natural splitting into biophysical similar clusters. This method creates clusters that are similar in terms of biophysical parameters, not simply spectral proximity. Furthermore, this method can be applied to other data sets, such as urban scenes, by developing other physically meaningful basis functions. The ability to use hyperspectral imaging for a variety of important applications requires the development of data processing techniques that can be automated. The radiometric-calibration combined with the histogram based unsupervised classification technique presented here provide one potential avenue for managing big-data associated with hyperspectral imaging.

  9. Unsupervised consensus cluster analysis of [18F]-fluoroethyl-L-tyrosine positron emission tomography identified textural features for the diagnosis of pseudoprogression in high-grade glioma.

    Science.gov (United States)

    Kebir, Sied; Khurshid, Zain; Gaertner, Florian C; Essler, Markus; Hattingen, Elke; Fimmers, Rolf; Scheffler, Björn; Herrlinger, Ulrich; Bundschuh, Ralph A; Glas, Martin

    2017-01-31

    Timely detection of pseudoprogression (PSP) is crucial for the management of patients with high-grade glioma (HGG) but remains difficult. Textural features of O-(2-[18F]fluoroethyl)-L-tyrosine positron emission tomography (FET-PET) mirror tumor uptake heterogeneity; some of them may be associated with tumor progression. Fourteen patients with HGG and suspected of PSP underwent FET-PET imaging. A set of 19 conventional and textural FET-PET features were evaluated and subjected to unsupervised consensus clustering. The final diagnosis of true progression vs. PSP was based on follow-up MRI using RANO criteria. Three robust clusters have been identified based on 10 predominantly textural FET-PET features. None of the patients with PSP fell into cluster 2, which was associated with high values for textural FET-PET markers of uptake heterogeneity. Three out of 4 patients with PSP were assigned to cluster 3 that was largely associated with low values of textural FET-PET features. By comparison, tumor-to-normal brain ratio (TNRmax) at the optimal cutoff 2.1 was less predictive of PSP (negative predictive value 57% for detecting true progression, p=0.07 vs. 75% with cluster 3, p=0.04). Clustering based on textural O-(2-[18F]fluoroethyl)-L-tyrosine PET features may provide valuable information in assessing the elusive phenomenon of pseudoprogression.

  10. Unsupervised Learning and Generalization

    DEFF Research Database (Denmark)

    Hansen, Lars Kai; Larsen, Jan

    1996-01-01

    The concept of generalization is defined for a general class of unsupervised learning machines. The generalization error is a straightforward extension of the corresponding concept for supervised learning, and may be estimated empirically using a test set or by statistical means-in close analogy ...... with supervised learning. The empirical and analytical estimates are compared for principal component analysis and for K-means clustering based density estimation......The concept of generalization is defined for a general class of unsupervised learning machines. The generalization error is a straightforward extension of the corresponding concept for supervised learning, and may be estimated empirically using a test set or by statistical means-in close analogy...

  11. The clustering-based case-based reasoning for imbalanced business failure prediction: a hybrid approach through integrating unsupervised process with supervised process

    Science.gov (United States)

    Li, Hui; Yu, Jun-Ling; Yu, Le-An; Sun, Jie

    2014-05-01

    Case-based reasoning (CBR) is one of the main forecasting methods in business forecasting, which performs well in prediction and holds the ability of giving explanations for the results. In business failure prediction (BFP), the number of failed enterprises is relatively small, compared with the number of non-failed ones. However, the loss is huge when an enterprise fails. Therefore, it is necessary to develop methods (trained on imbalanced samples) which forecast well for this small proportion of failed enterprises and performs accurately on total accuracy meanwhile. Commonly used methods constructed on the assumption of balanced samples do not perform well in predicting minority samples on imbalanced samples consisting of the minority/failed enterprises and the majority/non-failed ones. This article develops a new method called clustering-based CBR (CBCBR), which integrates clustering analysis, an unsupervised process, with CBR, a supervised process, to enhance the efficiency of retrieving information from both minority and majority in CBR. In CBCBR, various case classes are firstly generated through hierarchical clustering inside stored experienced cases, and class centres are calculated out by integrating cases information in the same clustered class. When predicting the label of a target case, its nearest clustered case class is firstly retrieved by ranking similarities between the target case and each clustered case class centre. Then, nearest neighbours of the target case in the determined clustered case class are retrieved. Finally, labels of the nearest experienced cases are used in prediction. In the empirical experiment with two imbalanced samples from China, the performance of CBCBR was compared with the classical CBR, a support vector machine, a logistic regression and a multi-variant discriminate analysis. The results show that compared with the other four methods, CBCBR performed significantly better in terms of sensitivity for identifying the

  12. Event Streams Clustering Using Machine Learning Techniques

    Directory of Open Access Journals (Sweden)

    Hanen Bouali

    2015-10-01

    Full Text Available Data streams are usually of unbounded lengths which push users to consider only recent observations by focusing on a time window, and ignore past data. However, in many real world applications, past data must be taken in consideration to guarantee the efficiency, the performance of decision making and to handle data streams evolution over time. In order to build a selectively history to track the underlying event streams changes, we opt for the continuously data of the sliding window which increases the time window based on changes over historical data. In this paper, to have the ability to access to historical data without requiring any significant storage or multiple passes over the data. In this paper, we propose a new algorithm for clustering multiple data streams using incremental support vector machine and data representative points’ technique. The algorithm uses a sliding window model for the most recent clustering results and data representative points to model the old data clustering results. Our experimental results on electromyography signal show a better clustering than other present in the literature

  13. Unsupervised Video Shot Detection Using Clustering Ensemble with a Color Global Scale-Invariant Feature Transform Descriptor

    Directory of Open Access Journals (Sweden)

    Yuchou Chang

    2008-02-01

    Full Text Available Scale-invariant feature transform (SIFT transforms a grayscale image into scale-invariant coordinates of local features that are invariant to image scale, rotation, and changing viewpoints. Because of its scale-invariant properties, SIFT has been successfully used for object recognition and content-based image retrieval. The biggest drawback of SIFT is that it uses only grayscale information and misses important visual information regarding color. In this paper, we present the development of a novel color feature extraction algorithm that addresses this problem, and we also propose a new clustering strategy using clustering ensembles for video shot detection. Based on Fibonacci lattice-quantization, we develop a novel color global scale-invariant feature transform (CGSIFT for better description of color contents in video frames for video shot detection. CGSIFT first quantizes a color image, representing it with a small number of color indices, and then uses SIFT to extract features from the quantized color index image. We also develop a new space description method using small image regions to represent global color features as the second step of CGSIFT. Clustering ensembles focusing on knowledge reuse are then applied to obtain better clustering results than using single clustering methods for video shot detection. Evaluation of the proposed feature extraction algorithm and the new clustering strategy using clustering ensembles reveals very promising results for video shot detection.

  14. Unsupervised Video Shot Detection Using Clustering Ensemble with a Color Global Scale-Invariant Feature Transform Descriptor

    Directory of Open Access Journals (Sweden)

    Hong Yi

    2008-01-01

    Full Text Available Abstract Scale-invariant feature transform (SIFT transforms a grayscale image into scale-invariant coordinates of local features that are invariant to image scale, rotation, and changing viewpoints. Because of its scale-invariant properties, SIFT has been successfully used for object recognition and content-based image retrieval. The biggest drawback of SIFT is that it uses only grayscale information and misses important visual information regarding color. In this paper, we present the development of a novel color feature extraction algorithm that addresses this problem, and we also propose a new clustering strategy using clustering ensembles for video shot detection. Based on Fibonacci lattice-quantization, we develop a novel color global scale-invariant feature transform (CGSIFT for better description of color contents in video frames for video shot detection. CGSIFT first quantizes a color image, representing it with a small number of color indices, and then uses SIFT to extract features from the quantized color index image. We also develop a new space description method using small image regions to represent global color features as the second step of CGSIFT. Clustering ensembles focusing on knowledge reuse are then applied to obtain better clustering results than using single clustering methods for video shot detection. Evaluation of the proposed feature extraction algorithm and the new clustering strategy using clustering ensembles reveals very promising results for video shot detection.

  15. Clustering microcalcifications techniques in digital mammograms

    Science.gov (United States)

    Díaz, Claudia. C.; Bosco, Paolo; Cerello, Piergiorgio

    2008-11-01

    Breast cancer has become a serious public health problem around the world. However, this pathology can be treated if it is detected in early stages. This task is achieved by a radiologist, who should read a large amount of mammograms per day, either for a screening or diagnostic purpose in mammography. However human factors could affect the diagnosis. Computer Aided Detection is an automatic system, which can help to specialists in the detection of possible signs of malignancy in mammograms. Microcalcifications play an important role in early detection, so we focused on their study. The two mammographic features that indicate the microcalcifications could be probably malignant are small size and clustered distribution. We worked with density techniques for automatic clustering, and we applied them on a mammography CAD prototype developed at INFN-Turin, Italy. An improvement of performance is achieved analyzing images from a Perugia-Assisi Hospital, in Italy.

  16. Technique for fast and efficient hierarchical clustering

    Science.gov (United States)

    Stork, Christopher

    2013-10-08

    A fast and efficient technique for hierarchical clustering of samples in a dataset includes compressing the dataset to reduce a number of variables within each of the samples of the dataset. A nearest neighbor matrix is generated to identify nearest neighbor pairs between the samples based on differences between the variables of the samples. The samples are arranged into a hierarchy that groups the samples based on the nearest neighbor matrix. The hierarchy is rendered to a display to graphically illustrate similarities or differences between the samples.

  17. Decomposition methods for unsupervised learning

    DEFF Research Database (Denmark)

    Mørup, Morten

    2008-01-01

    This thesis presents the application and development of decomposition methods for Unsupervised Learning. It covers topics from classical factor analysis based decomposition and its variants such as Independent Component Analysis, Non-negative Matrix Factorization and Sparse Coding...... methods and clustering problems is derived both in terms of classical point clustering but also in terms of community detection in complex networks. A guiding principle throughout this thesis is the principle of parsimony. Hence, the goal of Unsupervised Learning is here posed as striving for simplicity...... in the decompositions. Thus, it is demonstrated how a wide range of decomposition methods explicitly or implicitly strive to attain this goal. Applications of the derived decompositions are given ranging from multi-media analysis of image and sound data, analysis of biomedical data such as electroencephalography...

  18. Using Apparent Density of Paper from Hardwood Kraft Pulps to Predict Sheet Properties, based on Unsupervised Classification and Multivariable Regression Techniques

    Directory of Open Access Journals (Sweden)

    Ofélia Anjos

    2015-07-01

    Full Text Available Paper properties determine the product application potential and depend on the raw material, pulping conditions, and pulp refining. The aim of this study was to construct mathematical models that predict quantitative relations between the paper density and various mechanical and optical properties of the paper. A dataset of properties of paper handsheets produced with pulps of Acacia dealbata, Acacia melanoxylon, and Eucalyptus globulus beaten at 500, 2500, and 4500 revolutions was used. Unsupervised classification techniques were combined to assess the need to perform separated prediction models for each species, and multivariable regression techniques were used to establish such prediction models. It was possible to develop models with a high goodness of fit using paper density as the independent variable (or predictor for all variables except tear index and zero-span tensile strength, both dry and wet.

  19. Generation of brain pseudo-CTs using an undersampled, single-acquisition UTE-mDixon pulse sequence and unsupervised clustering

    International Nuclear Information System (INIS)

    Su, Kuan-Hao; Hu, Lingzhi; Traughber, Melanie; Stehning, Christian; Helle, Michael; Qian, Pengjiang; Thompson, Cheryl L.; Pereira, Gisele C.; Traughber, Bryan J.; Jordan, David W.; Herrmann, Karin A.; Muzic, Raymond F.

    2015-01-01

    Purpose: MR-based pseudo-CT has an important role in MR-based radiation therapy planning and PET attenuation correction. The purpose of this study is to establish a clinically feasible approach, including image acquisition, correction, and CT formation, for pseudo-CT generation of the brain using a single-acquisition, undersampled ultrashort echo time (UTE)-mDixon pulse sequence. Methods: Nine patients were recruited for this study. For each patient, a 190-s, undersampled, single acquisition UTE-mDixon sequence of the brain was acquired (TE = 0.1, 1.5, and 2.8 ms). A novel method of retrospective trajectory correction of the free induction decay (FID) signal was performed based on point-spread functions of three external MR markers. Two-point Dixon images were reconstructed using the first and second echo data (TE = 1.5 and 2.8 ms). R2 ∗ images (1/T2 ∗ ) were then estimated and were used to provide bone information. Three image features, i.e., Dixon-fat, Dixon-water, and R2 ∗ , were used for unsupervised clustering. Five tissue clusters, i.e., air, brain, fat, fluid, and bone, were estimated using the fuzzy c-means (FCM) algorithm. A two-step, automatic tissue-assignment approach was proposed and designed according to the prior information of the given feature space. Pseudo-CTs were generated by a voxelwise linear combination of the membership functions of the FCM. A low-dose CT was acquired for each patient and was used as the gold standard for comparison. Results: The contrast and sharpness of the FID images were improved after trajectory correction was applied. The mean of the estimated trajectory delay was 0.774 μs (max: 1.350 μs; min: 0.180 μs). The FCM-estimated centroids of different tissue types showed a distinguishable pattern for different tissues, and significant differences were found between the centroid locations of different tissue types. Pseudo-CT can provide additional skull detail and has low bias and absolute error of estimated CT

  20. Segmentation of fluorescence microscopy cell images using unsupervised mining.

    Science.gov (United States)

    Du, Xian; Dua, Sumeet

    2010-05-28

    The accurate measurement of cell and nuclei contours are critical for the sensitive and specific detection of changes in normal cells in several medical informatics disciplines. Within microscopy, this task is facilitated using fluorescence cell stains, and segmentation is often the first step in such approaches. Due to the complex nature of cell issues and problems inherent to microscopy, unsupervised mining approaches of clustering can be incorporated in the segmentation of cells. In this study, we have developed and evaluated the performance of multiple unsupervised data mining techniques in cell image segmentation. We adapt four distinctive, yet complementary, methods for unsupervised learning, including those based on k-means clustering, EM, Otsu's threshold, and GMAC. Validation measures are defined, and the performance of the techniques is evaluated both quantitatively and qualitatively using synthetic and recently published real data. Experimental results demonstrate that k-means, Otsu's threshold, and GMAC perform similarly, and have more precise segmentation results than EM. We report that EM has higher recall values and lower precision results from under-segmentation due to its Gaussian model assumption. We also demonstrate that these methods need spatial information to segment complex real cell images with a high degree of efficacy, as expected in many medical informatics applications.

  1. Evaluating Mixture Modeling for Clustering: Recommendations and Cautions

    Science.gov (United States)

    Steinley, Douglas; Brusco, Michael J.

    2011-01-01

    This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…

  2. An Improved Unsupervised Modeling Methodology For Detecting Fraud In Vendor Payment Transactions

    National Research Council Canada - National Science Library

    Rouillard, Gregory

    2003-01-01

    ...) vendor payment transactions through Unsupervised Modeling (cluster analysis) . Clementine Data Mining software is used to construct unsupervised models of vendor payment data using the K-Means, Two Step, and Kohonen algorithms...

  3. Unsupervised classification of variable stars

    Science.gov (United States)

    Valenzuela, Lucas; Pichara, Karim

    2018-03-01

    During the past 10 years, a considerable amount of effort has been made to develop algorithms for automatic classification of variable stars. That has been primarily achieved by applying machine learning methods to photometric data sets where objects are represented as light curves. Classifiers require training sets to learn the underlying patterns that allow the separation among classes. Unfortunately, building training sets is an expensive process that demands a lot of human efforts. Every time data come from new surveys; the only available training instances are the ones that have a cross-match with previously labelled objects, consequently generating insufficient training sets compared with the large amounts of unlabelled sources. In this work, we present an algorithm that performs unsupervised classification of variable stars, relying only on the similarity among light curves. We tackle the unsupervised classification problem by proposing an untraditional approach. Instead of trying to match classes of stars with clusters found by a clustering algorithm, we propose a query-based method where astronomers can find groups of variable stars ranked by similarity. We also develop a fast similarity function specific for light curves, based on a novel data structure that allows scaling the search over the entire data set of unlabelled objects. Experiments show that our unsupervised model achieves high accuracy in the classification of different types of variable stars and that the proposed algorithm scales up to massive amounts of light curves.

  4. A survey of text clustering techniques used for web mining

    Directory of Open Access Journals (Sweden)

    Dan MUNTEANU

    2005-12-01

    Full Text Available This paper contains an overview of basic formulations and approaches to clustering. Then it presents two important clustering paradigms: a bottom-up agglomerative technique, which collects similar documents into larger and larger groups, and a top-down partitioning technique, which divides a corpus into topic-oriented partitions.

  5. Optimality Measures for Monotone Equivariant Cluster Techniques.

    Science.gov (United States)

    1980-09-01

    complete linkage, u-clustering (u - .3, .5, .7), uv-clustering (uv = (.2,.4), (.2,.6), (.4,.6)) as well as the UPGMA algorithm. The idea will be to...Table 15. Notice that these measure-- do indeed pioduce difftxent verdicts. OPI rates UPGMA as best with uv = (.2,.4) R € second. By OP2, UPGMA is best...By OPI, UPGQA and uv = (.4,.6) are tied for first place, while by OP2, UPGMA is best with uv = (.2,.6), uv = (.2,.4) and uv = (.4,.6) close behind

  6. Clustering: An Interactive Technique to Enhance Learning in Biology.

    Science.gov (United States)

    Ambron, Joanna

    1988-01-01

    Explains an interdisciplinary approach to biology and writing which increases students' mastery of vocabulary, scientific concepts, creativity, and expression. Describes modifications of the clustering technique used to summarize lectures, integrate reading and understand textbook material. (RT)

  7. A fuzzy clustering technique for calorimetric data reconstruction

    International Nuclear Information System (INIS)

    Sandhir, Radha Pyari; Muhuri, Sanjib; Nayak, Tapan K.

    2010-01-01

    In high energy physics experiments, electromagnetic calorimeters are used to measure shower particles produced in p-p or heavy-ion collisions. In order to extract information and reconstruct the characteristics of the various incoming particles, clustering is required to be performed on each of the calorimeter planes. Hard clustering techniques such as Local Maxima Search, Connected-cell Search and K-means Clustering simply assign a data point to a cluster. A data point either lies in a cluster or it does not, and so, overlapping clusters are hardly distinguishable. Fuzzy c-means clustering is a version of the k-means algorithm that incorporates fuzzy logic, so that each point has a weak or strong association to the cluster, determined by the inverse distance to the center of the cluster. The term fuzzy is used because an observation may in fact lie in more than one cluster simultaneously, though to different degrees called 'memberships', as is the case with many high energy physics applications. The centers obtained using the FCM algorithm are based on the geometric locations of the data points

  8. Unsupervised land cover change detection: meaningful sequential time series analysis

    CSIR Research Space (South Africa)

    Salmon, BP

    2011-06-01

    Full Text Available An automated land cover change detection method is proposed that uses coarse spatial resolution hyper-temporal earth observation satellite time series data. The study compared three different unsupervised clustering approaches that operate on short...

  9. A Trajectory Regression Clustering Technique Combining a Novel Fuzzy C-Means Clustering Algorithm with the Least Squares Method

    Directory of Open Access Journals (Sweden)

    Xiangbing Zhou

    2018-04-01

    Full Text Available Rapidly growing GPS (Global Positioning System trajectories hide much valuable information, such as city road planning, urban travel demand, and population migration. In order to mine the hidden information and to capture better clustering results, a trajectory regression clustering method (an unsupervised trajectory clustering method is proposed to reduce local information loss of the trajectory and to avoid getting stuck in the local optimum. Using this method, we first define our new concept of trajectory clustering and construct a novel partitioning (angle-based partitioning method of line segments; second, the Lagrange-based method and Hausdorff-based K-means++ are integrated in fuzzy C-means (FCM clustering, which are used to maintain the stability and the robustness of the clustering process; finally, least squares regression model is employed to achieve regression clustering of the trajectory. In our experiment, the performance and effectiveness of our method is validated against real-world taxi GPS data. When comparing our clustering algorithm with the partition-based clustering algorithms (K-means, K-median, and FCM, our experimental results demonstrate that the presented method is more effective and generates a more reasonable trajectory.

  10. Unsupervised Classification of Surface Defects in Wire Rod Production Obtained by Eddy Current Sensors

    Directory of Open Access Journals (Sweden)

    Sergio Saludes-Rodil

    2015-04-01

    Full Text Available An unsupervised approach to classify surface defects in wire rod manufacturing is developed in this paper. The defects are extracted from an eddy current signal and classified using a clustering technique that uses the dynamic time warping distance as the dissimilarity measure. The new approach has been successfully tested using industrial data. It is shown that it outperforms other classification alternatives, such as the modified Fourier descriptors.

  11. Clustering economies based on multiple criteria decision making techniques

    Directory of Open Access Journals (Sweden)

    Mansour Momeni

    2011-10-01

    Full Text Available One of the primary concerns on many countries is to determine different important factors affecting economic growth. In this paper, we study some factors such as unemployment rate, inflation ratio, population growth, average annual income, etc to cluster different countries. The proposed model of this paper uses analytical hierarchy process (AHP to prioritize the criteria and then uses a K-mean technique to cluster 59 countries based on the ranked criteria into four groups. The first group includes countries with high standards such as Germany and Japan. In the second cluster, there are some developing countries with relatively good economic growth such as Saudi Arabia and Iran. The third cluster belongs to countries with faster rates of growth compared with the countries located in the second group such as China, India and Mexico. Finally, the fourth cluster includes countries with relatively very low rates of growth such as Jordan, Mali, Niger, etc.

  12. An extended k-means technique for clustering moving objects

    Directory of Open Access Journals (Sweden)

    Omnia Ossama

    2011-03-01

    Full Text Available k-means algorithm is one of the basic clustering techniques that is used in many data mining applications. In this paper we present a novel pattern based clustering algorithm that extends the k-means algorithm for clustering moving object trajectory data. The proposed algorithm uses a key feature of moving object trajectories namely, its direction as a heuristic to determine the different number of clusters for the k-means algorithm. In addition, we use the silhouette coefficient as a measure for the quality of our proposed approach. Finally, we present experimental results on both real and synthetic data that show the performance and accuracy of our proposed technique.

  13. Microgrids Real-Time Pricing Based on Clustering Techniques

    Directory of Open Access Journals (Sweden)

    Hao Liu

    2018-05-01

    Full Text Available Microgrids are widely spreading in electricity markets worldwide. Besides the security and reliability concerns for these microgrids, their operators need to address consumers’ pricing. Considering the growth of smart grids and smart meter facilities, it is expected that microgrids will have some level of flexibility to determine real-time pricing for at least some consumers. As such, the key challenge is finding an optimal pricing model for consumers. This paper, accordingly, proposes a new pricing scheme in which microgrids are able to deploy clustering techniques in order to understand their consumers’ load profiles and then assign real-time prices based on their load profile patterns. An improved weighted fuzzy average k-means is proposed to cluster load curve of consumers in an optimal number of clusters, through which the load profile of each cluster is determined. Having obtained the load profile of each cluster, real-time prices are given to each cluster, which is the best price given to all consumers in that cluster.

  14. Spatial Field Variability Mapping of Rice Crop using Clustering Technique from Space Borne Hyperspectral Data

    Science.gov (United States)

    Moharana, S.; Dutta, S.

    2015-12-01

    Precision farming refers to field-specific management of an agricultural crop at a spatial scale with an aim to get the highest achievable yield and to achieve this spatial information on field variability is essential. The difficulty in mapping of spatial variability occurring within an agriculture field can be revealed by employing spectral techniques in hyperspectral imagery rather than multispectral imagery. However an advanced algorithm needs to be developed to fully make use of the rich information content in hyperspectral data. In the present study, potential of hyperspectral data acquired from space platform was examined to map the field variation of paddy crop and its species discrimination. This high dimensional data comprising 242 spectral narrow bands with 30m ground resolution Hyperion L1R product acquired for Assam, India (30th Sept and 3rd Oct, 2014) were allowed for necessary pre-processing steps followed by geometric correction using Hyperion L1GST product. Finally an atmospherically corrected and spatially deduced image consisting of 112 band was obtained. By employing an advanced clustering algorithm, 12 different clusters of spectral waveforms of the crop were generated from six paddy fields for each images. The findings showed that, some clusters were well discriminated representing specific rice genotypes and some clusters were mixed treating as a single rice genotype. As vegetation index (VI) is the best indicator of vegetation mapping, three ratio based VI maps were also generated and unsupervised classification was performed for it. The so obtained 12 clusters of paddy crop were mapped spatially to the derived VI maps. From these findings, the existence of heterogeneity was clearly captured in one of the 6 rice plots (rice plot no. 1) while heterogeneity was observed in rest of the 5 rice plots. The degree of heterogeneous was found more in rice plot no.6 as compared to other plots. Subsequently, spatial variability of paddy field was

  15. Software refactoring at the package level using clustering techniques

    KAUST Repository

    Alkhalid, A.

    2011-01-01

    Enhancing, modifying or adapting the software to new requirements increases the internal software complexity. Software with high level of internal complexity is difficult to maintain. Software refactoring reduces software complexity and hence decreases the maintenance effort. However, software refactoring becomes quite challenging task as the software evolves. The authors use clustering as a pattern recognition technique to assist in software refactoring activities at the package level. The approach presents a computer aided support for identifying ill-structured packages and provides suggestions for software designer to balance between intra-package cohesion and inter-package coupling. A comparative study is conducted applying three different clustering techniques on different software systems. In addition, the application of refactoring at the package level using an adaptive k-nearest neighbour (A-KNN) algorithm is introduced. The authors compared A-KNN technique with the other clustering techniques (viz. single linkage algorithm, complete linkage algorithm and weighted pair-group method using arithmetic averages). The new technique shows competitive performance with lower computational complexity. © 2011 The Institution of Engineering and Technology.

  16. Unsupervised Image Segmentation

    Czech Academy of Sciences Publication Activity Database

    Haindl, Michal; Mikeš, Stanislav

    2014-01-01

    Roč. 36, č. 4 (2014), s. 23-23 R&D Projects: GA ČR(CZ) GA14-10911S Institutional support: RVO:67985556 Keywords : unsupervised image segmentation Subject RIV: BD - Theory of Information http://library.utia.cas.cz/separaty/2014/RO/haindl-0434412.pdf

  17. Class imbalance in unsupervised change detection - A diagnostic analysis from urban remote sensing

    Science.gov (United States)

    Leichtle, Tobias; Geiß, Christian; Lakes, Tobia; Taubenböck, Hannes

    2017-08-01

    Automatic monitoring of changes on the Earth's surface is an intrinsic capability and simultaneously a persistent methodological challenge in remote sensing, especially regarding imagery with very-high spatial resolution (VHR) and complex urban environments. In order to enable a high level of automatization, the change detection problem is solved in an unsupervised way to alleviate efforts associated with collection of properly encoded prior knowledge. In this context, this paper systematically investigates the nature and effects of class distribution and class imbalance in an unsupervised binary change detection application based on VHR imagery over urban areas. For this purpose, a diagnostic framework for sensitivity analysis of a large range of possible degrees of class imbalance is presented, which is of particular importance with respect to unsupervised approaches where the content of images and thus the occurrence and the distribution of classes are generally unknown a priori. Furthermore, this framework can serve as a general technique to evaluate model transferability in any two-class classification problem. The applied change detection approach is based on object-based difference features calculated from VHR imagery and subsequent unsupervised two-class clustering using k-means, genetic k-means and self-organizing map (SOM) clustering. The results from two test sites with different structural characteristics of the built environment demonstrated that classification performance is generally worse in imbalanced class distribution settings while best results were reached in balanced or close to balanced situations. Regarding suitable accuracy measures for evaluating model performance in imbalanced settings, this study revealed that the Kappa statistics show significant response to class distribution while the true skill statistic was widely insensitive to imbalanced classes. In general, the genetic k-means clustering algorithm achieved the most robust results

  18. Marine data users clustering using data mining technique

    Directory of Open Access Journals (Sweden)

    Farnaz Ghiasi

    2015-09-01

    Full Text Available The objective of this research is marine data users clustering using data mining technique. To achieve this objective, marine organizations will enable to know their data and users requirements. In this research, CRISP-DM standard model was used to implement the data mining technique. The required data was extracted from 500 marine data users profile database of Iranian National Institute for Oceanography and Atmospheric Sciences (INIOAS from 1386 to 1393. The TwoStep algorithm was used for clustering. In this research, patterns was discovered between marine data users such as student, organization and scientist and their data request (Data source, Data type, Data set, Parameter and Geographic area using clustering for the first time. The most important clusters are: Student with International data source, Chemistry data type, “World Ocean Database” dataset, Persian Gulf geographic area and Organization with Nitrate parameter. Senior managers of the marine organizations will enable to make correct decisions concerning their existing data. They will direct to planning for better data collection in the future. Also data users will guide with respect to their requests. Finally, the valuable suggestions were offered to improve the performance of marine organizations.

  19. Brain tumor segmentation based on a hybrid clustering technique

    Directory of Open Access Journals (Sweden)

    Eman Abdel-Maksoud

    2015-03-01

    This paper presents an efficient image segmentation approach using K-means clustering technique integrated with Fuzzy C-means algorithm. It is followed by thresholding and level set segmentation stages to provide an accurate brain tumor detection. The proposed technique can get benefits of the K-means clustering for image segmentation in the aspects of minimal computation time. In addition, it can get advantages of the Fuzzy C-means in the aspects of accuracy. The performance of the proposed image segmentation approach was evaluated by comparing it with some state of the art segmentation algorithms in case of accuracy, processing time, and performance. The accuracy was evaluated by comparing the results with the ground truth of each processed image. The experimental results clarify the effectiveness of our proposed approach to deal with a higher number of segmentation problems via improving the segmentation quality and accuracy in minimal execution time.

  20. Characterizing Interference in Radio Astronomy Observations through Active and Unsupervised Learning

    Science.gov (United States)

    Doran, G.

    2013-01-01

    In the process of observing signals from astronomical sources, radio astronomers must mitigate the effects of manmade radio sources such as cell phones, satellites, aircraft, and observatory equipment. Radio frequency interference (RFI) often occurs as short bursts (active learning approach in which an astronomer labels events that are most confusing to a classifier, minimizing the human effort required for classification. We also explore the use of unsupervised clustering techniques, which automatically group events into classes without user input. We apply these techniques to data from the Parkes Multibeam Pulsar Survey to characterize several million detected RFI events from over a thousand hours of observation.

  1. Distributed cluster management techniques for unattended ground sensor networks

    Science.gov (United States)

    Essawy, Magdi A.; Stelzig, Chad A.; Bevington, James E.; Minor, Sharon

    2005-05-01

    Smart Sensor Networks are becoming important target detection and tracking tools. The challenging problems in such networks include the sensor fusion, data management and communication schemes. This work discusses techniques used to distribute sensor management and multi-target tracking responsibilities across an ad hoc, self-healing cluster of sensor nodes. Although miniaturized computing resources possess the ability to host complex tracking and data fusion algorithms, there still exist inherent bandwidth constraints on the RF channel. Therefore, special attention is placed on the reduction of node-to-node communications within the cluster by minimizing unsolicited messaging, and distributing the sensor fusion and tracking tasks onto local portions of the network. Several challenging problems are addressed in this work including track initialization and conflict resolution, track ownership handling, and communication control optimization. Emphasis is also placed on increasing the overall robustness of the sensor cluster through independent decision capabilities on all sensor nodes. Track initiation is performed using collaborative sensing within a neighborhood of sensor nodes, allowing each node to independently determine if initial track ownership should be assumed. This autonomous track initiation prevents the formation of duplicate tracks while eliminating the need for a central "management" node to assign tracking responsibilities. Track update is performed as an ownership node requests sensor reports from neighboring nodes based on track error covariance and the neighboring nodes geo-positional location. Track ownership is periodically recomputed using propagated track states to determine which sensing node provides the desired coverage characteristics. High fidelity multi-target simulation results are presented, indicating the distribution of sensor management and tracking capabilities to not only reduce communication bandwidth consumption, but to also

  2. Unsupervised EEG analysis for automated epileptic seizure detection

    Science.gov (United States)

    Birjandtalab, Javad; Pouyan, Maziyar Baran; Nourani, Mehrdad

    2016-07-01

    Epilepsy is a neurological disorder which can, if not controlled, potentially cause unexpected death. It is extremely crucial to have accurate automatic pattern recognition and data mining techniques to detect the onset of seizures and inform care-givers to help the patients. EEG signals are the preferred biosignals for diagnosis of epileptic patients. Most of the existing pattern recognition techniques used in EEG analysis leverage the notion of supervised machine learning algorithms. Since seizure data are heavily under-represented, such techniques are not always practical particularly when the labeled data is not sufficiently available or when disease progression is rapid and the corresponding EEG footprint pattern will not be robust. Furthermore, EEG pattern change is highly individual dependent and requires experienced specialists to annotate the seizure and non-seizure events. In this work, we present an unsupervised technique to discriminate seizures and non-seizures events. We employ power spectral density of EEG signals in different frequency bands that are informative features to accurately cluster seizure and non-seizure events. The experimental results tried so far indicate achieving more than 90% accuracy in clustering seizure and non-seizure events without having any prior knowledge on patient's history.

  3. Cluster analysis of signal-intensity time course in dynamic breast MRI: does unsupervised vector quantization help to evaluate small mammographic lesions?

    Energy Technology Data Exchange (ETDEWEB)

    Leinsinger, Gerda; Schlossbauer, Thomas; Scherr, Michael; Lange, Oliver; Reiser, Maximilian; Wismueller, Axel [Institute for Clinical Radiology University of Munich, Munich (Germany)

    2006-05-15

    We examined whether neural network clustering could support the characterization of diagnostically challenging breast lesions in dynamic magnetic resonance imaging (MRI). We examined 88 patients with 92 breast lesions (51 malignant, 41 benign). Lesions were detected by mammography and classified Breast Imaging and Reporting Data System (BIRADS) III (median diameter 14 mm). MRI was performed with a dynamic T1-weighted gradient echo sequence (one precontrast and five postcontrast series). Lesions with an initial contrast enhancement {>=}50% were selected with semiautomatic segmentation. For conventional analysis, we calculated the mean initial signal increase and postinitial course of all voxels included in a lesion. Secondly, all voxels within the lesions were divided into four clusters using minimal-free-energy vector quantization (VQ). With conventional analysis, maximum accuracy in detecting breast cancer was 71%. With VQ, a maximum accuracy of 75% was observed. The slight improvement using VQ was mainly achieved by an increase of sensitivity, especially in invasive lobular carcinoma and ductal carcinoma in situ (DCIS). For lesion size, a high correlation between different observers was found (R{sup 2} = 0.98). VQ slightly improved the discrimination between malignant and benign indeterminate lesions (BIRADS III) in comparison with a standard evaluation method. (orig.)

  4. Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling.

    Science.gov (United States)

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2017-06-01

    Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.

  5. Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling

    Science.gov (United States)

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2017-06-01

    Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.

  6. Unsupervised text mining methods for literature analysis: a case study for Thomas Pynchon's V.

    Directory of Open Access Journals (Sweden)

    Christos Iraklis Tsatsoulis

    2013-08-01

    Full Text Available We investigate the use of unsupervised text mining methods for the analysis of prose literature works, using Thomas Pynchon's novel 'V'. as a case study. Our results suggest that such methods may be employed to reveal meaningful information regarding the novel’s structure. We report results using a wide variety of clustering algorithms, several distinct distance functions, and different visualization techniques. The application of a simple topic model is also demonstrated. We discuss the meaningfulness of our results along with the limitations of our approach, and we suggest some possible paths for further study.

  7. Content Discovery from Composite Audio : An unsupervised approach

    NARCIS (Netherlands)

    Lu, L.

    2009-01-01

    In this thesis, we developed and assessed a novel robust and unsupervised framework for semantic inference from composite audio signals. We focused on the problem of detecting audio scenes and grouping them into meaningful clusters. Our approach addressed all major steps in a general process of

  8. An Alternative Approach to Mapping Thermophysical Units from Martian Thermal Inertia and Albedo Data Using a Combination of Unsupervised Classification Techniques

    Directory of Open Access Journals (Sweden)

    Eriita Jones

    2014-06-01

    Full Text Available Thermal inertia and albedo provide information on the distribution of surface materials on Mars. These parameters have been mapped globally on Mars by the Thermal Emission Spectrometer (TES onboard the Mars Global Surveyor. Two-dimensional clusters of thermal inertia and albedo reflect the thermophysical attributes of the dominant materials on the surface. In this paper three automated, non-deterministic, algorithmic classification methods are employed for defining thermophysical units: Expectation Maximisation of a Gaussian Mixture Model; Iterative Self-Organizing Data Analysis Technique (ISODATA; and Maximum Likelihood. We analyse the behaviour of the thermophysical classes resulting from the three classifiers, operating on the 2007 TES thermal inertia and albedo datasets. Producing a rigorous mapping of thermophysical classes at ~3 km/pixel resolution remains important for constraining the geologic processes that have shaped the Martian surface on a regional scale, and for choosing appropriate landing sites. The results from applying these algorithms are compared to geologic maps, surface data from lander missions, features derived from imaging, and previous classifications of thermophysical units which utilized manual (and potentially more time consuming classification methods. These comparisons comprise data suitable for validation of our classifications. Our work shows that a combination of the algorithms—ISODATA and Maximum Likelihood—optimises the sensitivity to the underlying dataspace, and that new information on Martian surface materials can be obtained by using these methods. We demonstrate that the algorithms used here can be applied to define a finer partitioning of albedo and thermal inertia for a more detailed mapping of surface materials, grain sizes and thermal behaviour of the Martian surface and shallow subsurface, at the ~3 km scale.

  9. An unsupervised strategy for biomedical image segmentation

    Directory of Open Access Journals (Sweden)

    Roberto Rodríguez

    2010-09-01

    Full Text Available Roberto Rodríguez1, Rubén Hernández21Digital Signal Processing Group, Institute of Cybernetics, Mathematics, and Physics, Havana, Cuba; 2Interdisciplinary Professional Unit of Engineering and Advanced Technology, IPN, MexicoAbstract: Many segmentation techniques have been published, and some of them have been widely used in different application problems. Most of these segmentation techniques have been motivated by specific application purposes. Unsupervised methods, which do not assume any prior scene knowledge can be learned to help the segmentation process, and are obviously more challenging than the supervised ones. In this paper, we present an unsupervised strategy for biomedical image segmentation using an algorithm based on recursively applying mean shift filtering, where entropy is used as a stopping criterion. This strategy is proven with many real images, and a comparison is carried out with manual segmentation. With the proposed strategy, errors less than 20% for false positives and 0% for false negatives are obtained.Keywords: segmentation, mean shift, unsupervised segmentation, entropy

  10. Scalable Clustering of High-Dimensional Data Technique Using SPCM with Ant Colony Optimization Intelligence

    Directory of Open Access Journals (Sweden)

    Thenmozhi Srinivasan

    2015-01-01

    Full Text Available Clusters of high-dimensional data techniques are emerging, according to data noisy and poor quality challenges. This paper has been developed to cluster data using high-dimensional similarity based PCM (SPCM, with ant colony optimization intelligence which is effective in clustering nonspatial data without getting knowledge about cluster number from the user. The PCM becomes similarity based by using mountain method with it. Though this is efficient clustering, it is checked for optimization using ant colony algorithm with swarm intelligence. Thus the scalable clustering technique is obtained and the evaluation results are checked with synthetic datasets.

  11. Performance analysis of clustering techniques over microarray data: A case study

    Science.gov (United States)

    Dash, Rasmita; Misra, Bijan Bihari

    2018-03-01

    Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.

  12. Classification of protein profiles using fuzzy clustering techniques

    DEFF Research Database (Denmark)

    Karemore, Gopal; Mullick, Jhinuk B.; Sujatha, R.

    2010-01-01

     Present  study  has  brought  out  a  comparison  of PCA  and  fuzzy  clustering  techniques  in  classifying  protein profiles  (chromatogram)  of  homogenates  of  different  tissue origins:  Ovarian,  Cervix,  Oral  cancers,  which  were  acquired using HPLC–LIF (High Performance Liquid...... Chromatography- Laser   Induced   Fluorescence)   method   developed   in   our laboratory. Study includes 11 chromatogram spectra each from oral,  cervical,  ovarian  cancers  as  well  as  healthy  volunteers. Generally  multivariate  analysis  like  PCA  demands  clear  data that   is   devoid   of   day......   PCA   mapping   in   classifying   various cancers from healthy spectra with classification rate up to 95 % from  60%.  Methods  are  validated  using  various  clustering indexes   and   shows   promising   improvement   in   developing optical pathology like HPLC-LIF for early detection of various...

  13. Performance Based Clustering for Benchmarking of Container Ports: an Application of Dea and Cluster Analysis Technique

    Directory of Open Access Journals (Sweden)

    Jie Wu

    2010-12-01

    Full Text Available The operational performance of container ports has received more and more attentions in both academic and practitioner circles, the performance evaluation and process improvement of container ports have also been the focus of several studies. In this paper, Data Envelopment Analysis (DEA, an effective tool for relative efficiency assessment, is utilized for measuring the performances and benchmarking of the 77 world container ports in 2007. The used approaches in the current study consider four inputs (Capacity of Cargo Handling Machines, Number of Berths, Terminal Area and Storage Capacity and a single output (Container Throughput. The results for the efficiency scores are analyzed, and a unique ordering of the ports based on average cross efficiency is provided, also cluster analysis technique is used to select the more appropriate targets for poorly performing ports to use as benchmarks.

  14. Unsupervised Classification Using Immune Algorithm

    OpenAIRE

    Al-Muallim, M. T.; El-Kouatly, R.

    2012-01-01

    Unsupervised classification algorithm based on clonal selection principle named Unsupervised Clonal Selection Classification (UCSC) is proposed in this paper. The new proposed algorithm is data driven and self-adaptive, it adjusts its parameters to the data to make the classification operation as fast as possible. The performance of UCSC is evaluated by comparing it with the well known K-means algorithm using several artificial and real-life data sets. The experiments show that the proposed U...

  15. Unsupervised Language Acquisition

    Science.gov (United States)

    de Marcken, Carl

    1996-11-01

    This thesis presents a computational theory of unsupervised language acquisition, precisely defining procedures for learning language from ordinary spoken or written utterances, with no explicit help from a teacher. The theory is based heavily on concepts borrowed from machine learning and statistical estimation. In particular, learning takes place by fitting a stochastic, generative model of language to the evidence. Much of the thesis is devoted to explaining conditions that must hold for this general learning strategy to arrive at linguistically desirable grammars. The thesis introduces a variety of technical innovations, among them a common representation for evidence and grammars, and a learning strategy that separates the ``content'' of linguistic parameters from their representation. Algorithms based on it suffer from few of the search problems that have plagued other computational approaches to language acquisition. The theory has been tested on problems of learning vocabularies and grammars from unsegmented text and continuous speech, and mappings between sound and representations of meaning. It performs extremely well on various objective criteria, acquiring knowledge that causes it to assign almost exactly the same structure to utterances as humans do. This work has application to data compression, language modeling, speech recognition, machine translation, information retrieval, and other tasks that rely on either structural or stochastic descriptions of language.

  16. Supervised versus unsupervised categorization: two sides of the same coin?

    Science.gov (United States)

    Pothos, Emmanuel M; Edwards, Darren J; Perlman, Amotz

    2011-09-01

    Supervised and unsupervised categorization have been studied in separate research traditions. A handful of studies have attempted to explore a possible convergence between the two. The present research builds on these studies, by comparing the unsupervised categorization results of Pothos et al. ( 2011 ; Pothos et al., 2008 ) with the results from two procedures of supervised categorization. In two experiments, we tested 375 participants with nine different stimulus sets and examined the relation between ease of learning of a classification, memory for a classification, and spontaneous preference for a classification. After taking into account the role of the number of category labels (clusters) in supervised learning, we found the three variables to be closely associated with each other. Our results provide encouragement for researchers seeking unified theoretical explanations for supervised and unsupervised categorization, but raise a range of challenging theoretical questions.

  17. Unsupervised text mining for assessing and augmenting GWAS results.

    Science.gov (United States)

    Ailem, Melissa; Role, François; Nadif, Mohamed; Demenais, Florence

    2016-04-01

    Text mining can assist in the analysis and interpretation of large-scale biomedical data, helping biologists to quickly and cheaply gain confirmation of hypothesized relationships between biological entities. We set this question in the context of genome-wide association studies (GWAS), an actively emerging field that contributed to identify many genes associated with multifactorial diseases. These studies allow to identify groups of genes associated with the same phenotype, but provide no information about the relationships between these genes. Therefore, our objective is to leverage unsupervised text mining techniques using text-based cosine similarity comparisons and clustering applied to candidate and random gene vectors, in order to augment the GWAS results. We propose a generic framework which we used to characterize the relationships between 10 genes reported associated with asthma by a previous GWAS. The results of this experiment showed that the similarities between these 10 genes were significantly stronger than would be expected by chance (one-sided p-value<0.01). The clustering of observed and randomly selected gene also allowed to generate hypotheses about potential functional relationships between these genes and thus contributed to the discovery of new candidate genes for asthma. Copyright © 2016 Elsevier Inc. All rights reserved.

  18. Automatic microseismic event picking via unsupervised machine learning

    Science.gov (United States)

    Chen, Yangkang

    2018-01-01

    Effective and efficient arrival picking plays an important role in microseismic and earthquake data processing and imaging. Widely used short-term-average long-term-average ratio (STA/LTA) based arrival picking algorithms suffer from the sensitivity to moderate-to-strong random ambient noise. To make the state-of-the-art arrival picking approaches effective, microseismic data need to be first pre-processed, for example, removing sufficient amount of noise, and second analysed by arrival pickers. To conquer the noise issue in arrival picking for weak microseismic or earthquake event, I leverage the machine learning techniques to help recognizing seismic waveforms in microseismic or earthquake data. Because of the dependency of supervised machine learning algorithm on large volume of well-designed training data, I utilize an unsupervised machine learning algorithm to help cluster the time samples into two groups, that is, waveform points and non-waveform points. The fuzzy clustering algorithm has been demonstrated to be effective for such purpose. A group of synthetic, real microseismic and earthquake data sets with different levels of complexity show that the proposed method is much more robust than the state-of-the-art STA/LTA method in picking microseismic events, even in the case of moderately strong background noise.

  19. COMPARISON AND EVALUATION OF CLUSTER BASED IMAGE SEGMENTATION TECHNIQUES

    OpenAIRE

    Hetangi D. Mehta*, Daxa Vekariya, Pratixa Badelia

    2017-01-01

    Image segmentation is the classification of an image into different groups. Numerous algorithms using different approaches have been proposed for image segmentation. A major challenge in segmentation evaluation comes from the fundamental conflict between generality and objectivity. A review is done on different types of clustering methods used for image segmentation. Also a methodology is proposed to classify and quantify different clustering algorithms based on their consistency in different...

  20. An Efficient Optimization Method for Solving Unsupervised Data Classification Problems

    Directory of Open Access Journals (Sweden)

    Parvaneh Shabanzadeh

    2015-01-01

    Full Text Available Unsupervised data classification (or clustering analysis is one of the most useful tools and a descriptive task in data mining that seeks to classify homogeneous groups of objects based on similarity and is used in many medical disciplines and various applications. In general, there is no single algorithm that is suitable for all types of data, conditions, and applications. Each algorithm has its own advantages, limitations, and deficiencies. Hence, research for novel and effective approaches for unsupervised data classification is still active. In this paper a heuristic algorithm, Biogeography-Based Optimization (BBO algorithm, was adapted for data clustering problems by modifying the main operators of BBO algorithm, which is inspired from the natural biogeography distribution of different species. Similar to other population-based algorithms, BBO algorithm starts with an initial population of candidate solutions to an optimization problem and an objective function that is calculated for them. To evaluate the performance of the proposed algorithm assessment was carried on six medical and real life datasets and was compared with eight well known and recent unsupervised data classification algorithms. Numerical results demonstrate that the proposed evolutionary optimization algorithm is efficient for unsupervised data classification.

  1. SU(3) techniques for angular momentum projected matrix elements in multi-cluster problems

    International Nuclear Information System (INIS)

    Hecht, K.T.; Zahn, W.

    1978-01-01

    In the theory of integral transforms for the evaluation of the resonating group kernels needed for cluster model calculations, the evaluation of matrix elements in an angular momentum coupled basis has proved to be difficult for cluster problems involving more than two fragments. For multi-cluster wave functions SU(3) coupling and recoupling techniques can furnish a tool for the practical evaluation matrix elements in an angular momentum coupled basis if the several relative motion harmonic oscillator functions in Bargmann space have simple SU(3) coupling properties. The method is illustrated by a three-cluster problem, such as 12 C = α + α + α, involving three 1 S clusters. 2 references

  2. Supervised / unsupervised change detection

    OpenAIRE

    de Alwis Pitts, Dilkushi; De Vecchi, Daniele; Harb, Mostapha; So, Emily; Dell'Acqua, Fabio

    2014-01-01

    The aim of this deliverable is to provide an overview of the state of the art in change detection techniques and a critique of what could be programmed to derive SENSUM products. It is the product of the collaboration between UCAM and EUCENTRE. The document includes as a necessary requirement a discussion about a proposed technique for co-registration. Since change detection techniques require an assessment of a series of images and the basic process involves comparing and contrasting the sim...

  3. The k-means clustering technique: General considerations and implementation in Mathematica

    Directory of Open Access Journals (Sweden)

    Laurence Morissette

    2013-02-01

    Full Text Available Data clustering techniques are valuable tools for researchers working with large databases of multivariate data. In this tutorial, we present a simple yet powerful one: the k-means clustering technique, through three different algorithms: the Forgy/Lloyd, algorithm, the MacQueen algorithm and the Hartigan and Wong algorithm. We then present an implementation in Mathematica and various examples of the different options available to illustrate the application of the technique.

  4. Integrating the Supervised Information into Unsupervised Learning

    Directory of Open Access Journals (Sweden)

    Ping Ling

    2013-01-01

    Full Text Available This paper presents an assembling unsupervised learning framework that adopts the information coming from the supervised learning process and gives the corresponding implementation algorithm. The algorithm consists of two phases: extracting and clustering data representatives (DRs firstly to obtain labeled training data and then classifying non-DRs based on labeled DRs. The implementation algorithm is called SDSN since it employs the tuning-scaled Support vector domain description to collect DRs, uses spectrum-based method to cluster DRs, and adopts the nearest neighbor classifier to label non-DRs. The validation of the clustering procedure of the first-phase is analyzed theoretically. A new metric is defined data dependently in the second phase to allow the nearest neighbor classifier to work with the informed information. A fast training approach for DRs’ extraction is provided to bring more efficiency. Experimental results on synthetic and real datasets verify that the proposed idea is of correctness and performance and SDSN exhibits higher popularity in practice over the traditional pure clustering procedure.

  5. Unsupervised Condition Change Detection In Large Diesel Engines

    DEFF Research Database (Denmark)

    Pontoppidan, Niels Henrik; Larsen, Jan

    2003-01-01

    This paper presents a new method for unsupervised change detection which combines independent component modeling and probabilistic outlier etection. The method further provides a compact data representation, which is amenable to interpretation, i.e., the detected condition changes can be investig...... be investigated further. The method is successfully applied to unsupervised condition change detection in large diesel engines from acoustical emission sensor signal and compared to more classical techniques based on principal component analysis and Gaussian mixture models.......This paper presents a new method for unsupervised change detection which combines independent component modeling and probabilistic outlier etection. The method further provides a compact data representation, which is amenable to interpretation, i.e., the detected condition changes can...

  6. Techniques for Representation of Regional Clusters in Geographical In-formation Systems

    Directory of Open Access Journals (Sweden)

    Adriana REVEIU

    2011-01-01

    Full Text Available This paper provides an overview of visualization techniques adapted for regional clusters presentation in Geographic Information Systems. Clusters are groups of companies and insti-tutions co-located in a specific geographic region and linked by interdependencies in providing a related group of products and services. The regional clusters can be visualized by projecting the data into two-dimensional space or using parallel coordinates. Cluster membership is usually represented by different colours or by dividing clusters into several panels of a grille display. Taking into consideration regional clusters requirements and the multilevel administrative division of the Romania’s territory, I used two cartograms: NUTS2- regions and NUTS3- counties, to illustrate the tools for regional clusters representation.

  7. Measuring customer loyalty using an extended RFM and clustering technique

    Directory of Open Access Journals (Sweden)

    Zohre Zalaghi

    2014-05-01

    Full Text Available Today, the ability to identify the profitable customers, creating a long-term loyalty in them and expanding the existing relationships are considered as the key and competitive factors for a customer-oriented organization. The prerequisite for having such competitive factors is the presence of a very powerful customer relationship management (CRM. The accurate evaluation of customers’ profitability is considered as one of the fundamental reasons that lead to a successful customer relationship management. RFM is a method that scrutinizes three properties, namely recency, frequency and monetary for each customer and scores customers based on these properties. In this paper, a method is introduced that obtains the behavioral traits of customers using the extended RFM approach and having the information related to the customers of an organization; it then classifies the customers using the K-means algorithm and finally scores the customers in terms of their loyalty in each cluster. In the suggested approach, first the customers’ records will be clustered and then the RFM model items will be specified through selecting the effective properties on the customers’ loyalty rate using the multipurpose genetic algorithm. Next, they will be scored in each cluster based on the effect that they have on the loyalty rate. The influence rate each property has on loyalty is calculated using the Spearman’s correlation coefficient.

  8. Using Machine Learning Techniques in the Analysis of Oceanographic Data

    Science.gov (United States)

    Falcinelli, K. E.; Abuomar, S.

    2017-12-01

    Acoustic Doppler Current Profilers (ADCPs) are oceanographic tools capable of collecting large amounts of current profile data. Using unsupervised machine learning techniques such as principal component analysis, fuzzy c-means clustering, and self-organizing maps, patterns and trends in an ADCP dataset are found. Cluster validity algorithms such as visual assessment of cluster tendency and clustering index are used to determine the optimal number of clusters in the ADCP dataset. These techniques prove to be useful in analysis of ADCP data and demonstrate potential for future use in other oceanographic applications.

  9. Social Learning Network Analysis Model to Identify Learning Patterns Using Ontology Clustering Techniques and Meaningful Learning

    Science.gov (United States)

    Firdausiah Mansur, Andi Besse; Yusof, Norazah

    2013-01-01

    Clustering on Social Learning Network still not explored widely, especially when the network focuses on e-learning system. Any conventional methods are not really suitable for the e-learning data. SNA requires content analysis, which involves human intervention and need to be carried out manually. Some of the previous clustering techniques need…

  10. Semi-supervised and unsupervised extreme learning machines.

    Science.gov (United States)

    Huang, Gao; Song, Shiji; Gupta, Jatinder N D; Wu, Cheng

    2014-12-01

    Extreme learning machines (ELMs) have proven to be efficient and effective learning mechanisms for pattern classification and regression. However, ELMs are primarily applied to supervised learning problems. Only a few existing research papers have used ELMs to explore unlabeled data. In this paper, we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization, thus greatly expanding the applicability of ELMs. The key advantages of the proposed algorithms are as follows: 1) both the semi-supervised ELM (SS-ELM) and the unsupervised ELM (US-ELM) exhibit learning capability and computational efficiency of ELMs; 2) both algorithms naturally handle multiclass classification or multicluster clustering; and 3) both algorithms are inductive and can handle unseen data at test time directly. Moreover, it is shown in this paper that all the supervised, semi-supervised, and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping, which is the key concept in ELM theory. Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with the state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency.

  11. Comparison of Clustering Techniques for Residential Energy Behavior using Smart Meter Data

    Energy Technology Data Exchange (ETDEWEB)

    Jin, Ling; Lee, Doris; Sim, Alex; Borgeson, Sam; Wu, Kesheng; Spurlock, C. Anna; Todd, Annika

    2017-03-21

    Current practice in whole time series clustering of residential meter data focuses on aggregated or subsampled load data at the customer level, which ignores day-to-day differences within customers. This information is critical to determine each customer’s suitability to various demand side management strategies that support intelligent power grids and smart energy management. Clustering daily load shapes provides fine-grained information on customer attributes and sources of variation for subsequent models and customer segmentation. In this paper, we apply 11 clustering methods to daily residential meter data. We evaluate their parameter settings and suitability based on 6 generic performance metrics and post-checking of resulting clusters. Finally, we recommend suitable techniques and parameters based on the goal of discovering diverse daily load patterns among residential customers. To the authors’ knowledge, this paper is the first robust comparative review of clustering techniques applied to daily residential load shape time series in the power systems’ literature.

  12. Unsupervised Document Embedding With CNNs

    OpenAIRE

    Liu, Chundi; Zhao, Shunan; Volkovs, Maksims

    2017-01-01

    We propose a new model for unsupervised document embedding. Leading existing approaches either require complex inference or use recurrent neural networks (RNN) that are difficult to parallelize. We take a different route and develop a convolutional neural network (CNN) embedding model. Our CNN architecture is fully parallelizable resulting in over 10x speedup in inference time over RNN models. Parallelizable architecture enables to train deeper models where each successive layer has increasin...

  13. THE EFFECT OF CLUSTERING TECHNIQUE ON WRITING EXPOSITORY ESSAYS OF EFL STUDENTS

    Directory of Open Access Journals (Sweden)

    Sabarun Sabarun

    2013-03-01

    Full Text Available The study is aimed at investigating the effectiveness of using clustering technique in writing expository essays. The aim of the study is to prove whether there is a significant difference between writing using clustering technique and writing without using it on the students’ writing achievement or not. The study belonged to experimental study by applying counterbalance procedure to collect the data. The study was conducted at the fourth semester English department students of Palangka Raya State Islamic College of 2012/ 2013 academic year. The number of the sample was 13 students. This study was restricted to two focuses: using clustering technique and without using clustering technique to write composition. Using clustering technique to write essay was one of the pre writing strategies in writing process. To answer the research problem, the t test for correlated samples was applied. The research findings showed that,it was found that the t value was 10.554.It was also found that the df (Degree of freedom of the distribution observed was 13-1= 12.  Based on the Table of t value, if df was 12, the 5% of significant level of t value was at 1.782 and the 1% of significant level of t value was at 2.179. It meant that using clustering gave facilitative effect on the students’ essay writing performance. Keywords: reading comprehension, text, scaffolding

  14. A new web-based system for unsupervised classification of satellite images from the Google Maps engine

    Science.gov (United States)

    Ferrán, Ángel; Bernabé, Sergio; García-Rodríguez, Pablo; Plaza, Antonio

    2012-10-01

    In this paper, we develop a new web-based system for unsupervised classification of satellite images available from the Google Maps engine. The system has been developed using the Google Maps API and incorporates functionalities such as unsupervised classification of image portions selected by the user (at the desired zoom level). For this purpose, we use a processing chain made up of the well-known ISODATA and k-means algorithms, followed by spatial post-processing based on majority voting. The system is currently hosted on a high performance server which performs the execution of classification algorithms and returns the obtained classification results in a very efficient way. The previous functionalities are necessary to use efficient techniques for the classification of images and the incorporation of content-based image retrieval (CBIR). Several experimental validation types of the classification results with the proposed system are performed by comparing the classification accuracy of the proposed chain by means of techniques available in the well-known Environment for Visualizing Images (ENVI) software package. The server has access to a cluster of commodity graphics processing units (GPUs), hence in future work we plan to perform the processing in parallel by taking advantage of the cluster.

  15. An Unsupervised Online Spike-Sorting Framework.

    Science.gov (United States)

    Knieling, Simeon; Sridharan, Kousik S; Belardinelli, Paolo; Naros, Georgios; Weiss, Daniel; Mormann, Florian; Gharabaghi, Alireza

    2016-08-01

    Extracellular neuronal microelectrode recordings can include action potentials from multiple neurons. To separate spikes from different neurons, they can be sorted according to their shape, a procedure referred to as spike-sorting. Several algorithms have been reported to solve this task. However, when clustering outcomes are unsatisfactory, most of them are difficult to adjust to achieve the desired results. We present an online spike-sorting framework that uses feature normalization and weighting to maximize the distinctiveness between different spike shapes. Furthermore, multiple criteria are applied to either facilitate or prevent cluster fusion, thereby enabling experimenters to fine-tune the sorting process. We compare our method to established unsupervised offline (Wave_Clus (WC)) and online (OSort (OS)) algorithms by examining their performance in sorting various test datasets using two different scoring systems (AMI and the Adamos metric). Furthermore, we evaluate sorting capabilities on intra-operative recordings using established quality metrics. Compared to WC and OS, our algorithm achieved comparable or higher scores on average and produced more convincing sorting results for intra-operative datasets. Thus, the presented framework is suitable for both online and offline analysis and could substantially improve the quality of microelectrode-based data evaluation for research and clinical application.

  16. K-means-clustering-based fiber nonlinearity equalization techniques for 64-QAM coherent optical communication system.

    Science.gov (United States)

    Zhang, Junfeng; Chen, Wei; Gao, Mingyi; Shen, Gangxiang

    2017-10-30

    In this work, we proposed two k-means-clustering-based algorithms to mitigate the fiber nonlinearity for 64-quadrature amplitude modulation (64-QAM) signal, the training-sequence assisted k-means algorithm and the blind k-means algorithm. We experimentally demonstrated the proposed k-means-clustering-based fiber nonlinearity mitigation techniques in 75-Gb/s 64-QAM coherent optical communication system. The proposed algorithms have reduced clustering complexity and low data redundancy and they are able to quickly find appropriate initial centroids and select correctly the centroids of the clusters to obtain the global optimal solutions for large k value. We measured the bit-error-ratio (BER) performance of 64-QAM signal with different launched powers into the 50-km single mode fiber and the proposed techniques can greatly mitigate the signal impairments caused by the amplified spontaneous emission noise and the fiber Kerr nonlinearity and improve the BER performance.

  17. Arrays of Size-Selected Metal Nanoparticles Formed by Cluster Ion Beam Technique

    DEFF Research Database (Denmark)

    Ceynowa, F. A.; Chirumamilla, Manohar; Zenin, Volodymyr

    2018-01-01

    Deposition of size-selected copper and silver nanoparticles (NPs) on polymers using cluster beam technique is studied. It is shown that ratio of particle embedment in the film can be controlled by simple thermal annealing. Combining electron beam lithography, cluster beam deposition, and heat...... with required configurations which can be applied for wave-guiding, resonators, in sensor technologies, and surface enhanced Raman scattering....

  18. LENR BEC Clusters on and below Wires through Cavitation and Related Techniques

    Science.gov (United States)

    Stringham, Roger; Stringham, Julie

    2011-03-01

    During the last two years I have been working on BEC cluster densities deposited just under the surface of wires, using cavitation, and other techniques. If I get the concentration high enough before the clusters dissipate, in addition to cold fusion related excess heat (and other effects, including helium-4 formation) I anticipate that it may be possible to initiate transient forms of superconductivity at room temperature.

  19. Poly(methyl methacrylate) Composites with Size-selected Silver Nanoparticles Fabricated Using Cluster Beam Technique

    DEFF Research Database (Denmark)

    Muhammad, Hanif; Juluri, Raghavendra R.; Chirumamilla, Manohar

    2016-01-01

    based on cluster beam technique allowing the formation of monocrystalline size-selected silver nanoparticles with a ±5–7% precision of diameter and controllable embedment into poly (methyl methacrylate). It is shown that the soft-landed silver clusters preserve almost spherical shape with a slight...... tendency to flattening upon impact. By controlling the polymer hardness (from viscous to soft state) prior the cluster deposition and annealing conditions after the deposition the degree of immersion of the nanoparticles into polymer can be tuned, thus, making it possible to create composites with either...

  20. Assessment of Random Assignment in Training and Test Sets using Generalized Cluster Analysis Technique

    Directory of Open Access Journals (Sweden)

    Sorana D. BOLBOACĂ

    2011-06-01

    Full Text Available Aim: The properness of random assignment of compounds in training and validation sets was assessed using the generalized cluster technique. Material and Method: A quantitative Structure-Activity Relationship model using Molecular Descriptors Family on Vertices was evaluated in terms of assignment of carboquinone derivatives in training and test sets during the leave-many-out analysis. Assignment of compounds was investigated using five variables: observed anticancer activity and four structure descriptors. Generalized cluster analysis with K-means algorithm was applied in order to investigate if the assignment of compounds was or not proper. The Euclidian distance and maximization of the initial distance using a cross-validation with a v-fold of 10 was applied. Results: All five variables included in analysis proved to have statistically significant contribution in identification of clusters. Three clusters were identified, each of them containing both carboquinone derivatives belonging to training as well as to test sets. The observed activity of carboquinone derivatives proved to be normal distributed on every. The presence of training and test sets in all clusters identified using generalized cluster analysis with K-means algorithm and the distribution of observed activity within clusters sustain a proper assignment of compounds in training and test set. Conclusion: Generalized cluster analysis using the K-means algorithm proved to be a valid method in assessment of random assignment of carboquinone derivatives in training and test sets.

  1. Focus-based filtering + clustering technique for power-law networks with small world phenomenon

    Science.gov (United States)

    Boutin, François; Thièvre, Jérôme; Hascoët, Mountaz

    2006-01-01

    Realistic interaction networks usually present two main properties: a power-law degree distribution and a small world behavior. Few nodes are linked to many nodes and adjacent nodes are likely to share common neighbors. Moreover, graph structure usually presents a dense core that is difficult to explore with classical filtering and clustering techniques. In this paper, we propose a new filtering technique accounting for a user-focus. This technique extracts a tree-like graph with also power-law degree distribution and small world behavior. Resulting structure is easily drawn with classical force-directed drawing algorithms. It is also quickly clustered and displayed into a multi-level silhouette tree (MuSi-Tree) from any user-focus. We built a new graph filtering + clustering + drawing API and report a case study.

  2. Unsupervised spike sorting based on discriminative subspace learning.

    Science.gov (United States)

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2014-01-01

    Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. In this paper, we present two unsupervised spike sorting algorithms based on discriminative subspace learning. The first algorithm simultaneously learns the discriminative feature subspace and performs clustering. It uses histogram of features in the most discriminative projection to detect the number of neurons. The second algorithm performs hierarchical divisive clustering that learns a discriminative 1-dimensional subspace for clustering in each level of the hierarchy until achieving almost unimodal distribution in the subspace. The algorithms are tested on synthetic and in-vivo data, and are compared against two widely used spike sorting methods. The comparative results demonstrate that our spike sorting methods can achieve substantially higher accuracy in lower dimensional feature space, and they are highly robust to noise. Moreover, they provide significantly better cluster separability in the learned subspace than in the subspace obtained by principal component analysis or wavelet transform.

  3. A novel unsupervised spike sorting algorithm for intracranial EEG.

    Science.gov (United States)

    Yadav, R; Shah, A K; Loeb, J A; Swamy, M N S; Agarwal, R

    2011-01-01

    This paper presents a novel, unsupervised spike classification algorithm for intracranial EEG. The method combines template matching and principal component analysis (PCA) for building a dynamic patient-specific codebook without a priori knowledge of the spike waveforms. The problem of misclassification due to overlapping classes is resolved by identifying similar classes in the codebook using hierarchical clustering. Cluster quality is visually assessed by projecting inter- and intra-clusters onto a 3D plot. Intracranial EEG from 5 patients was utilized to optimize the algorithm. The resulting codebook retains 82.1% of the detected spikes in non-overlapping and disjoint clusters. Initial results suggest a definite role of this method for both rapid review and quantitation of interictal spikes that could enhance both clinical treatment and research studies on epileptic patients.

  4. Inferring hierarchical clustering structures by deterministic annealing

    International Nuclear Information System (INIS)

    Hofmann, T.; Buhmann, J.M.

    1996-01-01

    The unsupervised detection of hierarchical structures is a major topic in unsupervised learning and one of the key questions in data analysis and representation. We propose a novel algorithm for the problem of learning decision trees for data clustering and related problems. In contrast to many other methods based on successive tree growing and pruning, we propose an objective function for tree evaluation and we derive a non-greedy technique for tree growing. Applying the principles of maximum entropy and minimum cross entropy, a deterministic annealing algorithm is derived in a meanfield approximation. This technique allows us to canonically superimpose tree structures and to fit parameters to averaged or open-quote fuzzified close-quote trees

  5. Unsupervised classification of major depression using functional connectivity MRI.

    Science.gov (United States)

    Zeng, Ling-Li; Shen, Hui; Liu, Li; Hu, Dewen

    2014-04-01

    The current diagnosis of psychiatric disorders including major depressive disorder based largely on self-reported symptoms and clinical signs may be prone to patients' behaviors and psychiatrists' bias. This study aims at developing an unsupervised machine learning approach for the accurate identification of major depression based on single resting-state functional magnetic resonance imaging scans in the absence of clinical information. Twenty-four medication-naive patients with major depression and 29 demographically similar healthy individuals underwent resting-state functional magnetic resonance imaging. We first clustered the voxels within the perigenual cingulate cortex into two subregions, a subgenual region and a pregenual region, according to their distinct resting-state functional connectivity patterns and showed that a maximum margin clustering-based unsupervised machine learning approach extracted sufficient information from the subgenual cingulate functional connectivity map to differentiate depressed patients from healthy controls with a group-level clustering consistency of 92.5% and an individual-level classification consistency of 92.5%. It was also revealed that the subgenual cingulate functional connectivity network with the highest discriminative power primarily included the ventrolateral and ventromedial prefrontal cortex, superior temporal gyri and limbic areas, indicating that these connections may play critical roles in the pathophysiology of major depression. The current study suggests that subgenual cingulate functional connectivity network signatures may provide promising objective biomarkers for the diagnosis of major depression and that maximum margin clustering-based unsupervised machine learning approaches may have the potential to inform clinical practice and aid in research on psychiatric disorders. Copyright © 2013 Wiley Periodicals, Inc.

  6. IMPLEMENTATION OF IMPROVED NETWORK LIFETIME TECHNIQUE FOR WSN USING CLUSTER HEAD ROTATION AND SIMULTANEOUS RECEPTION

    Directory of Open Access Journals (Sweden)

    Arun Vasanaperumal

    2015-11-01

    Full Text Available There are number of potential applications of Wireless Sensor Networks (WSNs like wild habitat monitoring, forest fire detection, military surveillance etc. All these applications are constrained for power from a stand along battery power source. So it becomes of paramount importance to conserve the energy utilized from this power source. A lot of efforts have gone into this area recently and it remains as one of the hot research areas. In order to improve network lifetime and reduce average power consumption, this study proposes a novel cluster head selection algorithm. Clustering is the preferred architecture when the numbers of nodes are larger because it results in considerable power savings for large networks as compared to other ones like tree or star. Since majority of the applications generally involve more than 30 nodes, clustering has gained widespread importance and is most used network architecture. The optimum number of clusters is first selected based on the number of nodes in the network. When the network is in operation the cluster heads in a cluster are rotated periodically based on the proposed cluster head selection algorithm to increase the network lifetime. Throughout the network single-hop communication methodology is assumed. This work will serve as an encouragement for further advances in the low power techniques for implementing Wireless Sensor Networks (WSNs.

  7. A Comparison of Alternative Distributed Dynamic Cluster Formation Techniques for Industrial Wireless Sensor Networks.

    Science.gov (United States)

    Gholami, Mohammad; Brennan, Robert W

    2016-01-06

    In this paper, we investigate alternative distributed clustering techniques for wireless sensor node tracking in an industrial environment. The research builds on extant work on wireless sensor node clustering by reporting on: (1) the development of a novel distributed management approach for tracking mobile nodes in an industrial wireless sensor network; and (2) an objective comparison of alternative cluster management approaches for wireless sensor networks. To perform this comparison, we focus on two main clustering approaches proposed in the literature: pre-defined clusters and ad hoc clusters. These approaches are compared in the context of their reconfigurability: more specifically, we investigate the trade-off between the cost and the effectiveness of competing strategies aimed at adapting to changes in the sensing environment. To support this work, we introduce three new metrics: a cost/efficiency measure, a performance measure, and a resource consumption measure. The results of our experiments show that ad hoc clusters adapt more readily to changes in the sensing environment, but this higher level of adaptability is at the cost of overall efficiency.

  8. Application of unsupervised pattern recognition approaches for exploration of rare earth elements in Se-Chahun iron ore, central Iran

    Science.gov (United States)

    Sarparandeh, Mohammadali; Hezarkhani, Ardeshir

    2017-12-01

    The use of efficient methods for data processing has always been of interest to researchers in the field of earth sciences. Pattern recognition techniques are appropriate methods for high-dimensional data such as geochemical data. Evaluation of the geochemical distribution of rare earth elements (REEs) requires the use of such methods. In particular, the multivariate nature of REE data makes them a good target for numerical analysis. The main subject of this paper is application of unsupervised pattern recognition approaches in evaluating geochemical distribution of REEs in the Kiruna type magnetite-apatite deposit of Se-Chahun. For this purpose, 42 bulk lithology samples were collected from the Se-Chahun iron ore deposit. In this study, 14 rare earth elements were measured with inductively coupled plasma mass spectrometry (ICP-MS). Pattern recognition makes it possible to evaluate the relations between the samples based on all these 14 features, simultaneously. In addition to providing easy solutions, discovery of the hidden information and relations of data samples is the advantage of these methods. Therefore, four clustering methods (unsupervised pattern recognition) - including a modified basic sequential algorithmic scheme (MBSAS), hierarchical (agglomerative) clustering, k-means clustering and self-organizing map (SOM) - were applied and results were evaluated using the silhouette criterion. Samples were clustered in four types. Finally, the results of this study were validated with geological facts and analysis results from, for example, scanning electron microscopy (SEM), X-ray diffraction (XRD), ICP-MS and optical mineralogy. The results of the k-means clustering and SOM methods have the best matches with reality, with experimental studies of samples and with field surveys. Since only the rare earth elements are used in this division, a good agreement of the results with lithology is considerable. It is concluded that the combination of the proposed

  9. Unsupervised image matching based on manifold alignment.

    Science.gov (United States)

    Pei, Yuru; Huang, Fengchun; Shi, Fuhao; Zha, Hongbin

    2012-08-01

    This paper challenges the issue of automatic matching between two image sets with similar intrinsic structures and different appearances, especially when there is no prior correspondence. An unsupervised manifold alignment framework is proposed to establish correspondence between data sets by a mapping function in the mutual embedding space. We introduce a local similarity metric based on parameterized distance curves to represent the connection of one point with the rest of the manifold. A small set of valid feature pairs can be found without manual interactions by matching the distance curve of one manifold with the curve cluster of the other manifold. To avoid potential confusions in image matching, we propose an extended affine transformation to solve the nonrigid alignment in the embedding space. The comparatively tight alignments and the structure preservation can be obtained simultaneously. The point pairs with the minimum distance after alignment are viewed as the matchings. We apply manifold alignment to image set matching problems. The correspondence between image sets of different poses, illuminations, and identities can be established effectively by our approach.

  10. Hybrid Clustering-GWO-NARX neural network technique in predicting stock price

    Science.gov (United States)

    Das, Debashish; Safa Sadiq, Ali; Mirjalili, Seyedali; Noraziah, A.

    2017-09-01

    Prediction of stock price is one of the most challenging tasks due to nonlinear nature of the stock data. Though numerous attempts have been made to predict the stock price by applying various techniques, yet the predicted price is not always accurate and even the error rate is high to some extent. Consequently, this paper endeavours to determine an efficient stock prediction strategy by implementing a combinatorial method of Grey Wolf Optimizer (GWO), Clustering and Non Linear Autoregressive Exogenous (NARX) Technique. The study uses stock data from prominent stock market i.e. New York Stock Exchange (NYSE), NASDAQ and emerging stock market i.e. Malaysian Stock Market (Bursa Malaysia), Dhaka Stock Exchange (DSE). It applies K-means clustering algorithm to determine the most promising cluster, then MGWO is used to determine the classification rate and finally the stock price is predicted by applying NARX neural network algorithm. The prediction performance gained through experimentation is compared and assessed to guide the investors in making investment decision. The result through this technique is indeed promising as it has shown almost precise prediction and improved error rate. We have applied the hybrid Clustering-GWO-NARX neural network technique in predicting stock price. We intend to work with the effect of various factors in stock price movement and selection of parameters. We will further investigate the influence of company news either positive or negative in stock price movement. We would be also interested to predict the Stock indices.

  11. The Application of Clustering Techniques to Citation Data. Research Reports Series B No. 6.

    Science.gov (United States)

    Arms, William Y.; Arms, Caroline

    This report describes research carried out as part of the Design of Information Systems in the Social Sciences (DISISS) project. Cluster analysis techniques were applied to a machine readable file of bibliographic data in the form of cited journal titles in order to identify groupings which could be used to structure bibliographic files. Practical…

  12. A Hybrid Supervised/Unsupervised Machine Learning Approach to Solar Flare Prediction

    Science.gov (United States)

    Benvenuto, Federico; Piana, Michele; Campi, Cristina; Massone, Anna Maria

    2018-01-01

    This paper introduces a novel method for flare forecasting, combining prediction accuracy with the ability to identify the most relevant predictive variables. This result is obtained by means of a two-step approach: first, a supervised regularization method for regression, namely, LASSO is applied, where a sparsity-enhancing penalty term allows the identification of the significance with which each data feature contributes to the prediction; then, an unsupervised fuzzy clustering technique for classification, namely, Fuzzy C-Means, is applied, where the regression outcome is partitioned through the minimization of a cost function and without focusing on the optimization of a specific skill score. This approach is therefore hybrid, since it combines supervised and unsupervised learning; realizes classification in an automatic, skill-score-independent way; and provides effective prediction performances even in the case of imbalanced data sets. Its prediction power is verified against NOAA Space Weather Prediction Center data, using as a test set, data in the range between 1996 August and 2010 December and as training set, data in the range between 1988 December and 1996 June. To validate the method, we computed several skill scores typically utilized in flare prediction and compared the values provided by the hybrid approach with the ones provided by several standard (non-hybrid) machine learning methods. The results showed that the hybrid approach performs classification better than all other supervised methods and with an effectiveness comparable to the one of clustering methods; but, in addition, it provides a reliable ranking of the weights with which the data properties contribute to the forecast.

  13. Unsupervised Learning of Action Primitives

    DEFF Research Database (Denmark)

    Baby, Sanmohan; Krüger, Volker; Kragic, Danica

    2010-01-01

    and scale, the use of the object can provide a strong invariant for the detection of motion primitives. In this paper we propose an unsupervised learning approach for action primitives that makes use of the human movements as well as the object state changes. We group actions according to the changes......Action representation is a key issue in imitation learning for humanoids. With the recent finding of mirror neurons there has been a growing interest in expressing actions as a combination meaningful subparts called primitives. Primitives could be thought of as an alphabet for the human actions....... In this paper we observe that human actions and objects can be seen as being intertwined: we can interpret actions from the way the body parts are moving, but as well from how their effect on the involved object. While human movements can look vastly different even under minor changes in location, orientation...

  14. Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.

    Directory of Open Access Journals (Sweden)

    Ujjwal Maulik

    Full Text Available Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request.sarkar@labri.fr.

  15. Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

    Directory of Open Access Journals (Sweden)

    Guoqi Qian

    2016-01-01

    Full Text Available Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method.

  16. Statistical Techniques Applied to Aerial Radiometric Surveys (STAARS): cluster analysis. National Uranium Resource Evaluation

    International Nuclear Information System (INIS)

    Pirkle, F.L.; Stablein, N.K.; Howell, J.A.; Wecksung, G.W.; Duran, B.S.

    1982-11-01

    One objective of the aerial radiometric surveys flown as part of the US Department of Energy's National Uranium Resource Evaluation (NURE) program was to ascertain the regional distribution of near-surface radioelement abundances. Some method for identifying groups of observations with similar radioelement values was therefore required. It is shown in this report that cluster analysis can identify such groups even when no a priori knowledge of the geology of an area exists. A method of convergent k-means cluster analysis coupled with a hierarchical cluster analysis is used to classify 6991 observations (three radiometric variables at each observation location) from the Precambrian rocks of the Copper Mountain, Wyoming, area. Another method, one that combines a principal components analysis with a convergent k-means analysis, is applied to the same data. These two methods are compared with a convergent k-means analysis that utilizes available geologic knowledge. All three methods identify four clusters. Three of the clusters represent background values for the Precambrian rocks of the area, and one represents outliers (anomalously high 214 Bi). A segmentation of the data corresponding to geologic reality as discovered by other methods has been achieved based solely on analysis of aerial radiometric data. The techniques employed are composites of classical clustering methods designed to handle the special problems presented by large data sets. 20 figures, 7 tables

  17. Statistical uncertainty of extreme wind storms over Europe derived from a probabilistic clustering technique

    Science.gov (United States)

    Walz, Michael; Leckebusch, Gregor C.

    2016-04-01

    Extratropical wind storms pose one of the most dangerous and loss intensive natural hazards for Europe. However, due to only 50 years of high quality observational data, it is difficult to assess the statistical uncertainty of these sparse events just based on observations. Over the last decade seasonal ensemble forecasts have become indispensable in quantifying the uncertainty of weather prediction on seasonal timescales. In this study seasonal forecasts are used in a climatological context: By making use of the up to 51 ensemble members, a broad and physically consistent statistical base can be created. This base can then be used to assess the statistical uncertainty of extreme wind storm occurrence more accurately. In order to determine the statistical uncertainty of storms with different paths of progression, a probabilistic clustering approach using regression mixture models is used to objectively assign storm tracks (either based on core pressure or on extreme wind speeds) to different clusters. The advantage of this technique is that the entire lifetime of a storm is considered for the clustering algorithm. Quadratic curves are found to describe the storm tracks most accurately. Three main clusters (diagonal, horizontal or vertical progression of the storm track) can be identified, each of which have their own particulate features. Basic storm features like average velocity and duration are calculated and compared for each cluster. The main benefit of this clustering technique, however, is to evaluate if the clusters show different degrees of uncertainty, e.g. more (less) spread for tracks approaching Europe horizontally (diagonally). This statistical uncertainty is compared for different seasonal forecast products.

  18. Data Clustering

    Science.gov (United States)

    Wagstaff, Kiri L.

    2012-03-01

    On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained

  19. A three-stage strategy for optimal price offering by a retailer based on clustering techniques

    International Nuclear Information System (INIS)

    Mahmoudi-Kohan, N.; Shayesteh, E.; Moghaddam, M. Parsa; Sheikh-El-Eslami, M.K.

    2010-01-01

    In this paper, an innovative strategy for optimal price offering to customers for maximizing the profit of a retailer is proposed. This strategy is based on load profile clustering techniques and includes three stages. For the purpose of clustering, an improved weighted fuzzy average K-means is proposed. Also, in this paper a new acceptance function for increasing the profit of the retailer is proposed. The new method is evaluated by implementation on a group of 300 customers of a 20 kV distribution network. (author)

  20. A three-stage strategy for optimal price offering by a retailer based on clustering techniques

    Energy Technology Data Exchange (ETDEWEB)

    Mahmoudi-Kohan, N.; Shayesteh, E. [Islamic Azad University (Garmsar Branch), Garmsar (Iran); Moghaddam, M. Parsa; Sheikh-El-Eslami, M.K. [Tarbiat Modares University, Tehran (Iran)

    2010-12-15

    In this paper, an innovative strategy for optimal price offering to customers for maximizing the profit of a retailer is proposed. This strategy is based on load profile clustering techniques and includes three stages. For the purpose of clustering, an improved weighted fuzzy average K-means is proposed. Also, in this paper a new acceptance function for increasing the profit of the retailer is proposed. The new method is evaluated by implementation on a group of 300 customers of a 20 kV distribution network. (author)

  1. Bayesian feature weighting for unsupervised learning, with application to object recognition

    OpenAIRE

    Carbonetto , Peter; De Freitas , Nando; Gustafson , Paul; Thompson , Natalie

    2003-01-01

    International audience; We present a method for variable selection/weighting in an unsupervised learning context using Bayesian shrinkage. The basis for the model parameters and cluster assignments can be computed simultaneous using an efficient EM algorithm. Applying our Bayesian shrinkage model to a complex problem in object recognition (Duygulu, Barnard, de Freitas and Forsyth 2002), our experiments yied good results.

  2. Application of Clustering Techniques for Lung Sounds to Improve Interpretability and Detection of Crackles

    Directory of Open Access Journals (Sweden)

    Germán D. Sosa

    2015-01-01

    Full Text Available Due to the subjectivity involved currently in pulmonary auscultation process and its diagnostic to evaluate the condition of respiratory airways, this work pretends to evaluate the performance of clustering algorithms such as k-means and DBSCAN to perform a computational analysis of lung sounds aiming to visualize a representation of such sounds that highlights the presence of crackles and the energy associated with them. In order to achieve that goal, Wavelet analysis techniques were used in contrast to traditional frequency analysis given the similarity between the typical waveform for a crackle and the wavelet sym4. Once the lung sound signal with isolated crackles is obtained, the clustering process groups crackles in regions of high density and provides visualization that might be useful for the diagnostic made by an expert. Evaluation suggests that k-means groups crackle more effective than DBSCAN in terms of generated clusters.

  3. A Comparison of Methods for Player Clustering via Behavioral Telemetry

    DEFF Research Database (Denmark)

    Drachen, Anders; Thurau, C.; Sifa, R.

    2013-01-01

    patterns in the behavioral data, and developing profiles that are actionable to game developers. There are numerous methods for unsupervised clustering of user behavior, e.g. k-means/c-means, Nonnegative Matrix Factorization, or Principal Component Analysis. Although all yield behavior categorizations......, interpretation of the resulting categories in terms of actual play behavior can be difficult if not impossible. In this paper, a range of unsupervised techniques are applied together with Archetypal Analysis to develop behavioral clusters from playtime data of 70,014 World of Warcraft players, covering a five......The analysis of user behavior in digital games has been aided by the introduction of user telemetry in game development, which provides unprecedented access to quantitative data on user behavior from the installed game clients of the entire population of players. Player behavior telemetry datasets...

  4. Cluster-cluster clustering

    International Nuclear Information System (INIS)

    Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C.S.; Yale Univ., New Haven, CT; California Univ., Santa Barbara; Cambridge Univ., England; Sussex Univ., Brighton, England)

    1985-01-01

    The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales. 30 references

  5. Performance of clustering techniques for solving multi depot vehicle routing problem

    Directory of Open Access Journals (Sweden)

    Eliana M. Toro-Ocampo

    2016-01-01

    Full Text Available The vehicle routing problem considering multiple depots is classified as NP-hard. MDVRP determines simultaneously the routes of a set of vehicles and aims to meet a set of clients with a known demand. The objective function of the problem is to minimize the total distance traveled by the routes given that all customers must be served considering capacity constraints in depots and vehicles. This paper presents a hybrid methodology that combines agglomerative clustering techniques to generate initial solutions with an iterated local search algorithm (ILS to solve the problem. Although previous studies clustering methods have been proposed like strategies to generate initial solutions, in this work the search is intensified on the information generated after applying the clustering technique. Besides an extensive analysis on the performance of techniques, and their effect in the final solution is performed. The operation of the proposed methodology is feasible and effective to solve the problem regarding the quality of the answers and computational times obtained on request evaluated literature

  6. An unsupervised text mining method for relation extraction from biomedical literature.

    Directory of Open Access Journals (Sweden)

    Changqin Quan

    Full Text Available The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1 Protein-protein interactions extraction, and (2 Gene-suicide association extraction. The evaluation of task (1 on the benchmark dataset (AImed corpus showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene-suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method.

  7. Unsupervised daily routine and activity discovery in smart homes.

    Science.gov (United States)

    Jie Yin; Qing Zhang; Karunanithi, Mohan

    2015-08-01

    The ability to accurately recognize daily activities of residents is a core premise of smart homes to assist with remote health monitoring. Most of the existing methods rely on a supervised model trained from a preselected and manually labeled set of activities, which are often time-consuming and costly to obtain in practice. In contrast, this paper presents an unsupervised method for discovering daily routines and activities for smart home residents. Our proposed method first uses a Markov chain to model a resident's locomotion patterns at different times of day and discover clusters of daily routines at the macro level. For each routine cluster, it then drills down to further discover room-level activities at the micro level. The automatic identification of daily routines and activities is useful for understanding indicators of functional decline of elderly people and suggesting timely interventions.

  8. A relevance vector machine technique for the automatic detection of clustered microcalcifications (Honorable Mention Poster Award)

    Science.gov (United States)

    Wei, Liyang; Yang, Yongyi; Nishikawa, Robert M.

    2005-04-01

    Microcalcification (MC) clusters in mammograms can be important early signs of breast cancer in women. Accurate detection of MC clusters is an important but challenging problem. In this paper, we propose the use of a recently developed machine learning technique -- relevance vector machine (RVM) -- for automatic detection of MCs in digitized mammograms. RVM is based on Bayesian estimation theory, and as a feature it can yield a decision function that depends on only a very small number of so-called relevance vectors. We formulate MC detection as a supervised-learning problem, and use RVM to classify if an MC object is present or not at each location in a mammogram image. MC clusters are then identified by grouping the detected MC objects. The proposed method is tested using a database of 141 clinical mammograms, and compared with a support vector machine (SVM) classifier which we developed previously. The detection performance is evaluated using the free-response receiver operating characteristic (FROC) curves. It is demonstrated that the RVM classifier matches closely with the SVM classifier in detection performance, and does so with a much sparser kernel representation than the SVM classifier. Consequently, the RVM classifier greatly reduces the computational complexity, making it more suitable for real-time processing of MC clusters in mammograms.

  9. Mastication Evaluation With Unsupervised Learning: Using an Inertial Sensor-Based System

    Science.gov (United States)

    Lucena, Caroline Vieira; Lacerda, Marcelo; Caldas, Rafael; De Lima Neto, Fernando Buarque

    2018-01-01

    There is a direct relationship between the prevalence of musculoskeletal disorders of the temporomandibular joint and orofacial disorders. A well-elaborated analysis of the jaw movements provides relevant information for healthcare professionals to conclude their diagnosis. Different approaches have been explored to track jaw movements such that the mastication analysis is getting less subjective; however, all methods are still highly subjective, and the quality of the assessments depends much on the experience of the health professional. In this paper, an accurate and non-invasive method based on a commercial low-cost inertial sensor (MPU6050) to measure jaw movements is proposed. The jaw-movement feature values are compared to the obtained with clinical analysis, showing no statistically significant difference between both methods. Moreover, We propose to use unsupervised paradigm approaches to cluster mastication patterns of healthy subjects and simulated patients with facial trauma. Two techniques were used in this paper to instantiate the method: Kohonen’s Self-Organizing Maps and K-Means Clustering. Both algorithms have excellent performances to process jaw-movements data, showing encouraging results and potential to bring a full assessment of the masticatory function. The proposed method can be applied in real-time providing relevant dynamic information for health-care professionals. PMID:29651365

  10. A new avenue for classification and prediction of olive cultivars using supervised and unsupervised algorithms.

    Directory of Open Access Journals (Sweden)

    Amir H Beiki

    Full Text Available Various methods have been used to identify cultivares of olive trees; herein we used different bioinformatics algorithms to propose new tools to classify 10 cultivares of olive based on RAPD and ISSR genetic markers datasets generated from PCR reactions. Five RAPD markers (OPA0a21, OPD16a, OP01a1, OPD16a1 and OPA0a8 and five ISSR markers (UBC841a4, UBC868a7, UBC841a14, U12BC807a and UBC810a13 selected as the most important markers by all attribute weighting models. K-Medoids unsupervised clustering run on SVM dataset was fully able to cluster each olive cultivar to the right classes. All trees (176 induced by decision tree models generated meaningful trees and UBC841a4 attribute clearly distinguished between foreign and domestic olive cultivars with 100% accuracy. Predictive machine learning algorithms (SVM and Naïve Bayes were also able to predict the right class of olive cultivares with 100% accuracy. For the first time, our results showed data mining techniques can be effectively used to distinguish between plant cultivares and proposed machine learning based systems in this study can predict new olive cultivars with the best possible accuracy.

  11. Mastication Evaluation With Unsupervised Learning: Using an Inertial Sensor-Based System.

    Science.gov (United States)

    Lucena, Caroline Vieira; Lacerda, Marcelo; Caldas, Rafael; De Lima Neto, Fernando Buarque; Rativa, Diego

    2018-01-01

    There is a direct relationship between the prevalence of musculoskeletal disorders of the temporomandibular joint and orofacial disorders. A well-elaborated analysis of the jaw movements provides relevant information for healthcare professionals to conclude their diagnosis. Different approaches have been explored to track jaw movements such that the mastication analysis is getting less subjective; however, all methods are still highly subjective, and the quality of the assessments depends much on the experience of the health professional. In this paper, an accurate and non-invasive method based on a commercial low-cost inertial sensor (MPU6050) to measure jaw movements is proposed. The jaw-movement feature values are compared to the obtained with clinical analysis, showing no statistically significant difference between both methods. Moreover, We propose to use unsupervised paradigm approaches to cluster mastication patterns of healthy subjects and simulated patients with facial trauma. Two techniques were used in this paper to instantiate the method: Kohonen's Self-Organizing Maps and K-Means Clustering. Both algorithms have excellent performances to process jaw-movements data, showing encouraging results and potential to bring a full assessment of the masticatory function. The proposed method can be applied in real-time providing relevant dynamic information for health-care professionals.

  12. Melodic pattern discovery by structural analysis via wavelets and clustering techniques

    DEFF Research Database (Denmark)

    Velarde, Gissel; Meredith, David

    We present an automatic method to support melodic pattern discovery by structural analysis of symbolic representations by means of wavelet analysis and clustering techniques. In previous work, we used the method to recognize the parent works of melodic segments, or to classify tunes into tune......-means to cluster melodic segments into groups of measured similarity and obtain a raking of the most prototypical melodic segments or patterns and their occurrences. We test the method on the JKU Patterns Development Database and evaluate it based on the ground truth defined by the MIREX 2013 Discovery of Repeated...... Themes & Sections task. We compare the results of our method to the output of geometric approaches. Finally, we discuss about the relevance of our wavelet-based analysis in relation to structure, pattern discovery, similarity and variation, and comment about the considerations of the method when used...

  13. APPLICATION OF FUZZY C-MEANS CLUSTERING TECHNIQUE IN VEHICULAR POLLUTION

    Directory of Open Access Journals (Sweden)

    Samarjit Das

    2013-07-01

    Full Text Available Presently in most of the urban areas all over the world, due to the exponential increase in traffic, vehicular pollution has become one of the key contributors to air pollution. As uncertainty prevails in the process of designating the level of pollution of a particular region, a fuzzy method can be applied to see the membership values of that region to a number of predefined clusters. Also, due to the existence of different pollutants in vehicular pollution, the data used to represent it are in the form of numerical vectors. In our work, we shall apply the fuzzy c-means technique of Bezdek on a dataset representing vehicular pollution to obtain the membership values of pollution due to vehicular emission of a city to one or more of some predefined clusters. We shall try also to see the benefits of adopting a fuzzy approach over the traditional way of determining the level of pollution of the particular region

  14. Estimating extinction using unsupervised machine learning

    Science.gov (United States)

    Meingast, Stefan; Lombardi, Marco; Alves, João

    2017-05-01

    Dust extinction is the most robust tracer of the gas distribution in the interstellar medium, but measuring extinction is limited by the systematic uncertainties involved in estimating the intrinsic colors to background stars. In this paper we present a new technique, Pnicer, that estimates intrinsic colors and extinction for individual stars using unsupervised machine learning algorithms. This new method aims to be free from any priors with respect to the column density and intrinsic color distribution. It is applicable to any combination of parameters and works in arbitrary numbers of dimensions. Furthermore, it is not restricted to color space. Extinction toward single sources is determined by fitting Gaussian mixture models along the extinction vector to (extinction-free) control field observations. In this way it becomes possible to describe the extinction for observed sources with probability densities, rather than a single value. Pnicer effectively eliminates known biases found in similar methods and outperforms them in cases of deep observational data where the number of background galaxies is significant, or when a large number of parameters is used to break degeneracies in the intrinsic color distributions. This new method remains computationally competitive, making it possible to correctly de-redden millions of sources within a matter of seconds. With the ever-increasing number of large-scale high-sensitivity imaging surveys, Pnicer offers a fast and reliable way to efficiently calculate extinction for arbitrary parameter combinations without prior information on source characteristics. The Pnicer software package also offers access to the well-established Nicer technique in a simple unified interface and is capable of building extinction maps including the Nicest correction for cloud substructure. Pnicer is offered to the community as an open-source software solution and is entirely written in Python.

  15. Automated and unsupervised detection of malarial parasites in microscopic images

    Directory of Open Access Journals (Sweden)

    Purwar Yashasvi

    2011-12-01

    Full Text Available Abstract Background Malaria is a serious infectious disease. According to the World Health Organization, it is responsible for nearly one million deaths each year. There are various techniques to diagnose malaria of which manual microscopy is considered to be the gold standard. However due to the number of steps required in manual assessment, this diagnostic method is time consuming (leading to late diagnosis and prone to human error (leading to erroneous diagnosis, even in experienced hands. The focus of this study is to develop a robust, unsupervised and sensitive malaria screening technique with low material cost and one that has an advantage over other techniques in that it minimizes human reliance and is, therefore, more consistent in applying diagnostic criteria. Method A method based on digital image processing of Giemsa-stained thin smear image is developed to facilitate the diagnostic process. The diagnosis procedure is divided into two parts; enumeration and identification. The image-based method presented here is designed to automate the process of enumeration and identification; with the main advantage being its ability to carry out the diagnosis in an unsupervised manner and yet have high sensitivity and thus reducing cases of false negatives. Results The image based method is tested over more than 500 images from two independent laboratories. The aim is to distinguish between positive and negative cases of malaria using thin smear blood slide images. Due to the unsupervised nature of method it requires minimal human intervention thus speeding up the whole process of diagnosis. Overall sensitivity to capture cases of malaria is 100% and specificity ranges from 50-88% for all species of malaria parasites. Conclusion Image based screening method will speed up the whole process of diagnosis and is more advantageous over laboratory procedures that are prone to errors and where pathological expertise is minimal. Further this method

  16. An automatic taxonomy of galaxy morphology using unsupervised machine learning

    Science.gov (United States)

    Hocking, Alex; Geach, James E.; Sun, Yi; Davey, Neil

    2018-01-01

    We present an unsupervised machine learning technique that automatically segments and labels galaxies in astronomical imaging surveys using only pixel data. Distinct from previous unsupervised machine learning approaches used in astronomy we use no pre-selection or pre-filtering of target galaxy type to identify galaxies that are similar. We demonstrate the technique on the Hubble Space Telescope (HST) Frontier Fields. By training the algorithm using galaxies from one field (Abell 2744) and applying the result to another (MACS 0416.1-2403), we show how the algorithm can cleanly separate early and late type galaxies without any form of pre-directed training for what an 'early' or 'late' type galaxy is. We then apply the technique to the HST Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) fields, creating a catalogue of approximately 60 000 classifications. We show how the automatic classification groups galaxies of similar morphological (and photometric) type and make the classifications public via a catalogue, a visual catalogue and galaxy similarity search. We compare the CANDELS machine-based classifications to human-classifications from the Galaxy Zoo: CANDELS project. Although there is not a direct mapping between Galaxy Zoo and our hierarchical labelling, we demonstrate a good level of concordance between human and machine classifications. Finally, we show how the technique can be used to identify rarer objects and present lensed galaxy candidates from the CANDELS imaging.

  17. clusters

    Indian Academy of Sciences (India)

    2017-09-27

    Sep 27, 2017 ... Author for correspondence (zh4403701@126.com). MS received 15 ... lic clusters using density functional theory (DFT)-GGA of the DMOL3 package. ... In the process of geometric optimization, con- vergence thresholds ..... and Postgraduate Research & Practice Innovation Program of. Jiangsu Province ...

  18. clusters

    Indian Academy of Sciences (India)

    environmental as well as technical problems during fuel gas utilization. ... adsorption on some alloys of Pd, namely PdAu, PdAg ... ried out on small neutral and charged Au24,26,27, Cu,28 ... study of Zanti et al.29 on Pdn (n = 1–9) clusters.

  19. Unsupervised Performance Evaluation of Image Segmentation

    Directory of Open Access Journals (Sweden)

    Chabrier Sebastien

    2006-01-01

    Full Text Available We present in this paper a study of unsupervised evaluation criteria that enable the quantification of the quality of an image segmentation result. These evaluation criteria compute some statistics for each region or class in a segmentation result. Such an evaluation criterion can be useful for different applications: the comparison of segmentation results, the automatic choice of the best fitted parameters of a segmentation method for a given image, or the definition of new segmentation methods by optimization. We first present the state of art of unsupervised evaluation, and then, we compare six unsupervised evaluation criteria. For this comparative study, we use a database composed of 8400 synthetic gray-level images segmented in four different ways. Vinet's measure (correct classification rate is used as an objective criterion to compare the behavior of the different criteria. Finally, we present the experimental results on the segmentation evaluation of a few gray-level natural images.

  20. Unsupervised learning via self-organization a dynamic approach

    CERN Document Server

    Kyan, Matthew; Jarrah, Kambiz; Guan, Ling

    2014-01-01

    To aid in intelligent data mining, this book introduces a new family of unsupervised algorithms that have a basis in self-organization, yet are free from many of the constraints typical of other well known self-organizing architectures. It then moves through a series of pertinent real world applications with regards to the processing of multimedia data from its role in generic image processing techniques such as the automated modeling and removal of impulse noise in digital images, to problems in digital asset management, and its various roles in feature extraction, visual enhancement, segmentation, and analysis of microbiological image data.

  1. Clustering of commercial fish sauce products based on an e-panel technique

    Directory of Open Access Journals (Sweden)

    Mitsutoshi Nakano

    2018-02-01

    Full Text Available Fish sauce is a brownish liquid seasoning with a characteristic flavor that is produced in Asian countries and limited areas of Europe. The types of fish and shellfish and fermentation process used in its production depend on the region from which it derives. Variations in ingredients and fermentation procedures yield end products with different smells, tastes, and colors. For this data article, we employed an electronic panel (e-panel technique including an electronic nose (e-nose, electronic tongue (e-tongue, and electronic eye (e-eye, in which smell, taste, and color are evaluated by sensors instead of the human nose, tongue, and eye to avoid subjective error. The presented data comprise clustering of 46 commercially available fish sauce products based separate e-nose, e-tongue, and e-eye test results. Sensory intensity data from the e-nose, e-tongue, and e-eye were separately classified by cluster analysis and are shown in dendrograms. The hierarchical cluster analysis indicates major three groups on e-nose and e-tongue data, and major four groups on e-eye data.

  2. Profiling Local Optima in K-Means Clustering: Developing a Diagnostic Technique

    Science.gov (United States)

    Steinley, Douglas

    2006-01-01

    Using the cluster generation procedure proposed by D. Steinley and R. Henson (2005), the author investigated the performance of K-means clustering under the following scenarios: (a) different probabilities of cluster overlap; (b) different types of cluster overlap; (c) varying samples sizes, clusters, and dimensions; (d) different multivariate…

  3. Partitional clustering algorithms

    CERN Document Server

    2015-01-01

    This book summarizes the state-of-the-art in partitional clustering. Clustering, the unsupervised classification of patterns into groups, is one of the most important tasks in exploratory data analysis. Primary goals of clustering include gaining insight into, classifying, and compressing data. Clustering has a long and rich history that spans a variety of scientific disciplines including anthropology, biology, medicine, psychology, statistics, mathematics, engineering, and computer science. As a result, numerous clustering algorithms have been proposed since the early 1950s. Among these algorithms, partitional (nonhierarchical) ones have found many applications, especially in engineering and computer science. This book provides coverage of consensus clustering, constrained clustering, large scale and/or high dimensional clustering, cluster validity, cluster visualization, and applications of clustering. Examines clustering as it applies to large and/or high-dimensional data sets commonly encountered in reali...

  4. Unsupervised machine learning account of magnetic transitions in the Hubbard model

    Science.gov (United States)

    Ch'ng, Kelvin; Vazquez, Nick; Khatami, Ehsan

    2018-01-01

    We employ several unsupervised machine learning techniques, including autoencoders, random trees embedding, and t -distributed stochastic neighboring ensemble (t -SNE), to reduce the dimensionality of, and therefore classify, raw (auxiliary) spin configurations generated, through Monte Carlo simulations of small clusters, for the Ising and Fermi-Hubbard models at finite temperatures. Results from a convolutional autoencoder for the three-dimensional Ising model can be shown to produce the magnetization and the susceptibility as a function of temperature with a high degree of accuracy. Quantum fluctuations distort this picture and prevent us from making such connections between the output of the autoencoder and physical observables for the Hubbard model. However, we are able to define an indicator based on the output of the t -SNE algorithm that shows a near perfect agreement with the antiferromagnetic structure factor of the model in two and three spatial dimensions in the weak-coupling regime. t -SNE also predicts a transition to the canted antiferromagnetic phase for the three-dimensional model when a strong magnetic field is present. We show that these techniques cannot be expected to work away from half filling when the "sign problem" in quantum Monte Carlo simulations is present.

  5. Using intelligent clustering techniques to classify the energy performance of school buildings

    Energy Technology Data Exchange (ETDEWEB)

    Santamouris, M.; Sfakianaki, K.; Papaglastra, M.; Pavlou, C.; Doukas, P.; Geros, V.; Assimakopoulos, M.N.; Zerefos, S. [University of Athens, Department of Physics, Division of Applied Physics, Laboratory of Meteorology, Athens (Greece); Mihalakakou, G.; Gaitani, N. [University of Ioannina, Department of Environmental and Natural Resources Management, Agrinio (Greece); Patargias, P. [University of Peloponnesus, Faculty of Human Sciences and Cultural Studies, Department of History, Kalamata (Greece); Primikiri, E. [University of Patras, Department of Architecture, Patras (Greece); Mitoula, R. [Charokopion University of Athens, Athens (Greece)

    2007-07-01

    The present paper deals with the energy performance, energy classification and rating and the global environmental quality of school buildings. A new energy classification technique based on intelligent clustering methodologies is proposed. Energy rating of school buildings provides specific information on their energy consumption and efficiency relative to the other buildings of similar nature and permits a better planning of interventions to improve its energy performance. The overall work reported in the present paper, is carried out in three phases. During the first phase energy consumption data have been collected through energy surveys performed in 320 schools in Greece. In the second phase an innovative energy rating scheme based on fuzzy clustering techniques has been developed, while in the third phase, 10 schools have been selected and detailed measurements of their energy efficiency and performance as well as of the global environmental quality have been performed using a specific experimental protocol. The proposed energy rating method has been applied while the main environmental and energy problems have been identified. The potential for energy and environmental improvements has been assessed. (author)

  6. Efficient computation of the elastography inverse problem by combining variational mesh adaption and a clustering technique

    International Nuclear Information System (INIS)

    Arnold, Alexander; Bruhns, Otto T; Reichling, Stefan; Mosler, Joern

    2010-01-01

    This paper is concerned with an efficient implementation suitable for the elastography inverse problem. More precisely, the novel algorithm allows us to compute the unknown stiffness distribution in soft tissue by means of the measured displacement field by considerably reducing the numerical cost compared to previous approaches. This is realized by combining and further elaborating variational mesh adaption with a clustering technique similar to those known from digital image compression. Within the variational mesh adaption, the underlying finite element discretization is only locally refined if this leads to a considerable improvement of the numerical solution. Additionally, the numerical complexity is reduced by the aforementioned clustering technique, in which the parameters describing the stiffness of the respective soft tissue are sorted according to a predefined number of intervals. By doing so, the number of unknowns associated with the elastography inverse problem can be chosen explicitly. A positive side effect of this method is the reduction of artificial noise in the data (smoothing of the solution). The performance and the rate of convergence of the resulting numerical formulation are critically analyzed by numerical examples.

  7. Surface mapping via unsupervised classification of remote sensing: application to MESSENGER/MASCS and DAWN/VIRS data.

    Science.gov (United States)

    D'Amore, M.; Le Scaon, R.; Helbert, J.; Maturilli, A.

    2017-12-01

    Machine-learning achieved unprecedented results in high-dimensional data processing tasks with wide applications in various fields. Due to the growing number of complex nonlinear systems that have to be investigated in science and the bare raw size of data nowadays available, ML offers the unique ability to extract knowledge, regardless the specific application field. Examples are image segmentation, supervised/unsupervised/ semi-supervised classification, feature extraction, data dimensionality analysis/reduction.The MASCS instrument has mapped Mercury surface in the 400-1145 nm wavelength range during orbital observations by the MESSENGER spacecraft. We have conducted k-means unsupervised hierarchical clustering to identify and characterize spectral units from MASCS observations. The results display a dichotomy: a polar and equatorial units, possibly linked to compositional differences or weathering due to irradiation. To explore possible relations between composition and spectral behavior, we have compared the spectral provinces with elemental abundance maps derived from MESSENGER's X-Ray Spectrometer (XRS).For the Vesta application on DAWN Visible and infrared spectrometer (VIR) data, we explored several Machine Learning techniques: image segmentation method, stream algorithm and hierarchical clustering.The algorithm successfully separates the Olivine outcrops around two craters on Vesta's surface [1]. New maps summarizing the spectral and chemical signature of the surface could be automatically produced.We conclude that instead of hand digging in data, scientist could choose a subset of algorithms with well known feature (i.e. efficacy on the particular problem, speed, accuracy) and focus their effort in understanding what important characteristic of the groups found in the data mean. [1] E Ammannito et al. "Olivine in an unexpected location on Vesta's surface". In: Nature 504.7478 (2013), pp. 122-125.

  8. An unsupervised adaptive strategy for constructing probabilistic roadmaps

    KAUST Repository

    Tapia, L.

    2009-05-01

    Since planning environments are complex and no single planner exists that is best for all problems, much work has been done to explore methods for selecting where and when to apply particular planners. However, these two questions have been difficult to answer, even when adaptive methods meant to facilitate a solution are applied. For example, adaptive solutions such as setting learning rates, hand-classifying spaces, and defining parameters for a library of planners have all been proposed. We demonstrate a strategy based on unsupervised learning methods that makes adaptive planning more practical. The unsupervised strategies require less user intervention, model the topology of the problem in a reasonable and efficient manner, can adapt the sampler depending on characteristics of the problem, and can easily accept new samplers as they become available. Through a series of experiments, we demonstrate that in a wide variety of environments, the regions automatically identified by our technique represent the planning space well both in number and placement.We also show that our technique has little overhead and that it out-performs two existing adaptive methods in all complex cases studied.© 2009 IEEE.

  9. GLOBULAR CLUSTER ABUNDANCES FROM HIGH-RESOLUTION, INTEGRATED-LIGHT SPECTROSCOPY. II. EXPANDING THE METALLICITY RANGE FOR OLD CLUSTERS AND UPDATED ANALYSIS TECHNIQUES

    Energy Technology Data Exchange (ETDEWEB)

    Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew [The Observatories of the Carnegie Institution for Science, 813 Santa Barbara St., Pasadena, CA 91101 (United States)

    2017-01-10

    We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ∼0.1 dex for GCs with metallicities as high as [Fe/H] = −0.3, but the abundances measured for more metal-rich clusters may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na i, Mg i, Al i, Si i, Ca i, Ti i, Ti ii, Sc ii, V i, Cr i, Mn i, Co i, Ni i, Cu i, Y ii, Zr i, Ba ii, La ii, Nd ii, and Eu ii. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe i, Ca i, Si i, Ni i, and Ba ii. The elements that show the greatest differences include Mg i and Zr i. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements.

  10. A hybrid method based on a new clustering technique and multilayer perceptron neural networks for hourly solar radiation forecasting

    International Nuclear Information System (INIS)

    Azimi, R.; Ghayekhloo, M.; Ghofrani, M.

    2016-01-01

    Highlights: • A novel clustering approach is proposed based on the data transformation approach. • A novel cluster selection method based on correlation analysis is presented. • The proposed hybrid clustering approach leads to deep learning for MLPNN. • A hybrid forecasting method is developed to predict solar radiations. • The evaluation results show superior performance of the proposed forecasting model. - Abstract: Accurate forecasting of renewable energy sources plays a key role in their integration into the grid. This paper proposes a hybrid solar irradiance forecasting framework using a Transformation based K-means algorithm, named TB K-means, to increase the forecast accuracy. The proposed clustering method is a combination of a new initialization technique, K-means algorithm and a new gradual data transformation approach. Unlike the other K-means based clustering methods which are not capable of providing a fixed and definitive answer due to the selection of different cluster centroids for each run, the proposed clustering provides constant results for different runs of the algorithm. The proposed clustering is combined with a time-series analysis, a novel cluster selection algorithm and a multilayer perceptron neural network (MLPNN) to develop the hybrid solar radiation forecasting method for different time horizons (1 h ahead, 2 h ahead, …, 48 h ahead). The performance of the proposed TB K-means clustering is evaluated using several different datasets and compared with different variants of K-means algorithm. Solar datasets with different solar radiation characteristics are also used to determine the accuracy and processing speed of the developed forecasting method with the proposed TB K-means and other clustering techniques. The results of direct comparison with other well-established forecasting models demonstrate the superior performance of the proposed hybrid forecasting method. Furthermore, a comparative analysis with the benchmark solar

  11. Machine learning in APOGEE. Unsupervised spectral classification with K-means

    Science.gov (United States)

    Garcia-Dias, Rafael; Allende Prieto, Carlos; Sánchez Almeida, Jorge; Ordovás-Pascual, Ignacio

    2018-05-01

    Context. The volume of data generated by astronomical surveys is growing rapidly. Traditional analysis techniques in spectroscopy either demand intensive human interaction or are computationally expensive. In this scenario, machine learning, and unsupervised clustering algorithms in particular, offer interesting alternatives. The Apache Point Observatory Galactic Evolution Experiment (APOGEE) offers a vast data set of near-infrared stellar spectra, which is perfect for testing such alternatives. Aims: Our research applies an unsupervised classification scheme based on K-means to the massive APOGEE data set. We explore whether the data are amenable to classification into discrete classes. Methods: We apply the K-means algorithm to 153 847 high resolution spectra (R ≈ 22 500). We discuss the main virtues and weaknesses of the algorithm, as well as our choice of parameters. Results: We show that a classification based on normalised spectra captures the variations in stellar atmospheric parameters, chemical abundances, and rotational velocity, among other factors. The algorithm is able to separate the bulge and halo populations, and distinguish dwarfs, sub-giants, RC, and RGB stars. However, a discrete classification in flux space does not result in a neat organisation in the parameters' space. Furthermore, the lack of obvious groups in flux space causes the results to be fairly sensitive to the initialisation, and disrupts the efficiency of commonly-used methods to select the optimal number of clusters. Our classification is publicly available, including extensive online material associated with the APOGEE Data Release 12 (DR12). Conclusions: Our description of the APOGEE database can help greatly with the identification of specific types of targets for various applications. We find a lack of obvious groups in flux space, and identify limitations of the K-means algorithm in dealing with this kind of data. Full Tables B.1-B.4 are only available at the CDS via

  12. Unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma.

    Science.gov (United States)

    Young, Jonathan D; Cai, Chunhui; Lu, Xinghua

    2017-10-03

    One approach to improving the personalized treatment of cancer is to understand the cellular signaling transduction pathways that cause cancer at the level of the individual patient. In this study, we used unsupervised deep learning to learn the hierarchical structure within cancer gene expression data. Deep learning is a group of machine learning algorithms that use multiple layers of hidden units to capture hierarchically related, alternative representations of the input data. We hypothesize that this hierarchical structure learned by deep learning will be related to the cellular signaling system. Robust deep learning model selection identified a network architecture that is biologically plausible. Our model selection results indicated that the 1st hidden layer of our deep learning model should contain about 1300 hidden units to most effectively capture the covariance structure of the input data. This agrees with the estimated number of human transcription factors, which is approximately 1400. This result lends support to our hypothesis that the 1st hidden layer of a deep learning model trained on gene expression data may represent signals related to transcription factor activation. Using the 3rd hidden layer representation of each tumor as learned by our unsupervised deep learning model, we performed consensus clustering on all tumor samples-leading to the discovery of clusters of glioblastoma multiforme with differential survival. One of these clusters contained all of the glioblastoma samples with G-CIMP, a known methylation phenotype driven by the IDH1 mutation and associated with favorable prognosis, suggesting that the hidden units in the 3rd hidden layer representations captured a methylation signal without explicitly using methylation data as input. We also found differentially expressed genes and well-known mutations (NF1, IDH1, EGFR) that were uniquely correlated with each of these clusters. Exploring these unique genes and mutations will allow us to

  13. A new clustering algorithm for scanning electron microscope images

    Science.gov (United States)

    Yousef, Amr; Duraisamy, Prakash; Karim, Mohammad

    2016-04-01

    A scanning electron microscope (SEM) is a type of electron microscope that produces images of a sample by scanning it with a focused beam of electrons. The electrons interact with the sample atoms, producing various signals that are collected by detectors. The gathered signals contain information about the sample's surface topography and composition. The electron beam is generally scanned in a raster scan pattern, and the beam's position is combined with the detected signal to produce an image. The most common configuration for an SEM produces a single value per pixel, with the results usually rendered as grayscale images. The captured images may be produced with insufficient brightness, anomalous contrast, jagged edges, and poor quality due to low signal-to-noise ratio, grained topography and poor surface details. The segmentation of the SEM images is a tackling problems in the presence of the previously mentioned distortions. In this paper, we are stressing on the clustering of these type of images. In that sense, we evaluate the performance of the well-known unsupervised clustering and classification techniques such as connectivity based clustering (hierarchical clustering), centroid-based clustering, distribution-based clustering and density-based clustering. Furthermore, we propose a new spatial fuzzy clustering technique that works efficiently on this type of images and compare its results against these regular techniques in terms of clustering validation metrics.

  14. Graph-based unsupervised segmentation algorithm for cultured neuronal networks' structure characterization and modeling.

    Science.gov (United States)

    de Santos-Sierra, Daniel; Sendiña-Nadal, Irene; Leyva, Inmaculada; Almendral, Juan A; Ayali, Amir; Anava, Sarit; Sánchez-Ávila, Carmen; Boccaletti, Stefano

    2015-06-01

    Large scale phase-contrast images taken at high resolution through the life of a cultured neuronal network are analyzed by a graph-based unsupervised segmentation algorithm with a very low computational cost, scaling linearly with the image size. The processing automatically retrieves the whole network structure, an object whose mathematical representation is a matrix in which nodes are identified neurons or neurons' clusters, and links are the reconstructed connections between them. The algorithm is also able to extract any other relevant morphological information characterizing neurons and neurites. More importantly, and at variance with other segmentation methods that require fluorescence imaging from immunocytochemistry techniques, our non invasive measures entitle us to perform a longitudinal analysis during the maturation of a single culture. Such an analysis furnishes the way of individuating the main physical processes underlying the self-organization of the neurons' ensemble into a complex network, and drives the formulation of a phenomenological model yet able to describe qualitatively the overall scenario observed during the culture growth. © 2014 International Society for Advancement of Cytometry.

  15. An Unsupervised Anomalous Event Detection and Interactive Analysis Framework for Large-scale Satellite Data

    Science.gov (United States)

    LIU, Q.; Lv, Q.; Klucik, R.; Chen, C.; Gallaher, D. W.; Grant, G.; Shang, L.

    2016-12-01

    Due to the high volume and complexity of satellite data, computer-aided tools for fast quality assessments and scientific discovery are indispensable for scientists in the era of Big Data. In this work, we have developed a framework for automated anomalous event detection in massive satellite data. The framework consists of a clustering-based anomaly detection algorithm and a cloud-based tool for interactive analysis of detected anomalies. The algorithm is unsupervised and requires no prior knowledge of the data (e.g., expected normal pattern or known anomalies). As such, it works for diverse data sets, and performs well even in the presence of missing and noisy data. The cloud-based tool provides an intuitive mapping interface that allows users to interactively analyze anomalies using multiple features. As a whole, our framework can (1) identify outliers in a spatio-temporal context, (2) recognize and distinguish meaningful anomalous events from individual outliers, (3) rank those events based on "interestingness" (e.g., rareness or total number of outliers) defined by users, and (4) enable interactively query, exploration, and analysis of those anomalous events. In this presentation, we will demonstrate the effectiveness and efficiency of our framework in the application of detecting data quality issues and unusual natural events using two satellite datasets. The techniques and tools developed in this project are applicable for a diverse set of satellite data and will be made publicly available for scientists in early 2017.

  16. Unsupervised heart-rate estimation in wearables with Liquid states and a probabilistic readout.

    Science.gov (United States)

    Das, Anup; Pradhapan, Paruthi; Groenendaal, Willemijn; Adiraju, Prathyusha; Rajan, Raj Thilak; Catthoor, Francky; Schaafsma, Siebren; Krichmar, Jeffrey L; Dutt, Nikil; Van Hoof, Chris

    2018-03-01

    Heart-rate estimation is a fundamental feature of modern wearable devices. In this paper we propose a machine learning technique to estimate heart-rate from electrocardiogram (ECG) data collected using wearable devices. The novelty of our approach lies in (1) encoding spatio-temporal properties of ECG signals directly into spike train and using this to excite recurrently connected spiking neurons in a Liquid State Machine computation model; (2) a novel learning algorithm; and (3) an intelligently designed unsupervised readout based on Fuzzy c-Means clustering of spike responses from a subset of neurons (Liquid states), selected using particle swarm optimization. Our approach differs from existing works by learning directly from ECG signals (allowing personalization), without requiring costly data annotations. Additionally, our approach can be easily implemented on state-of-the-art spiking-based neuromorphic systems, offering high accuracy, yet significantly low energy footprint, leading to an extended battery-life of wearable devices. We validated our approach with CARLsim, a GPU accelerated spiking neural network simulator modeling Izhikevich spiking neurons with Spike Timing Dependent Plasticity (STDP) and homeostatic scaling. A range of subjects is considered from in-house clinical trials and public ECG databases. Results show high accuracy and low energy footprint in heart-rate estimation across subjects with and without cardiac irregularities, signifying the strong potential of this approach to be integrated in future wearable devices. Copyright © 2018 Elsevier Ltd. All rights reserved.

  17. Geodesic Flow Kernel Support Vector Machine for Hyperspectral Image Classification by Unsupervised Subspace Feature Transfer

    Directory of Open Access Journals (Sweden)

    Alim Samat

    2016-03-01

    Full Text Available In order to deal with scenarios where the training data, used to deduce a model, and the validation data have different statistical distributions, we study the problem of transformed subspace feature transfer for domain adaptation (DA in the context of hyperspectral image classification via a geodesic Gaussian flow kernel based support vector machine (GFKSVM. To show the superior performance of the proposed approach, conventional support vector machines (SVMs and state-of-the-art DA algorithms, including information-theoretical learning of discriminative cluster for domain adaptation (ITLDC, joint distribution adaptation (JDA, and joint transfer matching (JTM, are also considered. Additionally, unsupervised linear and nonlinear subspace feature transfer techniques including principal component analysis (PCA, randomized nonlinear principal component analysis (rPCA, factor analysis (FA and non-negative matrix factorization (NNMF are investigated and compared. Experiments on two real hyperspectral images show the cross-image classification performances of the GFKSVM, confirming its effectiveness and suitability when applied to hyperspectral images.

  18. Unsupervised learning of mixture models based on swarm intelligence and neural networks with optimal completion using incomplete data

    Directory of Open Access Journals (Sweden)

    Ahmed R. Abas

    2012-07-01

    Full Text Available In this paper, a new algorithm is presented for unsupervised learning of finite mixture models (FMMs using data set with missing values. This algorithm overcomes the local optima problem of the Expectation-Maximization (EM algorithm via integrating the EM algorithm with Particle Swarm Optimization (PSO. In addition, the proposed algorithm overcomes the problem of biased estimation due to overlapping clusters in estimating missing values in the input data set by integrating locally-tuned general regression neural networks with Optimal Completion Strategy (OCS. A comparison study shows the superiority of the proposed algorithm over other algorithms commonly used in the literature in unsupervised learning of FMM parameters that result in minimum mis-classification errors when used in clustering incomplete data set that is generated from overlapping clusters and these clusters are largely different in their sizes.

  19. Application of unsupervised learning methods in high energy physics

    Energy Technology Data Exchange (ETDEWEB)

    Koevesarki, Peter; Nuncio Quiroz, Adriana Elizabeth; Brock, Ian C. [Physikalisches Institut, Universitaet Bonn, Bonn (Germany)

    2011-07-01

    High energy physics is a home for a variety of multivariate techniques, mainly due to the fundamentally probabilistic behaviour of nature. These methods generally require training based on some theory, in order to discriminate a known signal from a background. Nevertheless, new physics can show itself in ways that previously no one thought about, and in these cases conventional methods give little or no help. A possible way to discriminate between known processes (like vector bosons or top-quark production) or look for new physics is using unsupervised machine learning to extract the features of the data. A technique was developed, based on the combination of neural networks and the method of principal curves, to find a parametrisation of the non-linear correlations of the data. The feasibility of the method is shown on ATLAS data.

  20. Perceptual approach for unsupervised digital color restoration of cinematographic archives

    Science.gov (United States)

    Chambah, Majed; Rizzi, Alessandro; Gatta, Carlo; Besserer, Bernard; Marini, Daniele

    2003-01-01

    The cinematographic archives represent an important part of our collective memory. We present in this paper some advances in automating the color fading restoration process, especially with regard to the automatic color correction technique. The proposed color correction method is based on the ACE model, an unsupervised color equalization algorithm based on a perceptual approach and inspired by some adaptation mechanisms of the human visual system, in particular lightness constancy and color constancy. There are some advantages in a perceptual approach: mainly its robustness and its local filtering properties, that lead to more effective results. The resulting technique, is not just an application of ACE on movie images, but an enhancement of ACE principles to meet the requirements in the digital film restoration field. The presented preliminary results are satisfying and promising.

  1. Characterization-Based Molecular Design of Bio-Fuel Additives Using Chemometric and Property Clustering Techniques

    International Nuclear Information System (INIS)

    Hada, Subin; Solvason, Charles C.; Eden, Mario R.

    2014-01-01

    In this work, multivariate characterization data such as infrared spectroscopy was used as a source of descriptor data involving information on molecular architecture for designing structured molecules with tailored properties. Application of multivariate statistical techniques such as principal component analysis allowed capturing important features of the molecular architecture from enormous amount of complex data to build appropriate latent variable models. Combining the property clustering techniques and group contribution methods based on characterization (cGCM) data in a reverse problem formulation enabled identifying candidate components by combining or mixing molecular fragments until the resulting properties match the targets. The developed methodology is demonstrated using molecular design of biodiesel additive, which when mixed with off-spec biodiesel produces biodiesel that meets the desired fuel specifications. The contribution of this work is that the complex structures and orientations of the molecule can be included in the design, thereby allowing enumeration of all feasible candidate molecules that matched the identified target but were not part of original training set of molecules.

  2. Characterization-Based Molecular Design of Bio-Fuel Additives Using Chemometric and Property Clustering Techniques

    Energy Technology Data Exchange (ETDEWEB)

    Hada, Subin; Solvason, Charles C.; Eden, Mario R., E-mail: edenmar@auburn.edu [Department of Chemical Engineering, Auburn University, Auburn, AL (United States)

    2014-06-10

    In this work, multivariate characterization data such as infrared spectroscopy was used as a source of descriptor data involving information on molecular architecture for designing structured molecules with tailored properties. Application of multivariate statistical techniques such as principal component analysis allowed capturing important features of the molecular architecture from enormous amount of complex data to build appropriate latent variable models. Combining the property clustering techniques and group contribution methods based on characterization (cGCM) data in a reverse problem formulation enabled identifying candidate components by combining or mixing molecular fragments until the resulting properties match the targets. The developed methodology is demonstrated using molecular design of biodiesel additive, which when mixed with off-spec biodiesel produces biodiesel that meets the desired fuel specifications. The contribution of this work is that the complex structures and orientations of the molecule can be included in the design, thereby allowing enumeration of all feasible candidate molecules that matched the identified target but were not part of original training set of molecules.

  3. Characterization-Based Molecular Design of Biofuel Additives Using Chemometric and Property Clustering Techniques

    Directory of Open Access Journals (Sweden)

    Subin eHada

    2014-06-01

    Full Text Available In this work, multivariate characterization data such as infrared (IR spectroscopy was used as a source of descriptor data involving information on molecular architecture for designing structured molecules with tailored properties. Application of multivariate statistical techniques such as principal component analysis (PCA allowed capturing important features of the molecular architecture from complex data to build appropriate latent variable models. Combining the property clustering techniques and group contribution methods (GCM based on characterization data in a reverse problem formulation enabled identifying candidate components by combining or mixing molecular fragments until the resulting properties match the targets. The developed methodology is demonstrated using molecular design of biodiesel additive which when mixed with off-spec biodiesel produces biodiesel that meets the desired fuel specifications. The contribution of this work is that the complex structures and orientations of the molecule can be included in the design, thereby allowing enumeration of all feasible candidate molecules that matched the identified target but were not part of original training set of molecules.

  4. A competition in unsupervised color image segmentation

    Czech Academy of Sciences Publication Activity Database

    Haindl, Michal; Mikeš, Stanislav

    2016-01-01

    Roč. 57, č. 9 (2016), s. 136-151 ISSN 0031-3203 R&D Projects: GA ČR(CZ) GA14-10911S Institutional support: RVO:67985556 Keywords : Unsupervised image segmentation * Segmentation contest * Texture analysis Subject RIV: BD - Theory of Information Impact factor: 4.582, year: 2016 http://library.utia.cas.cz/separaty/2016/RO/haindl-0459179.pdf

  5. Unsupervised classification of multivariate geostatistical data: Two algorithms

    Science.gov (United States)

    Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques

    2015-12-01

    With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.

  6. Cluster analysis technique for assessing variability in cowpea (Vigna unguiculata L. Walp accessions from Nigeria

    Directory of Open Access Journals (Sweden)

    Ajayi Abiola Toyin

    2013-01-01

    Full Text Available The genetic variability among 10 accessions of cowpea, Vigna unguiculata (L. Walp was studied by the use of 13 qualitative and 13 quantitative traits. From the results on qualitative traits, dendrogram grouped the 10 accessions into two major clusters, 1 and 2.Cluster 1 had 3 accessions and cluster 2 had 2 sub-clusters (I and II, having 2 accessions in sub-cluster I and 5 accessions in sub-cluster II. The dendrogram revealed two major clusters, 1 and 2, for quantitative data, for the 10 accessions. At distance of 4 and 6, cluster 1 had two sub-clusters (I and II, with sub-cluster I having 5 accessions, sub-cluster II having 4 accessions while cluster 2 had only 1 accession. This study made the observation that identification of the right agro-morphological traits of high discriminating capacity is essential, before embarking on any genetic diversity; as it was revealed that some traits discriminated more efficiently among the accessions than others. A group of accessions, which are NGSA1, NGSA2, NGSA3, NGSA4, NGSA7, NGSA9 and NGSA10, was identified as being different from the others for number of seeds per pod, pod length, plant height, peduncle length, seed weight and number of pods per plant. These accessions may be good for cowpea improvement programs.

  7. Objective Classification of Rainfall in Northern Europe for Online Operation of Urban Water Systems Based on Clustering Techniques

    DEFF Research Database (Denmark)

    Löwe, Roland; Madsen, Henrik; McSharry, Patrick

    2016-01-01

    operators to change modes of control of their facilities. A k-means clustering technique was applied to group events retrospectively and was able to distinguish events with clearly different temporal and spatial correlation properties. For online applications, techniques based on k-means clustering...... and quadratic discriminant analysis both provided a fast and reliable identification of rain events of "high" variability, while the k-means provided the smallest number of rain events falsely identified as being of "high" variability (false hits). A simple classification method based on a threshold...

  8. Automated three-dimensional morphology-based clustering of human erythrocytes with regular shapes: stomatocytes, discocytes, and echinocytes

    Science.gov (United States)

    Ahmadzadeh, Ezat; Jaferzadeh, Keyvan; Lee, Jieun; Moon, Inkyu

    2017-07-01

    We present unsupervised clustering methods for automatic grouping of human red blood cells (RBCs) extracted from RBC quantitative phase images obtained by digital holographic microscopy into three RBC clusters with regular shapes, including biconcave, stomatocyte, and sphero-echinocyte. We select some good features related to the RBC profile and morphology, such as RBC average thickness, sphericity coefficient, and mean corpuscular volume, and clustering methods, including density-based spatial clustering applications with noise, k-medoids, and k-means, are applied to the set of morphological features. The clustering results of RBCs using a set of three-dimensional features are compared against a set of two-dimensional features. Our experimental results indicate that by utilizing the introduced set of features, two groups of biconcave RBCs and old RBCs (suffering from the sphero-echinocyte process) can be perfectly clustered. In addition, by increasing the number of clusters, the three RBC types can be effectively clustered in an automated unsupervised manner with high accuracy. The performance evaluation of the clustering techniques reveals that they can assist hematologists in further diagnosis.

  9. Electric field measurements on Cluster: comparing the double-probe and electron drift techniques

    Directory of Open Access Journals (Sweden)

    A. I. Eriksson

    2006-03-01

    Full Text Available The four Cluster satellites each carry two instruments designed for measuring the electric field: a double-probe instrument (EFW and an electron drift instrument (EDI. We compare data from the two instruments in a representative sample of plasma regions. The complementary merits and weaknesses of the two techniques are illustrated. EDI operations are confined to regions of magnetic fields above 30 nT and where wave activity and keV electron fluxes are not too high, while EFW can provide data everywhere, and can go far higher in sampling frequency than EDI. On the other hand, the EDI technique is immune to variations in the low energy plasma, while EFW sometimes detects significant nongeophysical electric fields, particularly in regions with drifting plasma, with ion energy (in eV below the spacecraft potential (in volts. We show that the polar cap is a particularly intricate region for the double-probe technique, where large nongeophysical fields regularly contaminate EFW measurments of the DC electric field. We present a model explaining this in terms of enhanced cold plasma wake effects appearing when the ion flow energy is higher than the thermal energy but below the spacecraft potential multiplied by the ion charge. We suggest that these conditions, which are typical of the polar wind and occur sporadically in other regions containing a significant low energy ion population, cause a large cold plasma wake behind the spacecraft, resulting in spurious electric fields in EFW data. This interpretation is supported by an analysis of the direction of the spurious electric field, and by showing that use of active potential control alleviates the situation.

  10. Growth of CdTe on Si(100) surface by ionized cluster beam technique: Experimental and molecular dynamics simulation

    Energy Technology Data Exchange (ETDEWEB)

    Araghi, Houshang, E-mail: araghi@aut.ac.ir [Department of Physics, Amirkabir University of Technology, Tehran (Iran, Islamic Republic of); Zabihi, Zabiholah [Department of Physics, Amirkabir University of Technology, Tehran (Iran, Islamic Republic of); Nayebi, Payman [Department of Physics, College of Technical and Engineering, Saveh Branch, Islamic Azad University, Saveh (Iran, Islamic Republic of); Ehsani, Mohammad Mahdi [Department of Physics, Amirkabir University of Technology, Tehran (Iran, Islamic Republic of)

    2016-10-15

    II–VI semiconductor CdTe was grown on the Si(100) substrate surface by the ionized cluster beam (ICB) technique. In the ICB method, when vapors of solid materials such as CdTe were ejected through a nozzle of a heated crucible into a vacuum region, nanoclusters were created by an adiabatic expansion phenomenon. The clusters thus obtained were partially ionized by electron bombardment and then accelerated onto the silicon substrate at 473 K by high potentials. The cluster size was determined using a retarding field energy analyzer. The results of X-ray diffraction measurements indicate the cubic zinc blende (ZB) crystalline structure of the CdTe thin film on the silicon substrate. The CdTe thin film prepared by the ICB method had high crystalline quality. The microscopic processes involved in the ICB deposition technique, such as impact and coalescence processes, have been studied in detail by molecular dynamics (MD) simulation.

  11. Towards automating the discovery of certain innovative design principles through a clustering-based optimization technique

    Science.gov (United States)

    Bandaru, Sunith; Deb, Kalyanmoy

    2011-09-01

    In this article, a methodology is proposed for automatically extracting innovative design principles which make a system or process (subject to conflicting objectives) optimal using its Pareto-optimal dataset. Such 'higher knowledge' would not only help designers to execute the system better, but also enable them to predict how changes in one variable would affect other variables if the system has to retain its optimal behaviour. This in turn would help solve other similar systems with different parameter settings easily without the need to perform a fresh optimization task. The proposed methodology uses a clustering-based optimization technique and is capable of discovering hidden functional relationships between the variables, objective and constraint functions and any other function that the designer wishes to include as a 'basis function'. A number of engineering design problems are considered for which the mathematical structure of these explicit relationships exists and has been revealed by a previous study. A comparison with the multivariate adaptive regression splines (MARS) approach reveals the practicality of the proposed approach due to its ability to find meaningful design principles. The success of this procedure for automated innovization is highly encouraging and indicates its suitability for further development in tackling more complex design scenarios.

  12. Development and validation of the European Cluster Assimilation Techniques run libraries

    Science.gov (United States)

    Facskó, G.; Gordeev, E.; Palmroth, M.; Honkonen, I.; Janhunen, P.; Sergeev, V.; Kauristie, K.; Milan, S.

    2012-04-01

    The European Commission funded the European Cluster Assimilation Techniques (ECLAT) project as a collaboration of five leader European universities and research institutes. A main contribution of the Finnish Meteorological Institute (FMI) is to provide a wide range global MHD runs with the Grand Unified Magnetosphere Ionosphere Coupling simulation (GUMICS). The runs are divided in two categories: Synthetic runs investigating the extent of solar wind drivers that can influence magnetospheric dynamics, as well as dynamic runs using measured solar wind data as input. Here we consider the first set of runs with synthetic solar wind input. The solar wind density, velocity and the interplanetary magnetic field had different magnitudes and orientations; furthermore two F10.7 flux values were selected for solar radiation minimum and maximum values. The solar wind parameter values were constant such that a constant stable solution was archived. All configurations were run several times with three different (-15°, 0°, +15°) tilt angles in the GSE X-Z plane. The result of the 192 simulations named so called "synthetic run library" were visualized and uploaded to the homepage of the FMI after validation. Here we present details of these runs.

  13. The Effect of Roundtable and Clustering Teaching Techniques and Students’ Personal Traits on Students’ Achievement in Descriptive Writing

    Directory of Open Access Journals (Sweden)

    Megawati Sinaga

    2017-12-01

    Full Text Available The Objectives of this paper as an experimental research was to investigate the effect of Roundtable and Clustering teaching techniques and students’ personal traits on students’ achievement in descriptive writing. The students in grade ix of SMP Negeri 2 Pancurbatu 2016/2017 school academic year were chose as the population of this research.. The research design was experimental research by using factorial design 2x2. The students were divided into two experimental groups. The experimental group was treated by using Roundtable teaching technique and control group was treated by using Clustering teaching technique. The students are classified into the introvert and extrovert personal traits by conducting the questionnaire and the students’ achievement in descriptive writing was measured by using writing test, namely ‘Analytic Scoring’ by Weigle. The data were analyzed by applying two-way analysis of variance (ANOVA at the level of significance α = 0.05. The result reveals that (1 students’ achievement in descriptive writing taught by using  Roundtable teaching technique was higher than that taught by Clustering teaching technique, with Fobs = 4.59>Ftab=3.97, (2 students’ achievement in descriptive writing with introvert  personal trait was higher than that with extrovert personal traits with Fobs=4.90 Ftable=3.97, (3 there is interaction between teaching techniques and personal traits on students’ achievement in descriptive writing with Fobs =6,58 Ftable=3.97. After computing the Tuckey-Test, the result showed that introvert students got higher achievement if they were taught by using Roundtable teaching technique while extrovert students got higher achievement if they were taught by using Clustering teaching technique.

  14. Unsupervised Anomaly Detection for Liquid-Fueled Rocket Prop...

    Data.gov (United States)

    National Aeronautics and Space Administration — Title: Unsupervised Anomaly Detection for Liquid-Fueled Rocket Propulsion Health Monitoring. Abstract: This article describes the results of applying four...

  15. Unsupervised Power Profiling for Mobile Devices

    DEFF Research Database (Denmark)

    Kjærgaard, Mikkel Baun; Blunck, Henrik

    Today, power consumption is a main limitation for mobile phones. To minimize the power consumption of popular and traditionally power-hungry location-based services requires knowledge of how individual phone features consume power, so that those features can be utilized intelligently for optimal...... power savings while at the same time maintaining good quality of service. This paper proposes an unsupervised API-level method for power profiling mobile phones based on genetic algorithms. The method enables accurate profiling of the power consumption of devices and thereby provides the information...

  16. Unsupervised Power Profiling for Mobile Devices

    DEFF Research Database (Denmark)

    Kjærgaard, Mikkel Baun; Blunck, Henrik

    2011-01-01

    Today, power consumption is a main limitation for mobile phones. To minimize the power consumption of popular and traditionally power-hungry location-based services requires knowledge of how individual phone features consume power, so that those features can be utilized intelligently for optimal...... power savings while at the same time maintaining good quality of service. This paper proposes an unsupervised API-level method for power profiling mobile phones based on genetic algorithms. The method enables accurate profiling of the power consumption of devices and thereby provides the information...

  17. Unsupervised information extraction by text segmentation

    CERN Document Server

    Cortez, Eli

    2013-01-01

    A new unsupervised approach to the problem of Information Extraction by Text Segmentation (IETS) is proposed, implemented and evaluated herein. The authors' approach relies on information available on pre-existing data to learn how to associate segments in the input string with attributes of a given domain relying on a very effective set of content-based features. The effectiveness of the content-based features is also exploited to directly learn from test data structure-based features, with no previous human-driven training, a feature unique to the presented approach. Based on the approach, a

  18. The Effect of Buzz Group Technique and Clustering Technique in Teaching Writing at the First Class of SMA HKBP I Tarutung

    Science.gov (United States)

    Pangaribuan, Tagor; Manik, Sondang

    2018-01-01

    This research held at SMA HKBP 1 Tarutung North Sumatra on the research result of test XI[superscript 2] and XI[superscript 2] students, after they got treatment in teaching writing in recount text by using buzz group and clustering technique. The average score (X) was 67.7 and the total score buzz group the average score (X) was 77.2 and in…

  19. Unsupervised learning of facial emotion decoding skills

    Directory of Open Access Journals (Sweden)

    Jan Oliver Huelle

    2014-02-01

    Full Text Available Research on the mechanisms underlying human facial emotion recognition has long focussed on genetically determined neural algorithms and often neglected the question of how these algorithms might be tuned by social learning. Here we show that facial emotion decoding skills can be significantly and sustainably improved by practise without an external teaching signal. Participants saw video clips of dynamic facial expressions of five different women and were asked to decide which of four possible emotions (anger, disgust, fear and sadness was shown in each clip. Although no external information about the correctness of the participant’s response or the sender’s true affective state was provided, participants showed a significant increase of facial emotion recognition accuracy both within and across two training sessions two days to several weeks apart. We discuss several similarities and differences between the unsupervised improvement of facial decoding skills observed in the current study, unsupervised perceptual learning of simple stimuli described in previous studies and practise effects often observed in cognitive tasks.

  20. Unsupervised learning of facial emotion decoding skills.

    Science.gov (United States)

    Huelle, Jan O; Sack, Benjamin; Broer, Katja; Komlewa, Irina; Anders, Silke

    2014-01-01

    Research on the mechanisms underlying human facial emotion recognition has long focussed on genetically determined neural algorithms and often neglected the question of how these algorithms might be tuned by social learning. Here we show that facial emotion decoding skills can be significantly and sustainably improved by practice without an external teaching signal. Participants saw video clips of dynamic facial expressions of five different women and were asked to decide which of four possible emotions (anger, disgust, fear, and sadness) was shown in each clip. Although no external information about the correctness of the participant's response or the sender's true affective state was provided, participants showed a significant increase of facial emotion recognition accuracy both within and across two training sessions two days to several weeks apart. We discuss several similarities and differences between the unsupervised improvement of facial decoding skills observed in the current study, unsupervised perceptual learning of simple stimuli described in previous studies and practice effects often observed in cognitive tasks.

  1. Unsupervised action classification using space-time link analysis

    DEFF Research Database (Denmark)

    Liu, Haowei; Feris, Rogerio; Krüger, Volker

    2010-01-01

    In this paper we address the problem of unsupervised discovery of action classes in video data. Different from all existing methods thus far proposed for this task, we present a space-time link analysis approach which matches the performance of traditional unsupervised action categorization metho...

  2. Genetic diversity of wheat grain quality and determination the best clustering technique and data type for diversity assessment

    Directory of Open Access Journals (Sweden)

    Khodadadi Mostafa

    2014-01-01

    Full Text Available Wheat is an important staple in human nutrition and improvement of its grain quality characters will have high impact on population's health. The objectives of this study were assessing variation of some grain quality characteristics in the Iranian wheat genotypes and identify the best type of data and clustering method for grouping genotypes. In this study 30 spring wheat genotypes were cultivated through randomized complete block design with three replications in 2009 and 2010 years. High significant difference among genotypes for all traits except for Sulfate, K, Br and Cl content, also deference among two years mean for all traits were no significant. Meanwhile there were significant interaction between year and genotype for all traits except Sulfate and F content. Mean values for crude protein, Zn, Fe and Ca in Mahdavi, Falat, Star, Sistan genotypes were the highest. The Ca and Br content showed the highest and the lowest broadcast heritability respectively. In this study indicated that the Root Mean Square Standard Deviation is efficient than R Squared and R Squared efficient than Semi Partial R Squared criteria for determining the best clustering technique. Also Ward method and canonical scores identified as the best clustering method and data type for grouping genotypes, respectively. Genotypes were grouped into six completely separate clusters and Roshan, Niknejad and Star genotypes from the fourth, fifth and sixth clusters had high grain quality characters in overall.

  3. Unsupervised neural networks for solving Troesch's problem

    International Nuclear Information System (INIS)

    Raja Muhammad Asif Zahoor

    2014-01-01

    In this study, stochastic computational intelligence techniques are presented for the solution of Troesch's boundary value problem. The proposed stochastic solvers use the competency of a feed-forward artificial neural network for mathematical modeling of the problem in an unsupervised manner, whereas the learning of unknown parameters is made with local and global optimization methods as well as their combinations. Genetic algorithm (GA) and pattern search (PS) techniques are used as the global search methods and the interior point method (IPM) is used for an efficient local search. The combination of techniques like GA hybridized with IPM (GA-IPM) and PS hybridized with IPM (PS-IPM) are also applied to solve different forms of the equation. A comparison of the proposed results obtained from GA, PS, IPM, PS-IPM and GA-IPM has been made with the standard solutions including well known analytic techniques of the Adomian decomposition method, the variational iterational method and the homotopy perturbation method. The reliability and effectiveness of the proposed schemes, in term of accuracy and convergence, are evaluated from the results of statistical analysis based on sufficiently large independent runs. (interdisciplinary physics and related areas of science and technology)

  4. A new hybrid imperialist competitive algorithm on data clustering

    Indian Academy of Sciences (India)

    Modified imperialist competitive algorithm; simulated annealing; ... Clustering is one of the unsupervised learning branches where a set of patterns, usually vectors ..... machine classification is based on design, operation, and/or purpose.

  5. Canonical PSO Based K-Means Clustering Approach for Real Datasets.

    Science.gov (United States)

    Dey, Lopamudra; Chakraborty, Sanjay

    2014-01-01

    "Clustering" the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.

  6. Unsupervised learning of a steerable basis for invariant image representations

    Science.gov (United States)

    Bethge, Matthias; Gerwinn, Sebastian; Macke, Jakob H.

    2007-02-01

    There are two aspects to unsupervised learning of invariant representations of images: First, we can reduce the dimensionality of the representation by finding an optimal trade-off between temporal stability and informativeness. We show that the answer to this optimization problem is generally not unique so that there is still considerable freedom in choosing a suitable basis. Which of the many optimal representations should be selected? Here, we focus on this second aspect, and seek to find representations that are invariant under geometrical transformations occuring in sequences of natural images. We utilize ideas of 'steerability' and Lie groups, which have been developed in the context of filter design. In particular, we show how an anti-symmetric version of canonical correlation analysis can be used to learn a full-rank image basis which is steerable with respect to rotations. We provide a geometric interpretation of this algorithm by showing that it finds the two-dimensional eigensubspaces of the average bivector. For data which exhibits a variety of transformations, we develop a bivector clustering algorithm, which we use to learn a basis of generalized quadrature pairs (i.e. 'complex cells') from sequences of natural images.

  7. Vibration impact acoustic emission technique for identification and analysis of defects in carbon steel tubes: Part B Cluster analysis

    Energy Technology Data Exchange (ETDEWEB)

    Halim, Zakiah Abd [Universiti Teknikal Malaysia Melaka (Malaysia); Jamaludin, Nordin; Junaidi, Syarif [Faculty of Engineering and Built, Universiti Kebangsaan Malaysia, Bangi (Malaysia); Yahya, Syed Yusainee Syed [Universiti Teknologi MARA, Shah Alam (Malaysia)

    2015-04-15

    Current steel tubes inspection techniques are invasive, and the interpretation and evaluation of inspection results are manually done by skilled personnel. Part A of this work details the methodology involved in the newly developed non-invasive, non-destructive tube inspection technique based on the integration of vibration impact (VI) and acoustic emission (AE) systems known as the vibration impact acoustic emission (VIAE) technique. AE signals have been introduced into a series of ASTM A179 seamless steel tubes using the impact hammer. Specifically, a good steel tube as the reference tube and four steel tubes with through-hole artificial defect at different locations were used in this study. The AEs propagation was captured using a high frequency sensor of AE systems. The present study explores the cluster analysis approach based on autoregressive (AR) coefficients to automatically interpret the AE signals. The results from the cluster analysis were graphically illustrated using a dendrogram that demonstrated the arrangement of the natural clusters of AE signals. The AR algorithm appears to be the more effective method in classifying the AE signals into natural groups. This approach has successfully classified AE signals for quick and confident interpretation of defects in carbon steel tubes.

  8. Vibration impact acoustic emission technique for identification and analysis of defects in carbon steel tubes: Part B Cluster analysis

    International Nuclear Information System (INIS)

    Halim, Zakiah Abd; Jamaludin, Nordin; Junaidi, Syarif; Yahya, Syed Yusainee Syed

    2015-01-01

    Current steel tubes inspection techniques are invasive, and the interpretation and evaluation of inspection results are manually done by skilled personnel. Part A of this work details the methodology involved in the newly developed non-invasive, non-destructive tube inspection technique based on the integration of vibration impact (VI) and acoustic emission (AE) systems known as the vibration impact acoustic emission (VIAE) technique. AE signals have been introduced into a series of ASTM A179 seamless steel tubes using the impact hammer. Specifically, a good steel tube as the reference tube and four steel tubes with through-hole artificial defect at different locations were used in this study. The AEs propagation was captured using a high frequency sensor of AE systems. The present study explores the cluster analysis approach based on autoregressive (AR) coefficients to automatically interpret the AE signals. The results from the cluster analysis were graphically illustrated using a dendrogram that demonstrated the arrangement of the natural clusters of AE signals. The AR algorithm appears to be the more effective method in classifying the AE signals into natural groups. This approach has successfully classified AE signals for quick and confident interpretation of defects in carbon steel tubes.

  9. POLYMER COMPOSITE FILMS WITH SIZE-SELECTED METAL NANOPARTICLES FABRICATED BY CLUSTER BEAM TECHNIQUE

    DEFF Research Database (Denmark)

    Ceynowa, F. A.; Chirumamilla, Manohar; Popok, Vladimir

    2017-01-01

    Formation of polymer films with size-selected silver and copper nanoparticles (NPs) is studied. Polymers are prepared by spin coating while NPs are fabricated and deposited utilizing a magnetron sputtering cluster apparatus. The particle embedding into the films is provided by thermal annealing...... after the deposition. The degree of immersion can be controlled by the annealing temperature and time. Together with control of cluster coverage the described approach represents an efficient method for the synthesis of thin polymer composite layers with either partially or fully embedded metal NPs....... Combining electron beam lithography, cluster beam deposition and thermal annealing allows to form ordered arrays of metal NPs on polymer films. Plasticity and flexibility of polymer host and specific properties added by coinage metal NPs open a way for different applications of such composite materials...

  10. Unsupervised Tensor Mining for Big Data Practitioners.

    Science.gov (United States)

    Papalexakis, Evangelos E; Faloutsos, Christos

    2016-09-01

    Multiaspect data are ubiquitous in modern Big Data applications. For instance, different aspects of a social network are the different types of communication between people, the time stamp of each interaction, and the location associated to each individual. How can we jointly model all those aspects and leverage the additional information that they introduce to our analysis? Tensors, which are multidimensional extensions of matrices, are a principled and mathematically sound way of modeling such multiaspect data. In this article, our goal is to popularize tensors and tensor decompositions to Big Data practitioners by demonstrating their effectiveness, outlining challenges that pertain to their application in Big Data scenarios, and presenting our recent work that tackles those challenges. We view this work as a step toward a fully automated, unsupervised tensor mining tool that can be easily and broadly adopted by practitioners in academia and industry.

  11. Dimensionality reduction with unsupervised nearest neighbors

    CERN Document Server

    Kramer, Oliver

    2013-01-01

    This book is devoted to a novel approach for dimensionality reduction based on the famous nearest neighbor method that is a powerful classification and regression approach. It starts with an introduction to machine learning concepts and a real-world application from the energy domain. Then, unsupervised nearest neighbors (UNN) is introduced as efficient iterative method for dimensionality reduction. Various UNN models are developed step by step, reaching from a simple iterative strategy for discrete latent spaces to a stochastic kernel-based algorithm for learning submanifolds with independent parameterizations. Extensions that allow the embedding of incomplete and noisy patterns are introduced. Various optimization approaches are compared, from evolutionary to swarm-based heuristics. Experimental comparisons to related methodologies taking into account artificial test data sets and also real-world data demonstrate the behavior of UNN in practical scenarios. The book contains numerous color figures to illustr...

  12. Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.

    Science.gov (United States)

    Yang, Bin; Peng, Yu; Leung, Henry Chi-Ming; Yiu, Siu-Ming; Chen, Jing-Chi; Chin, Francis Yuk-Lun

    2010-04-16

    With the rapid development of genome sequencing techniques, traditional research methods based on the isolation and cultivation of microorganisms are being gradually replaced by metagenomics, which is also known as environmental genomics. The first step, which is still a major bottleneck, of metagenomics is the taxonomic characterization of DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as "binning". Existing binning methods are based on supervised or semi-supervised approaches which rely heavily on reference genomes of known microorganisms and phylogenetic marker genes. Due to the limited availability of reference genomes and the bias and instability of marker genes, existing binning methods may not be applicable in many cases. In this paper, we present an unsupervised binning method based on the distribution of a carefully selected set of l-mers (substrings of length l in DNA fragments). From our experiments, we show that our method can accurately bin DNA fragments with various lengths and relative species abundance ratios without using any reference and training datasets. Another feature of our method is its error robustness. The binning accuracy decreases by less than 1% when the sequencing error rate increases from 0% to 5%. Note that the typical sequencing error rate of existing commercial sequencing platforms is less than 2%. We provide a new and effective tool to solve the metagenome binning problem without using any reference datasets or markers information of any known reference genomes (species). The source code of our software tool, the reference genomes of the species for generating the test datasets and the corresponding test datasets are available at http://i.cs.hku.hk/~alse/MetaCluster/.

  13. Using Clustering Techniques To Detect Usage Patterns in a Web-based Information System.

    Science.gov (United States)

    Chen, Hui-Min; Cooper, Michael D.

    2001-01-01

    This study developed an analytical approach to detecting groups with homogenous usage patterns in a Web-based information system. Principal component analysis was used for data reduction, cluster analysis for categorizing usage into groups. The methodology was demonstrated and tested using two independent samples of user sessions from the…

  14. Comparison of cluster and principal component analysis techniques to derive dietary patterns in Irish adults.

    Science.gov (United States)

    Hearty, Aine P; Gibney, Michael J

    2009-02-01

    The aims of the present study were to examine and compare dietary patterns in adults using cluster and factor analyses and to examine the format of the dietary variables on the pattern solutions (i.e. expressed as grams/day (g/d) of each food group or as the percentage contribution to total energy intake). Food intake data were derived from the North/South Ireland Food Consumption Survey 1997-9, which was a randomised cross-sectional study of 7 d recorded food and nutrient intakes of a representative sample of 1379 Irish adults aged 18-64 years. Cluster analysis was performed using the k-means algorithm and principal component analysis (PCA) was used to extract dietary factors. Food data were reduced to thirty-three food groups. For cluster analysis, the most suitable format of the food-group variable was found to be the percentage contribution to energy intake, which produced six clusters: 'Traditional Irish'; 'Continental'; 'Unhealthy foods'; 'Light-meal foods & low-fat milk'; 'Healthy foods'; 'Wholemeal bread & desserts'. For PCA, food groups in the format of g/d were found to be the most suitable format, and this revealed four dietary patterns: 'Unhealthy foods & high alcohol'; 'Traditional Irish'; 'Healthy foods'; 'Sweet convenience foods & low alcohol'. In summary, cluster and PCA identified similar dietary patterns when presented with the same dataset. However, the two dietary pattern methods required a different format of the food-group variable, and the most appropriate format of the input variable should be considered in future studies.

  15. Fuzzy clustering-based segmented attenuation correction in whole-body PET

    CERN Document Server

    Zaidi, H; Boudraa, A; Slosman, DO

    2001-01-01

    Segmented-based attenuation correction is now a widely accepted technique to reduce noise contribution of measured attenuation correction. In this paper, we present a new method for segmenting transmission images in positron emission tomography. This reduces the noise on the correction maps while still correcting for differing attenuation coefficients of specific tissues. Based on the Fuzzy C-Means (FCM) algorithm, the method segments the PET transmission images into a given number of clusters to extract specific areas of differing attenuation such as air, the lungs and soft tissue, preceded by a median filtering procedure. The reconstructed transmission image voxels are therefore segmented into populations of uniform attenuation based on the human anatomy. The clustering procedure starts with an over-specified number of clusters followed by a merging process to group clusters with similar properties and remove some undesired substructures using anatomical knowledge. The method is unsupervised, adaptive and a...

  16. WORD SENSE DISAMBIGUATION FOR TAMIL LANGUAGE USING PART-OF-SPEECH AND CLUSTERING TECHNIQUE

    Directory of Open Access Journals (Sweden)

    P. ISWARYA

    2017-09-01

    Full Text Available Word sense disambiguation is an important task in Natural Language Processing (NLP, and this paper concentrates on the problem of target word selection in machine translation. The proposed method called enhanced Word Sense Disambiguation with Part-of-Speech and Clustering based Sensecollocation (WSDPCS consists of two steps namely (i Part-of-Speech (POS tagger in disambiguating word senses and (ii Enhanced with Clustering and Sense-collocation dictionary based disambiguation. In the first step an ambiguous Tamil words are disambiguated using Tamil and English POS Tagger. If it has same type of POS category labels, then it passes the word to the next step. In the second step ambiguity is resolved using sense-collocation dictionary. The experimental analysis shows that the accuracy of proposed WSDPCS method achieves 1.86% improvement over an existing method.

  17. LOD-based clustering techniques for efficient large-scale terrain storage and visualization

    Science.gov (United States)

    Bao, Xiaohong; Pajarola, Renato

    2003-05-01

    Large multi-resolution terrain data sets are usually stored out-of-core. To visualize terrain data at interactive frame rates, the data needs to be organized on disk, loaded into main memory part by part, then rendered efficiently. Many main-memory algorithms have been proposed for efficient vertex selection and mesh construction. Organization of terrain data on disk is quite difficult because the error, the triangulation dependency and the spatial location of each vertex all need to be considered. Previous terrain clustering algorithms did not consider the per-vertex approximation error of individual terrain data sets. Therefore, the vertex sequences on disk are exactly the same for any terrain. In this paper, we propose a novel clustering algorithm which introduces the level-of-detail (LOD) information to terrain data organization to map multi-resolution terrain data to external memory. In our approach the LOD parameters of the terrain elevation points are reflected during clustering. The experiments show that dynamic loading and paging of terrain data at varying LOD is very efficient and minimizes page faults. Additionally, the preprocessing of this algorithm is very fast and works from out-of-core.

  18. Hybrid image classification technique for land-cover mapping in the Arctic tundra, North Slope, Alaska

    Science.gov (United States)

    Chaudhuri, Debasish

    Remotely sensed image classification techniques are very useful to understand vegetation patterns and species combination in the vast and mostly inaccessible arctic region. Previous researches that were done for mapping of land cover and vegetation in the remote areas of northern Alaska have considerably low accuracies compared to other biomes. The unique arctic tundra environment with short growing season length, cloud cover, low sun angles, snow and ice cover hinders the effectiveness of remote sensing studies. The majority of image classification research done in this area as reported in the literature used traditional unsupervised clustering technique with Landsat MSS data. It was also emphasized by previous researchers that SPOT/HRV-XS data lacked the spectral resolution to identify the small arctic tundra vegetation parcels. Thus, there is a motivation and research need to apply a new classification technique to develop an updated, detailed and accurate vegetation map at a higher spatial resolution i.e. SPOT-5 data. Traditional classification techniques in remotely sensed image interpretation are based on spectral reflectance values with an assumption of the training data being normally distributed. Hence it is difficult to add ancillary data in classification procedures to improve accuracy. The purpose of this dissertation was to develop a hybrid image classification approach that effectively integrates ancillary information into the classification process and combines ISODATA clustering, rule-based classifier and the Multilayer Perceptron (MLP) classifier which uses artificial neural network (ANN). The main goal was to find out the best possible combination or sequence of classifiers for typically classifying tundra type vegetation that yields higher accuracy than the existing classified vegetation map from SPOT data. Unsupervised ISODATA clustering and rule-based classification techniques were combined to produce an intermediate classified map which was

  19. Proximity gettering technology for advanced CMOS image sensors using carbon cluster ion-implantation technique. A review

    Energy Technology Data Exchange (ETDEWEB)

    Kurita, Kazunari; Kadono, Takeshi; Okuyama, Ryousuke; Shigemastu, Satoshi; Hirose, Ryo; Onaka-Masada, Ayumi; Koga, Yoshihiro; Okuda, Hidehiko [SUMCO Corporation, Saga (Japan)

    2017-07-15

    A new technique is described for manufacturing advanced silicon wafers with the highest capability yet reported for gettering transition metallic, oxygen, and hydrogen impurities in CMOS image sensor fabrication processes. Carbon and hydrogen elements are localized in the projection range of the silicon wafer by implantation of ion clusters from a hydrocarbon molecular gas source. Furthermore, these wafers can getter oxygen impurities out-diffused to device active regions from a Czochralski grown silicon wafer substrate to the carbon cluster ion projection range during heat treatment. Therefore, they can reduce the formation of transition metals and oxygen-related defects in the device active regions and improve electrical performance characteristics, such as the dark current, white spot defects, pn-junction leakage current, and image lag characteristics. The new technique enables the formation of high-gettering-capability sinks for transition metals, oxygen, and hydrogen impurities under device active regions of CMOS image sensors. The wafers formed by this technique have the potential to significantly improve electrical devices performance characteristics in advanced CMOS image sensors. (copyright 2017 WILEY-VCH Verlag GmbH and Co. KGaA, Weinheim)

  20. Concept formation knowledge and experience in unsupervised learning

    CERN Document Server

    Fisher, Douglas H; Langley, Pat

    1991-01-01

    Concept Formation: Knowledge and Experience in Unsupervised Learning presents the interdisciplinary interaction between machine learning and cognitive psychology on unsupervised incremental methods. This book focuses on measures of similarity, strategies for robust incremental learning, and the psychological consistency of various approaches.Organized into three parts encompassing 15 chapters, this book begins with an overview of inductive concept learning in machine learning and psychology, with emphasis on issues that distinguish concept formation from more prevalent supervised methods and f

  1. A comparative study of dimensionality reduction techniques to enhance trace clustering performances

    NARCIS (Netherlands)

    Song, M.S.; Yang, H.; Siadat, S.H.; Pechenizkiy, M.

    2013-01-01

    Process mining techniques have been used to analyze event logs from information systems in order to derive useful patterns. However, in the big data era, real-life event logs are huge, unstructured, and complex so that traditional process mining techniques have difficulties in the analysis of big

  2. Unsupervised grammar induction of clinical report sublanguage.

    Science.gov (United States)

    Kate, Rohit J

    2012-10-05

    Clinical reports are written using a subset of natural language while employing many domain-specific terms; such a language is also known as a sublanguage for a scientific or a technical domain. Different genres of clinical reports use different sublaguages, and in addition, different medical facilities use different medical language conventions. This makes supervised training of a parser for clinical sentences very difficult as it would require expensive annotation effort to adapt to every type of clinical text. In this paper, we present an unsupervised method which automatically induces a grammar and a parser for the sublanguage of a given genre of clinical reports from a corpus with no annotations. In order to capture sentence structures specific to clinical domains, the grammar is induced in terms of semantic classes of clinical terms in addition to part-of-speech tags. Our method induces grammar by minimizing the combined encoding cost of the grammar and the corresponding sentence derivations. The probabilities for the productions of the induced grammar are then learned from the unannotated corpus using an instance of the expectation-maximization algorithm. Our experiments show that the induced grammar is able to parse novel sentences. Using a dataset of discharge summary sentences with no annotations, our method obtains 60.5% F-measure for parse-bracketing on sentences of maximum length 10. By varying a parameter, the method can induce a range of grammars, from very specific to very general, and obtains the best performance in between the two extremes.

  3. Unsupervised Retinal Vessel Segmentation Using Combined Filters.

    Directory of Open Access Journals (Sweden)

    Wendeson S Oliveira

    Full Text Available Image segmentation of retinal blood vessels is a process that can help to predict and diagnose cardiovascular related diseases, such as hypertension and diabetes, which are known to affect the retinal blood vessels' appearance. This work proposes an unsupervised method for the segmentation of retinal vessels images using a combined matched filter, Frangi's filter and Gabor Wavelet filter to enhance the images. The combination of these three filters in order to improve the segmentation is the main motivation of this work. We investigate two approaches to perform the filter combination: weighted mean and median ranking. Segmentation methods are tested after the vessel enhancement. Enhanced images with median ranking are segmented using a simple threshold criterion. Two segmentation procedures are applied when considering enhanced retinal images using the weighted mean approach. The first method is based on deformable models and the second uses fuzzy C-means for the image segmentation. The procedure is evaluated using two public image databases, Drive and Stare. The experimental results demonstrate that the proposed methods perform well for vessel segmentation in comparison with state-of-the-art methods.

  4. Flexible manifold embedding: a framework for semi-supervised and unsupervised dimension reduction.

    Science.gov (United States)

    Nie, Feiping; Xu, Dong; Tsang, Ivor Wai-Hung; Zhang, Changshui

    2010-07-01

    We propose a unified manifold learning framework for semi-supervised and unsupervised dimension reduction by employing a simple but effective linear regression function to map the new data points. For semi-supervised dimension reduction, we aim to find the optimal prediction labels F for all the training samples X, the linear regression function h(X) and the regression residue F(0) = F - h(X) simultaneously. Our new objective function integrates two terms related to label fitness and manifold smoothness as well as a flexible penalty term defined on the residue F(0). Our Semi-Supervised learning framework, referred to as flexible manifold embedding (FME), can effectively utilize label information from labeled data as well as a manifold structure from both labeled and unlabeled data. By modeling the mismatch between h(X) and F, we show that FME relaxes the hard linear constraint F = h(X) in manifold regularization (MR), making it better cope with the data sampled from a nonlinear manifold. In addition, we propose a simplified version (referred to as FME/U) for unsupervised dimension reduction. We also show that our proposed framework provides a unified view to explain and understand many semi-supervised, supervised and unsupervised dimension reduction techniques. Comprehensive experiments on several benchmark databases demonstrate the significant improvement over existing dimension reduction algorithms.

  5. Evaluating unsupervised methods to size and classify suspended particles using digital in-line holography

    Science.gov (United States)

    Davies, Emlyn J.; Buscombe, Daniel D.; Graham, George W.; Nimmo-Smith, W. Alex M.

    2015-01-01

    Substantial information can be gained from digital in-line holography of marine particles, eliminating depth-of-field and focusing errors associated with standard lens-based imaging methods. However, for the technique to reach its full potential in oceanographic research, fully unsupervised (automated) methods are required for focusing, segmentation, sizing and classification of particles. These computational challenges are the subject of this paper, in which we draw upon data collected using a variety of holographic systems developed at Plymouth University, UK, from a significant range of particle types, sizes and shapes. A new method for noise reduction in reconstructed planes is found to be successful in aiding particle segmentation and sizing. The performance of an automated routine for deriving particle characteristics (and subsequent size distributions) is evaluated against equivalent size metrics obtained by a trained operative measuring grain axes on screen. The unsupervised method is found to be reliable, despite some errors resulting from over-segmentation of particles. A simple unsupervised particle classification system is developed, and is capable of successfully differentiating sand grains, bubbles and diatoms from within the surf-zone. Avoiding miscounting bubbles and biological particles as sand grains enables more accurate estimates of sand concentrations, and is especially important in deployments of particle monitoring instrumentation in aerated water. Perhaps the greatest potential for further development in the computational aspects of particle holography is in the area of unsupervised particle classification. The simple method proposed here provides a foundation upon which further development could lead to reliable identification of more complex particle populations, such as those containing phytoplankton, zooplankton, flocculated cohesive sediments and oil droplets.

  6. Unsupervised learning of binary vectors: A Gaussian scenario

    International Nuclear Information System (INIS)

    Copelli, Mauro; Van den Broeck, Christian

    2000-01-01

    We study a model of unsupervised learning where the real-valued data vectors are isotropically distributed, except for a single symmetry-breaking binary direction B(set-membership sign){-1,+1} N , onto which the projections have a Gaussian distribution. We show that a candidate vector J undergoing Gibbs learning in this discrete space, approaches the perfect match J=B exponentially. In addition to the second-order ''retarded learning'' phase transition for unbiased distributions, we show that first-order transitions can also occur. Extending the known result that the center of mass of the Gibbs ensemble has Bayes-optimal performance, we show that taking the sign of the components of this vector (clipping) leads to the vector with optimal performance in the binary space. These upper bounds are shown generally not to be saturated with the technique of transforming the components of a special continuous vector, except in asymptotic limits and in a special linear case. Simulations are presented which are in excellent agreement with the theoretical results. (c) 2000 The American Physical Society

  7. Unsupervised Learning and Pattern Recognition of Biological Data Structures with Density Functional Theory and Machine Learning.

    Science.gov (United States)

    Chen, Chien-Chang; Juan, Hung-Hui; Tsai, Meng-Yuan; Lu, Henry Horng-Shing

    2018-01-11

    By introducing the methods of machine learning into the density functional theory, we made a detour for the construction of the most probable density function, which can be estimated by learning relevant features from the system of interest. Using the properties of universal functional, the vital core of density functional theory, the most probable cluster numbers and the corresponding cluster boundaries in a studying system can be simultaneously and automatically determined and the plausibility is erected on the Hohenberg-Kohn theorems. For the method validation and pragmatic applications, interdisciplinary problems from physical to biological systems were enumerated. The amalgamation of uncharged atomic clusters validated the unsupervised searching process of the cluster numbers and the corresponding cluster boundaries were exhibited likewise. High accurate clustering results of the Fisher's iris dataset showed the feasibility and the flexibility of the proposed scheme. Brain tumor detections from low-dimensional magnetic resonance imaging datasets and segmentations of high-dimensional neural network imageries in the Brainbow system were also used to inspect the method practicality. The experimental results exhibit the successful connection between the physical theory and the machine learning methods and will benefit the clinical diagnoses.

  8. CLUSTERING TECHNIQUES IN FINANCIAL DATA ANALYSIS APPLICATIONS ON THE U.S. FINANCIAL MARKET

    Directory of Open Access Journals (Sweden)

    ALEXANDRU BOGEANU

    2013-08-01

    Full Text Available In the economic and financial analysis, the need to classify companies in terms of categories, thedelimitation of which has to be clear and natural occurs frequently. The differentiation of companies bycategories is performed according to the economic and financial indicators which are associated to the above.The clustering algorithms are a very powerful tool in identifying the classes of companies based on theinformation provided by the indicators associated to them. The last decade imposed to the economic andfinancial practice the use of economic value added as an indicator of synthesis of the entire activity of acompany. Our study uses a sample of 106 companies in four different fields of activity; each company isidentified by: Economic Value Added, Net Income, Current Sales, Equity and Stock Price. Using the ascendinghierarchical classification methods and the partitioning classification methods, as well as Ward’s method and kmeansalgorithm, we identified on the considered sample an information structure consisting of 5 rating classes.

  9. Application of multi-element clustering techniques of five Egyptian industrial sugar products

    International Nuclear Information System (INIS)

    Awadallah, R.M.; Mohamed, A.E.

    1995-01-01

    The concentration of 18 elements in different cane sugar products, i.e., cane sugar plants, crude and syrup juices, molasses, and the end products of the consumer sugar, were analyzed and processed. The samples were collected from five cities, i.e., Kom Ombo, Edfu, Armant, Deshna and Naga Hammady in Upper Egypt where the main Egyptian sugar industry factories are located. INAA was applied for the determination of Al, Ca, Cl, Co, Cr, Fe, Mg, Mn, Na, and Sc, while Cu, Li, P, Sn, V and Zn were determined by ICP-AES and Pb and As were determined by AAS. These three analytical methods were applied to optimize the sensitivity and the accuracy of the measurements in order to provide a sound basis for the obtention of reliable clustering results. (author). 5 refs., 8 figs., 3 tabs

  10. Ultra-Wideband Geo-Regioning: A Novel Clustering and Localization Technique

    Directory of Open Access Journals (Sweden)

    Armin Wittneben

    2007-12-01

    Full Text Available Ultra-wideband (UWB technology enables a high temporal resolution of the propagation channel. Consequently, a channel impulse response between transmitter and receiver can be interpreted as signature for their relative positions. If the position of the receiver is known, the channel impulse response indicates the position of the transmitter and vice versa. This work introduces UWB geo-regioning as a clustering and localization method based on channel impulse response fingerprinting, develops a theoretical framework for performance analysis, and evaluates this approach by means of performance results based on measured channel impulse responses. Complexity issues are discussed and performance dependencies on signal-to-noise ratio, a priori knowledge, observation window, and system bandwidth are investigated.

  11. Unsupervised grammar induction of clinical report sublanguage

    Directory of Open Access Journals (Sweden)

    Kate Rohit J

    2012-10-01

    Full Text Available Abstract Background Clinical reports are written using a subset of natural language while employing many domain-specific terms; such a language is also known as a sublanguage for a scientific or a technical domain. Different genres of clinical reports use different sublaguages, and in addition, different medical facilities use different medical language conventions. This makes supervised training of a parser for clinical sentences very difficult as it would require expensive annotation effort to adapt to every type of clinical text. Methods In this paper, we present an unsupervised method which automatically induces a grammar and a parser for the sublanguage of a given genre of clinical reports from a corpus with no annotations. In order to capture sentence structures specific to clinical domains, the grammar is induced in terms of semantic classes of clinical terms in addition to part-of-speech tags. Our method induces grammar by minimizing the combined encoding cost of the grammar and the corresponding sentence derivations. The probabilities for the productions of the induced grammar are then learned from the unannotated corpus using an instance of the expectation-maximization algorithm. Results Our experiments show that the induced grammar is able to parse novel sentences. Using a dataset of discharge summary sentences with no annotations, our method obtains 60.5% F-measure for parse-bracketing on sentences of maximum length 10. By varying a parameter, the method can induce a range of grammars, from very specific to very general, and obtains the best performance in between the two extremes.

  12. Reduction of the uncertainty due to fissile clusters in radioactive waste characterization with the Differential Die-away Technique

    Science.gov (United States)

    Antoni, R.; Passard, C.; Perot, B.; Guillaumin, F.; Mazy, C.; Batifol, M.; Grassi, G.

    2018-07-01

    AREVA NC is preparing to process, characterize and compact old used fuel metallic waste stored at La Hague reprocessing plant in view of their future storage ("Haute Activité Oxyde" HAO project). For a large part of these historical wastes, the packaging is planned in CSD-C canisters ("Colis Standard de Déchets Compacté s") in the ACC hulls and nozzles compaction facility ("Atelier de Compactage des Coques et embouts"). . This paper presents a new method to take into account the possible presence of fissile material clusters, which may have a significant impact in the active neutron interrogation (Differential Die-away Technique) measurement of the CSD-C canisters, in the industrial neutron measurement station "P2-2". A matrix effect correction has already been investigated to predict the prompt fission neutron calibration coefficient (which provides the fissile mass) from an internal "drum flux monitor" signal provided during the active measurement by a boron-coated proportional counter located in the measurement cavity, and from a "drum transmission signal" recorded in passive mode by the detection blocks, in presence of an AmBe point source in the measurement cell. Up to now, the relationship between the calibration coefficient and these signals was obtained from a factorial design that did not consider the potential for occurrence of fissile material clusters. The interrogative neutron self-shielding in these clusters was treated separately and resulted in a penalty coefficient larger than 20% to prevent an underestimation of the fissile mass within the drum. In this work, we have shown that the incorporation of a new parameter in the factorial design, representing the fissile mass fraction in these clusters, provides an alternative to the penalty coefficient. This new approach finally does not degrade the uncertainty of the original prediction, which was calculated without taking into consideration the possible presence of clusters. Consequently, the

  13. Statistical mechanics of semi-supervised clustering in sparse graphs

    International Nuclear Information System (INIS)

    Ver Steeg, Greg; Galstyan, Aram; Allahverdyan, Armen E

    2011-01-01

    We theoretically study semi-supervised clustering in sparse graphs in the presence of pair-wise constraints on the cluster assignments of nodes. We focus on bi-cluster graphs and study the impact of semi-supervision for varying constraint density and overlap between the clusters. Recent results for unsupervised clustering in sparse graphs indicate that there is a critical ratio of within-cluster and between-cluster connectivities below which clusters cannot be recovered with better than random accuracy. The goal of this paper is to examine the impact of pair-wise constraints on the clustering accuracy. Our results suggest that the addition of constraints does not provide automatic improvement over the unsupervised case. When the density of the constraints is sufficiently small, their only impact is to shift the detection threshold while preserving the criticality. Conversely, if the density of (hard) constraints is above the percolation threshold, the criticality is suppressed and the detection threshold disappears

  14. UNSUPERVISED TRANSIENT LIGHT CURVE ANALYSIS VIA HIERARCHICAL BAYESIAN INFERENCE

    Energy Technology Data Exchange (ETDEWEB)

    Sanders, N. E.; Soderberg, A. M. [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138 (United States); Betancourt, M., E-mail: nsanders@cfa.harvard.edu [Department of Statistics, University of Warwick, Coventry CV4 7AL (United Kingdom)

    2015-02-10

    Historically, light curve studies of supernovae (SNe) and other transient classes have focused on individual objects with copious and high signal-to-noise observations. In the nascent era of wide field transient searches, objects with detailed observations are decreasing as a fraction of the overall known SN population, and this strategy sacrifices the majority of the information contained in the data about the underlying population of transients. A population level modeling approach, simultaneously fitting all available observations of objects in a transient sub-class of interest, fully mines the data to infer the properties of the population and avoids certain systematic biases. We present a novel hierarchical Bayesian statistical model for population level modeling of transient light curves, and discuss its implementation using an efficient Hamiltonian Monte Carlo technique. As a test case, we apply this model to the Type IIP SN sample from the Pan-STARRS1 Medium Deep Survey, consisting of 18,837 photometric observations of 76 SNe, corresponding to a joint posterior distribution with 9176 parameters under our model. Our hierarchical model fits provide improved constraints on light curve parameters relevant to the physical properties of their progenitor stars relative to modeling individual light curves alone. Moreover, we directly evaluate the probability for occurrence rates of unseen light curve characteristics from the model hyperparameters, addressing observational biases in survey methodology. We view this modeling framework as an unsupervised machine learning technique with the ability to maximize scientific returns from data to be collected by future wide field transient searches like LSST.

  15. UNSUPERVISED TRANSIENT LIGHT CURVE ANALYSIS VIA HIERARCHICAL BAYESIAN INFERENCE

    International Nuclear Information System (INIS)

    Sanders, N. E.; Soderberg, A. M.; Betancourt, M.

    2015-01-01

    Historically, light curve studies of supernovae (SNe) and other transient classes have focused on individual objects with copious and high signal-to-noise observations. In the nascent era of wide field transient searches, objects with detailed observations are decreasing as a fraction of the overall known SN population, and this strategy sacrifices the majority of the information contained in the data about the underlying population of transients. A population level modeling approach, simultaneously fitting all available observations of objects in a transient sub-class of interest, fully mines the data to infer the properties of the population and avoids certain systematic biases. We present a novel hierarchical Bayesian statistical model for population level modeling of transient light curves, and discuss its implementation using an efficient Hamiltonian Monte Carlo technique. As a test case, we apply this model to the Type IIP SN sample from the Pan-STARRS1 Medium Deep Survey, consisting of 18,837 photometric observations of 76 SNe, corresponding to a joint posterior distribution with 9176 parameters under our model. Our hierarchical model fits provide improved constraints on light curve parameters relevant to the physical properties of their progenitor stars relative to modeling individual light curves alone. Moreover, we directly evaluate the probability for occurrence rates of unseen light curve characteristics from the model hyperparameters, addressing observational biases in survey methodology. We view this modeling framework as an unsupervised machine learning technique with the ability to maximize scientific returns from data to be collected by future wide field transient searches like LSST

  16. Glaucomatous patterns in Frequency Doubling Technology (FDT) perimetry data identified by unsupervised machine learning classifiers.

    Science.gov (United States)

    Bowd, Christopher; Weinreb, Robert N; Balasubramanian, Madhusudhanan; Lee, Intae; Jang, Giljin; Yousefi, Siamak; Zangwill, Linda M; Medeiros, Felipe A; Girkin, Christopher A; Liebmann, Jeffrey M; Goldbaum, Michael H

    2014-01-01

    The variational Bayesian independent component analysis-mixture model (VIM), an unsupervised machine-learning classifier, was used to automatically separate Matrix Frequency Doubling Technology (FDT) perimetry data into clusters of healthy and glaucomatous eyes, and to identify axes representing statistically independent patterns of defect in the glaucoma clusters. FDT measurements were obtained from 1,190 eyes with normal FDT results and 786 eyes with abnormal FDT results from the UCSD-based Diagnostic Innovations in Glaucoma Study (DIGS) and African Descent and Glaucoma Evaluation Study (ADAGES). For all eyes, VIM input was 52 threshold test points from the 24-2 test pattern, plus age. FDT mean deviation was -1.00 dB (S.D. = 2.80 dB) and -5.57 dB (S.D. = 5.09 dB) in FDT-normal eyes and FDT-abnormal eyes, respectively (p<0.001). VIM identified meaningful clusters of FDT data and positioned a set of statistically independent axes through the mean of each cluster. The optimal VIM model separated the FDT fields into 3 clusters. Cluster N contained primarily normal fields (1109/1190, specificity 93.1%) and clusters G1 and G2 combined, contained primarily abnormal fields (651/786, sensitivity 82.8%). For clusters G1 and G2 the optimal number of axes were 2 and 5, respectively. Patterns automatically generated along axes within the glaucoma clusters were similar to those known to be indicative of glaucoma. Fields located farther from the normal mean on each glaucoma axis showed increasing field defect severity. VIM successfully separated FDT fields from healthy and glaucoma eyes without a priori information about class membership, and identified familiar glaucomatous patterns of loss.

  17. Updated teaching techniques improve CPR performance measures: a cluster randomized, controlled trial.

    Science.gov (United States)

    Ettl, Florian; Testori, Christoph; Weiser, Christoph; Fleischhackl, Sabine; Mayer-Stickler, Monika; Herkner, Harald; Schreiber, Wolfgang; Fleischhackl, Roman

    2011-06-01

    The first-aid training necessary for obtaining a drivers license in Austria has a regulated and predefined curriculum but has been targeted for the implementation of a new course structure with less theoretical input, repetitive training in cardiopulmonary resuscitation (CPR) and structured presentations using innovative media. The standard and a new course design were compared with a prospective, participant- and observer-blinded, cluster-randomized controlled study. Six months after the initial training, we evaluated the confidence of the 66 participants in their skills, CPR effectiveness parameters and correctness of their actions. The median self-confidence was significantly higher in the interventional group [IG, visual analogue scale (VAS:"0" not-confident at all,"100" highly confident):57] than in the control group (CG, VAS:41). The mean chest compression rate in the IG (98/min) was closer to the recommended 100 bpm than in the CG (110/min). The time to the first chest compression (IG:25s, CG:36s) and time to first defibrillator shock (IG:86s, CG:92s) were significantly shorter in the IG. Furthermore, the IG participants were safer in their handling of the defibrillator and started with countermeasures against developing shock more often. The management of an unconscious person and of heavy bleeding did not show a difference between the two groups even after shortening the lecture time. Motivation and self-confidence as well as skill retention after six months were shown to be dependent on the teaching methods and the time for practical training. Courses may be reorganized and content rescheduled, even within predefined curricula, to improve course outcomes. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  18. Regionalizing Aquatic Ecosystems Based on the River Subbasin Taxonomy Concept and Spatial Clustering Techniques

    Directory of Open Access Journals (Sweden)

    Jiahu Zhao

    2011-11-01

    Full Text Available Aquatic ecoregions were increasingly used as spatial units for aquatic ecosystem management at the watershed scale. In this paper, the principle of including land area, comprehensiveness and dominance, conjugation and hierarchy were selected as regionalizing principles. Elevation and drainage density were selected as the regionalizing indicators for the delineation of level I aquatic ecoregions, and percent of construction land area, percent of cultivated land area, soil type and slope for the level II. Under the support of GIS technology, the spatial distribution maps of the two indicators for level I and the four indicators for level II aquatic ecoregion delineation were generated from the raster data based on the 1,107 subwatersheds. River subbasin taxonomy concept, two-step spatial clustering analysis approach and manual-assisted method were used to regionalize aquatic ecosystems in the Taihu Lake watershed. Then the Taihu Lake watershed was divided into two level I aquatic ecoregions, including Ecoregion I1 and Ecoregion I2, and five level II aquatic subecoregions, including Subecoregion II11, Subecoregion II12, Subecoregion II21, Subecoregion II22 and Subecoregion II23. Moreover, the characteristics of the two level I aquatic ecoregions and five level II aquatic subecoregions in the Taihu Lake watershed were summarized, showing that there were significant differences in topography, socio-economic development, water quality and aquatic ecology, etc. The results of quantitative comparison of aquatic life also indicated that the dominant species of fish, benthic density, biomass, dominant species, Shannon-Wiener diversity index, Margalef species richness index, Pielou evenness index and ecological dominance showed great spatial variability between the two level I aquatic ecoregions and five level II aquatic subecoregions. It reflected the spatial heterogeneities and the uneven natures of aquatic ecosystems in the Taihu Lake watershed.

  19. Investigation of Cu(In,Ga)Se{sub 2} using Monte Carlo and the cluster expansion technique

    Energy Technology Data Exchange (ETDEWEB)

    Ludwig, Christian D.R.; Gruhn, Thomas; Felser, Claudia [Institute of Inorganic and Analytical Chemistry, Johannes Gutenberg-University, Mainz (Germany); Windeln, Johannes [IBM Germany, Mgr. Technology Center ISC EMEA, Mainz (Germany)

    2010-07-01

    CIGS based solar cells are among the most promising thin-film techniques for cheap, yet efficient modules. They have been investigated for many years, but the full potential of CIGS cells has not yet been exhausted and many effects are not understood. For instance, the band gap of the absorber material Cu(In,Ga)Se{sub 2} varies with Ga content. The question why solar cells with high Ga content have low efficiencies, despite the fact that the band gap should have the optimum value, is still unanswered. We are using Monte Carlo simulations in combination with a cluster expansion to investigate the homogeneity of the In-Ga distribution as a possible cause of the low efficiency of cells with high Ga content. The cluster expansion is created by a fit to ab initio electronic structure energies. The results we found are crucial for the processing of solar cells, shed light on structural properties and give hints on how to significantly improve solar cell performance. Above the transition temperature from the separated to the mixed phase, we observe different sizes of the In and Ga domains for a given temperature. The In domains in the Ga-rich compound are smaller and less abundant than the Ga domains in the In-rich compound. This translates into the Ga-rich material being less homogeneous.

  20. Histopathological Breast Cancer Image Classification by Deep Neural Network Techniques Guided by Local Clustering.

    Science.gov (United States)

    Nahid, Abdullah-Al; Mehrabi, Mohamad Ali; Kong, Yinan

    2018-01-01

    Breast Cancer is a serious threat and one of the largest causes of death of women throughout the world. The identification of cancer largely depends on digital biomedical photography analysis such as histopathological images by doctors and physicians. Analyzing histopathological images is a nontrivial task, and decisions from investigation of these kinds of images always require specialised knowledge. However, Computer Aided Diagnosis (CAD) techniques can help the doctor make more reliable decisions. The state-of-the-art Deep Neural Network (DNN) has been recently introduced for biomedical image analysis. Normally each image contains structural and statistical information. This paper classifies a set of biomedical breast cancer images (BreakHis dataset) using novel DNN techniques guided by structural and statistical information derived from the images. Specifically a Convolutional Neural Network (CNN), a Long-Short-Term-Memory (LSTM), and a combination of CNN and LSTM are proposed for breast cancer image classification. Softmax and Support Vector Machine (SVM) layers have been used for the decision-making stage after extracting features utilising the proposed novel DNN models. In this experiment the best Accuracy value of 91.00% is achieved on the 200x dataset, the best Precision value 96.00% is achieved on the 40x dataset, and the best F -Measure value is achieved on both the 40x and 100x datasets.

  1. Evaluation of primary immunization coverage of infants under universal immunization programme in an urban area of Bangalore city using cluster sampling and lot quality assurance sampling techniques

    Directory of Open Access Journals (Sweden)

    Punith K

    2008-01-01

    Full Text Available Research Question: Is LQAS technique better than cluster sampling technique in terms of resources to evaluate the immunization coverage in an urban area? Objective: To assess and compare the lot quality assurance sampling against cluster sampling in the evaluation of primary immunization coverage. Study Design: Population-based cross-sectional study. Study Setting: Areas under Mathikere Urban Health Center. Study Subjects: Children aged 12 months to 23 months. Sample Size: 220 in cluster sampling, 76 in lot quality assurance sampling. Statistical Analysis: Percentages and Proportions, Chi square Test. Results: (1 Using cluster sampling, the percentage of completely immunized, partially immunized and unimmunized children were 84.09%, 14.09% and 1.82%, respectively. With lot quality assurance sampling, it was 92.11%, 6.58% and 1.31%, respectively. (2 Immunization coverage levels as evaluated by cluster sampling technique were not statistically different from the coverage value as obtained by lot quality assurance sampling techniques. Considering the time and resources required, it was found that lot quality assurance sampling is a better technique in evaluating the primary immunization coverage in urban area.

  2. Evaluation of primary immunization coverage of infants under universal immunization programme in an urban area of bangalore city using cluster sampling and lot quality assurance sampling techniques.

    Science.gov (United States)

    K, Punith; K, Lalitha; G, Suman; Bs, Pradeep; Kumar K, Jayanth

    2008-07-01

    Is LQAS technique better than cluster sampling technique in terms of resources to evaluate the immunization coverage in an urban area? To assess and compare the lot quality assurance sampling against cluster sampling in the evaluation of primary immunization coverage. Population-based cross-sectional study. Areas under Mathikere Urban Health Center. Children aged 12 months to 23 months. 220 in cluster sampling, 76 in lot quality assurance sampling. Percentages and Proportions, Chi square Test. (1) Using cluster sampling, the percentage of completely immunized, partially immunized and unimmunized children were 84.09%, 14.09% and 1.82%, respectively. With lot quality assurance sampling, it was 92.11%, 6.58% and 1.31%, respectively. (2) Immunization coverage levels as evaluated by cluster sampling technique were not statistically different from the coverage value as obtained by lot quality assurance sampling techniques. Considering the time and resources required, it was found that lot quality assurance sampling is a better technique in evaluating the primary immunization coverage in urban area.

  3. An Extension of the Fuzzy Possibilistic Clustering Algorithm Using Type-2 Fuzzy Logic Techniques

    Directory of Open Access Journals (Sweden)

    Elid Rubio

    2017-01-01

    Full Text Available In this work an extension of the Fuzzy Possibilistic C-Means (FPCM algorithm using Type-2 Fuzzy Logic Techniques is presented, and this is done in order to improve the efficiency of FPCM algorithm. With the purpose of observing the performance of the proposal against the Interval Type-2 Fuzzy C-Means algorithm, several experiments were made using both algorithms with well-known datasets, such as Wine, WDBC, Iris Flower, Ionosphere, Abalone, and Cover type. In addition some experiments were performed using another set of test images to observe the behavior of both of the above-mentioned algorithms in image preprocessing. Some comparisons are performed between the proposed algorithm and the Interval Type-2 Fuzzy C-Means (IT2FCM algorithm to observe if the proposed approach has better performance than this algorithm.

  4. Deep Unsupervised Learning on a Desktop PC: A Primer for Cognitive Scientists.

    Science.gov (United States)

    Testolin, Alberto; Stoianov, Ivilin; De Filippo De Grazia, Michele; Zorzi, Marco

    2013-01-01

    Deep belief networks hold great promise for the simulation of human cognition because they show how structured and abstract representations may emerge from probabilistic unsupervised learning. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. However, learning in deep networks typically requires big datasets and it can involve millions of connection weights, which implies that simulations on standard computers are unfeasible. Developing realistic, medium-to-large-scale learning models of cognition would therefore seem to require expertise in programing parallel-computing hardware, and this might explain why the use of this promising approach is still largely confined to the machine learning community. Here we show how simulations of deep unsupervised learning can be easily performed on a desktop PC by exploiting the processors of low cost graphic cards (graphic processor units) without any specific programing effort, thanks to the use of high-level programming routines (available in MATLAB or Python). We also show that even an entry-level graphic card can outperform a small high-performance computing cluster in terms of learning time and with no loss of learning quality. We therefore conclude that graphic card implementations pave the way for a widespread use of deep learning among cognitive scientists for modeling cognition and behavior.

  5. Deep unsupervised learning on a desktop PC: A primer for cognitive scientists

    Directory of Open Access Journals (Sweden)

    Alberto eTestolin

    2013-05-01

    Full Text Available Deep belief networks hold great promise for the simulation of human cognition because they show how structured and abstract representations may emerge from probabilistic unsupervised learning. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. However, learning in deep networks typically requires big datasets and it can involve millions of connection weights, which implies that simulations on standard computers are unfeasible. Developing realistic, medium-to-large-scale learning models of cognition would therefore seem to require expertise in programming parallel-computing hardware, and this might explain why the use of this promising approach is still largely confined to the machine learning community. Here we show how simulations of deep unsupervised learning can be easily performed on a desktop PC by exploiting the processors of low-cost graphic cards (GPUs without any specific programming effort, thanks to the use of high-level programming routines (available in MATLAB or Python. We also show that even an entry-level graphic card can outperform a small high-performance computing cluster in terms of learning time and with no loss of learning quality. We therefore conclude that graphic card implementations pave the way for a widespread use of deep learning among cognitive scientists for modeling cognition and behavior.

  6. GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge.

    Science.gov (United States)

    Wagner, Florian

    2015-01-01

    Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets.

  7. Deep Unsupervised Learning on a Desktop PC: A Primer for Cognitive Scientists

    Science.gov (United States)

    Testolin, Alberto; Stoianov, Ivilin; De Filippo De Grazia, Michele; Zorzi, Marco

    2013-01-01

    Deep belief networks hold great promise for the simulation of human cognition because they show how structured and abstract representations may emerge from probabilistic unsupervised learning. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. However, learning in deep networks typically requires big datasets and it can involve millions of connection weights, which implies that simulations on standard computers are unfeasible. Developing realistic, medium-to-large-scale learning models of cognition would therefore seem to require expertise in programing parallel-computing hardware, and this might explain why the use of this promising approach is still largely confined to the machine learning community. Here we show how simulations of deep unsupervised learning can be easily performed on a desktop PC by exploiting the processors of low cost graphic cards (graphic processor units) without any specific programing effort, thanks to the use of high-level programming routines (available in MATLAB or Python). We also show that even an entry-level graphic card can outperform a small high-performance computing cluster in terms of learning time and with no loss of learning quality. We therefore conclude that graphic card implementations pave the way for a widespread use of deep learning among cognitive scientists for modeling cognition and behavior. PMID:23653617

  8. Shadow detection and removal in RGB VHR images for land use unsupervised classification

    Science.gov (United States)

    Movia, A.; Beinat, A.; Crosilla, F.

    2016-09-01

    Nowadays, high resolution aerial images are widely available thanks to the diffusion of advanced technologies such as UAVs (Unmanned Aerial Vehicles) and new satellite missions. Although these developments offer new opportunities for accurate land use analysis and change detection, cloud and terrain shadows actually limit benefits and possibilities of modern sensors. Focusing on the problem of shadow detection and removal in VHR color images, the paper proposes new solutions and analyses how they can enhance common unsupervised classification procedures for identifying land use classes related to the CO2 absorption. To this aim, an improved fully automatic procedure has been developed for detecting image shadows using exclusively RGB color information, and avoiding user interaction. Results show a significant accuracy enhancement with respect to similar methods using RGB based indexes. Furthermore, novel solutions derived from Procrustes analysis have been applied to remove shadows and restore brightness in the images. In particular, two methods implementing the so called "anisotropic Procrustes" and the "not-centered oblique Procrustes" algorithms have been developed and compared with the linear correlation correction method based on the Cholesky decomposition. To assess how shadow removal can enhance unsupervised classifications, results obtained with classical methods such as k-means, maximum likelihood, and self-organizing maps, have been compared to each other and with a supervised clustering procedure.

  9. Automated age-related macular degeneration classification in OCT using unsupervised feature learning

    Science.gov (United States)

    Venhuizen, Freerk G.; van Ginneken, Bram; Bloemen, Bart; van Grinsven, Mark J. J. P.; Philipsen, Rick; Hoyng, Carel; Theelen, Thomas; Sánchez, Clara I.

    2015-03-01

    Age-related Macular Degeneration (AMD) is a common eye disorder with high prevalence in elderly people. The disease mainly affects the central part of the retina, and could ultimately lead to permanent vision loss. Optical Coherence Tomography (OCT) is becoming the standard imaging modality in diagnosis of AMD and the assessment of its progression. However, the evaluation of the obtained volumetric scan is time consuming, expensive and the signs of early AMD are easy to miss. In this paper we propose a classification method to automatically distinguish AMD patients from healthy subjects with high accuracy. The method is based on an unsupervised feature learning approach, and processes the complete image without the need for an accurate pre-segmentation of the retina. The method can be divided in two steps: an unsupervised clustering stage that extracts a set of small descriptive image patches from the training data, and a supervised training stage that uses these patches to create a patch occurrence histogram for every image on which a random forest classifier is trained. Experiments using 384 volume scans show that the proposed method is capable of identifying AMD patients with high accuracy, obtaining an area under the Receiver Operating Curve of 0:984. Our method allows for a quick and reliable assessment of the presence of AMD pathology in OCT volume scans without the need for accurate layer segmentation algorithms.

  10. Unsupervised process monitoring and fault diagnosis with machine learning methods

    CERN Document Server

    Aldrich, Chris

    2013-01-01

    This unique text/reference describes in detail the latest advances in unsupervised process monitoring and fault diagnosis with machine learning methods. Abundant case studies throughout the text demonstrate the efficacy of each method in real-world settings. The broad coverage examines such cutting-edge topics as the use of information theory to enhance unsupervised learning in tree-based methods, the extension of kernel methods to multiple kernel learning for feature extraction from data, and the incremental training of multilayer perceptrons to construct deep architectures for enhanced data

  11. Unsupervised Fault Diagnosis of a Gear Transmission Chain Using a Deep Belief Network

    Directory of Open Access Journals (Sweden)

    Jun He

    2017-07-01

    Full Text Available Artificial intelligence (AI techniques, which can effectively analyze massive amounts of fault data and automatically provide accurate diagnosis results, have been widely applied to fault diagnosis of rotating machinery. Conventional AI methods are applied using features selected by a human operator, which are manually extracted based on diagnostic techniques and field expertise. However, developing robust features for each diagnostic purpose is often labour-intensive and time-consuming, and the features extracted for one specific task may be unsuitable for others. In this paper, a novel AI method based on a deep belief network (DBN is proposed for the unsupervised fault diagnosis of a gear transmission chain, and the genetic algorithm is used to optimize the structural parameters of the network. Compared to the conventional AI methods, the proposed method can adaptively exploit robust features related to the faults by unsupervised feature learning, thus requires less prior knowledge about signal processing techniques and diagnostic expertise. Besides, it is more powerful at modelling complex structured data. The effectiveness of the proposed method is validated using datasets from rolling bearings and gearbox. To show the superiority of the proposed method, its performance is compared with two well-known classifiers, i.e., back propagation neural network (BPNN and support vector machine (SVM. The fault classification accuracies are 99.26% for rolling bearings and 100% for gearbox when using the proposed method, which are much higher than that of the other two methods.

  12. Unsupervised Fault Diagnosis of a Gear Transmission Chain Using a Deep Belief Network.

    Science.gov (United States)

    He, Jun; Yang, Shixi; Gan, Chunbiao

    2017-07-04

    Artificial intelligence (AI) techniques, which can effectively analyze massive amounts of fault data and automatically provide accurate diagnosis results, have been widely applied to fault diagnosis of rotating machinery. Conventional AI methods are applied using features selected by a human operator, which are manually extracted based on diagnostic techniques and field expertise. However, developing robust features for each diagnostic purpose is often labour-intensive and time-consuming, and the features extracted for one specific task may be unsuitable for others. In this paper, a novel AI method based on a deep belief network (DBN) is proposed for the unsupervised fault diagnosis of a gear transmission chain, and the genetic algorithm is used to optimize the structural parameters of the network. Compared to the conventional AI methods, the proposed method can adaptively exploit robust features related to the faults by unsupervised feature learning, thus requires less prior knowledge about signal processing techniques and diagnostic expertise. Besides, it is more powerful at modelling complex structured data. The effectiveness of the proposed method is validated using datasets from rolling bearings and gearbox. To show the superiority of the proposed method, its performance is compared with two well-known classifiers, i.e., back propagation neural network (BPNN) and support vector machine (SVM). The fault classification accuracies are 99.26% for rolling bearings and 100% for gearbox when using the proposed method, which are much higher than that of the other two methods.

  13. Multispectral and Panchromatic used Enhancement Resolution and Study Effective Enhancement on Supervised and Unsupervised Classification Land – Cover

    Science.gov (United States)

    Salman, S. S.; Abbas, W. A.

    2018-05-01

    The goal of the study is to support analysis Enhancement of Resolution and study effect on classification methods on bands spectral information of specific and quantitative approaches. In this study introduce a method to enhancement resolution Landsat 8 of combining the bands spectral of 30 meters resolution with panchromatic band 8 of 15 meters resolution, because of importance multispectral imagery to extracting land - cover. Classification methods used in this study to classify several lands -covers recorded from OLI- 8 imagery. Two methods of Data mining can be classified as either supervised or unsupervised. In supervised methods, there is a particular predefined target, that means the algorithm learn which values of the target are associated with which values of the predictor sample. K-nearest neighbors and maximum likelihood algorithms examine in this work as supervised methods. In other hand, no sample identified as target in unsupervised methods, the algorithm of data extraction searches for structure and patterns between all the variables, represented by Fuzzy C-mean clustering method as one of the unsupervised methods, NDVI vegetation index used to compare the results of classification method, the percent of dense vegetation in maximum likelihood method give a best results.

  14. K­MEANS CLUSTERING FOR HIDDEN MARKOV MODEL

    NARCIS (Netherlands)

    Perrone, M.P.; Connell, S.D.

    2004-01-01

    An unsupervised k­means clustering algorithm for hidden Markov models is described and applied to the task of generating subclass models for individual handwritten character classes. The algorithm is compared to a related clustering method and shown to give a relative change in the error rate of as

  15. Single pass kernel k-means clustering method

    Indian Academy of Sciences (India)

    In unsupervised classification, kernel -means clustering method has been shown to perform better than conventional -means clustering method in ... 518501, India; Department of Computer Science and Engineering, Jawaharlal Nehru Technological University, Anantapur College of Engineering, Anantapur 515002, India ...

  16. Specialization processes in on-line unsupervised learning

    NARCIS (Netherlands)

    Biehl, M.; Freking, A.; Reents, G.; Schlösser, E.

    1998-01-01

    From the recent analysis of supervised learning by on-line gradient descent in multilayered neural networks it is known that the necessary process of student specialization can be delayed significantly. We demonstrate that this phenomenon also occurs in various models of unsupervised learning. A

  17. Unsupervised Assessment of Subcutaneous and Visceral Fat by MRI

    DEFF Research Database (Denmark)

    Jørgensen, Peter Stanley; Larsen, Rasmus; Wraae, Kristian

    2009-01-01

    This paper presents a. method for unsupervised assessment of visceral and subcutaneous adipose tissue in the abdominal region by MRI. The identification of the subcutaneous and the visceral regions were achieved by dynamic programming constrained by points acquired from an active shape model...

  18. Modeling Visit Behaviour in Smart Homes using Unsupervised Learning

    NARCIS (Netherlands)

    Nait Aicha, A.; Englebienne, G.; Kröse, B.

    2014-01-01

    Many algorithms on health monitoring from ambient sensor networks assume that only a single person is present in the home. We present an unsupervised method that models visit behaviour. A Markov modulated multidimensional non-homogeneous Poisson process (M3P2) is described that allows us to model

  19. Bilingual Lexical Interactions in an Unsupervised Neural Network Model

    Science.gov (United States)

    Zhao, Xiaowei; Li, Ping

    2010-01-01

    In this paper we present an unsupervised neural network model of bilingual lexical development and interaction. We focus on how the representational structures of the bilingual lexicons can emerge, develop, and interact with each other as a function of the learning history. The results show that: (1) distinct representations for the two lexicons…

  20. Teacher and learner: Supervised and unsupervised learning in communities.

    Science.gov (United States)

    Shafto, Michael G; Seifert, Colleen M

    2015-01-01

    How far can teaching methods go to enhance learning? Optimal methods of teaching have been considered in research on supervised and unsupervised learning. Locally optimal methods are usually hybrids of teaching and self-directed approaches. The costs and benefits of specific methods have been shown to depend on the structure of the learning task, the learners, the teachers, and the environment.

  1. Unsupervised detection and removal of muscle artifacts from scalp EEG recordings using canonical correlation analysis, wavelets and random forests.

    Science.gov (United States)

    Anastasiadou, Maria N; Christodoulakis, Manolis; Papathanasiou, Eleftherios S; Papacostas, Savvas S; Mitsis, Georgios D

    2017-09-01

    This paper proposes supervised and unsupervised algorithms for automatic muscle artifact detection and removal from long-term EEG recordings, which combine canonical correlation analysis (CCA) and wavelets with random forests (RF). The proposed algorithms first perform CCA and continuous wavelet transform of the canonical components to generate a number of features which include component autocorrelation values and wavelet coefficient magnitude values. A subset of the most important features is subsequently selected using RF and labelled observations (supervised case) or synthetic data constructed from the original observations (unsupervised case). The proposed algorithms are evaluated using realistic simulation data as well as 30min epochs of non-invasive EEG recordings obtained from ten patients with epilepsy. We assessed the performance of the proposed algorithms using classification performance and goodness-of-fit values for noisy and noise-free signal windows. In the simulation study, where the ground truth was known, the proposed algorithms yielded almost perfect performance. In the case of experimental data, where expert marking was performed, the results suggest that both the supervised and unsupervised algorithm versions were able to remove artifacts without affecting noise-free channels considerably, outperforming standard CCA, independent component analysis (ICA) and Lagged Auto-Mutual Information Clustering (LAMIC). The proposed algorithms achieved excellent performance for both simulation and experimental data. Importantly, for the first time to our knowledge, we were able to perform entirely unsupervised artifact removal, i.e. without using already marked noisy data segments, achieving performance that is comparable to the supervised case. Overall, the results suggest that the proposed algorithms yield significant future potential for improving EEG signal quality in research or clinical settings without the need for marking by expert

  2. Unsupervised segmentation of lung fields in chest radiographs using multiresolution fractal feature vector and deformable models.

    Science.gov (United States)

    Lee, Wen-Li; Chang, Koyin; Hsieh, Kai-Sheng

    2016-09-01

    Segmenting lung fields in a chest radiograph is essential for automatically analyzing an image. We present an unsupervised method based on multiresolution fractal feature vector. The feature vector characterizes the lung field region effectively. A fuzzy c-means clustering algorithm is then applied to obtain a satisfactory initial contour. The final contour is obtained by deformable models. The results show the feasibility and high performance of the proposed method. Furthermore, based on the segmentation of lung fields, the cardiothoracic ratio (CTR) can be measured. The CTR is a simple index for evaluating cardiac hypertrophy. After identifying a suspicious symptom based on the estimated CTR, a physician can suggest that the patient undergoes additional extensive tests before a treatment plan is finalized.

  3. Gastric cancer differentiation using Fourier transform near-infrared spectroscopy with unsupervised pattern recognition

    Science.gov (United States)

    Yi, Wei-song; Cui, Dian-sheng; Li, Zhi; Wu, Lan-lan; Shen, Ai-guo; Hu, Ji-ming

    2013-01-01

    The manuscript has investigated the application of near-infrared (NIR) spectroscopy for differentiation gastric cancer. The 90 spectra from cancerous and normal tissues were collected from a total of 30 surgical specimens using Fourier transform near-infrared spectroscopy (FT-NIR) equipped with a fiber-optic probe. Major spectral differences were observed in the CH-stretching second overtone (9000-7000 cm-1), CH-stretching first overtone (6000-5200 cm-1), and CH-stretching combination (4500-4000 cm-1) regions. By use of unsupervised pattern recognition, such as principal component analysis (PCA) and cluster analysis (CA), all spectra were classified into cancerous and normal tissue groups with accuracy up to 81.1%. The sensitivity and specificity was 100% and 68.2%, respectively. These present results indicate that CH-stretching first, combination band and second overtone regions can serve as diagnostic markers for gastric cancer.

  4. Audio-based, unsupervised machine learning reveals cyclic changes in earthquake mechanisms in the Geysers geothermal field, California

    Science.gov (United States)

    Holtzman, B. K.; Paté, A.; Paisley, J.; Waldhauser, F.; Repetto, D.; Boschi, L.

    2017-12-01

    The earthquake process reflects complex interactions of stress, fracture and frictional properties. New machine learning methods reveal patterns in time-dependent spectral properties of seismic signals and enable identification of changes in faulting processes. Our methods are based closely on those developed for music information retrieval and voice recognition, using the spectrogram instead of the waveform directly. Unsupervised learning involves identification of patterns based on differences among signals without any additional information provided to the algorithm. Clustering of 46,000 earthquakes of $0.3

  5. Individualized unsupervised exercise programs and chest physiotherapy in children with cystic fibrosis

    Directory of Open Access Journals (Sweden)

    Bogdan ALMĂJAN-GUȚĂ

    2013-12-01

    Full Text Available Traditionally, physiotherapy for cystic fibrosis focused mainly on airway clearance (clearing mucus from the lungs. This still makes up a large part of daily treatment, but the role of the physiotherapist in cystic fibrosis has expanded to include daily exercise, inhalation therapy, posture awareness and, for some, the management of urinary incontinence. The purpose of this study is to demonstrate the necessity and the efficiency of various methods of chest physiotherapy and individualized unsupervised exercise program, in the improvement of body composition and physical performance. This study included 12 children with cystic fibrosis, with ages between 8-13 years. Each subject was evaluated in terms of body composition, effort capacity and lower body muscular performance, at the beginning of the study and after 12 months.The intervention consisted in classic respiratory clearance and physiotherapy techniques (5 times a week and an individualized unsupervised exercise program (3 times a week. After 12 months we noticed a significant improvement of the measured parameters: body weight increased from 32.25±5.5 to 33.53±5.4 kg (p <0.001, skeletal muscle mass increased from a mean of 16.04±4.1 to 17.01±4.2 (p<0.001, the fitness score, increased from a mean of 71±3.8 points to73±3.8, (p<0.001 and power and force also registered positive evolutions (from 19.3±2.68 to 21.65±2.4 W/kg and respectively 19.68±2.689 to 20.81±2.98 N/kg.The association between physiotherapy procedures and an individualized (after a proper clinical assessment unsupervised exercise program, proved to be an effective, relatively simple and accessible (regardless of social class intervention.

  6. The Immersive Virtual Reality Experience: A Typology of Users Revealed Through Multiple Correspondence Analysis Combined with Cluster Analysis Technique.

    Science.gov (United States)

    Rosa, Pedro J; Morais, Diogo; Gamito, Pedro; Oliveira, Jorge; Saraiva, Tomaz

    2016-03-01

    Immersive virtual reality is thought to be advantageous by leading to higher levels of presence. However, and despite users getting actively involved in immersive three-dimensional virtual environments that incorporate sound and motion, there are individual factors, such as age, video game knowledge, and the predisposition to immersion, that may be associated with the quality of virtual reality experience. Moreover, one particular concern for users engaged in immersive virtual reality environments (VREs) is the possibility of side effects, such as cybersickness. The literature suggests that at least 60% of virtual reality users report having felt symptoms of cybersickness, which reduces the quality of the virtual reality experience. The aim of this study was thus to profile the right user to be involved in a VRE through head-mounted display. To examine which user characteristics are associated with the most effective virtual reality experience (lower cybersickness), a multiple correspondence analysis combined with cluster analysis technique was performed. Results revealed three distinct profiles, showing that the PC gamer profile is more associated with higher levels of virtual reality effectiveness, that is, higher predisposition to be immersed and reduced cybersickness symptoms in the VRE than console gamer and nongamer. These findings can be a useful orientation in clinical practice and future research as they help identify which users are more predisposed to benefit from immersive VREs.

  7. Computerized detection method for asymptomatic white matter lesions in brain screening MR images using a clustering technique

    International Nuclear Information System (INIS)

    Kunieda, Takuya; Uchiyama, Yoshikazu; Hara, Takeshi

    2008-01-01

    Asymptomatic white matter lesions are frequently identified by the screening system known as Brain Dock, which is intended for the detection of asymptomatic brain diseases. The detection of asymptomatic white matter lesions is important because their presence is associated with an increased risk of stroke. Therefore, we have developed a computerized method for the detection of asymptomatic white matter lesions in order to assist radiologists in image interpretation as a ''second opinion''. Our database consisted of T 1 - and T 2 -weighted images obtained from 73 patients. The locations of the white matter lesions were determined by an experienced neuroradiologist. In order to restrict the area to be searched for white matter lesions, we first segmented the cerebral region in T 1 -weighted images by applying thresholding and region-growing techniques. To identify the initial candidate lesions, k-means clustering with pixel values in T 1 - and T 2 -weighted images was applied to the segmented cerebral region. To eliminate false positives (FPs), we determined the features, such as location, size, and circularity, of each of the initial candidate lesions. Finally, a rule-based scheme and a quadratic discriminant analysis with these features were employed to distinguish between white matter lesions and FPs. The results showed that the sensitivity for the detection of white matter lesions was 93.2%, with 4.3 FPs per image, suggesting that our computerized method may be useful for the detection of asymptomatic white matter lesions in T 1 - and T 2 -weighted images. (author)

  8. The Effect of Roundtable and Clustering Teaching Techniques and Students' Personal Traits on Students' Achievement in Descriptive Writing

    Science.gov (United States)

    Sinaga, Megawati

    2017-01-01

    The Objectives of this paper as an experimental research was to investigate the effect of Roundtable and Clustering teaching techniques and students' personal traits on students' achievement in descriptive writing. The students in grade ix of SMP Negeri 2 Pancurbatu 2016/2017 school academic year were chose as the population of this research. The…

  9. Enhancement of ELM by Clustering Discrimination Manifold Regularization and Multiobjective FOA for Semisupervised Classification

    OpenAIRE

    Qing Ye; Hao Pan; Changhua Liu

    2015-01-01

    A novel semisupervised extreme learning machine (ELM) with clustering discrimination manifold regularization (CDMR) framework named CDMR-ELM is proposed for semisupervised classification. By using unsupervised fuzzy clustering method, CDMR framework integrates clustering discrimination of both labeled and unlabeled data with twinning constraints regularization. Aiming at further improving the classification accuracy and efficiency, a new multiobjective fruit fly optimization algorithm (MOFOA)...

  10. A comparative study on full diagonalization of Hessian matrix and Gradient-only technique to trace out reaction path in doped noble gas clusters using stochastic optimization

    International Nuclear Information System (INIS)

    Biring, Shyamal Kumar; Chaudhury, Pinaki

    2012-01-01

    Highlights: ► Estimation of critical points in Noble-gas clusters. ► Evaluation of first order saddle point or transition states. ► Construction of reaction path for structural change in clusters. ► Use of Monte-Carlo Simulated Annealing to study structural changes. - Abstract: This paper proposes Simulated Annealing based search to locate critical points in mixed noble gas clusters where Ne and Xe are individually doped in Ar-clusters. Using Lennard–Jones (LJ) atomic interaction we try to explore the search process of transformation through Minimum Energy Path (MEP) from one minimum energy geometry to another via first order saddle point on the potential energy surface of the clusters. Here we compare the results based on diagonalization of the full Hessian all through the search and quasi-gradient only technique to search saddle points and construction of reaction path (RP) for three sizes of doped Ar-clusters, (Ar) 19 Ne/Xe,(Ar) 24 Ne/Xe and (Ar) 29 Ne/Xe.

  11. Towards unsupervised ontology learning from data

    CSIR Research Space (South Africa)

    Klarman, S

    2015-07-01

    Full Text Available from facts [Shapiro, 1981], finite automata descriptions from observations [Pitt, 1989], logic programs from interpretations [De Raedt and Lavracˇ, 1993; De Raedt, 1994]. In the area of DLs, a few learning scenarios have been formally addressed..., concerned largely with learning concept descriptions via different learn- ing operators [Straccia and Mucci, 2015; Lehmann and Hit- zler, 2008; Fanizzi et al., 2008; Cohen and Hirsh, 1994] and applications of formal concept analysis techniques to auto- mated...

  12. An Unsupervised Approach to Activity Recognition and Segmentation based on Object-Use Fingerprints

    DEFF Research Database (Denmark)

    Gu, Tao; Chen, Shaxun; Tao, Xianping

    2010-01-01

    Human activity recognition is an important task which has many potential applications. In recent years, researchers from pervasive computing are interested in deploying on-body sensors to collect observations and applying machine learning techniques to model and recognize activities. Supervised...... machine learning techniques typically require an appropriate training process in which training data need to be labeled manually. In this paper, we propose an unsupervised approach based on object-use fingerprints to recognize activities without human labeling. We show how to build our activity models...... a trace and detect the boundary of any two adjacent activities. We develop a wearable RFID system and conduct a real-world trace collection done by seven volunteers in a smart home over a period of 2 weeks. We conduct comprehensive experimental evaluations and comparison study. The results show that our...

  13. Subspace K-means clustering.

    Science.gov (United States)

    Timmerman, Marieke E; Ceulemans, Eva; De Roover, Kim; Van Leeuwen, Karla

    2013-12-01

    To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).

  14. Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings

    OpenAIRE

    Aldarmaki, Hanan; Mohan, Mahesh; Diab, Mona

    2017-01-01

    Most existing methods for automatic bilingual dictionary induction rely on prior alignments between the source and target languages, such as parallel corpora or seed dictionaries. For many language pairs, such supervised alignments are not readily available. We propose an unsupervised approach for learning a bilingual dictionary for a pair of languages given their independently-learned monolingual word embeddings. The proposed method exploits local and global structures in monolingual vector ...

  15. Automated Glioblastoma Segmentation Based on a Multiparametric Structured Unsupervised Classification

    Science.gov (United States)

    Juan-Albarracín, Javier; Fuster-Garcia, Elies; Manjón, José V.; Robles, Montserrat; Aparici, F.; Martí-Bonmatí, L.; García-Gómez, Juan M.

    2015-01-01

    Automatic brain tumour segmentation has become a key component for the future of brain tumour treatment. Currently, most of brain tumour segmentation approaches arise from the supervised learning standpoint, which requires a labelled training dataset from which to infer the models of the classes. The performance of these models is directly determined by the size and quality of the training corpus, whose retrieval becomes a tedious and time-consuming task. On the other hand, unsupervised approaches avoid these limitations but often do not reach comparable results than the supervised methods. In this sense, we propose an automated unsupervised method for brain tumour segmentation based on anatomical Magnetic Resonance (MR) images. Four unsupervised classification algorithms, grouped by their structured or non-structured condition, were evaluated within our pipeline. Considering the non-structured algorithms, we evaluated K-means, Fuzzy K-means and Gaussian Mixture Model (GMM), whereas as structured classification algorithms we evaluated Gaussian Hidden Markov Random Field (GHMRF). An automated postprocess based on a statistical approach supported by tissue probability maps is proposed to automatically identify the tumour classes after the segmentations. We evaluated our brain tumour segmentation method with the public BRAin Tumor Segmentation (BRATS) 2013 Test and Leaderboard datasets. Our approach based on the GMM model improves the results obtained by most of the supervised methods evaluated with the Leaderboard set and reaches the second position in the ranking. Our variant based on the GHMRF achieves the first position in the Test ranking of the unsupervised approaches and the seventh position in the general Test ranking, which confirms the method as a viable alternative for brain tumour segmentation. PMID:25978453

  16. Information-Based Approach to Unsupervised Machine Learning

    Science.gov (United States)

    2013-06-19

    samples with large fitting error. The above optimization problem can be reduced to a quadratic program (Mangasarian & Musicant , 2000), which can be...recognition. Technicheskaya Kibernetica, 3. in Russian. Mallows, C. L. (1973). Some comments on CP . Technometrics, 15, 661–675. Mangasarian, O. L., & Musicant ...to find correspondence between two sets of objects in different domains in an unsupervised way. Photo album summa- rization is a typical application

  17. Decoding Decoders: Finding Optimal Representation Spaces for Unsupervised Similarity Tasks

    OpenAIRE

    Zhelezniak, Vitalii; Busbridge, Dan; Shen, April; Smith, Samuel L.; Hammerla, Nils Y.

    2018-01-01

    Experimental evidence indicates that simple models outperform complex deep networks on many unsupervised similarity tasks. We provide a simple yet rigorous explanation for this behaviour by introducing the concept of an optimal representation space, in which semantically close symbols are mapped to representations that are close under a similarity measure induced by the model's objective function. In addition, we present a straightforward procedure that, without any retraining or architectura...

  18. Unsupervised Learning of Spatiotemporal Features by Video Completion

    OpenAIRE

    Nallabolu, Adithya Reddy

    2017-01-01

    In this work, we present an unsupervised representation learning approach for learning rich spatiotemporal features from videos without the supervision from semantic labels. We propose to learn the spatiotemporal features by training a 3D convolutional neural network (CNN) using video completion as a surrogate task. Using a large collection of unlabeled videos, we train the CNN to predict the missing pixels of a spatiotemporal hole given the remaining parts of the video through minimizing per...

  19. Unsupervised classification of neocortical activity patterns in neonatal and pre-juvenile rodents

    Directory of Open Access Journals (Sweden)

    Nicole eCichon

    2014-05-01

    Full Text Available Flexible communication within the brain, which relies on oscillatory activity, is not confined to adult neuronal networks. Experimental evidence has documented the presence of discontinuous patterns of oscillatory activity already during early development. Their highly variable spatial and time-frequency organization has been related to region specificity. However, it might be equally due to the absence of unitary criteria for classifying the early activity patterns, since they have been mainly characterized by visual inspection. Therefore, robust and unbiased methods for categorizing these discontinuous oscillations are needed for increasingly complex data sets from different labs. Here, we introduce an unsupervised detection and classification algorithm for the discontinuous activity patterns of rodents during early development. For this, firstly time windows with discontinuous oscillations vs. epochs of network silence were identified. In a second step, the major features of detected events were identified and processed by principal component analysis for deciding on their contribution to the classification of different oscillatory patterns. Finally, these patterns were categorized using an unsupervised cluster algorithm. The results were validated on manually characterized neonatal spindle bursts, which ubiquitously entrain neocortical areas of rats and mice, and prelimbic nested gamma spindle bursts. Moreover, the algorithm led to satisfactory results for oscillatory events that, due to increased similarity of their features, were more difficult to classify, e.g. during the pre-juvenile developmental period. Based on a linear classification, the optimal number of features to consider increased with the difficulty of detection. This algorithm allows the comparison of neonatal and pre-juvenile oscillatory patterns in their spatial and temporal organization. It might represent a first step for the unbiased elucidation of activity patterns

  20. Automatic Clustering Using FSDE-Forced Strategy Differential Evolution

    Science.gov (United States)

    Yasid, A.

    2018-01-01

    Clustering analysis is important in datamining for unsupervised data, cause no adequate prior knowledge. One of the important tasks is defining the number of clusters without user involvement that is known as automatic clustering. This study intends on acquiring cluster number automatically utilizing forced strategy differential evolution (AC-FSDE). Two mutation parameters, namely: constant parameter and variable parameter are employed to boost differential evolution performance. Four well-known benchmark datasets were used to evaluate the algorithm. Moreover, the result is compared with other state of the art automatic clustering methods. The experiment results evidence that AC-FSDE is better or competitive with other existing automatic clustering algorithm.

  1. Towards a new classification of stable phase schizophrenia into major and simple neuro-cognitive psychosis: Results of unsupervised machine learning analysis.

    Science.gov (United States)

    Kanchanatawan, Buranee; Sriswasdi, Sira; Thika, Supaksorn; Stoyanov, Drozdstoy; Sirivichayakul, Sunee; Carvalho, André F; Geffard, Michel; Maes, Michael

    2018-05-23

    Deficit schizophrenia, as defined by the Schedule for Deficit Syndrome, may represent a distinct diagnostic class defined by neurocognitive impairments coupled with changes in IgA/IgM responses to tryptophan catabolites (TRYCATs). Adequate classifications should be based on supervised and unsupervised learning rather than on consensus criteria. This study used machine learning as means to provide a more accurate classification of patients with stable phase schizophrenia. We found that using negative symptoms as discriminatory variables, schizophrenia patients may be divided into two distinct classes modelled by (A) impairments in IgA/IgM responses to noxious and generally more protective tryptophan catabolites, (B) impairments in episodic and semantic memory, paired associative learning and false memory creation, and (C) psychotic, excitation, hostility, mannerism, negative, and affective symptoms. The first cluster shows increased negative, psychotic, excitation, hostility, mannerism, depression and anxiety symptoms, and more neuroimmune and cognitive disorders and is therefore called "major neurocognitive psychosis" (MNP). The second cluster, called "simple neurocognitive psychosis" (SNP) is discriminated from normal controls by the same features although the impairments are less well developed than in MNP. The latter is additionally externally validated by lowered quality of life, body mass (reflecting a leptosome body type), and education (reflecting lower cognitive reserve). Previous distinctions including "type 1" (positive)/"type 2" (negative) and DSM-IV-TR (eg, paranoid) schizophrenia could not be validated using machine learning techniques. Previous names of the illness, including schizophrenia, are not very adequate because they do not describe the features of the illness, namely, interrelated neuroimmune, cognitive, and clinical features. Stable-phase schizophrenia consists of 2 relevant qualitatively distinct categories or nosological entities with SNP

  2. Clustering and visualizing similarity networks of membrane proteins.

    Science.gov (United States)

    Hu, Geng-Ming; Mai, Te-Lun; Chen, Chi-Ming

    2015-08-01

    We proposed a fast and unsupervised clustering method, minimum span clustering (MSC), for analyzing the sequence-structure-function relationship of biological networks, and demonstrated its validity in clustering the sequence/structure similarity networks (SSN) of 682 membrane protein (MP) chains. The MSC clustering of MPs based on their sequence information was found to be consistent with their tertiary structures and functions. For the largest seven clusters predicted by MSC, the consistency in chain function within the same cluster is found to be 100%. From analyzing the edge distribution of SSN for MPs, we found a characteristic threshold distance for the boundary between clusters, over which SSN of MPs could be properly clustered by an unsupervised sparsification of the network distance matrix. The clustering results of MPs from both MSC and the unsupervised sparsification methods are consistent with each other, and have high intracluster similarity and low intercluster similarity in sequence, structure, and function. Our study showed a strong sequence-structure-function relationship of MPs. We discussed evidence of convergent evolution of MPs and suggested applications in finding structural similarities and predicting biological functions of MP chains based on their sequence information. © 2015 Wiley Periodicals, Inc.

  3. Approximate fuzzy C-means (AFCM) cluster analysis of medical magnetic resonance image (MRI) data

    International Nuclear Information System (INIS)

    DelaPaz, R.L.; Chang, P.J.; Bernstein, R.; Dave, J.V.

    1987-01-01

    The authors describe the application of an approximate fuzzy C-means (AFCM) clustering algorithm as a data dimension reduction approach to medical magnetic resonance images (MRI). Image data consisted of one T1-weighted, two T2-weighted, and one T2*-weighted (magnetic susceptibility) image for each cranial study and a matrix of 10 images generated from 10 combinations of TE and TR for each body lymphoma study. All images were obtained with a 1.5 Tesla imaging system (GE Signa). Analyses were performed on over 100 MR image sets with a variety of pathologies. The cluster analysis was operated in an unsupervised mode and computational overhead was minimized by utilizing a table look-up approach without adversely affecting accuracy. Image data were first segmented into 2 coarse clusters, each of which was then subdivided into 16 fine clusters. The final tissue classifications were presented as color-coded anatomically-mapped images and as two and three dimensional displays of cluster center data in selected feature space (minimum spanning tree). Fuzzy cluster analysis appears to be a clinically useful dimension reduction technique which results in improved diagnostic specificity of medical magnetic resonance images

  4. Clustering analysis

    International Nuclear Information System (INIS)

    Romli

    1997-01-01

    Cluster analysis is the name of group of multivariate techniques whose principal purpose is to distinguish similar entities from the characteristics they process.To study this analysis, there are several algorithms that can be used. Therefore, this topic focuses to discuss the algorithms, such as, similarity measures, and hierarchical clustering which includes single linkage, complete linkage and average linkage method. also, non-hierarchical clustering method, which is popular name K -mean method ' will be discussed. Finally, this paper will be described the advantages and disadvantages of every methods

  5. Cluster analysis

    CERN Document Server

    Everitt, Brian S; Leese, Morven; Stahl, Daniel

    2011-01-01

    Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.Real life examples are used throughout to demons

  6. Detecting Transitions in Manual Tasks from Wearables: An Unsupervised Labeling Approach

    Directory of Open Access Journals (Sweden)

    Sebastian Böttcher

    2018-03-01

    Full Text Available Authoring protocols for manual tasks such as following recipes, manufacturing processes or laboratory experiments requires significant effort. This paper presents a system that estimates individual procedure transitions from the user’s physical movement and gestures recorded with inertial motion sensors. Combined with egocentric or external video recordings, this facilitates efficient review and annotation of video databases. We investigate different clustering algorithms on wearable inertial sensor data recorded on par with video data, to automatically create transition marks between task steps. The goal is to match these marks to the transitions given in a description of the workflow, thus creating navigation cues to browse video repositories of manual work. To evaluate the performance of unsupervised algorithms, the automatically-generated marks are compared to human expert-created labels on two publicly-available datasets. Additionally, we tested the approach on a novel dataset in a manufacturing lab environment, describing an existing sequential manufacturing process. The results from selected clustering methods are also compared to some supervised methods.

  7. Semi-supervised clustering methods.

    Science.gov (United States)

    Bair, Eric

    2013-01-01

    Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.

  8. Little effect of transfer technique instruction and physical fitness training in reducing low back pain among nurses: a cluster randomised intervention study

    DEFF Research Database (Denmark)

    Warming, S; Ebbehøj, N E; Wiese, N

    2008-01-01

    intervention (six wards) or to control (five wards). The intervention cluster was individually randomised to TT (55 nurses) and TTPT (50 nurses), control (76 nurses). The transfer technique programme was a 4-d course of train-the-trainers to teach transfer technique to their colleagues. The physical training...... consisted of supervised physical fitness training 1 h twice per week for 8 weeks. Implementing transfer technique alone or in combination with physical fitness training among a hospital nursing staff did not, when compared to a control group, show any statistical differences according to self-reported low...... to nurses in a hospital setting needs to be thoroughly considered. Other priorities such as physical training may be taken into consideration. The current study supports the findings of other studies that introducing transfer technique alone has no effect in targeting LBP. However, physical training seems...

  9. ClusTrack: feature extraction and similarity measures for clustering of genome-wide data sets.

    Directory of Open Access Journals (Sweden)

    Halfdan Rydbeck

    Full Text Available Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-level genomic and epigenomic data, e.g. ChIP-based data. We here introduce a general methodology for clustering data sets of coordinates relative to a genome assembly, i.e. genomic tracks. By defining appropriate feature extraction approaches and similarity measures, we allow biologically meaningful clustering to be performed for genomic tracks using standard clustering algorithms. An implementation of the methodology is provided through a tool, ClusTrack, which allows fine-tuned clustering analyses to be specified through a web-based interface. We apply our methods to the clustering of occupancy of the H3K4me1 histone modification in samples from a range of different cell types. The majority of samples form meaningful subclusters, confirming that the definitions of features and similarity capture biological, rather than technical, variation between the genomic tracks. Input data and results are available, and can be reproduced, through a Galaxy Pages document at http://hyperbrowser.uio.no/hb/u/hb-superuser/p/clustrack. The clustering functionality is available as a Galaxy tool, under the menu option "Specialized analyzis of tracks", and the submenu option "Cluster tracks based on genome level similarity", at the Genomic HyperBrowser server: http://hyperbrowser.uio.no/hb/.

  10. Analog memristive synapse in spiking networks implementing unsupervised learning

    Directory of Open Access Journals (Sweden)

    Erika Covi

    2016-10-01

    Full Text Available Emerging brain-inspired architectures call for devices that can emulate the functionality of biological synapses in order to implement new efficient computational schemes able to solve ill-posed problems. Various devices and solutions are still under investigation and, in this respect, a challenge is opened to the researchers in the field. Indeed, the optimal candidate is a device able to reproduce the complete functionality of a synapse, i.e. the typical synaptic process underlying learning in biological systems (activity-dependent synaptic plasticity. This implies a device able to change its resistance (synaptic strength, or weight upon proper electrical stimuli (synaptic activity and showing several stable resistive states throughout its dynamic range (analog behavior. Moreover, it should be able to perform spike timing dependent plasticity (STDP, an associative homosynaptic plasticity learning rule based on the delay time between the two firing neurons the synapse is connected to. This rule is a fundamental learning protocol in state-of-art networks, because it allows unsupervised learning. Notwithstanding this fact, STDP-based unsupervised learning has been proposed several times mainly for binary synapses rather than multilevel synapses composed of many binary memristors. This paper proposes an HfO2-based analog memristor as a synaptic element which performs STDP within a small spiking neuromorphic network operating unsupervised learning for character recognition. The trained network is able to recognize five characters even in case incomplete or noisy characters are displayed and it is robust to a device-to-device variability of up to +/-30%.

  11. Analog Memristive Synapse in Spiking Networks Implementing Unsupervised Learning.

    Science.gov (United States)

    Covi, Erika; Brivio, Stefano; Serb, Alexander; Prodromakis, Themis; Fanciulli, Marco; Spiga, Sabina

    2016-01-01

    Emerging brain-inspired architectures call for devices that can emulate the functionality of biological synapses in order to implement new efficient computational schemes able to solve ill-posed problems. Various devices and solutions are still under investigation and, in this respect, a challenge is opened to the researchers in the field. Indeed, the optimal candidate is a device able to reproduce the complete functionality of a synapse, i.e., the typical synaptic process underlying learning in biological systems (activity-dependent synaptic plasticity). This implies a device able to change its resistance (synaptic strength, or weight) upon proper electrical stimuli (synaptic activity) and showing several stable resistive states throughout its dynamic range (analog behavior). Moreover, it should be able to perform spike timing dependent plasticity (STDP), an associative homosynaptic plasticity learning rule based on the delay time between the two firing neurons the synapse is connected to. This rule is a fundamental learning protocol in state-of-art networks, because it allows unsupervised learning. Notwithstanding this fact, STDP-based unsupervised learning has been proposed several times mainly for binary synapses rather than multilevel synapses composed of many binary memristors. This paper proposes an HfO 2 -based analog memristor as a synaptic element which performs STDP within a small spiking neuromorphic network operating unsupervised learning for character recognition. The trained network is able to recognize five characters even in case incomplete or noisy images are displayed and it is robust to a device-to-device variability of up to ±30%.

  12. Unsupervised classification of operator workload from brain signals

    Science.gov (United States)

    Schultze-Kraft, Matthias; Dähne, Sven; Gugler, Manfred; Curio, Gabriel; Blankertz, Benjamin

    2016-06-01

    Objective. In this study we aimed for the classification of operator workload as it is expected in many real-life workplace environments. We explored brain-signal based workload predictors that differ with respect to the level of label information required for training, including entirely unsupervised approaches. Approach. Subjects executed a task on a touch screen that required continuous effort of visual and motor processing with alternating difficulty. We first employed classical approaches for workload state classification that operate on the sensor space of EEG and compared those to the performance of three state-of-the-art spatial filtering methods: common spatial patterns (CSPs) analysis, which requires binary label information; source power co-modulation (SPoC) analysis, which uses the subjects’ error rate as a target function; and canonical SPoC (cSPoC) analysis, which solely makes use of cross-frequency power correlations induced by different states of workload and thus represents an unsupervised approach. Finally, we investigated the effects of fusing brain signals and peripheral physiological measures (PPMs) and examined the added value for improving classification performance. Main results. Mean classification accuracies of 94%, 92% and 82% were achieved with CSP, SPoC, cSPoC, respectively. These methods outperformed the approaches that did not use spatial filtering and they extracted physiologically plausible components. The performance of the unsupervised cSPoC is significantly increased by augmenting it with PPM features. Significance. Our analyses ensured that the signal sources used for classification were of cortical origin and not contaminated with artifacts. Our findings show that workload states can be successfully differentiated from brain signals, even when less and less information from the experimental paradigm is used, thus paving the way for real-world applications in which label information may be noisy or entirely unavailable.

  13. A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning

    DEFF Research Database (Denmark)

    Fraccaro, Marco; Kamronn, Simon Due; Paquet, Ulrich

    2017-01-01

    This paper takes a step towards temporal reasoning in a dynamically changing video, not in the pixel space that constitutes its frames, but in a latent space that describes the non-linear dynamics of the objects in its world. We introduce the Kalman variational auto-encoder, a framework...... for unsupervised learning of sequential data that disentangles two latent representations: an object’s representation, coming from a recognition model, and a latent state describing its dynamics. As a result, the evolution of the world can be imagined and missing data imputed, both without the need to generate...

  14. Unsupervised behaviour-specific dictionary learning for abnormal event detection

    DEFF Research Database (Denmark)

    Ren, Huamin; Liu, Weifeng; Olsen, Søren Ingvor

    2015-01-01

    the training data is only a small proportion of the surveillance data. Therefore, we propose behavior-specific dictionaries (BSD) through unsupervised learning, pursuing atoms from the same type of behavior to represent one behavior dictionary. To further improve the dictionary by introducing information from...... potential infrequent normal patterns, we refine the dictionary by searching ‘missed atoms’ that have compact coefficients. Experimental results show that our BSD algorithm outperforms state-of-the-art dictionaries in abnormal event detection on the public UCSD dataset. Moreover, BSD has less false alarms...

  15. Hierarchical Multiple Markov Chain Model for Unsupervised Texture Segmentation

    Czech Academy of Sciences Publication Activity Database

    Scarpa, G.; Gaetano, R.; Haindl, Michal; Zerubia, J.

    2009-01-01

    Roč. 18, č. 8 (2009), s. 1830-1843 ISSN 1057-7149 R&D Projects: GA ČR GA102/08/0593 EU Projects: European Commission(XE) 507752 - MUSCLE Institutional research plan: CEZ:AV0Z10750506 Keywords : Classification * texture analysis * segmentation * hierarchical image models * Markov process Subject RIV: BD - Theory of Information Impact factor: 2.848, year: 2009 http://library.utia.cas.cz/separaty/2009/RO/haindl-hierarchical multiple markov chain model for unsupervised texture segmentation.pdf

  16. Unsupervised detection of salt marsh platforms: a topographic method

    Science.gov (United States)

    Goodwin, Guillaume C. H.; Mudd, Simon M.; Clubb, Fiona J.

    2018-03-01

    Salt marshes filter pollutants, protect coastlines against storm surges, and sequester carbon, yet are under threat from sea level rise and anthropogenic modification. The sustained existence of the salt marsh ecosystem depends on the topographic evolution of marsh platforms. Quantifying marsh platform topography is vital for improving the management of these valuable landscapes. The determination of platform boundaries currently relies on supervised classification methods requiring near-infrared data to detect vegetation, or demands labour-intensive field surveys and digitisation. We propose a novel, unsupervised method to reproducibly isolate salt marsh scarps and platforms from a digital elevation model (DEM), referred to as Topographic Identification of Platforms (TIP). Field observations and numerical models show that salt marshes mature into subhorizontal platforms delineated by subvertical scarps. Based on this premise, we identify scarps as lines of local maxima on a slope raster, then fill landmasses from the scarps upward, thus isolating mature marsh platforms. We test the TIP method using lidar-derived DEMs from six salt marshes in England with varying tidal ranges and geometries, for which topographic platforms were manually isolated from tidal flats. Agreement between manual and unsupervised classification exceeds 94 % for DEM resolutions of 1 m, with all but one site maintaining an accuracy superior to 90 % for resolutions up to 3 m. For resolutions of 1 m, platforms detected with the TIP method are comparable in surface area to digitised platforms and have similar elevation distributions. We also find that our method allows for the accurate detection of local block failures as small as 3 times the DEM resolution. Detailed inspection reveals that although tidal creeks were digitised as part of the marsh platform, unsupervised classification categorises them as part of the tidal flat, causing an increase in false negatives and overall platform

  17. Unsupervised detection of salt marsh platforms: a topographic method

    Directory of Open Access Journals (Sweden)

    G. C. H. Goodwin

    2018-03-01

    Full Text Available Salt marshes filter pollutants, protect coastlines against storm surges, and sequester carbon, yet are under threat from sea level rise and anthropogenic modification. The sustained existence of the salt marsh ecosystem depends on the topographic evolution of marsh platforms. Quantifying marsh platform topography is vital for improving the management of these valuable landscapes. The determination of platform boundaries currently relies on supervised classification methods requiring near-infrared data to detect vegetation, or demands labour-intensive field surveys and digitisation. We propose a novel, unsupervised method to reproducibly isolate salt marsh scarps and platforms from a digital elevation model (DEM, referred to as Topographic Identification of Platforms (TIP. Field observations and numerical models show that salt marshes mature into subhorizontal platforms delineated by subvertical scarps. Based on this premise, we identify scarps as lines of local maxima on a slope raster, then fill landmasses from the scarps upward, thus isolating mature marsh platforms. We test the TIP method using lidar-derived DEMs from six salt marshes in England with varying tidal ranges and geometries, for which topographic platforms were manually isolated from tidal flats. Agreement between manual and unsupervised classification exceeds 94 % for DEM resolutions of 1 m, with all but one site maintaining an accuracy superior to 90 % for resolutions up to 3 m. For resolutions of 1 m, platforms detected with the TIP method are comparable in surface area to digitised platforms and have similar elevation distributions. We also find that our method allows for the accurate detection of local block failures as small as 3 times the DEM resolution. Detailed inspection reveals that although tidal creeks were digitised as part of the marsh platform, unsupervised classification categorises them as part of the tidal flat, causing an increase in false negatives

  18. Classification Of Cluster Area Forsatellite Image

    Directory of Open Access Journals (Sweden)

    Thwe Zin Phyo

    2015-06-01

    Full Text Available Abstract This paper describes area classification for Landsat7 satellite image. The main purpose of this system is to classify the area of each cluster contained in a satellite image. To classify this image firstly need to clusterthe satellite image into different land cover types. Clustering is an unsupervised learning method that aimsto classify an image into homogeneous regions. This system is implemented based on color features with K-means clustering unsupervised algorithm. This method does not need to train image before clustering.The clusters of satellite image are grouped into a set of three clusters for Landsat7 satellite image. For this work the combined band 432 from Landsat7 satellite is used as an input. Satellite imageMandalay area in 2001 is chosen to test the segmentation method. After clustering a specific range for three clustered images must be defined in order to obtain greenland water and urbanbalance.This system is implemented by using MATLAB programming language.

  19. Consensus clustering approach to group brain connectivity matrices

    Directory of Open Access Journals (Sweden)

    Javier Rasero

    2017-10-01

    Full Text Available A novel approach rooted on the notion of consensus clustering, a strategy developed for community detection in complex networks, is proposed to cope with the heterogeneity that characterizes connectivity matrices in health and disease. The method can be summarized as follows: (a define, for each node, a distance matrix for the set of subjects by comparing the connectivity pattern of that node in all pairs of subjects; (b cluster the distance matrix for each node; (c build the consensus network from the corresponding partitions; and (d extract groups of subjects by finding the communities of the consensus network thus obtained. Different from the previous implementations of consensus clustering, we thus propose to use the consensus strategy to combine the information arising from the connectivity patterns of each node. The proposed approach may be seen either as an exploratory technique or as an unsupervised pretraining step to help the subsequent construction of a supervised classifier. Applications on a toy model and two real datasets show the effectiveness of the proposed methodology, which represents heterogeneity of a set of subjects in terms of a weighted network, the consensus matrix.

  20. An improved clustering algorithm based on reverse learning in intelligent transportation

    Science.gov (United States)

    Qiu, Guoqing; Kou, Qianqian; Niu, Ting

    2017-05-01

    With the development of artificial intelligence and data mining technology, big data has gradually entered people's field of vision. In the process of dealing with large data, clustering is an important processing method. By introducing the reverse learning method in the clustering process of PAM clustering algorithm, to further improve the limitations of one-time clustering in unsupervised clustering learning, and increase the diversity of clustering clusters, so as to improve the quality of clustering. The algorithm analysis and experimental results show that the algorithm is feasible.

  1. Semi-supervised clustering methods

    Science.gov (United States)

    Bair, Eric

    2013-01-01

    Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as “semi-supervised clustering” methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided. PMID:24729830

  2. Statistical Significance for Hierarchical Clustering

    Science.gov (United States)

    Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

    2017-01-01

    Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990

  3. Remote photoplethysmography system for unsupervised monitoring regional anesthesia effectiveness

    Science.gov (United States)

    Rubins, U.; Miscuks, A.; Marcinkevics, Z.; Lange, M.

    2017-12-01

    Determining the level of regional anesthesia (RA) is vitally important to both an anesthesiologist and surgeon, also knowing the RA level can protect the patient and reduce the time of surgery. Normally to detect the level of RA, usually a simple subjective (sensitivity test) and complicated quantitative methods (thermography, neuromyography, etc.) are used, but there is not yet a standardized method for objective RA detection and evaluation. In this study, the advanced remote photoplethysmography imaging (rPPG) system for unsupervised monitoring of human palm RA is demonstrated. The rPPG system comprises compact video camera with green optical filter, surgical lamp as a light source and a computer with custom-developed software. The algorithm implemented in Matlab software recognizes the palm and two dermatomes (Medial and Ulnar innervation), calculates the perfusion map and perfusion changes in real-time to detect effect of RA. Seven patients (aged 18-80 years) undergoing hand surgery received peripheral nerve brachial plexus blocks during the measurements. Clinical experiments showed that our rPPG system is able to perform unsupervised monitoring of RA.

  4. Function approximation using combined unsupervised and supervised learning.

    Science.gov (United States)

    Andras, Peter

    2014-03-01

    Function approximation is one of the core tasks that are solved using neural networks in the context of many engineering problems. However, good approximation results need good sampling of the data space, which usually requires exponentially increasing volume of data as the dimensionality of the data increases. At the same time, often the high-dimensional data is arranged around a much lower dimensional manifold. Here we propose the breaking of the function approximation task for high-dimensional data into two steps: (1) the mapping of the high-dimensional data onto a lower dimensional space corresponding to the manifold on which the data resides and (2) the approximation of the function using the mapped lower dimensional data. We use over-complete self-organizing maps (SOMs) for the mapping through unsupervised learning, and single hidden layer neural networks for the function approximation through supervised learning. We also extend the two-step procedure by considering support vector machines and Bayesian SOMs for the determination of the best parameters for the nonlinear neurons in the hidden layer of the neural networks used for the function approximation. We compare the approximation performance of the proposed neural networks using a set of functions and show that indeed the neural networks using combined unsupervised and supervised learning outperform in most cases the neural networks that learn the function approximation using the original high-dimensional data.

  5. Improved Anomaly Detection using Integrated Supervised and Unsupervised Processing

    Science.gov (United States)

    Hunt, B.; Sheppard, D. G.; Wetterer, C. J.

    There are two broad technologies of signal processing applicable to space object feature identification using nonresolved imagery: supervised processing analyzes a large set of data for common characteristics that can be then used to identify, transform, and extract information from new data taken of the same given class (e.g. support vector machine); unsupervised processing utilizes detailed physics-based models that generate comparison data that can then be used to estimate parameters presumed to be governed by the same models (e.g. estimation filters). Both processes have been used in non-resolved space object identification and yield similar results yet arrived at using vastly different processes. The goal of integrating the results of the two is to seek to achieve an even greater performance by building on the process diversity. Specifically, both supervised processing and unsupervised processing will jointly operate on the analysis of brightness (radiometric flux intensity) measurements reflected by space objects and observed by a ground station to determine whether a particular day conforms to a nominal operating mode (as determined from a training set) or exhibits anomalous behavior where a particular parameter (e.g. attitude, solar panel articulation angle) has changed in some way. It is demonstrated in a variety of different scenarios that the integrated process achieves a greater performance than each of the separate processes alone.

  6. Assessment of Heavy Metal Pollution in Macrophytes, Water and Sediment of a Tropical Wetland System Using Hierarchical Cluster Analysis Technique

    OpenAIRE

    , N. Kumar J.I.; , M. Das; , R. Mukherji; , R.N. Kumar

    2011-01-01

    Heavy metal pollution in aquatic ecosystems is becoming a global phenomenon because these metals are indestructible and most of them have toxic effects on living organisms. Most of the fresh water bodies all over the world are getting contaminated thus declining their suitability. Therefore, monitoring and assessment of such freshwater systems has become an environmental concern. This study aims to elucidate the useful role of the cluster analysis to assess the relationship and interdependenc...

  7. Do COPD subtypes really exist? COPD heterogeneity and clustering in 10 independent cohorts

    NARCIS (Netherlands)

    Castaldi, Peter J; Benet, Marta; Petersen, Hans; Rafaels, Nicholas; Finigan, James; Paoletti, Matteo; Marike Boezen, H; Vonk, Judith M; Bowler, Russell; Pistolesi, Massimo; Puhan, Milo A; Anto, Josep; Wauters, Els; Lambrechts, Diether; Janssens, Wim; Bigazzi, Francesca; Camiciottoli, Gianna; Cho, Michael H; Hersh, Craig P; Barnes, Kathleen; Rennard, Stephen; Boorgula, Meher Preethi; Dy, Jennifer; Hansel, Nadia N; Crapo, James D; Tesfaigzi, Yohannes; Agusti, Alvar; Silverman, Edwin K; Garcia-Aymerich, Judith

    Background COPD is a heterogeneous disease, but there is little consensus on specific definitions for COPD subtypes. Unsupervised clustering offers the promise of 'unbiased' data-driven assessment of COPD heterogeneity. Multiple groups have identified COPD subtypes using cluster analysis, but there

  8. Semi-Supervised Clustering for High-Dimensional and Sparse Features

    Science.gov (United States)

    Yan, Su

    2010-01-01

    Clustering is one of the most common data mining tasks, used frequently for data organization and analysis in various application domains. Traditional machine learning approaches to clustering are fully automated and unsupervised where class labels are unknown a priori. In real application domains, however, some "weak" form of side…

  9. Unsupervised Categorization in a Sample of Children with Autism Spectrum Disorders

    Science.gov (United States)

    Edwards, Darren J.; Perlman, Amotz; Reed, Phil

    2012-01-01

    Studies of supervised Categorization have demonstrated limited Categorization performance in participants with autism spectrum disorders (ASD), however little research has been conducted regarding unsupervised Categorization in this population. This study explored unsupervised Categorization using two stimulus sets that differed in their…

  10. Clustering of near clusters versus cluster compactness

    International Nuclear Information System (INIS)

    Yu Gao; Yipeng Jing

    1989-01-01

    The clustering properties of near Zwicky clusters are studied by using the two-point angular correlation function. The angular correlation functions for compact and medium compact clusters, for open clusters, and for all near Zwicky clusters are estimated. The results show much stronger clustering for compact and medium compact clusters than for open clusters, and that open clusters have nearly the same clustering strength as galaxies. A detailed study of the compactness-dependence of correlation function strength is worth investigating. (author)

  11. Cluster analysis

    OpenAIRE

    Mucha, Hans-Joachim; Sofyan, Hizir

    2000-01-01

    As an explorative technique, duster analysis provides a description or a reduction in the dimension of the data. It classifies a set of observations into two or more mutually exclusive unknown groups based on combinations of many variables. Its aim is to construct groups in such a way that the profiles of objects in the same groups are relatively homogenous whereas the profiles of objects in different groups are relatively heterogeneous. Clustering is distinct from classification techniques, ...

  12. Random clustering ferns for multimodal object recognition

    OpenAIRE

    Villamizar Vergel, Michael Alejandro; Garrell Zulueta, Anais; Sanfeliu Cortés, Alberto; Moreno-Noguer, Francesc

    2017-01-01

    The final publication is available at link.springer.com We propose an efficient and robust method for the recognition of objects exhibiting multiple intra-class modes, where each one is associated with a particular object appearance. The proposed method, called random clustering ferns, combines synergically a single and real-time classifier, based on the boosted assembling of extremely randomized trees (ferns), with an unsupervised and probabilistic approach in order to recognize efficient...

  13. Data clustering theory, algorithms, and applications

    CERN Document Server

    Gan, Guojun; Wu, Jianhong

    2007-01-01

    Cluster analysis is an unsupervised process that divides a set of objects into homogeneous groups. This book starts with basic information on cluster analysis, including the classification of data and the corresponding similarity measures, followed by the presentation of over 50 clustering algorithms in groups according to some specific baseline methodologies such as hierarchical, center-based, and search-based methods. As a result, readers and users can easily identify an appropriate algorithm for their applications and compare novel ideas with existing results. The book also provides examples of clustering applications to illustrate the advantages and shortcomings of different clustering architectures and algorithms. Application areas include pattern recognition, artificial intelligence, information technology, image processing, biology, psychology, and marketing. Readers also learn how to perform cluster analysis with the C/C++ and MATLAB® programming languages.

  14. Comparative Study of Antibacterial Properties of Polystyrene Films with TiOx and Cu Nanoparticles Fabricated using Cluster Beam Technique

    DEFF Research Database (Denmark)

    Popok, Vladimir; Jeppesen, Cesarino; Fojan, Peter

    2018-01-01

    Background: Antibacterial materials are of high importance for medicine, food production and conservation. Among these materials, polymer films with metals nanoparticles (NPs) are of considerable attention for many practical applications. Results: The paper describes a novel approach...... for the formation of bactericidal media which are represented by thin polymer films (polystyrene in the current case), produced by spin-coating, with Ti and Cu NPs deposited from cluster beams. Ti NPs are treated in three different ways in order to study different approaches for oxidation and, thus, efficiency...

  15. Damage detection methodology under variable load conditions based on strain field pattern recognition using FBGs, nonlinear principal component analysis, and clustering techniques

    Science.gov (United States)

    Sierra-Pérez, Julián; Torres-Arredondo, M.-A.; Alvarez-Montoya, Joham

    2018-01-01

    Structural health monitoring consists of using sensors integrated within structures together with algorithms to perform load monitoring, damage detection, damage location, damage size and severity, and prognosis. One possibility is to use strain sensors to infer structural integrity by comparing patterns in the strain field between the pristine and damaged conditions. In previous works, the authors have demonstrated that it is possible to detect small defects based on strain field pattern recognition by using robust machine learning techniques. They have focused on methodologies based on principal component analysis (PCA) and on the development of several unfolding and standardization techniques, which allow dealing with multiple load conditions. However, before a real implementation of this approach in engineering structures, changes in the strain field due to conditions different from damage occurrence need to be isolated. Since load conditions may vary in most engineering structures and promote significant changes in the strain field, it is necessary to implement novel techniques for uncoupling such changes from those produced by damage occurrence. A damage detection methodology based on optimal baseline selection (OBS) by means of clustering techniques is presented. The methodology includes the use of hierarchical nonlinear PCA as a nonlinear modeling technique in conjunction with Q and nonlinear-T 2 damage indices. The methodology is experimentally validated using strain measurements obtained by 32 fiber Bragg grating sensors bonded to an aluminum beam under dynamic bending loads and simultaneously submitted to variations in its pitch angle. The results demonstrated the capability of the methodology for clustering data according to 13 different load conditions (pitch angles), performing the OBS and detecting six different damages induced in a cumulative way. The proposed methodology showed a true positive rate of 100% and a false positive rate of 1.28% for a

  16. Spectrum Hole Identification in IEEE 802.22 WRAN using Unsupervised Learning

    Directory of Open Access Journals (Sweden)

    V. Balaji

    2016-01-01

    Full Text Available In this paper we present a Cooperative Spectrum Sensing (CSS algorithm for Cognitive Radios (CR based on IEEE 802.22Wireless Regional Area Network (WRAN standard. The core objective is to improve cooperative sensing efficiency which specifies how fast a decision can be reached in each round of cooperation (iteration to sense an appropriate number of channels/bands (i.e. 86 channels of 7MHz bandwidth as per IEEE 802.22 within a time constraint (channel sensing time. To meet this objective, we have developed CSS algorithm using unsupervised K-means clustering classification approach. The received energy level of each Secondary User (SU is considered as the parameter for determining channel availability. The performance of proposed algorithm is quantified in terms of detection accuracy, training and classification delay time. Further, the detection accuracy of our proposed scheme meets the requirement of IEEE 802.22 WRAN with the target probability of falsealrm as 0.1. All the simulations are carried out using Matlab tool.

  17. Hubble Space Telescope-NICMOS Observations of M31'S Metal-Rich Globular Clusters and Their Surrounding Fields. I. Techniques

    Science.gov (United States)

    Stephens, Andrew W.; Frogel, Jay A.; Freedman, Wendy; Gallart, Carme; Jablonka, Pascale; Ortolani, Sergio; Renzini, Alvio; Rich, R. Michael; Davies, Roger

    2001-05-01

    Astronomers are always anxious to push their observations to the limit-basing results on objects at the detection threshold, spectral features barely stronger than the noise, or photometry in very crowded regions. In this paper we present a careful analysis of photometry in crowded regions and show how image blending affects the results and interpretation of such data. Although this analysis is specifically for our NICMOS observations in M31, the techniques we develop can be applied to any imaging data taken in crowded fields; we show how the effects of image blending will limit even the Next Generation Space Telescope. We have obtained HST-NICMOS observations of five of M31's most metal-rich globular clusters. These data allow photometry of individual stars in the clusters and their surrounding fields. However, to achieve our goals-obtain accurate luminosity functions to compare with their Galactic counterparts, determine metallicities from the slope of the giant branch, identify long-period variables, and estimate ages from the AGB tip luminosity-we must be able to disentangle the true properties of the population from the observational effects associated with measurements made in very crowded fields. We thus use three different techniques to analyze the effects of crowding on our data, including the insertion of artificial stars (traditional completeness tests) and the creation of completely artificial clusters. These computer simulations are used to derive threshold- and critical-blending radii for each cluster, which determine how close to the cluster center reliable photometry can be achieved. The simulations also allow us to quantify and correct for the effects of blending on the slope and width of the RGB at different surface brightness levels. We then use these results to estimate the limits blending will place on future space-based observations. Based on observations with the NASA/ESA Hubble Space Telescope obtained at the Space Telescope Science

  18. Improving Layman Readability of Clinical Narratives with Unsupervised Synonym Replacement.

    Science.gov (United States)

    Moen, Hans; Peltonen, Laura-Maria; Koivumäki, Mikko; Suhonen, Henry; Salakoski, Tapio; Ginter, Filip; Salanterä, Sanna

    2018-01-01

    We report on the development and evaluation of a prototype tool aimed to assist laymen/patients in understanding the content of clinical narratives. The tool relies largely on unsupervised machine learning applied to two large corpora of unlabeled text - a clinical corpus and a general domain corpus. A joint semantic word-space model is created for the purpose of extracting easier to understand alternatives for words considered difficult to understand by laymen. Two domain experts evaluate the tool and inter-rater agreement is calculated. When having the tool suggest ten alternatives to each difficult word, it suggests acceptable lay words for 55.51% of them. This and future manual evaluation will serve to further improve performance, where also supervised machine learning will be used.

  19. Unsupervised/supervised learning concept for 24-hour load forecasting

    Energy Technology Data Exchange (ETDEWEB)

    Djukanovic, M [Electrical Engineering Inst. ' Nikola Tesla' , Belgrade (Yugoslavia); Babic, B [Electrical Power Industry of Serbia, Belgrade (Yugoslavia); Sobajic, D J; Pao, Y -H [Case Western Reserve Univ., Cleveland, OH (United States). Dept. of Electrical Engineering and Computer Science

    1993-07-01

    An application of artificial neural networks in short-term load forecasting is described. An algorithm using an unsupervised/supervised learning concept and historical relationship between the load and temperature for a given season, day type and hour of the day to forecast hourly electric load with a lead time of 24 hours is proposed. An additional approach using functional link net, temperature variables, average load and last one-hour load of previous day is introduced and compared with the ANN model with one hidden layer load forecast. In spite of limited available weather variables (maximum, minimum and average temperature for the day) quite acceptable results have been achieved. The 24-hour-ahead forecast errors (absolute average) ranged from 2.78% for Saturdays and 3.12% for working days to 3.54% for Sundays. (Author)

  20. Unsupervised feature learning for autonomous rock image classification

    Science.gov (United States)

    Shu, Lei; McIsaac, Kenneth; Osinski, Gordon R.; Francis, Raymond

    2017-09-01

    Autonomous rock image classification can enhance the capability of robots for geological detection and enlarge the scientific returns, both in investigation on Earth and planetary surface exploration on Mars. Since rock textural images are usually inhomogeneous and manually hand-crafting features is not always reliable, we propose an unsupervised feature learning method to autonomously learn the feature representation for rock images. In our tests, rock image classification using the learned features shows that the learned features can outperform manually selected features. Self-taught learning is also proposed to learn the feature representation from a large database of unlabelled rock images of mixed class. The learned features can then be used repeatedly for classification of any subclass. This takes advantage of the large dataset of unlabelled rock images and learns a general feature representation for many kinds of rocks. We show experimental results supporting the feasibility of self-taught learning on rock images.

  1. Unsupervised Neural Network Quantifies the Cost of Visual Information Processing.

    Science.gov (United States)

    Orbán, Levente L; Chartier, Sylvain

    2015-01-01

    Untrained, "flower-naïve" bumblebees display behavioural preferences when presented with visual properties such as colour, symmetry, spatial frequency and others. Two unsupervised neural networks were implemented to understand the extent to which these models capture elements of bumblebees' unlearned visual preferences towards flower-like visual properties. The computational models, which are variants of Independent Component Analysis and Feature-Extracting Bidirectional Associative Memory, use images of test-patterns that are identical to ones used in behavioural studies. Each model works by decomposing images of floral patterns into meaningful underlying factors. We reconstruct the original floral image using the components and compare the quality of the reconstructed image to the original image. Independent Component Analysis matches behavioural results substantially better across several visual properties. These results are interpreted to support a hypothesis that the temporal and energetic costs of information processing by pollinators served as a selective pressure on floral displays: flowers adapted to pollinators' cognitive constraints.

  2. Unsupervised Feature Learning for Heart Sounds Classification Using Autoencoder

    Science.gov (United States)

    Hu, Wei; Lv, Jiancheng; Liu, Dongbo; Chen, Yao

    2018-04-01

    Cardiovascular disease seriously threatens the health of many people. It is usually diagnosed during cardiac auscultation, which is a fast and efficient method of cardiovascular disease diagnosis. In recent years, deep learning approach using unsupervised learning has made significant breakthroughs in many fields. However, to our knowledge, deep learning has not yet been used for heart sound classification. In this paper, we first use the average Shannon energy to extract the envelope of the heart sounds, then find the highest point of S1 to extract the cardiac cycle. We convert the time-domain signals of the cardiac cycle into spectrograms and apply principal component analysis whitening to reduce the dimensionality of the spectrogram. Finally, we apply a two-layer autoencoder to extract the features of the spectrogram. The experimental results demonstrate that the features from the autoencoder are suitable for heart sound classification.

  3. CHISSL: A Human-Machine Collaboration Space for Unsupervised Learning

    Energy Technology Data Exchange (ETDEWEB)

    Arendt, Dustin L.; Komurlu, Caner; Blaha, Leslie M.

    2017-07-14

    We developed CHISSL, a human-machine interface that utilizes supervised machine learning in an unsupervised context to help the user group unlabeled instances by her own mental model. The user primarily interacts via correction (moving a misplaced instance into its correct group) or confirmation (accepting that an instance is placed in its correct group). Concurrent with the user's interactions, CHISSL trains a classification model guided by the user's grouping of the data. It then predicts the group of unlabeled instances and arranges some of these alongside the instances manually organized by the user. We hypothesize that this mode of human and machine collaboration is more effective than Active Learning, wherein the machine decides for itself which instances should be labeled by the user. We found supporting evidence for this hypothesis in a pilot study where we applied CHISSL to organize a collection of handwritten digits.

  4. The role of chemometrics in single and sequential extraction assays: a review. Part II. Cluster analysis, multiple linear regression, mixture resolution, experimental design and other techniques.

    Science.gov (United States)

    Giacomino, Agnese; Abollino, Ornella; Malandrino, Mery; Mentasti, Edoardo

    2011-03-04

    Single and sequential extraction procedures are used for studying element mobility and availability in solid matrices, like soils, sediments, sludge, and airborne particulate matter. In the first part of this review we reported an overview on these procedures and described the applications of chemometric uni- and bivariate techniques and of multivariate pattern recognition techniques based on variable reduction to the experimental results obtained. The second part of the review deals with the use of chemometrics not only for the visualization and interpretation of data, but also for the investigation of the effects of experimental conditions on the response, the optimization of their values and the calculation of element fractionation. We will describe the principles of the multivariate chemometric techniques considered, the aims for which they were applied and the key findings obtained. The following topics will be critically addressed: pattern recognition by cluster analysis (CA), linear discriminant analysis (LDA) and other less common techniques; modelling by multiple linear regression (MLR); investigation of spatial distribution of variables by geostatistics; calculation of fractionation patterns by a mixture resolution method (Chemometric Identification of Substrates and Element Distributions, CISED); optimization and characterization of extraction procedures by experimental design; other multivariate techniques less commonly applied. Copyright © 2010 Elsevier B.V. All rights reserved.

  5. Unsupervised online classifier in sleep scoring for sleep deprivation studies.

    Science.gov (United States)

    Libourel, Paul-Antoine; Corneyllie, Alexandra; Luppi, Pierre-Hervé; Chouvet, Guy; Gervasoni, Damien

    2015-05-01

    This study was designed to evaluate an unsupervised adaptive algorithm for real-time detection of sleep and wake states in rodents. We designed a Bayesian classifier that automatically extracts electroencephalogram (EEG) and electromyogram (EMG) features and categorizes non-overlapping 5-s epochs into one of the three major sleep and wake states without any human supervision. This sleep-scoring algorithm is coupled online with a new device to perform selective paradoxical sleep deprivation (PSD). Controlled laboratory settings for chronic polygraphic sleep recordings and selective PSD. Ten adult Sprague-Dawley rats instrumented for chronic polysomnographic recordings. The performance of the algorithm is evaluated by comparison with the score obtained by a human expert reader. Online detection of PS is then validated with a PSD protocol with duration of 72 hours. Our algorithm gave a high concordance with human scoring with an average κ coefficient > 70%. Notably, the specificity to detect PS reached 92%. Selective PSD using real-time detection of PS strongly reduced PS amounts, leaving only brief PS bouts necessary for the detection of PS in EEG and EMG signals (4.7 ± 0.7% over 72 h, versus 8.9 ± 0.5% in baseline), and was followed by a significant PS rebound (23.3 ± 3.3% over 150 minutes). Our fully unsupervised data-driven algorithm overcomes some limitations of the other automated methods such as the selection of representative descriptors or threshold settings. When used online and coupled with our sleep deprivation device, it represents a better option for selective PSD than other methods like the tedious gentle handling or the platform method. © 2015 Associated Professional Sleep Societies, LLC.

  6. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification.

    Science.gov (United States)

    Li, Jinyan; Fong, Simon; Sung, Yunsick; Cho, Kyungeun; Wong, Raymond; Wong, Kelvin K L

    2016-01-01

    An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class. In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE. Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model.

  7. On the application of the weak-beam technique to the determination of the sizes of small point-defect clusters in ion-irradiated copper

    International Nuclear Information System (INIS)

    Jenkins, M. L.

    1998-01-01

    We have made an analysis of the conditions necessary for the successful use of the weak-beam technique for identifying and characterizing small point-defect clusters in ion-irradiated copper. The visibility of small defects was found to depend only weakly on the magnitude of the beam-convergence. In general, the image sizes of small clusters were found to be most sensitive to the magnitude of Sa with the image sizes of some individual defects changing by large amounts with changes as small as 0.025 nm -1 . The most reliable information on the true defect size is likely to be obtained by taking a series of 5-9 micrographs with a systematic variation of deviation parameter from 0.2-0.3 nm -1 . This procedure allows size information to be obtained down to a resolution limit of about 0.5 nm for defects situated throughout a foil thickness of 60 nm. The technique has been applied to the determination of changes in the sizes of small defects produced by a low-temperature in-situ irradiation and annealing experiment

  8. Deep supervised, but not unsupervised, models may explain IT cortical representation.

    Directory of Open Access Journals (Sweden)

    Seyed-Mahdi Khaligh-Razavi

    2014-11-01

    Full Text Available Inferior temporal (IT cortex in human and nonhuman primates serves visual object recognition. Computational object-vision models, although continually improving, do not yet reach human performance. It is unclear to what extent the internal representations of computational models can explain the IT representation. Here we investigate a wide range of computational model representations (37 in total, testing their categorization performance and their ability to account for the IT representational geometry. The models include well-known neuroscientific object-recognition models (e.g. HMAX, VisNet along with several models from computer vision (e.g. SIFT, GIST, self-similarity features, and a deep convolutional neural network. We compared the representational dissimilarity matrices (RDMs of the model representations with the RDMs obtained from human IT (measured with fMRI and monkey IT (measured with cell recording for the same set of stimuli (not used in training the models. Better performing models were more similar to IT in that they showed greater clustering of representational patterns by category. In addition, better performing models also more strongly resembled IT in terms of their within-category representational dissimilarities. Representational geometries were significantly correlated between IT and many of the models. However, the categorical clustering observed in IT was largely unexplained by the unsupervised models. The deep convolutional network, which was trained by supervision with over a million category-labeled images, reached the highest categorization performance and also best explained IT, although it did not fully explain the IT data. Combining the features of this model with appropriate weights and adding linear combinations that maximize the margin between animate and inanimate objects and between faces and other objects yielded a representation that fully explained our IT data. Overall, our results suggest that explaining

  9. X-ray imaging and spectro-imaging techniques for investigating the intergalactic medium properties within merging clusters of galaxies

    International Nuclear Information System (INIS)

    Bourdin, Herve

    2004-01-01

    Clusters of galaxies are gravitationally bound matter over-densities which are filled with a hot and ionized gas emitting in X-rays. They form during merging phases of subgroups, so that the gas undergoes shock and mixing processes which perturb its physical properties at hydrostatic equilibrium. In order to map the spatial distributions of the gas emissivity, temperature and entropy as observed by X-ray telescopes, we compared different multi-scale imaging algorithms, and also developed and tested a new multi-scale spectro-imaging algorithm. With this algorithm, the searched parameter is first estimated from a count statistics within different spatial resolution elements, and its space-frequency variations are then coded by Haar wavelet coefficients. The optimal spatial distribution of the parameter is finally restored by thresholding the noisy wavelet transform. (author) [fr

  10. Spatial access method for urban geospatial database management: An efficient approach of 3D vector data clustering technique

    DEFF Research Database (Denmark)

    Azri, Suhaibah; Ujang, Uznir; Rahman, Alias Abdul

    2014-01-01

    In the last few years, 3D urban data and its information are rapidly increased due to the growth of urban area and urbanization phenomenon. These datasets are then maintain and manage in 3D spatial database system. However, performance deterioration is likely to happen due to the massiveness of 3D...... datasets. As a solution, 3D spatial index structure is used as a booster to increase the performance of data retrieval. In commercial database, commonly and widely used index structure for 3D spatial database is 3D R-Tree. This is due to its simplicity and promising method in handling spatial data. However......D geospatial data clustering to be used in the construction of 3D R-Tree and respectively could reduce the overlapping among nodes. The proposed method is tested on 3D urban dataset for the application of urban infill development. By using several cases of data updating operations such as building...

  11. Semi-Supervised Generation with Cluster-aware Generative Models

    DEFF Research Database (Denmark)

    Maaløe, Lars; Fraccaro, Marco; Winther, Ole

    2017-01-01

    Deep generative models trained with large amounts of unlabelled data have proven to be powerful within the domain of unsupervised learning. Many real life data sets contain a small amount of labelled data points, that are typically disregarded when training generative models. We propose the Clust...... a log-likelihood of −79.38 nats on permutation invariant MNIST, while also achieving competitive semi-supervised classification accuracies. The model can also be trained fully unsupervised, and still improve the log-likelihood performance with respect to related methods.......Deep generative models trained with large amounts of unlabelled data have proven to be powerful within the domain of unsupervised learning. Many real life data sets contain a small amount of labelled data points, that are typically disregarded when training generative models. We propose the Cluster...

  12. Evaluation of immunization coverage in the rural area of Pune, Maharashtra, using the 30 cluster sampling technique

    Directory of Open Access Journals (Sweden)

    Pankaj Kumar Gupta

    2013-01-01

    Full Text Available Background: Infectious diseases are a major cause of morbidity and mortality in children. One of the most cost-effective and easy methods for child survival is immunization. Despite all the efforts put in by governmental and nongovernmental institutes for 100% immunization coverage, there are still pockets of low-coverage areas. In India, immunization services are offered free in public health facilities, but, despite rapid increases, the immunization rate remains low in some areas. The Millennium Development Goals (MDG indicators also give importance to immunization. Objective: To assess the immunization coverage in the rural area of Pune. Materials and Methods: A cross-sectional study was conducted in the field practice area of the Rural Health Training Center (RHTC using the WHO′s 30 cluster sampling method for evaluation of immunization coverage. Results: A total of 1913 houses were surveyed. A total of 210 children aged 12-23 months were included in the study. It was found that 86.67% of the children were fully immunized against all the six vaccine-preventable diseases. The proportion of fully immunized children was marginally higher in males (87.61% than in females (85.57%, and the immunization card was available with 60.95% of the subjects. The most common cause for partial immunization was that the time of immunization was inconvenient (36%. Conclusion: Sustained efforts are required to achieve universal coverage of immunization in the rural area of Pune district.

  13. Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Sreepathi, Sarat [ORNL; Kumar, Jitendra [ORNL; Mills, Richard T. [Argonne National Laboratory; Hoffman, Forrest M. [ORNL; Sripathi, Vamsi [Intel Corporation; Hargrove, William Walter [United States Department of Agriculture (USDA), United States Forest Service (USFS)

    2017-09-01

    A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like the Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.

  14. Unsupervised Learning for Efficient Texture Estimation From Limited Discrete Orientation Data

    Science.gov (United States)

    Niezgoda, Stephen R.; Glover, Jared

    2013-11-01

    The estimation of orientation distribution functions (ODFs) from discrete orientation data, as produced by electron backscatter diffraction or crystal plasticity micromechanical simulations, is typically achieved via techniques such as the Williams-Imhof-Matthies-Vinel (WIMV) algorithm or generalized spherical harmonic expansions, which were originally developed for computing an ODF from pole figures measured by X-ray or neutron diffraction. These techniques rely on ad-hoc methods for choosing parameters, such as smoothing half-width and bandwidth, and for enforcing positivity constraints and appropriate normalization. In general, such approaches provide little or no information-theoretic guarantees as to their optimality in describing the given dataset. In the current study, an unsupervised learning algorithm is proposed which uses a finite mixture of Bingham distributions for the estimation of ODFs from discrete orientation data. The Bingham distribution is an antipodally-symmetric, max-entropy distribution on the unit quaternion hypersphere. The proposed algorithm also introduces a minimum message length criterion, a common tool in information theory for balancing data likelihood with model complexity, to determine the number of components in the Bingham mixture. This criterion leads to ODFs which are less likely to overfit (or underfit) the data, eliminating the need for a priori parameter choices.

  15. Sparsity enabled cluster reduced-order models for control

    Science.gov (United States)

    Kaiser, Eurika; Morzyński, Marek; Daviller, Guillaume; Kutz, J. Nathan; Brunton, Bingni W.; Brunton, Steven L.

    2018-01-01

    Characterizing and controlling nonlinear, multi-scale phenomena are central goals in science and engineering. Cluster-based reduced-order modeling (CROM) was introduced to exploit the underlying low-dimensional dynamics of complex systems. CROM builds a data-driven discretization of the Perron-Frobenius operator, resulting in a probabilistic model for ensembles of trajectories. A key advantage of CROM is that it embeds nonlinear dynamics in a linear framework, which enables the application of standard linear techniques to the nonlinear system. CROM is typically computed on high-dimensional data; however, access to and computations on this full-state data limit the online implementation of CROM for prediction and control. Here, we address this key challenge by identifying a small subset of critical measurements to learn an efficient CROM, referred to as sparsity-enabled CROM. In particular, we leverage compressive measurements to faithfully embed the cluster geometry and preserve the probabilistic dynamics. Further, we show how to identify fewer optimized sensor locations tailored to a specific problem that outperform random measurements. Both of these sparsity-enabled sensing strategies significantly reduce the burden of data acquisition and processing for low-latency in-time estimation and control. We illustrate this unsupervised learning approach on three different high-dimensional nonlinear dynamical systems from fluids with increasing complexity, with one application in flow control. Sparsity-enabled CROM is a critical facilitator for real-time implementation on high-dimensional systems where full-state information may be inaccessible.

  16. Smart Fog: Fog Computing Framework for Unsupervised Clustering Analytics in Wearable Internet of Things

    OpenAIRE

    Borthakur, Debanjan; Dubey, Harishchandra; Constant, Nicholas; Mahler, Leslie; Mankodiya, Kunal

    2017-01-01

    The increasing use of wearables in smart telehealth generates heterogeneous medical big data. Cloud and fog services process these data for assisting clinical procedures. IoT based ehealthcare have greatly benefited from efficient data processing. This paper proposed and evaluated use of low resource machine learning on Fog devices kept close to the wearables for smart healthcare. In state of the art telecare systems, the signal processing and machine learning modules are deployed in the clou...

  17. Supervised and Unsupervised Speaker Adaptation in the NIST 2005 Speaker Recognition Evaluation

    National Research Council Canada - National Science Library

    Hansen, Eric G; Slyh, Raymond E; Anderson, Timothy R

    2006-01-01

    Starting in 2004, the annual NIST Speaker Recognition Evaluation (SRE) has added an optional unsupervised speaker adaptation track where test files are processed sequentially and one may update the target model...

  18. Algorithms of maximum likelihood data clustering with applications

    Science.gov (United States)

    Giada, Lorenzo; Marsili, Matteo

    2002-12-01

    We address the problem of data clustering by introducing an unsupervised, parameter-free approach based on maximum likelihood principle. Starting from the observation that data sets belonging to the same cluster share a common information, we construct an expression for the likelihood of any possible cluster structure. The likelihood in turn depends only on the Pearson's coefficient of the data. We discuss clustering algorithms that provide a fast and reliable approximation to maximum likelihood configurations. Compared to standard clustering methods, our approach has the advantages that (i) it is parameter free, (ii) the number of clusters need not be fixed in advance and (iii) the interpretation of the results is transparent. In order to test our approach and compare it with standard clustering algorithms, we analyze two very different data sets: time series of financial market returns and gene expression data. We find that different maximization algorithms produce similar cluster structures whereas the outcome of standard algorithms has a much wider variability.

  19. Unsupervised laparoscopic appendicectomy by surgical trainees is safe and time-effective.

    Science.gov (United States)

    Wong, Kenneth; Duncan, Tristram; Pearson, Andrew

    2007-07-01

    Open appendicectomy is the traditional standard treatment for appendicitis. Laparoscopic appendicectomy is perceived as a procedure with greater potential for complications and longer operative times. This paper examines the hypothesis that unsupervised laparoscopic appendicectomy by surgical trainees is a safe and time-effective valid alternative. Medical records, operating theatre records and histopathology reports of all patients undergoing laparoscopic and open appendicectomy over a 15-month period in two hospitals within an area health service were retrospectively reviewed. Data were analysed to compare patient features, pathology findings, operative times, complications, readmissions and mortality between laparoscopic and open groups and between unsupervised surgical trainee operators versus consultant surgeon operators. A total of 143 laparoscopic and 222 open appendicectomies were reviewed. Unsupervised trainees performed 64% of the laparoscopic appendicectomies and 55% of the open appendicectomies. There were no significant differences in complication rates, readmissions, mortality and length of stay between laparoscopic and open appendicectomy groups or between trainee and consultant surgeon operators. Conversion rates (laparoscopic to open approach) were similar for trainees and consultants. Unsupervised senior surgical trainees did not take significantly longer to perform laparoscopic appendicectomy when compared to unsupervised trainee-performed open appendicectomy. Unsupervised laparoscopic appendicectomy by surgical trainees is safe and time-effective.

  20. Indoor localization using unsupervised manifold alignment with geometry perturbation

    KAUST Repository

    Majeed, Khaqan

    2014-04-01

    The main limitation of deploying/updating Received Signal Strength (RSS) based indoor localization is the construction of fingerprinted radio map, which is quite a hectic and time-consuming process especially when the indoor area is enormous and/or dynamic. Different approaches have been undertaken to reduce such deployment/update efforts, but the performance degrades when the fingerprinting load is reduced below a certain level. In this paper, we propose an indoor localization scheme that requires as low as 1% fingerprinting load. This scheme employs unsupervised manifold alignment that takes crowd sourced RSS readings and localization requests as source data set and the environment\\'s plan coordinates as destination data set. The 1% fingerprinting load is only used to perturb the local geometries in the destination data set. Our proposed algorithm was shown to achieve less than 5 m mean localization error with 1% fingerprinting load and a limited number of crowd sourced readings, when other learning based localization schemes pass the 10 m mean error with the same information.

  1. An unsupervised method for summarizing egocentric sport videos

    Science.gov (United States)

    Habibi Aghdam, Hamed; Jahani Heravi, Elnaz; Puig, Domenec

    2015-12-01

    People are getting more interested to record their sport activities using head-worn or hand-held cameras. This type of videos which is called egocentric sport videos has different motion and appearance patterns compared with life-logging videos. While a life-logging video can be defined in terms of well-defined human-object interactions, notwithstanding, it is not trivial to describe egocentric sport videos using well-defined activities. For this reason, summarizing egocentric sport videos based on human-object interaction might fail to produce meaningful results. In this paper, we propose an unsupervised method for summarizing egocentric videos by identifying the key-frames of the video. Our method utilizes both appearance and motion information and it automatically finds the number of the key-frames. Our blind user study on the new dataset collected from YouTube shows that in 93:5% cases, the users choose the proposed method as their first video summary choice. In addition, our method is within the top 2 choices of the users in 99% of studies.

  2. Spike timing analysis in neural networks with unsupervised synaptic plasticity

    Science.gov (United States)

    Mizusaki, B. E. P.; Agnes, E. J.; Brunnet, L. G.; Erichsen, R., Jr.

    2013-01-01

    The synaptic plasticity rules that sculpt a neural network architecture are key elements to understand cortical processing, as they may explain the emergence of stable, functional activity, while avoiding runaway excitation. For an associative memory framework, they should be built in a way as to enable the network to reproduce a robust spatio-temporal trajectory in response to an external stimulus. Still, how these rules may be implemented in recurrent networks and the way they relate to their capacity of pattern recognition remains unclear. We studied the effects of three phenomenological unsupervised rules in sparsely connected recurrent networks for associative memory: spike-timing-dependent-plasticity, short-term-plasticity and an homeostatic scaling. The system stability is monitored during the learning process of the network, as the mean firing rate converges to a value determined by the homeostatic scaling. Afterwards, it is possible to measure the recovery efficiency of the activity following each initial stimulus. This is evaluated by a measure of the correlation between spike fire timings, and we analysed the full memory separation capacity and limitations of this system.

  3. Unsupervised Ensemble Anomaly Detection Using Time-Periodic Packet Sampling

    Science.gov (United States)

    Uchida, Masato; Nawata, Shuichi; Gu, Yu; Tsuru, Masato; Oie, Yuji

    We propose an anomaly detection method for finding patterns in network traffic that do not conform to legitimate (i.e., normal) behavior. The proposed method trains a baseline model describing the normal behavior of network traffic without using manually labeled traffic data. The trained baseline model is used as the basis for comparison with the audit network traffic. This anomaly detection works in an unsupervised manner through the use of time-periodic packet sampling, which is used in a manner that differs from its intended purpose — the lossy nature of packet sampling is used to extract normal packets from the unlabeled original traffic data. Evaluation using actual traffic traces showed that the proposed method has false positive and false negative rates in the detection of anomalies regarding TCP SYN packets comparable to those of a conventional method that uses manually labeled traffic data to train the baseline model. Performance variation due to the probabilistic nature of sampled traffic data is mitigated by using ensemble anomaly detection that collectively exploits multiple baseline models in parallel. Alarm sensitivity is adjusted for the intended use by using maximum- and minimum-based anomaly detection that effectively take advantage of the performance variations among the multiple baseline models. Testing using actual traffic traces showed that the proposed anomaly detection method performs as well as one using manually labeled traffic data and better than one using randomly sampled (unlabeled) traffic data.

  4. Unsupervised Neural Network Quantifies the Cost of Visual Information Processing.

    Directory of Open Access Journals (Sweden)

    Levente L Orbán

    Full Text Available Untrained, "flower-naïve" bumblebees display behavioural preferences when presented with visual properties such as colour, symmetry, spatial frequency and others. Two unsupervised neural networks were implemented to understand the extent to which these models capture elements of bumblebees' unlearned visual preferences towards flower-like visual properties. The computational models, which are variants of Independent Component Analysis and Feature-Extracting Bidirectional Associative Memory, use images of test-patterns that are identical to ones used in behavioural studies. Each model works by decomposing images of floral patterns into meaningful underlying factors. We reconstruct the original floral image using the components and compare the quality of the reconstructed image to the original image. Independent Component Analysis matches behavioural results substantially better across several visual properties. These results are interpreted to support a hypothesis that the temporal and energetic costs of information processing by pollinators served as a selective pressure on floral displays: flowers adapted to pollinators' cognitive constraints.

  5. Using DEDICOM for completely unsupervised part-of-speech tagging.

    Energy Technology Data Exchange (ETDEWEB)

    Chew, Peter A.; Bader, Brett William; Rozovskaya, Alla (University of Illinois, Urbana, IL)

    2009-02-01

    A standard and widespread approach to part-of-speech tagging is based on Hidden Markov Models (HMMs). An alternative approach, pioneered by Schuetze (1993), induces parts of speech from scratch using singular value decomposition (SVD). We introduce DEDICOM as an alternative to SVD for part-of-speech induction. DEDICOM retains the advantages of SVD in that it is completely unsupervised: no prior knowledge is required to induce either the tagset or the associations of terms with tags. However, unlike SVD, it is also fully compatible with the HMM framework, in that it can be used to estimate emission- and transition-probability matrices which can then be used as the input for an HMM. We apply the DEDICOM method to the CONLL corpus (CONLL 2000) and compare the output of DEDICOM to the part-of-speech tags given in the corpus, and find that the correlation (almost 0.5) is quite high. Using DEDICOM, we also estimate part-of-speech ambiguity for each term, and find that these estimates correlate highly with part-of-speech ambiguity as measured in the original corpus (around 0.88). Finally, we show how the output of DEDICOM can be evaluated and compared against the more familiar output of supervised HMM-based tagging.

  6. Indoor localization using unsupervised manifold alignment with geometry perturbation

    KAUST Repository

    Majeed, Khaqan; Sorour, Sameh; Al-Naffouri, Tareq Y.; Valaee, Shahrokh

    2014-01-01

    The main limitation of deploying/updating Received Signal Strength (RSS) based indoor localization is the construction of fingerprinted radio map, which is quite a hectic and time-consuming process especially when the indoor area is enormous and/or dynamic. Different approaches have been undertaken to reduce such deployment/update efforts, but the performance degrades when the fingerprinting load is reduced below a certain level. In this paper, we propose an indoor localization scheme that requires as low as 1% fingerprinting load. This scheme employs unsupervised manifold alignment that takes crowd sourced RSS readings and localization requests as source data set and the environment's plan coordinates as destination data set. The 1% fingerprinting load is only used to perturb the local geometries in the destination data set. Our proposed algorithm was shown to achieve less than 5 m mean localization error with 1% fingerprinting load and a limited number of crowd sourced readings, when other learning based localization schemes pass the 10 m mean error with the same information.

  7. Unsupervised seismic facies analysis with spatial constraints using regularized fuzzy c-means

    Science.gov (United States)

    Song, Chengyun; Liu, Zhining; Cai, Hanpeng; Wang, Yaojun; Li, Xingming; Hu, Guangmin

    2017-12-01

    Seismic facies analysis techniques combine classification algorithms and seismic attributes to generate a map that describes main reservoir heterogeneities. However, most of the current classification algorithms only view the seismic attributes as isolated data regardless of their spatial locations, and the resulting map is generally sensitive to noise. In this paper, a regularized fuzzy c-means (RegFCM) algorithm is used for unsupervised seismic facies analysis. Due to the regularized term of the RegFCM algorithm, the data whose adjacent locations belong to same classification will play a more important role in the iterative process than other data. Therefore, this method can reduce the effect of seismic data noise presented in discontinuous regions. The synthetic data with different signal/noise values are used to demonstrate the noise tolerance ability of the RegFCM algorithm. Meanwhile, the fuzzy factor, the neighbour window size and the regularized weight are tested using various values, to provide a reference of how to set these parameters. The new approach is also applied to a real seismic data set from the F3 block of the Netherlands. The results show improved spatial continuity, with clear facies boundaries and channel morphology, which reveals that the method is an effective seismic facies analysis tool.

  8. Unsupervised quantification of abdominal fat from CT images using Greedy Snakes

    Science.gov (United States)

    Agarwal, Chirag; Dallal, Ahmed H.; Arbabshirani, Mohammad R.; Patel, Aalpen; Moore, Gregory

    2017-02-01

    Adipose tissue has been associated with adverse consequences of obesity. Total adipose tissue (TAT) is divided into subcutaneous adipose tissue (SAT) and visceral adipose tissue (VAT). Intra-abdominal fat (VAT), located inside the abdominal cavity, is a major factor for the classic obesity related pathologies. Since direct measurement of visceral and subcutaneous fat is not trivial, substitute metrics like waist circumference (WC) and body mass index (BMI) are used in clinical settings to quantify obesity. Abdominal fat can be assessed effectively using CT or MRI, but manual fat segmentation is rather subjective and time-consuming. Hence, an automatic and accurate quantification tool for abdominal fat is needed. The goal of this study is to extract TAT, VAT and SAT fat from abdominal CT in a fully automated unsupervised fashion using energy minimization techniques. We applied a four step framework consisting of 1) initial body contour estimation, 2) approximation of the body contour, 3) estimation of inner abdominal contour using Greedy Snakes algorithm, and 4) voting, to segment the subcutaneous and visceral fat. We validated our algorithm on 952 clinical abdominal CT images (from 476 patients with a very wide BMI range) collected from various radiology departments of Geisinger Health System. To our knowledge, this is the first study of its kind on such a large and diverse clinical dataset. Our algorithm obtained a 3.4% error for VAT segmentation compared to manual segmentation. These personalized and accurate measurements of fat can complement traditional population health driven obesity metrics such as BMI and WC.

  9. Improved Performance of Unsupervised Method by Renovated K-Means

    OpenAIRE

    Ashok, P.; Nawaz, G. M Kadhar; Elayaraja, E.; Vadivel, V.

    2013-01-01

    Clustering is a separation of data into groups of similar objects. Every group called cluster consists of objects that are similar to one another and dissimilar to objects of other groups. In this paper, the K-Means algorithm is implemented by three distance functions and to identify the optimal distance function for clustering methods. The proposed K-Means algorithm is compared with K-Means, Static Weighted K-Means (SWK-Means) and Dynamic Weighted K-Means (DWK-Means) algorithm by using Davis...

  10. A novel model-free data analysis technique based on clustering in a mutual information space: application to resting-state fMRI

    Directory of Open Access Journals (Sweden)

    Simon Benjaminsson

    2010-08-01

    Full Text Available Non-parametric data-driven analysis techniques can be used to study datasets with few assumptions about the data and underlying experiment. Variations of Independent Component Analysis (ICA have been the methods mostly used on fMRI data, e.g. in finding resting-state networks thought to reflect the connectivity of the brain. Here we present a novel data analysis technique and demonstrate it on resting-state fMRI data. It is a generic method with few underlying assumptions about the data. The results are built from the statistical relations between all input voxels, resulting in a whole-brain analysis on a voxel level. It has good scalability properties and the parallel implementation is capable of handling large datasets and databases. From the mutual information between the activities of the voxels over time, a distance matrix is created for all voxels in the input space. Multidimensional scaling is used to put the voxels in a lower-dimensional space reflecting the dependency relations based on the distance matrix. By performing clustering in this space we can find the strong statistical regularities in the data, which for the resting-state data turns out to be the resting-state networks. The decomposition is performed in the last step of the algorithm and is computationally simple. This opens up for rapid analysis and visualization of the data on different spatial levels, as well as automatically finding a suitable number of decomposition components.

  11. Supervised and Unsupervised Classification for Pattern Recognition Purposes

    Directory of Open Access Journals (Sweden)

    Catalina COCIANU

    2006-01-01

    Full Text Available A cluster analysis task has to identify the grouping trends of data, to decide on the sound clusters as well as to validate somehow the resulted structure. The identification of the grouping tendency existing in a data collection assumes the selection of a framework stated in terms of a mathematical model allowing to express the similarity degree between couples of particular objects, quasi-metrics expressing the similarity between an object an a cluster and between clusters, respectively. In supervised classification, we are provided with a collection of preclassified patterns, and the problem is to label a newly encountered pattern. Typically, the given training patterns are used to learn the descriptions of classes which in turn are used to label a new pattern. The final section of the paper presents a new methodology for supervised learning based on PCA. The classes are represented in the measurement/feature space by a continuous repartitions

  12. On the Multi-Modal Object Tracking and Image Fusion Using Unsupervised Deep Learning Methodologies

    Science.gov (United States)

    LaHaye, N.; Ott, J.; Garay, M. J.; El-Askary, H. M.; Linstead, E.

    2017-12-01

    The number of different modalities of remote-sensors has been on the rise, resulting in large datasets with different complexity levels. Such complex datasets can provide valuable information separately, yet there is a bigger value in having a comprehensive view of them combined. As such, hidden information can be deduced through applying data mining techniques on the fused data. The curse of dimensionality of such fused data, due to the potentially vast dimension space, hinders our ability to have deep understanding of them. This is because each dataset requires a user to have instrument-specific and dataset-specific knowledge for optimum and meaningful usage. Once a user decides to use multiple datasets together, deeper understanding of translating and combining these datasets in a correct and effective manner is needed. Although there exists data centric techniques, generic automated methodologies that can potentially solve this problem completely don't exist. Here we are developing a system that aims to gain a detailed understanding of different data modalities. Such system will provide an analysis environment that gives the user useful feedback and can aid in research tasks. In our current work, we show the initial outputs our system implementation that leverages unsupervised deep learning techniques so not to burden the user with the task of labeling input data, while still allowing for a detailed machine understanding of the data. Our goal is to be able to track objects, like cloud systems or aerosols, across different image-like data-modalities. The proposed system is flexible, scalable and robust to understand complex likenesses within multi-modal data in a similar spatio-temporal range, and also to be able to co-register and fuse these images when needed.

  13. Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

    Science.gov (United States)

    Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

    2013-03-01

    Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.

  14. Cluster forcing

    DEFF Research Database (Denmark)

    Christensen, Thomas Budde

    The cluster theory attributed to Michael Porter has significantly influenced industrial policies in countries across Europe and North America since the beginning of the 1990s. Institutions such as the EU, OECD and the World Bank and governments in countries such as the UK, France, The Netherlands...... or management. Both the Accelerate Wales and the Accelerate Cluster programmes target this issue by trying to establish networks between companies that can be used to supply knowledge from research institutions to manufacturing companies. The paper concludes that public sector interventions can make...... businesses. The universities were not considered by the participating companies to be important parts of the local business environment and inputs from universities did not appear to be an important source to access knowledge about new product development or new techniques in production, distribution...

  15. Continuous Online Sequence Learning with an Unsupervised Neural Network Model.

    Science.gov (United States)

    Cui, Yuwei; Ahmad, Subutar; Hawkins, Jeff

    2016-09-14

    The ability to recognize and predict temporal sequences of sensory inputs is vital for survival in natural environments. Based on many known properties of cortical neurons, hierarchical temporal memory (HTM) sequence memory recently has been proposed as a theoretical framework for sequence learning in the cortex. In this letter, we analyze properties of HTM sequence memory and apply it to sequence learning and prediction problems with streaming data. We show the model is able to continuously learn a large number of variableorder temporal sequences using an unsupervised Hebbian-like learning rule. The sparse temporal codes formed by the model can robustly handle branching temporal sequences by maintaining multiple predictions until there is sufficient disambiguating evidence. We compare the HTM sequence memory with other sequence learning algorithms, including statistical methods: autoregressive integrated moving average; feedforward neural networks-time delay neural network and online sequential extreme learning machine; and recurrent neural networks-long short-term memory and echo-state networks on sequence prediction problems with both artificial and real-world data. The HTM model achieves comparable accuracy to other state-of-the-art algorithms. The model also exhibits properties that are critical for sequence learning, including continuous online learning, the ability to handle multiple predictions and branching sequences with high-order statistics, robustness to sensor noise and fault tolerance, and good performance without task-specific hyperparameter tuning. Therefore, the HTM sequence memory not only advances our understanding of how the brain may solve the sequence learning problem but is also applicable to real-world sequence learning problems from continuous data streams.

  16. Electrocardiogram signal quality measures for unsupervised telehealth environments

    International Nuclear Information System (INIS)

    Redmond, S J; Xie, Y; Chang, D; Lovell, N H; Basilakis, J

    2012-01-01

    The use of telehealth paradigms for the remote management of patients suffering from chronic conditions has become more commonplace with the advancement of Internet connectivity and enterprise software systems. To facilitate clinicians in managing large numbers of telehealth patients, and in digesting the vast array of data returned from the remote monitoring environment, decision support systems in various guises are often utilized. The success of decision support systems in interpreting patient conditions from physiological data is dependent largely on the quality of these recorded data. This paper outlines an algorithm to determine the quality of single-lead electrocardiogram (ECG) recordings obtained from telehealth patients. Three hundred short ECG recordings were manually annotated to identify movement artifact, QRS locations and signal quality (discrete quality levels) by a panel of three experts, who then reconciled the annotation as a group to resolve any discrepancies. After applying a published algorithm to remove gross movement artifact, the proposed method was then applied to estimate the remaining ECG signal quality, using a Parzen window supervised statistical classifier model. The three-class classifier model, using a number of time-domain features and evaluated using cross validation, gave an accuracy in classifying signal quality of 78.7% (κ = 0.67) when using fully automated preprocessing algorithms to remove gross motion artifact and detect QRS locations. This is a similar level of accuracy to the reported human inter-scorer agreement when generating the gold standard annotation (accuracy = 70–89.3%, κ = 0.54–0.84). These results indicate that the assessment of the quality of single-lead ECG recordings, acquired in unsupervised telehealth environments, is entirely feasible and may help to promote the acceptance and utility of future decision support systems for remotely managing chronic disease conditions. (paper)

  17. Spike sorting using locality preserving projection with gap statistics and landmark-based spectral clustering.

    Science.gov (United States)

    Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

    2014-12-30

    Understanding neural functions requires knowledge from analysing electrophysiological data. The process of assigning spikes of a multichannel signal into clusters, called spike sorting, is one of the important problems in such analysis. There have been various automated spike sorting techniques with both advantages and disadvantages regarding accuracy and computational costs. Therefore, developing spike sorting methods that are highly accurate and computationally inexpensive is always a challenge in the biomedical engineering practice. An automatic unsupervised spike sorting method is proposed in this paper. The method uses features extracted by the locality preserving projection (LPP) algorithm. These features afterwards serve as inputs for the landmark-based spectral clustering (LSC) method. Gap statistics (GS) is employed to evaluate the number of clusters before the LSC can be performed. The proposed LPP-LSC is highly accurate and computationally inexpensive spike sorting approach. LPP spike features are very discriminative; thereby boost the performance of clustering methods. Furthermore, the LSC method exhibits its efficiency when integrated with the cluster evaluator GS. The proposed method's accuracy is approximately 13% superior to that of the benchmark combination between wavelet transformation and superparamagnetic clustering (WT-SPC). Additionally, LPP-LSC computing time is six times less than that of the WT-SPC. LPP-LSC obviously demonstrates a win-win spike sorting solution meeting both accuracy and computational cost criteria. LPP and LSC are linear algorithms that help reduce computational burden and thus their combination can be applied into real-time spike analysis. Copyright © 2014 Elsevier B.V. All rights reserved.

  18. Random matrix improved subspace clustering

    KAUST Repository

    Couillet, Romain; Kammoun, Abla

    2017-01-01

    This article introduces a spectral method for statistical subspace clustering. The method is built upon standard kernel spectral clustering techniques, however carefully tuned by theoretical understanding arising from random matrix findings. We show

  19. Cyclist–motorist crash patterns in Denmark: A latent class clustering approach

    DEFF Research Database (Denmark)

    Kaplan, Sigal; Prato, Carlo Giacomo

    2013-01-01

    to prioritize safety issues and to devise efficient preventive measures. Method: The current study focused on cyclist–motorist crashes that occurred in Denmark during the period between 2007 and 2011. To uncover crash patterns, the current analysis applied latent class clustering, an unsupervised probabilistic...

  20. Unsupervised motion-based object segmentation refined by color

    Science.gov (United States)

    Piek, Matthijs C.; Braspenning, Ralph; Varekamp, Chris

    2003-06-01

    chance of the wrong position producing a good match. Consequently, a number of methods exist which combine motion and colour segmentation. These methods use colour segmentation as a base for the motion segmentation and estimation or perform an independent colour segmentation in parallel which is in some way combined with the motion segmentation. The presented method uses both techniques to complement each other by first segmenting on motion cues and then refining the segmentation with colour. To our knowledge few methods exist which adopt this approach. One example is te{meshrefine}. This method uses an irregular mesh, which hinders its efficient implementation in consumer electronics devices. Furthermore, the method produces a foreground/background segmentation, while our applications call for the segmentation of multiple objects. NEW METHOD As mentioned above we start with motion segmentation and refine the edges of this segmentation with a pixel resolution colour segmentation method afterwards. There are several reasons for this approach: + Motion segmentation does not produce the oversegmentation which colour segmentation methods normally produce, because objects are more likely to have colour discontinuities than motion discontinuities. In this way, the colour segmentation only has to be done at the edges of segments, confining the colour segmentation to a smaller part of the image. In such a part, it is more likely that the colour of an object is homogeneous. + This approach restricts the computationally expensive pixel resolution colour segmentation to a subset of the image. Together with the very efficient 3DRS motion estimation algorithm, this helps to reduce the computational complexity. + The motion cue alone is often enough to reliably distinguish objects from one another and the background. To obtain the motion vector fields, a variant of the 3DRS block-based motion estimator which analyses three frames of input was used. The 3DRS motion estimator is known

  1. Unsupervised Bayesian linear unmixing of gene expression microarrays.

    Science.gov (United States)

    Bazot, Cécile; Dobigeon, Nicolas; Tourneret, Jean-Yves; Zaas, Aimee K; Ginsburg, Geoffrey S; Hero, Alfred O

    2013-03-19

    This paper introduces a new constrained model and the corresponding algorithm, called unsupervised Bayesian linear unmixing (uBLU), to identify biological signatures from high dimensional assays like gene expression microarrays. The basis for uBLU is a Bayesian model for the data samples which are represented as an additive mixture of random positive gene signatures, called factors, with random positive mixing coefficients, called factor scores, that specify the relative contribution of each signature to a specific sample. The particularity of the proposed method is that uBLU constrains the factor loadings to be non-negative and the factor scores to be probability distributions over the factors. Furthermore, it also provides estimates of the number of factors. A Gibbs sampling strategy is adopted here to generate random samples according to the posterior distribution of the factors, factor scores, and number of factors. These samples are then used to estimate all the unknown parameters. Firstly, the proposed uBLU method is applied to several simulated datasets with known ground truth and compared with previous factor decomposition methods, such as principal component analysis (PCA), non negative matrix factorization (NMF), Bayesian factor regression modeling (BFRM), and the gradient-based algorithm for general matrix factorization (GB-GMF). Secondly, we illustrate the application of uBLU on a real time-evolving gene expression dataset from a recent viral challenge study in which individuals have been inoculated with influenza A/H3N2/Wisconsin. We show that the uBLU method significantly outperforms the other methods on the simulated and real data sets considered here. The results obtained on synthetic and real data illustrate the accuracy of the proposed uBLU method when compared to other factor decomposition methods from the literature (PCA, NMF, BFRM, and GB-GMF). The uBLU method identifies an inflammatory component closely associated with clinical symptom scores

  2. Unsupervised topic modelling on South African parliament audio data

    CSIR Research Space (South Africa)

    Kleynhans, N

    2014-11-01

    Full Text Available Using a speech recognition system to convert spoken audio to text can enable the structuring of large collections of spoken audio data. A convenient means to summarise or cluster spoken data is to identify the topic under discussion. There are many...

  3. Unsupervised Analysis of Array Comparative Genomic Hybridization Data from Early-Onset Colorectal Cancer Reveals Equivalence with Molecular Classification and Phenotypes

    Directory of Open Access Journals (Sweden)

    María Arriba

    2017-01-01

    Full Text Available AIM: To investigate whether chromosomal instability (CIN is associated with tumor phenotypes and/or with global genomic status based on MSI (microsatellite instability and CIMP (CpG island methylator phenotype in early-onset colorectal cancer (EOCRC. METHODS: Taking as a starting point our previous work in which tumors from 60 EOCRC cases (≤45 years at the time of diagnosis were analyzed by array comparative genomic hybridization (aCGH, in the present study we performed an unsupervised hierarchical clustering analysis of those aCGH data in order to unveil possible associations between the CIN profile and the clinical features of the tumors. In addition, we evaluated the MSI and the CIMP statuses of the samples with the aim of investigating a possible relationship between copy number alterations (CNAs and the MSI/CIMP condition in EOCRC. RESULTS: Based on the similarity of the CNAs detected, the unsupervised analysis stratified samples into two main clusters (A, B and four secondary clusters (A1, A2, B3, B4. The different subgroups showed a certain correspondence with the molecular classification of colorectal cancer (CRC, which enabled us to outline an algorithm to categorize tumors according to their CIMP status. Interestingly, each subcluster showed some distinctive clinicopathological features. But more interestingly, the CIN of each subcluster mainly affected particular chromosomes, allowing us to define chromosomal regions more specifically affected depending on the CIMP/MSI status of the samples. CONCLUSIONS: Our findings may provide a basis for a new form of classifying EOCRC according to the genomic status of the tumors.

  4. Correlates of Unsupervised Bathing of Infants: A Cross-Sectional Study

    Directory of Open Access Journals (Sweden)

    Tinneke M. J. Beirens

    2013-03-01

    Full Text Available Drowning represents the third leading cause of fatal unintentional injury in infants (0–1 years. The aim of this study is to investigate correlates of unsupervised bathing. This cross-sectional study included 1,410 parents with an infant. Parents completed a questionnaire regarding supervision during bathing, socio-demographic factors, and Protection Motivation Theory-constructs. To determine correlates of parents who leave their infant unsupervised, logistic regression analyses were performed. Of the parents, 6.2% left their child unsupervised in the bathtub. Parents with older children (OR 1.24; 95%CI 1.00–1.54 were more likely to leave their child unsupervised in the bathtub. First-time parents (OR 0.59; 95%CI 0.36–0.97 and non-Western migrant fathers (OR 0.18; 95%CI 0.05–0.63 were less likely to leave their child unsupervised in the bathtub. Furthermore, parents who perceived higher self-efficacy (OR 0.57; 95%CI 0.47–0.69, higher response efficacy (OR 0.34; 95%CI 0.24–0.48, and higher severity (OR 0.74; 95%CI 0.58–0.93 were less likely to leave their child unsupervised. Since young children are at great risk of drowning if supervision is absent, effective strategies for drowning prevention should be developed and evaluated. In the meantime, health care professionals should inform parents with regard to the importance of supervision during bathing.

  5. Cluster headache

    Science.gov (United States)

    Histamine headache; Headache - histamine; Migrainous neuralgia; Headache - cluster; Horton's headache; Vascular headache - cluster ... Doctors do not know exactly what causes cluster headaches. They ... (chemical in the body released during an allergic response) or ...

  6. Projection-based curve clustering

    International Nuclear Information System (INIS)

    Auder, Benjamin; Fischer, Aurelie

    2012-01-01

    This paper focuses on unsupervised curve classification in the context of nuclear industry. At the Commissariat a l'Energie Atomique (CEA), Cadarache (France), the thermal-hydraulic computer code CATHARE is used to study the reliability of reactor vessels. The code inputs are physical parameters and the outputs are time evolution curves of a few other physical quantities. As the CATHARE code is quite complex and CPU time-consuming, it has to be approximated by a regression model. This regression process involves a clustering step. In the present paper, the CATHARE output curves are clustered using a k-means scheme, with a projection onto a lower dimensional space. We study the properties of the empirically optimal cluster centres found by the clustering method based on projections, compared with the 'true' ones. The choice of the projection basis is discussed, and an algorithm is implemented to select the best projection basis among a library of orthonormal bases. The approach is illustrated on a simulated example and then applied to the industrial problem. (authors)

  7. An unsupervised MVA method to compare specific regions in human breast tumor tissue samples using ToF-SIMS.

    Science.gov (United States)

    Bluestein, Blake M; Morrish, Fionnuala; Graham, Daniel J; Guenthoer, Jamie; Hockenbery, David; Porter, Peggy L; Gamble, Lara J

    2016-03-21

    Imaging time-of-flight secondary ion mass spectrometry (ToF-SIMS) and principal component analysis (PCA) were used to investigate two sets of pre- and post-chemotherapy human breast tumor tissue sections to characterize lipids associated with tumor metabolic flexibility and response to treatment. The micron spatial resolution imaging capability of ToF-SIMS provides a powerful approach to attain spatially-resolved molecular and cellular data from cancerous tissues not available with conventional imaging techniques. Three ca. 1 mm(2) areas per tissue section were analyzed by stitching together 200 μm × 200 μm raster area scans. A method to isolate and analyze specific tissue regions of interest by utilizing PCA of ToF-SIMS images is presented, which allowed separation of cellularized areas from stromal areas. These PCA-generated regions of interest were then used as masks to reconstruct representative spectra from specifically stromal or cellular regions. The advantage of this unsupervised selection method is a reduction in scatter in the spectral PCA results when compared to analyzing all tissue areas or analyzing areas highlighted by a pathologist. Utilizing this method, stromal and cellular regions of breast tissue biopsies taken pre- versus post-chemotherapy demonstrate chemical separation using negatively-charged ion species. In this sample set, the cellular regions were predominantly all cancer cells. Fatty acids (i.e. palmitic, oleic, and stearic), monoacylglycerols, diacylglycerols and vitamin E profiles were distinctively different between the pre- and post-therapy tissues. These results validate a new unsupervised method to isolate and interpret biochemically distinct regions in cancer tissues using imaging ToF-SIMS data. In addition, the method developed here can provide a framework to compare a variety of tissue samples using imaging ToF-SIMS, especially where there is section-to-section variability that makes it difficult to use a serial hematoxylin

  8. Development of a Semi-Automatic Technique for Flow Estimation using Optical Flow Registration and k-means Clustering on Two Dimensional Cardiovascular Magnetic Resonance Flow Images

    DEFF Research Database (Denmark)

    Brix, Lau; Christoffersen, Christian P. V.; Kristiansen, Martin Søndergaard

    was then categorized into groups by the k-means clustering method. Finally, the cluster containing the vessel under investigation was selected manually by a single mouse click. All calculations were performed on a Nvidia 8800 GTX graphics card using the Compute Unified Device Architecture (CUDA) extension to the C...

  9. Unsupervised image segmentation for passive THz broadband images for concealed weapon detection

    Science.gov (United States)

    Ramírez, Mabel D.; Dietlein, Charles R.; Grossman, Erich; Popović, Zoya

    2007-04-01

    This work presents the application of a basic unsupervised classification algorithm for the segmentation of indoor passive Terahertz images. The 30,000 pixel broadband images of a person with concealed weapons under clothing are taken at a range of 0.8-2m over a frequency range of 0.1-1.2THz using single-pixel row-based raster scanning. The spiral-antenna coupled 36x1x0.02μm Nb bridge cryogenic micro-bolometers are developed at NIST-Optoelectronics Division. The antenna is evaporated on a 250μm thick Si substrate with a 4mm diameter hyper-hemispherical Si lens. The NETD of the microbolometer is 125mK at an integration time of 30 ms. The background temperature calibration is performed with a known 25 pixel source above 330 K, and a measured background fluctuation of 200-500mK. Several weapons were concealed under different fabrics: cotton, polyester, windblocker jacket and thermal sweater. Measured temperature contrasts ranged from 0.5-1K for wrinkles in clothing to 5K for a zipper and 8K for the concealed weapon. In order to automate feature detection in the images, some image processing and pattern recognition techniques have been applied and the results are presented here. We show that even simple algorithms, that can potentially be performed in real time, are capable of differentiating between a metal and a dielectric object concealed under clothing. Additionally, we show that pre-processing can reveal low temperature contrast features, such as folds in clothing.

  10. Clustering high dimensional data

    DEFF Research Database (Denmark)

    Assent, Ira

    2012-01-01

    High-dimensional data, i.e., data described by a large number of attributes, pose specific challenges to clustering. The so-called ‘curse of dimensionality’, coined originally to describe the general increase in complexity of various computational problems as dimensionality increases, is known...... to render traditional clustering algorithms ineffective. The curse of dimensionality, among other effects, means that with increasing number of dimensions, a loss of meaningful differentiation between similar and dissimilar objects is observed. As high-dimensional objects appear almost alike, new approaches...... for clustering are required. Consequently, recent research has focused on developing techniques and clustering algorithms specifically for high-dimensional data. Still, open research issues remain. Clustering is a data mining task devoted to the automatic grouping of data based on mutual similarity. Each cluster...

  11. Characterization of heavy-metal-contaminated sediment by using unsupervised multivariate techniques and health risk assessment.

    Science.gov (United States)

    Wang, Yeuh-Bin; Liu, Chen-Wuing; Wang, Sheng-Wei

    2015-03-01

    This study characterized the sediment quality of the severely contaminated Erjen River in Taiwan by using multivariate analysis methods-including factor analysis (FA), self-organizing maps (SOMs), and positive matrix factorization (PMF)-and health risk assessment. The SOMs classified the dataset with similar heavy-metal-contaminated sediment into five groups. FA extracted three major factors-traditional electroplating and metal-surface processing factor, nontraditional heavy-metal-industry factor, and natural geological factor-which accounted for 80.8% of the variance. The SOMs and FA revealed the heavy-metal-contaminated-sediment hotspots in the middle and upper reaches of the major tributary in the dry season. The hazardous index value for health risk via ingestion was 0.302. PMF further qualified the source apportionment, indicating that traditional electroplating and metal-surface-processing industries comprised 47% of the health risk posed by heavy-metal-contaminated sediment. Contaminants discharged from traditional electroplating and metal-surface-processing industries in the middle and upper reaches of the major tributary must be eliminated first to improve the sediment quality in Erjen River. The proposed assessment framework for heavy-metal-contaminated sediment can be applied to contaminated-sediment river sites in other regions. Copyright © 2014 Elsevier Inc. All rights reserved.

  12. Detection of Erroneous Payments Utilizing Supervised And Unsupervised Data Mining Techniques

    National Research Council Canada - National Science Library

    Yanik, Todd

    2004-01-01

    ... (C&RT)) modeling algorithms. S-Plus software was used to construct a supervised model of vendor payment data using Logistic Regression, along with the Hosmer-Lemeshow Test, for testing the predictive ability of the model...

  13. Thermodynamic free-energy minimization for unsupervised fusion of dual-color infrared breast images

    Science.gov (United States)

    Szu, Harold; Miao, Lidan; Qi, Hairong

    2006-04-01

    function [A] may vary from the point tumor to its neighborhood, we could not rely on neighborhood statistics as did in a popular unsupervised independent component analysis (ICA) mathematical statistical method, we instead impose the physics equilibrium condition of the minimum of Helmholtz free-energy, H = E - T °S. In case of the point breast cancer, we can assume the constant ground state energy E ° to be normalized by those benign neighborhood tissue, and then the excited state can be computed by means of Taylor series expansion in terms of the pixel I/O data. We can augment the X-ray mammogram technique with passive IR imaging to reduce the unwanted X-rays during the chemotherapy recovery. When the sequence is animated into a movie, and the recovery dynamics is played backward in time, the movie simulates the cameras' potential for early detection without suffering the PD=0.1 search uncertainty. In summary, we applied two satellite-grade dual-color IR imaging cameras and advanced military (automatic target recognition) ATR spectrum fusion algorithm at the middle wavelength IR (3 - 5μm) and long wavelength IR (8 - 12μm), which are capable to screen malignant tumors proved by the time-reverse fashion of the animated movie experiments. On the contrary, the traditional thermal breast scanning/imaging, known as thermograms over decades, was IR spectrum-blind, and limited to a single night-vision camera and the necessary waiting for the cool down period for taking a second look for change detection suffers too many environmental and personnel variabilities.

  14. Unsupervised classification of lidar-based vegetation structure metrics at Jean Lafitte National Historical Park and Preserve

    Science.gov (United States)

    Kranenburg, Christine J.; Palaseanu-Lovejoy, Monica; Nayegandhi, Amar; Brock, John; Woodman, Robert

    2012-01-01

    Traditional vegetation maps capture the horizontal distribution of various vegetation properties, for example, type, species and age/senescence, across a landscape. Ecologists have long known, however, that many important forest properties, for example, interior microclimate, carbon capacity, biomass and habitat suitability, are also dependent on the vertical arrangement of branches and leaves within tree canopies. The objective of this study was to use a digital elevation model (DEM) along with tree canopy-structure metrics derived from a lidar survey conducted using the Experimental Advanced Airborne Research Lidar (EAARL) to capture a three-dimensional view of vegetation communities in the Barataria Preserve unit of Jean Lafitte National Historical Park and Preserve, Louisiana. The EAARL instrument is a raster-scanning, full waveform-resolving, small-footprint, green-wavelength (532-nanometer) lidar system designed to map coastal bathymetry, topography and vegetation structure simultaneously. An unsupervised clustering procedure was then applied to the 3-dimensional-based metrics and DEM to produce a vegetation map based on the vertical structure of the park's vegetation, which includes a flotant marsh, scrub-shrub wetland, bottomland hardwood forest, and baldcypress-tupelo swamp forest. This study was completed in collaboration with the National Park Service Inventory and Monitoring Program's Gulf Coast Network. The methods presented herein are intended to be used as part of a cost-effective monitoring tool to capture change in park resources.

  15. Unsupervised sub-categorization for object detection: fInding cars from a driving vehicle

    NARCIS (Netherlands)

    Wijnhoven, R.G.J.; With, de P.H.N.

    2011-01-01

    We present a novel algorithm for unsupervised subcategorization of an object class, in the context of object detection. Dividing the detection problem into smaller subproblems simplifies the object vs. background classification. The algorithm uses an iterative split-and-merge procedure and uses both

  16. Evaluating unsupervised thesaurus-based labeling of audiovisual content in an archive production environment

    NARCIS (Netherlands)

    de Boer, V.; Ordelman, Roeland J.; Schuurman, Josefien

    2016-01-01

    In this paper we report on a two-stage evaluation of unsupervised labeling of audiovisual content using collateral text data sources to investigate how such an approach can provide acceptable results for given requirements with respect to archival quality, authority and service levels to external

  17. Best friends' interactions and substance use: The role of friend pressure and unsupervised co-deviancy.

    Science.gov (United States)

    Tsakpinoglou, Florence; Poulin, François

    2017-10-01

    Best friends exert a substantial influence on rising alcohol and marijuana use during adolescence. Two mechanisms occurring within friendship - friend pressure and unsupervised co-deviancy - may partially capture the way friends influence one another. The current study aims to: (1) examine the psychometric properties of a new instrument designed to assess pressure from a youth's best friend and unsupervised co-deviancy; (2) investigate the relative contribution of these processes to alcohol and marijuana use; and (3) determine whether gender moderates these associations. Data were collected through self-report questionnaires completed by 294 Canadian youths (62% female) across two time points (ages 15-16). Principal component analysis yielded a two-factor solution corresponding to friend pressure and unsupervised co-deviancy. Logistic regressions subsequently showed that unsupervised co-deviancy was predictive of an increase in marijuana use one year later. Neither process predicted an increase in alcohol use. Results did not differ as a function of gender. Copyright © 2017 The Foundation for Professionals in Services for Adolescents. Published by Elsevier Ltd. All rights reserved.

  18. Evaluating Unsupervised Thesaurus-based Labeling of Audiovisual Content in an Archive Production Environment

    NARCIS (Netherlands)

    de Boer, Victor; Ordelman, Roeland J.F.; Schuurman, Josefien

    In this paper we report on a two-stage evaluation of unsupervised labeling of audiovisual content using collateral text data sources to investigate how such an approach can provide acceptable results for given requirements with respect to archival quality, authority and service levels to external

  19. Practice-Oriented Evaluation of Unsupervised Labeling of Audiovisual Content in an Archive Production Environment

    NARCIS (Netherlands)

    de Boer, Victor; Ordelman, Roeland J.F.; Schuurman, Josefien

    In this paper we report on an evaluation of unsupervised labeling of audiovisual content using collateral text data sources to investigate how such an approach can provide acceptable results given requirements with respect to archival quality, authority and service levels to external users. We

  20. A comparative evaluation of supervised and unsupervised representation learning approaches for anaplastic medulloblastoma differentiation

    Science.gov (United States)

    Cruz-Roa, Angel; Arevalo, John; Basavanhally, Ajay; Madabhushi, Anant; González, Fabio

    2015-01-01

    Learning data representations directly from the data itself is an approach that has shown great success in different pattern recognition problems, outperforming state-of-the-art feature extraction schemes for different tasks in computer vision, speech recognition and natural language processing. Representation learning applies unsupervised and supervised machine learning methods to large amounts of data to find building-blocks that better represent the information in it. Digitized histopathology images represents a very good testbed for representation learning since it involves large amounts of high complex, visual data. This paper presents a comparative evaluation of different supervised and unsupervised representation learning architectures to specifically address open questions on what type of learning architectures (deep or shallow), type of learning (unsupervised or supervised) is optimal. In this paper we limit ourselves to addressing these questions in the context of distinguishing between anaplastic and non-anaplastic medulloblastomas from routine haematoxylin and eosin stained images. The unsupervised approaches evaluated were sparse autoencoders and topographic reconstruct independent component analysis, and the supervised approach was convolutional neural networks. Experimental results show that shallow architectures with more neurons are better than deeper architectures without taking into account local space invariances and that topographic constraints provide useful invariant features in scale and rotations for efficient tumor differentiation.

  1. Hanging out with Which Friends? Friendship-Level Predictors of Unstructured and Unsupervised Socializing in Adolescence

    Science.gov (United States)

    Siennick, Sonja E.; Osgood, D. Wayne

    2012-01-01

    Companions are central to explanations of the risky nature of unstructured and unsupervised socializing, yet we know little about whom adolescents are with when hanging out. We examine predictors of how often friendship dyads hang out via multilevel analyses of longitudinal friendship-level data on over 5,000 middle schoolers. Adolescents hang out…

  2. A Novel Unsupervised Segmentation Quality Evaluation Method for Remote Sensing Images.

    Science.gov (United States)

    Gao, Han; Tang, Yunwei; Jing, Linhai; Li, Hui; Ding, Haifeng

    2017-10-24

    The segmentation of a high spatial resolution remote sensing image is a critical step in geographic object-based image analysis (GEOBIA). Evaluating the performance of segmentation without ground truth data, i.e., unsupervised evaluation, is important for the comparison of segmentation algorithms and the automatic selection of optimal parameters. This unsupervised strategy currently faces several challenges in practice, such as difficulties in designing effective indicators and limitations of the spectral values in the feature representation. This study proposes a novel unsupervised evaluation method to quantitatively measure the quality of segmentation results to overcome these problems. In this method, multiple spectral and spatial features of images are first extracted simultaneously and then integrated into a feature set to improve the quality of the feature representation of ground objects. The indicators designed for spatial stratified heterogeneity and spatial autocorrelation are included to estimate the properties of the segments in this integrated feature set. These two indicators are then combined into a global assessment metric as the final quality score. The trade-offs of the combined indicators are accounted for using a strategy based on the Mahalanobis distance, which can be exhibited geometrically. The method is tested on two segmentation algorithms and three testing images. The proposed method is compared with two existing unsupervised methods and a supervised method to confirm its capabilities. Through comparison and visual analysis, the results verified the effectiveness of the proposed method and demonstrated the reliability and improvements of this method with respect to other methods.

  3. A Novel Unsupervised Segmentation Quality Evaluation Method for Remote Sensing Images

    Directory of Open Access Journals (Sweden)

    Han Gao

    2017-10-01

    Full Text Available The segmentation of a high spatial resolution remote sensing image is a critical step in geographic object-based image analysis (GEOBIA. Evaluating the performance of segmentation without ground truth data, i.e., unsupervised evaluation, is important for the comparison of segmentation algorithms and the automatic selection of optimal parameters. This unsupervised strategy currently faces several challenges in practice, such as difficulties in designing effective indicators and limitations of the spectral values in the feature representation. This study proposes a novel unsupervised evaluation method to quantitatively measure the quality of segmentation results to overcome these problems. In this method, multiple spectral and spatial features of images are first extracted simultaneously and then integrated into a feature set to improve the quality of the feature representation of ground objects. The indicators designed for spatial stratified heterogeneity and spatial autocorrelation are included to estimate the properties of the segments in this integrated feature set. These two indicators are then combined into a global assessment metric as the final quality score. The trade-offs of the combined indicators are accounted for using a strategy based on the Mahalanobis distance, which can be exhibited geometrically. The method is tested on two segmentation algorithms and three testing images. The proposed method is compared with two existing unsupervised methods and a supervised method to confirm its capabilities. Through comparison and visual analysis, the results verified the effectiveness of the proposed method and demonstrated the reliability and improvements of this method with respect to other methods.

  4. Supervised and unsupervised condition monitoring of non-stationary acoustic emission signals

    DEFF Research Database (Denmark)

    Sigurdsson, Sigurdur; Pontoppidan, Niels Henrik; Larsen, Jan

    2005-01-01

    condition changes across load changes. In this paper we approach this load interpolation problem with supervised and unsupervised learning, i.e. model with normal and fault examples and normal examples only, respectively. We apply non-linear methods for the learning of engine condition changes. Both...

  5. PosQ: Unsupervised Fingerprinting and Visualization of GPS Positioning Quality

    DEFF Research Database (Denmark)

    Kjærgaard, Mikkel Baun; Weckemann, Kay

    . This paper proposes PosQ, a system for unsupervised fingerprinting and visualization of GPS positioning quality. PosQ provides quality maps to position-based applications and visual overlays to users and managers to reveal the positioning quality in a local environment. The system reveals the quality both...

  6. A method for unsupervised change detection and automatic radiometric normalization in multispectral data

    DEFF Research Database (Denmark)

    Nielsen, Allan Aasbjerg; Canty, Morton John

    2011-01-01

    Based on canonical correlation analysis the iteratively re-weighted multivariate alteration detection (MAD) method is used to successfully perform unsupervised change detection in bi-temporal Landsat ETM+ images covering an area with villages, woods, agricultural fields and open pit mines in North...... to carry out the analyses is available from the authors' websites....

  7. An Introduction to Topic Modeling as an Unsupervised Machine Learning Way to Organize Text Information

    Science.gov (United States)

    Snyder, Robin M.

    2015-01-01

    The field of topic modeling has become increasingly important over the past few years. Topic modeling is an unsupervised machine learning way to organize text (or image or DNA, etc.) information such that related pieces of text can be identified. This paper/session will present/discuss the current state of topic modeling, why it is important, and…

  8. Model–Free Visualization of Suspicious Lesions in Breast MRI Based on Supervised and Unsupervised Learning

    NARCIS (Netherlands)

    Twellmann, T.; Meyer-Bäse, A.; Lange, O.; Foo, S.; Nattkemper, T.W.

    2008-01-01

    Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) has become an important tool in breast cancer diagnosis, but evaluation of multitemporal 3D image data holds new challenges for human observers. To aid the image analysis process, we apply supervised and unsupervised pattern recognition

  9. Weighted Clustering

    DEFF Research Database (Denmark)

    Ackerman, Margareta; Ben-David, Shai; Branzei, Simina

    2012-01-01

    We investigate a natural generalization of the classical clustering problem, considering clustering tasks in which different instances may have different weights.We conduct the first extensive theoretical analysis on the influence of weighted data on standard clustering algorithms in both...... the partitional and hierarchical settings, characterizing the conditions under which algorithms react to weights. Extending a recent framework for clustering algorithm selection, we propose intuitive properties that would allow users to choose between clustering algorithms in the weighted setting and classify...

  10. Clustering Game Behavior Data

    DEFF Research Database (Denmark)

    Bauckhage, C.; Drachen, Anders; Sifa, Rafet

    2015-01-01

    of the causes, the proliferation of behavioral data poses the problem of how to derive insights therefrom. Behavioral data sets can be large, time-dependent and high-dimensional. Clustering offers a way to explore such data and to discover patterns that can reduce the overall complexity of the data. Clustering...... and other techniques for player profiling and play style analysis have, therefore, become popular in the nascent field of game analytics. However, the proper use of clustering techniques requires expertise and an understanding of games is essential to evaluate results. With this paper, we address game data...... scientists and present a review and tutorial focusing on the application of clustering techniques to mine behavioral game data. Several algorithms are reviewed and examples of their application shown. Key topics such as feature normalization are discussed and open problems in the context of game analytics...

  11. Comparative analysis of clustering methods for gene expression time course data

    Directory of Open Access Journals (Sweden)

    Ivan G. Costa

    2004-01-01

    Full Text Available This work performs a data driven comparative study of clustering methods used in the analysis of gene expression time courses (or time series. Five clustering methods found in the literature of gene expression analysis are compared: agglomerative hierarchical clustering, CLICK, dynamical clustering, k-means and self-organizing maps. In order to evaluate the methods, a k-fold cross-validation procedure adapted to unsupervised methods is applied. The accuracy of the results is assessed by the comparison of the partitions obtained in these experiments with gene annotation, such as protein function and series classification.

  12. Learning from label proportions in brain-computer interfaces: Online unsupervised learning with guarantees

    Science.gov (United States)

    Verhoeven, Thibault; Schmid, Konstantin; Müller, Klaus-Robert; Tangermann, Michael; Kindermans, Pieter-Jan

    2017-01-01

    Objective Using traditional approaches, a brain-computer interface (BCI) requires the collection of calibration data for new subjects prior to online use. Calibration time can be reduced or eliminated e.g., by subject-to-subject transfer of a pre-trained classifier or unsupervised adaptive classification methods which learn from scratch and adapt over time. While such heuristics work well in practice, none of them can provide theoretical guarantees. Our objective is to modify an event-related potential (ERP) paradigm to work in unison with the machine learning decoder, and thus to achieve a reliable unsupervised calibrationless decoding with a guarantee to recover the true class means. Method We introduce learning from label proportions (LLP) to the BCI community as a new unsupervised, and easy-to-implement classification approach for ERP-based BCIs. The LLP estimates the mean target and non-target responses based on known proportions of these two classes in different groups of the data. We present a visual ERP speller to meet the requirements of LLP. For evaluation, we ran simulations on artificially created data sets and conducted an online BCI study with 13 subjects performing a copy-spelling task. Results Theoretical considerations show that LLP is guaranteed to minimize the loss function similar to a corresponding supervised classifier. LLP performed well in simulations and in the online application, where 84.5% of characters were spelled correctly on average without prior calibration. Significance The continuously adapting LLP classifier is the first unsupervised decoder for ERP BCIs guaranteed to find the optimal decoder. This makes it an ideal solution to avoid tedious calibration sessions. Additionally, LLP works on complementary principles compared to existing unsupervised methods, opening the door for their further enhancement when combined with LLP. PMID:28407016

  13. Theoretical developments for interpreting kernel spectral clustering from alternative viewpoints

    Directory of Open Access Journals (Sweden)

    Diego Peluffo-Ordóñez

    2017-08-01

    Full Text Available To perform an exploration process over complex structured data within unsupervised settings, the so-called kernel spectral clustering (KSC is one of the most recommended and appealing approaches, given its versatility and elegant formulation. In this work, we explore the relationship between (KSC and other well-known approaches, namely normalized cut clustering and kernel k-means. To do so, we first deduce a generic KSC model from a primal-dual formulation based on least-squares support-vector machines (LS-SVM. For experiments, KSC as well as other consider methods are assessed on image segmentation tasks to prove their usability.

  14. Kernel method for clustering based on optimal target vector

    International Nuclear Information System (INIS)

    Angelini, Leonardo; Marinazzo, Daniele; Pellicoro, Mario; Stramaglia, Sebastiano

    2006-01-01

    We introduce Ising models, suitable for dichotomic clustering, with couplings that are (i) both ferro- and anti-ferromagnetic (ii) depending on the whole data-set and not only on pairs of samples. Couplings are determined exploiting the notion of optimal target vector, here introduced, a link between kernel supervised and unsupervised learning. The effectiveness of the method is shown in the case of the well-known iris data-set and in benchmarks of gene expression levels, where it works better than existing methods for dichotomic clustering

  15. Data clustering algorithms and applications

    CERN Document Server

    Aggarwal, Charu C

    2013-01-01

    Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains.The book focuses on three primary aspects of data clustering: Methods, describing key techniques commonly used for clustering, such as fea

  16. Cluster management.

    Science.gov (United States)

    Katz, R

    1992-11-01

    Cluster management is a management model that fosters decentralization of management, develops leadership potential of staff, and creates ownership of unit-based goals. Unlike shared governance models, there is no formal structure created by committees and it is less threatening for managers. There are two parts to the cluster management model. One is the formation of cluster groups, consisting of all staff and facilitated by a cluster leader. The cluster groups function for communication and problem-solving. The second part of the cluster management model is the creation of task forces. These task forces are designed to work on short-term goals, usually in response to solving one of the unit's goals. Sometimes the task forces are used for quality improvement or system problems. Clusters are groups of not more than five or six staff members, facilitated by a cluster leader. A cluster is made up of individuals who work the same shift. For example, people with job titles who work days would be in a cluster. There would be registered nurses, licensed practical nurses, nursing assistants, and unit clerks in the cluster. The cluster leader is chosen by the manager based on certain criteria and is trained for this specialized role. The concept of cluster management, criteria for choosing leaders, training for leaders, using cluster groups to solve quality improvement issues, and the learning process necessary for manager support are described.

  17. The effects of a life goal-setting technique in a preventive care program for frail community-dwelling older people: a cluster nonrandomized controlled trial.

    Science.gov (United States)

    Yuri, Yoshimi; Takabatake, Shinichi; Nishikawa, Tomoko; Oka, Mari; Fujiwara, Taro

    2016-05-12

    Frailty among older people is associated with an increased risk of needing care. There have been many reports on preventive care programs for frail older people, but few have shown positive effects on disability prevention. Physical exercise programs for frail older people affect elements such as physical fitness and balance, but are less effective for disability outcomes and are not followed up in the longer term. We developed a life goal-setting technique (LGST). Our objective was to determine the effect of a LGST plus standard preventive care program for community-dwelling frail older people. We used a cluster nonrandomized controlled trial with seven intervention and nine matched control groups, with baseline assessment and follow-up at 3, 6, and 9 months. Participants were 176 frail older people, aged 65 years or over, living in the community in Izumi, Osaka, Japan. All participants attended regular 120 min preventive care exercise classes each week, over 3 months. They also received oral care and nutrition education. The intervention groups alone received life goal-setting support. We assessed outcomes longitudinally, comparing pre-intervention with follow-up. The primary outcome measure was health improvement according to the Japanese Ministry of Health, Labour and Welfare's "Kihon Checklist" for assessment of frailty and quality of life (QOL), analyzed with a two-way ANOVA and post-test comparison. Secondary outcomes included physical functions and assessment of life goals. The improvement on the Kihon Checklist for the intervention group was approximately 60 % from baseline to 9-months follow-up; the control group improved by approximately 40 %. The difference between groups was significant at 3-month (p = 0.043) and 6-month (p = 0.015) follow-ups but not at 9-month (p = 0.098) follow-up. Analysis of QOL yielded a significant time × group interaction effect (p = 0.022). The effect was significant at 3 months in the intervention

  18. Unsupervised Multi-Scale Change Detection from SAR Imagery for Monitoring Natural and Anthropogenic Disasters

    Science.gov (United States)

    Ajadi, Olaniyi A.

    Radar remote sensing can play a critical role in operational monitoring of natural and anthropogenic disasters. Despite its all-weather capabilities, and its high performance in mapping, and monitoring of change, the application of radar remote sensing in operational monitoring activities has been limited. This has largely been due to: (1) the historically high costs associated with obtaining radar data; (2) slow data processing, and delivery procedures; and (3) the limited temporal sampling that was provided by spaceborne radar-based satellites. Recent advances in the capabilities of spaceborne Synthetic Aperture Radar (SAR) sensors have developed an environment that now allows for SAR to make significant contributions to disaster monitoring. New SAR processing strategies that can take full advantage of these new sensor capabilities are currently being developed. Hence, with this PhD dissertation, I aim to: (i) investigate unsupervised change detection techniques that can reliably extract signatures from time series of SAR images, and provide the necessary flexibility for application to a variety of natural, and anthropogenic hazard situations; (ii) investigate effective methods to reduce the effects of speckle and other noise on change detection performance; (iii) automate change detection algorithms using probabilistic Bayesian inferencing; and (iv) ensure that the developed technology is applicable to current, and future SAR sensors to maximize temporal sampling of a hazardous event. This is achieved by developing new algorithms that rely on image amplitude information only, the sole image parameter that is available for every single SAR acquisition.. The motivation and implementation of the change detection concept are described in detail in Chapter 3. In the same chapter, I demonstrated the technique's performance using synthetic data as well as a real-data application to map wildfire progression. I applied Radiometric Terrain Correction (RTC) to the data to

  19. Advanced defect detection algorithm using clustering in ultrasonic NDE

    Science.gov (United States)

    Gongzhang, Rui; Gachagan, Anthony

    2016-02-01

    A range of materials used in industry exhibit scattering properties which limits ultrasonic NDE. Many algorithms have been proposed to enhance defect detection ability, such as the well-known Split Spectrum Processing (SSP) technique. Scattering noise usually cannot be fully removed and the remaining noise can be easily confused with real feature signals, hence becoming artefacts during the image interpretation stage. This paper presents an advanced algorithm to further reduce the influence of artefacts remaining in A-scan data after processing using a conventional defect detection algorithm. The raw A-scan data can be acquired from either traditional single transducer or phased array configurations. The proposed algorithm uses the concept of unsupervised machine learning to cluster segmental defect signals from pre-processed A-scans into different classes. The distinction and similarity between each class and the ensemble of randomly selected noise segments can be observed by applying a classification algorithm. Each class will then be labelled as `legitimate reflector' or `artefacts' based on this observation and the expected probability of defection (PoD) and probability of false alarm (PFA) determined. To facilitate data collection and validate the proposed algorithm, a 5MHz linear array transducer is used to collect A-scans from both austenitic steel and Inconel samples. Each pulse-echo A-scan is pre-processed using SSP and the subsequent application of the proposed clustering algorithm has provided an additional reduction to PFA while maintaining PoD for both samples compared with SSP results alone.

  20. Maximum Margin Clustering of Hyperspectral Data

    Science.gov (United States)

    Niazmardi, S.; Safari, A.; Homayouni, S.

    2013-09-01

    In recent decades, large margin methods such as Support Vector Machines (SVMs) are supposed to be the state-of-the-art of supervised learning methods for classification of hyperspectral data. However, the results of these algorithms mainly depend on the quality and quantity of available training data. To tackle down the problems associated with the training data, the researcher put effort into extending the capability of large margin algorithms for unsupervised learning. One of the recent proposed algorithms is Maximum Margin Clustering (MMC). The MMC is an unsupervised SVMs algorithm that simultaneously estimates both the labels and the hyperplane parameters. Nevertheless, the optimization of the MMC algorithm is a non-convex problem. Most of the existing MMC methods rely on the reformulating and the relaxing of the non-convex optimization problem as semi-definite programs (SDP), which are computationally very expensive and only can handle small data sets. Moreover, most of these algorithms are two-class classification, which cannot be used for classification of remotely sensed data. In this paper, a new MMC algorithm is used that solve the original non-convex problem using Alternative Optimization method. This algorithm is also extended for multi-class classification and its performance is evaluated. The results of the proposed algorithm show that the algorithm has acceptable results for hyperspectral data clustering.

  1. Normalization based K means Clustering Algorithm

    OpenAIRE

    Virmani, Deepali; Taneja, Shweta; Malhotra, Geetika

    2015-01-01

    K-means is an effective clustering technique used to separate similar data into groups based on initial centroids of clusters. In this paper, Normalization based K-means clustering algorithm(N-K means) is proposed. Proposed N-K means clustering algorithm applies normalization prior to clustering on the available data as well as the proposed approach calculates initial centroids based on weights. Experimental results prove the betterment of proposed N-K means clustering algorithm over existing...

  2. Isotopic clusters

    International Nuclear Information System (INIS)

    Geraedts, J.M.P.

    1983-01-01

    Spectra of isotopically mixed clusters (dimers of SF 6 ) are calculated as well as transition frequencies. The result leads to speculations about the suitability of the laser-cluster fragmentation process for isotope separation. (Auth.)

  3. Cluster Headache

    Science.gov (United States)

    ... a role. Unlike migraine and tension headache, cluster headache generally isn't associated with triggers, such as foods, hormonal changes or stress. Once a cluster period begins, however, drinking alcohol ...

  4. Unsupervised Ontology Generation from Unstructured Text. CRESST Report 827

    Science.gov (United States)

    Mousavi, Hamid; Kerr, Deirdre; Iseli, Markus R.

    2013-01-01

    Ontologies are a vital component of most knowledge acquisition systems, and recently there has been a huge demand for generating ontologies automatically since manual or supervised techniques are not scalable. In this paper, we introduce "OntoMiner", a rule-based, iterative method to extract and populate ontologies from unstructured or…

  5. Neutrosophic Hierarchical Clustering Algoritms

    Directory of Open Access Journals (Sweden)

    Rıdvan Şahin

    2014-03-01

    Full Text Available Interval neutrosophic set (INS is a generalization of interval valued intuitionistic fuzzy set (IVIFS, whose the membership and non-membership values of elements consist of fuzzy range, while single valued neutrosophic set (SVNS is regarded as extension of intuitionistic fuzzy set (IFS. In this paper, we extend the hierarchical clustering techniques proposed for IFSs and IVIFSs to SVNSs and INSs respectively. Based on the traditional hierarchical clustering procedure, the single valued neutrosophic aggregation operator, and the basic distance measures between SVNSs, we define a single valued neutrosophic hierarchical clustering algorithm for clustering SVNSs. Then we extend the algorithm to classify an interval neutrosophic data. Finally, we present some numerical examples in order to show the effectiveness and availability of the developed clustering algorithms.

  6. Clustering consumers based on trust, confidence and giving behaviour: data-driven model building for charitable involvement in the Australian not-for-profit sector.

    Science.gov (United States)

    de Vries, Natalie Jane; Reis, Rodrigo; Moscato, Pablo

    2015-01-01

    Organisations in the Not-for-Profit and charity sector face increasing competition to win time, money and efforts from a common donor base. Consequently, these organisations need to be more proactive than ever. The increased level of communications between individuals and organisations today, heightens the need for investigating the drivers of charitable giving and understanding the various consumer groups, or donor segments, within a population. It is contended that `trust' is the cornerstone of the not-for-profit sector's survival, making it an inevitable topic for research in this context. It has become imperative for charities and not-for-profit organisations to adopt for-profit's research, marketing and targeting strategies. This study provides the not-for-profit sector with an easily-interpretable segmentation method based on a novel unsupervised clustering technique (MST-kNN) followed by a feature saliency method (the CM1 score). A sample of 1,562 respondents from a survey conducted by the Australian Charities and Not-for-profits Commission is analysed to reveal donor segments. Each cluster's most salient features are identified using the CM1 score. Furthermore, symbolic regression modelling is employed to find cluster-specific models to predict `low' or `high' involvement in clusters. The MST-kNN method found seven clusters. Based on their salient features they were labelled as: the `non-institutionalist charities supporters', the `resource allocation critics', the `information-seeking financial sceptics', the `non-questioning charity supporters', the `non-trusting sceptics', the `charity management believers' and the `institutionalist charity believers'. Each cluster exhibits their own characteristics as well as different drivers of `involvement'. The method in this study provides the not-for-profit sector with a guideline for clustering, segmenting, understanding and potentially targeting their donor base better. If charities and not

  7. Clustering consumers based on trust, confidence and giving behaviour: data-driven model building for charitable involvement in the Australian not-for-profit sector.

    Directory of Open Access Journals (Sweden)

    Natalie Jane de Vries

    Full Text Available Organisations in the Not-for-Profit and charity sector face increasing competition to win time, money and efforts from a common donor base. Consequently, these organisations need to be more proactive than ever. The increased level of communications between individuals and organisations today, heightens the need for investigating the drivers of charitable giving and understanding the various consumer groups, or donor segments, within a population. It is contended that `trust' is the cornerstone of the not-for-profit sector's survival, making it an inevitable topic for research in this context. It has become imperative for charities and not-for-profit organisations to adopt for-profit's research, marketing and targeting strategies. This study provides the not-for-profit sector with an easily-interpretable segmentation method based on a novel unsupervised clustering technique (MST-kNN followed by a feature saliency method (the CM1 score. A sample of 1,562 respondents from a survey conducted by the Australian Charities and Not-for-profits Commission is analysed to reveal donor segments. Each cluster's most salient features are identified using the CM1 score. Furthermore, symbolic regression modelling is employed to find cluster-specific models to predict `low' or `high' involvement in clusters. The MST-kNN method found seven clusters. Based on their salient features they were labelled as: the `non-institutionalist charities supporters', the `resource allocation critics', the `information-seeking financial sceptics', the `non-questioning charity supporters', the `non-trusting sceptics', the `charity management believers' and the `institutionalist charity believers'. Each cluster exhibits their own characteristics as well as different drivers of `involvement'. The method in this study provides the not-for-profit sector with a guideline for clustering, segmenting, understanding and potentially targeting their donor base better. If charities and not

  8. Cluster Headache

    OpenAIRE

    Pearce, Iris

    1985-01-01

    Cluster headache is the most severe primary headache with recurrent pain attacks described as worse than giving birth. The aim of this paper was to make an overview of current knowledge on cluster headache with a focus on pathophysiology and treatment. This paper presents hypotheses of cluster headache pathophysiology, current treatment options and possible future therapy approaches. For years, the hypothalamus was regarded as the key structure in cluster headache, but is now thought to be pa...

  9. Categorias Cluster

    OpenAIRE

    Queiroz, Dayane Andrade

    2015-01-01

    Neste trabalho apresentamos as categorias cluster, que foram introduzidas por Aslak Bakke Buan, Robert Marsh, Markus Reineke, Idun Reiten e Gordana Todorov, com o objetivo de categoriíicar as algebras cluster criadas em 2002 por Sergey Fomin e Andrei Zelevinsky. Os autores acima, em [4], mostraram que existe uma estreita relação entre algebras cluster e categorias cluster para quivers cujo grafo subjacente é um diagrama de Dynkin. Para isto desenvolveram uma teoria tilting na estrutura triang...

  10. Genomic signal processing for DNA sequence clustering.

    Science.gov (United States)

    Mendizabal-Ruiz, Gerardo; Román-Godínez, Israel; Torres-Ramos, Sulema; Salido-Ruiz, Ricardo A; Vélez-Pérez, Hugo; Morales, J Alejandro

    2018-01-01

    Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.

  11. Meaningful Clusters

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Calapristi, Augustin J.; Crow, Vernon L.; Hetzler, Elizabeth G.; Turner, Alan E.

    2004-05-26

    We present an approach to the disambiguation of cluster labels that capitalizes on the notion of semantic similarity to assign WordNet senses to cluster labels. The approach provides interesting insights on how document clustering can provide the basis for developing a novel approach to word sense disambiguation.

  12. Horticultural cluster

    OpenAIRE

    SHERSTIUK S.V.; POSYLAYEVA K.I.

    2013-01-01

    In the article there are the theoretical and methodological approaches to the nature and existence of the cluster. The cluster differences from other kinds of cooperative and integration associations. Was develop by scientific-practical recommendations for forming a competitive horticultur cluster.

  13. Unsupervised learning in neural networks with short range synapses

    Science.gov (United States)

    Brunnet, L. G.; Agnes, E. J.; Mizusaki, B. E. P.; Erichsen, R., Jr.

    2013-01-01

    Different areas of the brain are involved in specific aspects of the information being processed both in learning and in memory formation. For example, the hippocampus is important in the consolidation of information from short-term memory to long-term memory, while emotional memory seems to be dealt by the amygdala. On the microscopic scale the underlying structures in these areas differ in the kind of neurons involved, in their connectivity, or in their clustering degree but, at this level, learning and memory are attributed to neuronal synapses mediated by longterm potentiation and long-term depression. In this work we explore the properties of a short range synaptic connection network, a nearest neighbor lattice composed mostly by excitatory neurons and a fraction of inhibitory ones. The mechanism of synaptic modification responsible for the emergence of memory is Spike-Timing-Dependent Plasticity (STDP), a Hebbian-like rule, where potentiation/depression is acquired when causal/non-causal spikes happen in a synapse involving two neurons. The system is intended to store and recognize memories associated to spatial external inputs presented as simple geometrical forms. The synaptic modifications are continuously applied to excitatory connections, including a homeostasis rule and STDP. In this work we explore the different scenarios under which a network with short range connections can accomplish the task of storing and recognizing simple connected patterns.

  14. Unsupervised Change Detection for Geological and Ecological Monitoring via Remote Sensing: Application on a Volcanic Area

    Science.gov (United States)

    Falco, N.; Pedersen, G. B. M.; Vilmunandardóttir, O. K.; Belart, J. M. M. C.; Sigurmundsson, F. S.; Benediktsson, J. A.

    2016-12-01

    The project "Environmental Mapping and Monitoring of Iceland by Remote Sensing (EMMIRS)" aims at providing fast and reliable mapping and monitoring techniques on a big spatial scale with a high temporal resolution of the Icelandic landscape. Such mapping and monitoring will be crucial to both mitigate and understand the scale of processes and their often complex interlinked feedback mechanisms.In the EMMIRS project, the Hekla volcano area is one of the main sites under study, where the volcanic eruptions, extreme weather and human activities had an extensive impact on the landscape degradation. The development of innovative remote sensing approaches to compute earth observation variables as automatically as possible is one of the main tasks of the EMMIRS project. Furthermore, a temporal remote sensing archive is created and composed by images acquired by different sensors (Landsat, RapidEye, ASTER and SPOT5). Moreover, historical aerial stereo photos allowed decadal reconstruction of the landscape by reconstruction of digital elevation models. Here, we propose a novel architecture for automatic unsupervised change detection analysis able to ingest multi-source data in order to detect landscape changes in the Hekla area. The change detection analysis is based on multi-scale analysis, which allows the identification of changes at different level of abstraction, from pixel-level to region-level. For this purpose, operators defined in mathematical morphology framework are implemented to model the contextual information, represented by the neighbour system of a pixel, allowing the identification of changes related to both geometrical and spectral domains. Automatic radiometric normalization strategy is also implemented as pre-processing step, aiming at minimizing the effect of different acquisition conditions. The proposed architecture is tested on multi-temporal data sets acquired over different time periods coinciding with the last three eruptions (1980-1981, 1991

  15. Globular clusters and galaxy halos

    International Nuclear Information System (INIS)

    Van Den Bergh, S.

    1984-01-01

    Using semipartial correlation coefficients and bootstrap techniques, a study is made of the important features of globular clusters with respect to the total number of galaxy clusters and dependence of specific galaxy cluster on parent galaxy type, cluster radii, luminosity functions and cluster ellipticity. It is shown that the ellipticity of LMC clusters correlates significantly with cluster luminosity functions, but not with cluster age. The cluter luminosity value above which globulars are noticeably flattened may differ by a factor of about 100 from galaxy to galaxy. Both in the Galaxy and in M31 globulars with small core radii have a Gaussian distribution over luminosity, whereas clusters with large core radii do not. In the cluster systems surrounding the Galaxy, M31 and NGC 5128 the mean radii of globular clusters was found to increase with the distance from the nucleus. Central galaxies in rich clusters have much higher values for specific globular cluster frequency than do other cluster ellipticals, suggesting that such central galaxies must already have been different from normal ellipticals at the time they were formed

  16. Automatic segmentation of dynamic neuroreceptor single-photon emission tomography images using fuzzy clustering

    International Nuclear Information System (INIS)

    Acton, P.D.; Pilowsky, L.S.; Kung, H.F.; Ell, P.J.

    1999-01-01

    The segmentation of medical images is one of the most important steps in the analysis and quantification of imaging data. However, partial volume artefacts make accurate tissue boundary definition difficult, particularly for images with lower resolution commonly used in nuclear medicine. In single-photon emission tomography (SPET) neuroreceptor studies, areas of specific binding are usually delineated by manually drawing regions of interest (ROIs), a time-consuming and subjective process. This paper applies the technique of fuzzy c-means clustering (FCM) to automatically segment dynamic neuroreceptor SPET images. Fuzzy clustering was tested using a realistic, computer-generated, dynamic SPET phantom derived from segmenting an MR image of an anthropomorphic brain phantom. Also, the utility of applying FCM to real clinical data was assessed by comparison against conventional ROI analysis of iodine-123 iodobenzamide (IBZM) binding to dopamine D 2 /D 3 receptors in the brains of humans. In addition, a further test of the methodology was assessed by applying FCM segmentation to [ 123 I]IDAM images (5-iodo-2-[[2-2-[(dimethylamino)methyl]phenyl]thio] benzyl alcohol) of serotonin transporters in non-human primates. In the simulated dynamic SPET phantom, over a wide range of counts and ratios of specific binding to background, FCM correlated very strongly with the true counts (correlation coefficient r 2 >0.99, P 123 I]IBZM data comparable with manual ROI analysis, with the binding ratios derived from both methods significantly correlated (r 2 =0.83, P<0.0001). Fuzzy clustering is a powerful tool for the automatic, unsupervised segmentation of dynamic neuroreceptor SPET images. Where other automated techniques fail completely, and manual ROI definition would be highly subjective, FCM is capable of segmenting noisy images in a robust and repeatable manner. (orig.)

  17. Software usage in unsupervised digital doorway computing environments in disadvantaged South African communities: Focusing on youthful users

    CSIR Research Space (South Africa)

    Gush, K

    2011-01-01

    Full Text Available Digital Doorways provide computing infrastructure in low-income communities in South Africa. The unsupervised DD terminals offer various software applications, from entertainment through educational resources to research material, encouraging...

  18. Cluster Matters

    DEFF Research Database (Denmark)

    Gulati, Mukesh; Lund-Thomsen, Peter; Suresh, Sangeetha

    2018-01-01

    sell their products successfully in international markets, but there is also an increasingly large consumer base within India. Indeed, Indian industrial clusters have contributed to a substantial part of this growth process, and there are several hundred registered clusters within the country...... of this handbook, which focuses on the role of CSR in MSMEs. Hence we contribute to the literature on CSR in industrial clusters and specifically CSR in Indian industrial clusters by investigating the drivers of CSR in India’s industrial clusters....

  19. Towards Statistical Unsupervised Online Learning for Music Listening with Hearing Devices

    DEFF Research Database (Denmark)

    Purwins, Hendrik; Marchini, Marco; Marxer, Richard

    of sounds into phonetic/instrument categories and learning of instrument event sequences is performed jointly using a Hierarchical Dirichlet Process Hidden Markov Model. Whereas machines often learn by processing a large data base and subsequently updating parameters of the algorithm, humans learn...... and their respective transition counts. We propose to use online learning for the co-evolution of both CI user and machine in (re-)learning musical language. [1] Marco Marchini and Hendrik Purwins. Unsupervised analysis and generation of audio percussion sequences. In International Symposium on Computer Music Modeling...... categories) as well as the temporal context horizon (e.g. storing up to 2-note sequences or up to 10-note sequences) is adaptable. The framework in [1] is based on two cognitively plausible principles: unsupervised learning and statistical learning. Opposed to supervised learning in primary school children...

  20. Large-Scale Unsupervised Hashing with Shared Structure Learning.

    Science.gov (United States)

    Liu, Xianglong; Mu, Yadong; Zhang, Danchen; Lang, Bo; Li, Xuelong

    2015-09-01

    Hashing methods are effective in generating compact binary signatures for images and videos. This paper addresses an important open issue in the literature, i.e., how to learn compact hash codes by enhancing the complementarity among different hash functions. Most of prior studies solve this problem either by adopting time-consuming sequential learning algorithms or by generating the hash functions which are subject to some deliberately-designed constraints (e.g., enforcing hash functions orthogonal to one another). We analyze the drawbacks of past works and propose a new solution to this problem. Our idea is to decompose the feature space into a subspace shared by all hash functions and its complementary subspace. On one hand, the shared subspace, corresponding to the common structure across different hash functions, conveys most relevant information for the hashing task. Similar to data de-noising, irrelevant information is explicitly suppressed during hash function generation. On the other hand, in case that the complementary subspace also contains useful information for specific hash functions, the final form of our proposed hashing scheme is a compromise between these two kinds of subspaces. To make hash functions not only preserve the local neighborhood structure but also capture the global cluster distribution of the whole data, an objective function incorporating spectral embedding loss, binary quantization loss, and shared subspace contribution is introduced to guide the hash function learning. We propose an efficient alternating optimization method to simultaneously learn both the shared structure and the hash functions. Experimental results on three well-known benchmarks CIFAR-10, NUS-WIDE, and a-TRECVID demonstrate that our approach significantly outperforms state-of-the-art hashing methods.

  1. Electronic and chemical properties of indium clusters

    International Nuclear Information System (INIS)

    Rayane, D.; Khardi, S.; Tribollet, B.; Broyer, M.; Melinon, P.; Cabaud, B.; Hoareau, A.

    1989-01-01

    Indium clusters are produced by the inert gas condensation technique. The ionization potentials are found higher for small clusters than for the Indium atom. This is explained by the p character of the bonding as in aluminium. Doubly charge clusters are also observed and fragmentation processes discussed. Finally small Indium clusters 3< n<9 are found very reactive with hydrocarbon. (orig.)

  2. Unsupervised Learning of Word-Sequence Representations from Scratch via Convolutional Tensor Decomposition

    OpenAIRE

    Huang, Furong; Anandkumar, Animashree

    2016-01-01

    Unsupervised text embeddings extraction is crucial for text understanding in machine learning. Word2Vec and its variants have received substantial success in mapping words with similar syntactic or semantic meaning to vectors close to each other. However, extracting context-aware word-sequence embedding remains a challenging task. Training over large corpus is difficult as labels are difficult to get. More importantly, it is challenging for pre-trained models to obtain word-...

  3. Modeling Language and Cognition with Deep Unsupervised Learning:A Tutorial Overview

    OpenAIRE

    Marco eZorzi; Marco eZorzi; Alberto eTestolin; Ivilin Peev Stoianov; Ivilin Peev Stoianov

    2013-01-01

    Deep unsupervised learning in stochastic recurrent neural networks with many layers of hidden units is a recent breakthrough in neural computation research. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. In this article we discuss the theoretical foundations of this approach and we review key issues related to training, testing and analysis of deep networks for modeling language and cog...

  4. Modeling language and cognition with deep unsupervised learning: a tutorial overview

    OpenAIRE

    Zorzi, Marco; Testolin, Alberto; Stoianov, Ivilin P.

    2013-01-01

    Deep unsupervised learning in stochastic recurrent neural networks with many layers of hidden units is a recent breakthrough in neural computation research. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. In this article we discuss the theoretical foundations of this approach and we review key issues related to training, testing and analysis of deep networks for modeling language and cog...

  5. Structure and bonding in clusters

    International Nuclear Information System (INIS)

    Kumar, V.

    1991-10-01

    We review here the recent progress made in the understanding of the electronic and atomic structure of small clusters of s-p bonded materials using the density functional molecular dynamics technique within the local density approximation. Starting with a brief description of the method, results are presented for alkali metal clusters, clusters of divalent metals such as Mg and Be which show a transition from van der Waals or weak chemical bonding to metallic behaviour as the cluster size grows and clusters of Al, Sn and Sb. In the case of semiconductors, we discuss results for Si, Ge and GaAs clusters. Clusters of other materials such as P, C, S, and Se are also briefly discussed. From these and other available results we suggest the possibility of unique structures for the magic clusters. (author). 69 refs, 7 figs, 1 tab

  6. Validation of a free software for unsupervised assessment of abdominal fat in MRI.

    Science.gov (United States)

    Maddalo, Michele; Zorza, Ivan; Zubani, Stefano; Nocivelli, Giorgio; Calandra, Giulio; Soldini, Pierantonio; Mascaro, Lorella; Maroldi, Roberto

    2017-05-01

    To demonstrate the accuracy of an unsupervised (fully automated) software for fat segmentation in magnetic resonance imaging. The proposed software is a freeware solution developed in ImageJ that enables the quantification of metabolically different adipose tissues in large cohort studies. The lumbar part of the abdomen (19cm in craniocaudal direction, centered in L3) of eleven healthy volunteers (age range: 21-46years, BMI range: 21.7-31.6kg/m 2 ) was examined in a breath hold on expiration with a GE T1 Dixon sequence. Single-slice and volumetric data were considered for each subject. The results of the visceral and subcutaneous adipose tissue assessments obtained by the unsupervised software were compared to supervised segmentations of reference. The associated statistical analysis included Pearson correlations, Bland-Altman plots and volumetric differences (VD % ). Values calculated by the unsupervised software significantly correlated with corresponding supervised segmentations of reference for both subcutaneous adipose tissue - SAT (R=0.9996, psoftware is capable of segmenting the metabolically different adipose tissues with a high degree of accuracy. This free add-on software for ImageJ can easily have a widespread and enable large-scale population studies regarding the adipose tissue and its related diseases. Copyright © 2017 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

  7. Unsupervised categorization with individuals diagnosed as having moderate traumatic brain injury: Over-selective responding.

    Science.gov (United States)

    Edwards, Darren J; Wood, Rodger

    2016-01-01

    This study explored over-selectivity (executive dysfunction) using a standard unsupervised categorization task. Over-selectivity has been demonstrated using supervised categorization procedures (where training is given); however, little has been done in the way of unsupervised categorization (without training). A standard unsupervised categorization task was used to assess levels of over-selectivity in a traumatic brain injury (TBI) population. Individuals with TBI were selected from the Tertiary Traumatic Brain Injury Clinic at Swansea University and were asked to categorize two-dimensional items (pictures on cards), into groups that they felt were most intuitive, and without any learning (feedback from experimenter). This was compared against categories made by a control group for the same task. The findings of this study demonstrate that individuals with TBI had deficits for both easy and difficult categorization sets, as indicated by a larger amount of one-dimensional sorting compared to control participants. Deficits were significantly greater for the easy condition. The implications of these findings are discussed in the context of over-selectivity, and the processes that underlie this deficit. Also, the implications for using this procedure as a screening measure for over-selectivity in TBI are discussed.

  8. A Novel Unsupervised Adaptive Learning Method for Long-Term Electromyography (EMG) Pattern Recognition

    Science.gov (United States)

    Huang, Qi; Yang, Dapeng; Jiang, Li; Zhang, Huajie; Liu, Hong; Kotani, Kiyoshi

    2017-01-01

    Performance degradation will be caused by a variety of interfering factors for pattern recognition-based myoelectric control methods in the long term. This paper proposes an adaptive learning method with low computational cost to mitigate the effect in unsupervised adaptive learning scenarios. We presents a particle adaptive classifier (PAC), by constructing a particle adaptive learning strategy and universal incremental least square support vector classifier (LS-SVC). We compared PAC performance with incremental support vector classifier (ISVC) and non-adapting SVC (NSVC) in a long-term pattern recognition task in both unsupervised and supervised adaptive learning scenarios. Retraining time cost and recognition accuracy were compared by validating the classification performance on both simulated and realistic long-term EMG data. The classification results of realistic long-term EMG data showed that the PAC significantly decreased the performance degradation in unsupervised adaptive learning scenarios compared with NSVC (9.03% ± 2.23%, p < 0.05) and ISVC (13.38% ± 2.62%, p = 0.001), and reduced the retraining time cost compared with ISVC (2 ms per updating cycle vs. 50 ms per updating cycle). PMID:28608824

  9. A Novel Unsupervised Adaptive Learning Method for Long-Term Electromyography (EMG Pattern Recognition

    Directory of Open Access Journals (Sweden)

    Qi Huang

    2017-06-01

    Full Text Available Performance degradation will be caused by a variety of interfering factors for pattern recognition-based myoelectric control methods in the long term. This paper proposes an adaptive learning method with low computational cost to mitigate the effect in unsupervised adaptive learning scenarios. We presents a particle adaptive classifier (PAC, by constructing a particle adaptive learning strategy and universal incremental least square support vector classifier (LS-SVC. We compared PAC performance with incremental support vector classifier (ISVC and non-adapting SVC (NSVC in a long-term pattern recognition task in both unsupervised and supervised adaptive learning scenarios. Retraining time cost and recognition accuracy were compared by validating the classification performance on both simulated and realistic long-term EMG data. The classification results of realistic long-term EMG data showed that the PAC significantly decreased the performance degradation in unsupervised adaptive learning scenarios compared with NSVC (9.03% ± 2.23%, p < 0.05 and ISVC (13.38% ± 2.62%, p = 0.001, and reduced the retraining time cost compared with ISVC (2 ms per updating cycle vs. 50 ms per updating cycle.

  10. A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data

    Science.gov (United States)

    Goldstein, Markus; Uchida, Seiichi

    2016-01-01

    Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks. PMID:27093601

  11. Identifying influential individuals on intensive care units: using cluster analysis to explore culture.

    Science.gov (United States)

    Fong, Allan; Clark, Lindsey; Cheng, Tianyi; Franklin, Ella; Fernandez, Nicole; Ratwani, Raj; Parker, Sarah Henrickson

    2017-07-01

    The objective of this paper is to identify attribute patterns of influential individuals in intensive care units using unsupervised cluster analysis. Despite the acknowledgement that culture of an organisation is critical to improving patient safety, specific methods to shift culture have not been explicitly identified. A social network analysis survey was conducted and an unsupervised cluster analysis was used. A total of 100 surveys were gathered. Unsupervised cluster analysis was used to group individuals with similar dimensions highlighting three general genres of influencers: well-rounded, knowledge and relational. Culture is created locally by individual influencers. Cluster analysis is an effective way to identify common characteristics among members of an intensive care unit team that are noted as highly influential by their peers. To change culture, identifying and then integrating the influencers in intervention development and dissemination may create more sustainable and effective culture change. Additional studies are ongoing to test the effectiveness of utilising these influencers to disseminate patient safety interventions. This study offers an approach that can be helpful in both identifying and understanding influential team members and may be an important aspect of developing methods to change organisational culture. © 2017 John Wiley & Sons Ltd.

  12. Integrative cluster analysis in bioinformatics

    CERN Document Server

    Abu-Jamous, Basel; Nandi, Asoke K

    2015-01-01

    Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review o

  13. Cluster analysis of track structure

    International Nuclear Information System (INIS)

    Michalik, V.

    1991-01-01

    One of the possibilities of classifying track structures is application of conventional partition techniques of analysis of multidimensional data to the track structure. Using these cluster algorithms this paper attempts to find characteristics of radiation reflecting the spatial distribution of ionizations in the primary particle track. An absolute frequency distribution of clusters of ionizations giving the mean number of clusters produced by radiation per unit of deposited energy can serve as this characteristic. General computation techniques used as well as methods of calculations of distributions of clusters for different radiations are discussed. 8 refs.; 5 figs

  14. Unsupervised Data Mining in nanoscale X-ray Spectro-Microscopic Study of NdFeB Magnet.

    Science.gov (United States)

    Duan, Xiaoyue; Yang, Feifei; Antono, Erin; Yang, Wenge; Pianetta, Piero; Ermon, Stefano; Mehta, Apurva; Liu, Yijin

    2016-09-29

    Novel developments in X-ray based spectro-microscopic characterization techniques have increased the rate of acquisition of spatially resolved spectroscopic data by several orders of magnitude over what was possible a few years ago. This accelerated data acquisition, with high spatial resolution at nanoscale and sensitivity to subtle differences in chemistry and atomic structure, provides a unique opportunity to investigate hierarchically complex and structurally heterogeneous systems found in functional devices and materials systems. However, handling and analyzing the large volume data generated poses significant challenges. Here we apply an unsupervised data-mining algorithm known as DBSCAN to study a rare-earth element based permanent magnet material, Nd 2 Fe 14 B. We are able to reduce a large spectro-microscopic dataset of over 300,000 spectra to 3, preserving much of the underlying information. Scientists can easily and quickly analyze in detail three characteristic spectra. Our approach can rapidly provide a concise representation of a large and complex dataset to materials scientists and chemists. For example, it shows that the surface of common Nd 2 Fe 14 B magnet is chemically and structurally very different from the bulk, suggesting a possible surface alteration effect possibly due to the corrosion, which could affect the material's overall properties.

  15. Unsupervised Data Mining in nanoscale X-ray Spectro-Microscopic Study of NdFeB Magnet

    Science.gov (United States)

    Duan, Xiaoyue; Yang, Feifei; Antono, Erin; Yang, Wenge; Pianetta, Piero; Ermon, Stefano; Mehta, Apurva; Liu, Yijin

    2016-09-01

    Novel developments in X-ray based spectro-microscopic characterization techniques have increased the rate of acquisition of spatially resolved spectroscopic data by several orders of magnitude over what was possible a few years ago. This accelerated data acquisition, with high spatial resolution at nanoscale and sensitivity to subtle differences in chemistry and atomic structure, provides a unique opportunity to investigate hierarchically complex and structurally heterogeneous systems found in functional devices and materials systems. However, handling and analyzing the large volume data generated poses significant challenges. Here we apply an unsupervised data-mining algorithm known as DBSCAN to study a rare-earth element based permanent magnet material, Nd2Fe14B. We are able to reduce a large spectro-microscopic dataset of over 300,000 spectra to 3, preserving much of the underlying information. Scientists can easily and quickly analyze in detail three characteristic spectra. Our approach can rapidly provide a concise representation of a large and complex dataset to materials scientists and chemists. For example, it shows that the surface of common Nd2Fe14B magnet is chemically and structurally very different from the bulk, suggesting a possible surface alteration effect possibly due to the corrosion, which could affect the material’s overall properties.

  16. Automated lesion detection on MRI scans using combined unsupervised and supervised methods

    International Nuclear Information System (INIS)

    Guo, Dazhou; Fridriksson, Julius; Fillmore, Paul; Rorden, Christopher; Yu, Hongkai; Zheng, Kang; Wang, Song

    2015-01-01

    Accurate and precise detection of brain lesions on MR images (MRI) is paramount for accurately relating lesion location to impaired behavior. In this paper, we present a novel method to automatically detect brain lesions from a T1-weighted 3D MRI. The proposed method combines the advantages of both unsupervised and supervised methods. First, unsupervised methods perform a unified segmentation normalization to warp images from the native space into a standard space and to generate probability maps for different tissue types, e.g., gray matter, white matter and fluid. This allows us to construct an initial lesion probability map by comparing the normalized MRI to healthy control subjects. Then, we perform non-rigid and reversible atlas-based registration to refine the probability maps of gray matter, white matter, external CSF, ventricle, and lesions. These probability maps are combined with the normalized MRI to construct three types of features, with which we use supervised methods to train three support vector machine (SVM) classifiers for a combined classifier. Finally, the combined classifier is used to accomplish lesion detection. We tested this method using T1-weighted MRIs from 60 in-house stroke patients. Using leave-one-out cross validation, the proposed method can achieve an average Dice coefficient of 73.1 % when compared to lesion maps hand-delineated by trained neurologists. Furthermore, we tested the proposed method on the T1-weighted MRIs in the MICCAI BRATS 2012 dataset. The proposed method can achieve an average Dice coefficient of 66.5 % in comparison to the expert annotated tumor maps provided in MICCAI BRATS 2012 dataset. In addition, on these two test datasets, the proposed method shows competitive performance to three state-of-the-art methods, including Stamatakis et al., Seghier et al., and Sanjuan et al. In this paper, we introduced a novel automated procedure for lesion detection from T1-weighted MRIs by combining both an unsupervised and a

  17. Land cover classification using reformed fuzzy C-means

    Indian Academy of Sciences (India)

    This paper uses segmentation based on unsupervised clustering techniques for classification of land cover. ∗ ... and unsupervised classification can be solved by FCM. ..... They also act as input to the development and monitoring of a range of ...

  18. Cluster evolution

    International Nuclear Information System (INIS)

    Schaeffer, R.

    1987-01-01

    The galaxy and cluster luminosity functions are constructed from a model of the mass distribution based on hierarchical clustering at an epoch where the matter distribution is non-linear. These luminosity functions are seen to reproduce the present distribution of objects as can be inferred from the observations. They can be used to deduce the redshift dependence of the cluster distribution and to extrapolate the observations towards the past. The predicted evolution of the cluster distribution is quite strong, although somewhat less rapid than predicted by the linear theory

  19. Enhanced thermal lens effect in gold nanoparticle-doped Lyotropic liquid crystal by nanoparticle clustering probed by Z-scan technique

    International Nuclear Information System (INIS)

    Gomez, S.L.; Lenart, V.M.

    2015-01-01

    This work presents an experimental study of the thermal lens effect in Au nanoparticles-doped lyotropic liquid crystals under cw 532 nm optical excitation. Spherical Au nanoparticles of about 12 nm were prepared by Turkevich’s method, and the lyotropic liquid crystal was a ternary mixture of SDS, 1-DeOH, and water that exhibits an isotropic phase at room temperature. The lyotropic matrix induces aggregation of the nanoparticles, leading to a broad and a red-shifted surface plasmon resonance. The thermal nonlinear optical refraction coefficient n 2 increases as a power of number density of nanoparticles, being possible to address this behavior to nanoparticle clustering. (author)

  20. Enhanced thermal lens effect in gold nanoparticle-doped Lyotropic liquid crystal by nanoparticle clustering probed by Z-scan technique

    Energy Technology Data Exchange (ETDEWEB)

    Gomez, S.L.; Lenart, V.M., E-mail: sgomez@uepg.br [Universidade Estadual de Ponta Grossa (UEPG), PR (Brazil). Dept. de Fisica; Turchiello, R.T. [Universidade Federal Tecnologica do Parana (UFTPR), Ponta Grossa, PR (Brazil). Dept. de Fisica; Goya, G.F. [Department of Condensed Matter Physics, Aragon Institute of Nanoscience, Zaragoza (Spain)

    2015-10-01

    This work presents an experimental study of the thermal lens effect in Au nanoparticles-doped lyotropic liquid crystals under cw 532 nm optical excitation. Spherical Au nanoparticles of about 12 nm were prepared by Turkevich’s method, and the lyotropic liquid crystal was a ternary mixture of SDS, 1-DeOH, and water that exhibits an isotropic phase at room temperature. The lyotropic matrix induces aggregation of the nanoparticles, leading to a broad and a red-shifted surface plasmon resonance. The thermal nonlinear optical refraction coefficient n{sub 2} increases as a power of number density of nanoparticles, being possible to address this behavior to nanoparticle clustering. (author)

  1. Toward unsupervised outbreak detection through visual perception of new patterns

    Directory of Open Access Journals (Sweden)

    Lévy Pierre P

    2009-06-01

    Full Text Available Abstract Background Statistical algorithms are routinely used to detect outbreaks of well-defined syndromes, such as influenza-like illness. These methods cannot be applied to the detection of emerging diseases for which no preexisting information is available. This paper presents a method aimed at facilitating the detection of outbreaks, when there is no a priori knowledge of the clinical presentation of cases. Methods The method uses a visual representation of the symptoms and diseases coded during a patient consultation according to the International Classification of Primary Care 2nd version (ICPC-2. The surveillance data are transformed into color-coded cells, ranging from white to red, reflecting the increasing frequency of observed signs. They are placed in a graphic reference frame mimicking body anatomy. Simple visual observation of color-change patterns over time, concerning a single code or a combination of codes, enables detection in the setting of interest. Results The method is demonstrated through retrospective analyses of two data sets: description of the patients referred to the hospital by their general practitioners (GPs participating in the French Sentinel Network and description of patients directly consulting at a hospital emergency department (HED. Informative image color-change alert patterns emerged in both cases: the health consequences of the August 2003 heat wave were visualized with GPs' data (but passed unnoticed with conventional surveillance systems, and the flu epidemics, which are routinely detected by standard statistical techniques, were recognized visually with HED data. Conclusion Using human visual pattern-recognition capacities to detect the onset of unexpected health events implies a convenient image representation of epidemiological surveillance and well-trained "epidemiology watchers". Once these two conditions are met, one could imagine that the epidemiology watchers could signal epidemiological alerts

  2. Chemical modeling of groundwater in the Banat Plain, southwestern Romania, with elevated As content and co-occurring species by combining diagrams and unsupervised multivariate statistical approaches.

    Science.gov (United States)

    Butaciu, Sinziana; Senila, Marin; Sarbu, Costel; Ponta, Michaela; Tanaselia, Claudiu; Cadar, Oana; Roman, Marius; Radu, Emil; Sima, Mihaela; Frentiu, Tiberiu

    2017-04-01

    The study proposes a combined model based on diagrams (Gibbs, Piper, Stuyfzand Hydrogeochemical Classification System) and unsupervised statistical approaches (Cluster Analysis, Principal Component Analysis, Fuzzy Principal Component Analysis, Fuzzy Hierarchical Cross-Clustering) to describe natural enrichment of inorganic arsenic and co-occurring species in groundwater in the Banat Plain, southwestern Romania. Speciation of inorganic As (arsenite, arsenate), ion concentrations (Na + , K + , Ca 2+ , Mg 2+ , HCO 3 - , Cl - , F - , SO 4 2- , PO 4 3- , NO 3 - ), pH, redox potential, conductivity and total dissolved substances were performed. Classical diagrams provided the hydrochemical characterization, while statistical approaches were helpful to establish (i) the mechanism of naturally occurring of As and F - species and the anthropogenic one for NO 3 - , SO 4 2- , PO 4 3- and K + and (ii) classification of groundwater based on content of arsenic species. The HCO 3 - type of local groundwater and alkaline pH (8.31-8.49) were found to be responsible for the enrichment of arsenic species and occurrence of F - but by different paths. The PO 4 3- -AsO 4 3- ion exchange, water-rock interaction (silicates hydrolysis and desorption from clay) were associated to arsenate enrichment in the oxidizing aquifer. Fuzzy Hierarchical Cross-Clustering was the strongest tool for the rapid simultaneous classification of groundwaters as a function of arsenic content and hydrogeochemical characteristics. The approach indicated the Na + -F - -pH cluster as marker for groundwater with naturally elevated As and highlighted which parameters need to be monitored. A chemical conceptual model illustrating the natural and anthropogenic paths and enrichment of As and co-occurring species in the local groundwater supported by mineralogical analysis of rocks was established. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. Random matrix improved subspace clustering

    KAUST Repository

    Couillet, Romain

    2017-03-06

    This article introduces a spectral method for statistical subspace clustering. The method is built upon standard kernel spectral clustering techniques, however carefully tuned by theoretical understanding arising from random matrix findings. We show in particular that our method provides high clustering performance while standard kernel choices provably fail. An application to user grouping based on vector channel observations in the context of massive MIMO wireless communication networks is provided.

  4. Quantum annealing for combinatorial clustering

    Science.gov (United States)

    Kumar, Vaibhaw; Bass, Gideon; Tomlin, Casey; Dulny, Joseph

    2018-02-01

    Clustering is a powerful machine learning technique that groups "similar" data points based on their characteristics. Many clustering algorithms work by approximating the minimization of an objective function, namely the sum of within-the-cluster distances between points. The straightforward approach involves examining all the possible assignments of points to each of the clusters. This approach guarantees the solution will be a global minimum; however, the number of possible assignments scales quickly with the number of data points and becomes computationally intractable even for very small datasets. In order to circumvent this issue, cost function minima are found using popular local search-based heuristic approaches such as k-means and hierarchical clustering. Due to their greedy nature, such techniques do not guarantee that a global minimum will be found and can lead to sub-optimal clustering assignments. Other classes of global search-based techniques, such as simulated annealing, tabu search, and genetic algorithms, may offer better quality results but can be too time-consuming to implement. In this work, we describe how quantum annealing can be used to carry out clustering. We map the clustering objective to a quadratic binary optimization problem and discuss two clustering algorithms which are then implemented on commercially available quantum annealing hardware, as well as on a purely classical solver "qbsolv." The first algorithm assigns N data points to K clusters, and the second one can be used to perform binary clustering in a hierarchical manner. We present our results in the form of benchmarks against well-known k-means clustering and discuss the advantages and disadvantages of the proposed techniques.

  5. Comparing Generative Adversarial Network Techniques for Image Creation and Modification

    NARCIS (Netherlands)

    Pieters, Mathijs; Wiering, Marco

    2018-01-01

    Generative adversarial networks (GANs) have demonstrated to be successful at generating realistic real-world images. In this paper we compare various GAN techniques, both supervised and unsupervised. The effects on training stability of different objective functions are compared. We add an encoder

  6. Clustering for Different Scales of Measurement - the Gap-Ratio Weighted K-means Algorithm

    OpenAIRE

    Guérin, Joris; Gibaru, Olivier; Thiery, Stéphane; Nyiri, Eric

    2017-01-01

    This paper describes a method for clustering data that are spread out over large regions and which dimensions are on different scales of measurement. Such an algorithm was developed to implement a robotics application consisting in sorting and storing objects in an unsupervised way. The toy dataset used to validate such application consists of Lego bricks of different shapes and colors. The uncontrolled lighting conditions together with the use of RGB color features, respectively involve data...

  7. Cluster Validity Classification Approaches Based on Geometric Probability and Application in the Classification of Remotely Sensed Images

    Directory of Open Access Journals (Sweden)

    LI Jian-Wei

    2014-08-01

    Full Text Available On the basis of the cluster validity function based on geometric probability in literature [1, 2], propose a cluster analysis method based on geometric probability to process large amount of data in rectangular area. The basic idea is top-down stepwise refinement, firstly categories then subcategories. On all clustering levels, use the cluster validity function based on geometric probability firstly, determine clusters and the gathering direction, then determine the center of clustering and the border of clusters. Through TM remote sensing image classification examples, compare with the supervision and unsupervised classification in ERDAS and the cluster analysis method based on geometric probability in two-dimensional square which is proposed in literature 2. Results show that the proposed method can significantly improve the classification accuracy.

  8. Knowledge-Based Topic Model for Unsupervised Object Discovery and Localization.

    Science.gov (United States)

    Niu, Zhenxing; Hua, Gang; Wang, Le; Gao, Xinbo

    Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object instances from a given image collection without any supervision. Previous work has attempted to tackle this problem with vanilla topic models, such as latent Dirichlet allocation (LDA). However, in those methods no prior knowledge for the given image collection is exploited to facilitate object discovery. On the other hand, the topic models used in those methods suffer from the topic coherence issue-some inferred topics do not have clear meaning, which limits the final performance of object discovery. In this paper, prior knowledge in terms of the so-called must-links are exploited from Web images on the Internet. Furthermore, a novel knowledge-based topic model, called LDA with mixture of Dirichlet trees, is proposed to incorporate the must-links into topic modeling for object discovery. In particular, to better deal with the polysemy phenomenon of visual words, the must-link is re-defined as that one must-link only constrains one or some topic(s) instead of all topics, which leads to significantly improved topic coherence. Moreover, the must-links are built and grouped with respect to specific object classes, thus the must-links in our approach are semantic-specific , which allows to more efficiently exploit discriminative prior knowledge from Web images. Extensive experiments validated the efficiency of our proposed approach on several data sets. It is shown that our method significantly improves topic coherence and outperforms the unsupervised methods for object discovery and localization. In addition, compared with discriminative methods, the naturally existing object classes in the given image collection can be subtly discovered, which makes our approach well suited for realistic applications of unsupervised object discovery.Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object

  9. Rain gauge network design for flood forecasting using multi-criteria decision analysis and clustering techniques in lower Mahanadi river basin, India

    Directory of Open Access Journals (Sweden)

    Anil Kumar Kar

    2015-09-01

    New hydrological insights for the region: This study establishes different possible key RG networks using Hall’s method, analytical hierarchical process (AHP, self organization map (SOM and hierarchical clustering (HC using the characteristics of each rain gauge occupied Thiessen polygon area. Efficiency of the key networks is tested by artificial neural network (ANN, Fuzzy and NAM rainfall-runoff models. Furthermore, flood forecasting has been carried out using the three most effective RG networks which uses only 7 RGs instead of 14 gauges established in the Kantamal sub-catchment, Mahanadi basin. The Fuzzy logic applied on the key RG network derived using AHP has shown the best result for flood forecasting with efficiency of 82.74% for 1-day lead period. This study demonstrates the design procedure of key RG network for effective flood forecasting particularly when there is difficulty in gathering the information from all RGs.

  10. Automated segmentation of white matter fiber bundles using diffusion tensor imaging data and a new density based clustering algorithm.

    Science.gov (United States)

    Kamali, Tahereh; Stashuk, Daniel

    2016-10-01

    Robust and accurate segmentation of brain white matter (WM) fiber bundles assists in diagnosing and assessing progression or remission of neuropsychiatric diseases such as schizophrenia, autism and depression. Supervised segmentation methods are infeasible in most applications since generating gold standards is too costly. Hence, there is a growing interest in designing unsupervised methods. However, most conventional unsupervised methods require the number of clusters be known in advance which is not possible in most applications. The purpose of this study is to design an unsupervised segmentation algorithm for brain white matter fiber bundles which can automatically segment fiber bundles using intrinsic diffusion tensor imaging data information without considering any prior information or assumption about data distributions. Here, a new density based clustering algorithm called neighborhood distance entropy consistency (NDEC), is proposed which discovers natural clusters within data by simultaneously utilizing both local and global density information. The performance of NDEC is compared with other state of the art clustering algorithms including chameleon, spectral clustering, DBSCAN and k-means using Johns Hopkins University publicly available diffusion tensor imaging data. The performance of NDEC and other employed clustering algorithms were evaluated using dice ratio as an external evaluation criteria and density based clustering validation (DBCV) index as an internal evaluation metric. Across all employed clustering algorithms, NDEC obtained the highest average dice ratio (0.94) and DBCV value (0.71). NDEC can find clusters with arbitrary shapes and densities and consequently can be used for WM fiber bundle segmentation where there is no distinct boundary between various bundles. NDEC may also be used as an effective tool in other pattern recognition and medical diagnostic systems in which discovering natural clusters within data is a necessity. Copyright

  11. Clustering Dycom

    KAUST Repository

    Minku, Leandro L.

    2017-10-06

    Background: Software Effort Estimation (SEE) can be formulated as an online learning problem, where new projects are completed over time and may become available for training. In this scenario, a Cross-Company (CC) SEE approach called Dycom can drastically reduce the number of Within-Company (WC) projects needed for training, saving the high cost of collecting such training projects. However, Dycom relies on splitting CC projects into different subsets in order to create its CC models. Such splitting can have a significant impact on Dycom\\'s predictive performance. Aims: This paper investigates whether clustering methods can be used to help finding good CC splits for Dycom. Method: Dycom is extended to use clustering methods for creating the CC subsets. Three different clustering methods are investigated, namely Hierarchical Clustering, K-Means, and Expectation-Maximisation. Clustering Dycom is compared against the original Dycom with CC subsets of different sizes, based on four SEE databases. A baseline WC model is also included in the analysis. Results: Clustering Dycom with K-Means can potentially help to split the CC projects, managing to achieve similar or better predictive performance than Dycom. However, K-Means still requires the number of CC subsets to be pre-defined, and a poor choice can negatively affect predictive performance. EM enables Dycom to automatically set the number of CC subsets while still maintaining or improving predictive performance with respect to the baseline WC model. Clustering Dycom with Hierarchical Clustering did not offer significant advantage in terms of predictive performance. Conclusion: Clustering methods can be an effective way to automatically generate Dycom\\'s CC subsets.

  12. Natural-Annotation-based Unsupervised Construction of Korean-Chinese Domain Dictionary

    Science.gov (United States)

    Liu, Wuying; Wang, Lin

    2018-03-01

    The large-scale bilingual parallel resource is significant to statistical learning and deep learning in natural language processing. This paper addresses the automatic construction issue of the Korean-Chinese domain dictionary, and presents a novel unsupervised construction method based on the natural annotation in the raw corpus. We firstly extract all Korean-Chinese word pairs from Korean texts according to natural annotations, secondly transform the traditional Chinese characters into the simplified ones, and finally distill out a bilingual domain dictionary after retrieving the simplified Chinese words in an extra Chinese domain dictionary. The experimental results show that our method can automatically build multiple Korean-Chinese domain dictionaries efficiently.

  13. Unsupervised ensemble ranking of terms in electronic health record notes based on their importance to patients.

    Science.gov (United States)

    Chen, Jinying; Yu, Hong

    2017-04-01

    Allowing patients to access their own electronic health record (EHR) notes through online patient portals has the potential to improve patient-centered care. However, EHR notes contain abundant medical jargon that can be difficult for patients to comprehend. One way to help patients is to reduce information overload and help them focus on medical terms that matter most to them. Targeted education can then be developed to improve patient EHR comprehension and the quality of care. The aim of this work was to develop FIT (Finding Important Terms for patients), an unsupervised natural language processing (NLP) system that ranks medical terms in EHR notes based on their importance to patients. We built FIT on a new unsupervised ensemble ranking model derived from the biased random walk algorithm to combine heterogeneous information resources for ranking candidate terms from each EHR note. Specifically, FIT integrates four single views (rankers) for term importance: patient use of medical concepts, document-level term salience, word co-occurrence based term relatedness, and topic coherence. It also incorporates partial information of term importance as conveyed by terms' unfamiliarity levels and semantic types. We evaluated FIT on 90 expert-annotated EHR notes and used the four single-view rankers as baselines. In addition, we implemented three benchmark unsupervised ensemble ranking methods as strong baselines. FIT achieved 0.885 AUC-ROC for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FIT for identifying important terms from EHR notes was 0.813 AUC-ROC. Both performance scores significantly exceeded the corresponding scores from the four single rankers (P<0.001). FIT also outperformed the three ensemble rankers for most metrics. Its performance is relatively insensitive to its parameter. FIT can automatically identify EHR terms important to patients. It may help develop future interventions

  14. Unsupervised Learning Through Randomized Algorithms for High-Volume High-Velocity Data (ULTRA-HV).

    Energy Technology Data Exchange (ETDEWEB)

    Pinar, Ali [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Kolda, Tamara G. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Carlberg, Kevin Thomas [Wake Forest Univ., Winston-Salem, MA (United States); Ballard, Grey [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mahoney, Michael [Univ. of California, Berkeley, CA (United States)

    2018-01-01

    Through long-term investments in computing, algorithms, facilities, and instrumentation, DOE is an established leader in massive-scale, high-fidelity simulations, as well as science-leading experimentation. In both cases, DOE is generating more data than it can analyze and the problem is intensifying quickly. The need for advanced algorithms that can automatically convert the abundance of data into a wealth of useful information by discovering hidden structures is well recognized. Such efforts however, are hindered by the massive volume of the data and its high velocity. Here, the challenge is developing unsupervised learning methods to discover hidden structure in high-volume, high-velocity data.

  15. Unsupervised DInSAR processing chain for multi-scale displacement analysis

    Science.gov (United States)

    Casu, Francesco; Manunta, Michele

    2016-04-01

    Earth Observation techniques can be very helpful for the estimation of several sources of ground deformation due to their characteristics of large spatial coverage, high resolution and cost effectiveness. In this scenario, Differential Synthetic Aperture Radar Interferometry (DInSAR) is one of the most effective methodologies for its capability to generate spatially dense deformation maps at both global and local spatial scale, with centimeter to millimeter accuracy. DInSAR exploits the phase difference (interferogram) between SAR image pairs relevant to acquisitions gathered at different times, but with the same illumination geometry and from sufficiently close flight tracks, whose separation is typically referred to as baseline. Among several, the SBAS algorithm is one of the most used DInSAR approaches and it is aimed at generating displacement time series at a multi-scale level by exploiting a set of small baseline interferograms. SBAS, and generally DInSAR, has taken benefit from the large availability of spaceborne SAR data collected along years by several satellite systems, with particular regard to the European ERS and ENVISAT sensors, which have acquired SAR images worldwide during approximately 20 years. Moreover, since 2014 the new generation of Copernicus Sentinel satellites has started to acquire data with a short revisit time (12 days) and a global coverage policy, thus flooding the scientific EO community with an unprecedent amount of data. To efficiently manage such amount of data, proper processing facilities (as those coming from the emerging Cloud Computing technologies) have to be used, as well as novel algorithms aimed at their efficient exploitation have to be developed. In this work we present a set of results achieved by exploiting a recently proposed implementation of the SBAS algorithm, namely Parallel-SBAS (P-SBAS), which allows us to effectively process, in an unsupervised way and in a limited time frame, a huge number of SAR images

  16. Cluster editing

    DEFF Research Database (Denmark)

    Böcker, S.; Baumbach, Jan

    2013-01-01

    . The problem has been the inspiration for numerous algorithms in bioinformatics, aiming at clustering entities such as genes, proteins, phenotypes, or patients. In this paper, we review exact and heuristic methods that have been proposed for the Cluster Editing problem, and also applications......The Cluster Editing problem asks to transform a graph into a disjoint union of cliques using a minimum number of edge modifications. Although the problem has been proven NP-complete several times, it has nevertheless attracted much research both from the theoretical and the applied side...

  17. An unsupervised meta-graph clustering based prototype-specific feature quantification for human re-identification in video surveillance

    Directory of Open Access Journals (Sweden)

    Aparajita Nanda

    2017-06-01

    Full Text Available Human re-identification is an emerging research area in the field of visual surveillance. It refers to the task of associating the images of the persons captured by one camera (probe set with the images captured by another camera (gallery set at different locations in different time instances. The performance of these systems are often challenged by some factors—variation in articulated human pose and clothing, frequent occlusion with various objects, change in light illumination, and the cluttered background are to name a few. Besides, the ambiguity in recognition increases between individuals with similar appearance. In this paper, we present a novel framework for human re-identification that finds the correspondence image pair across non-overlapping camera views in the presence of the above challenging scenarios. The proposed framework handles the visual ambiguity having similar appearance by first segmenting the gallery instances into disjoint prototypes (groups, where each prototype represents the images with high commonality. Then, a weighing scheme is formulated that quantifies the selective and distinct information about the features concerning the level of contribution against each prototype. Finally, the prototype specific weights are utilized in the similarity measure and fused with the existing generic weighing to facilitates improvement in the re-identification. Exhaustive simulation on three benchmark datasets alongside the CMC (Cumulative Matching Characteristics plot enumerate the efficacy of our proposed framework over the counterparts.

  18. Clustering analysis of malware behavior using Self Organizing Map

    DEFF Research Database (Denmark)

    Pirscoveanu, Radu-Stefan; Stevanovic, Matija; Pedersen, Jens Myrup

    2016-01-01

    For the time being, malware behavioral classification is performed by means of Anti-Virus (AV) generated labels. The paper investigates the inconsistencies associated with current practices by evaluating the identified differences between current vendors. In this paper we rely on Self Organizing...... Map, an unsupervised machine learning algorithm, for generating clusters that capture the similarities between malware behavior. A data set of approximately 270,000 samples was used to generate the behavioral profile of malicious types in order to compare the outcome of the proposed clustering...... approach with the labels collected from 57 Antivirus vendors using VirusTotal. Upon evaluating the results, the paper concludes on shortcomings of relying on AV vendors for labeling malware samples. In order to solve the problem, a cluster-based classification is proposed, which should provide more...

  19. Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.

    Science.gov (United States)

    He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej

    2011-12-01

    Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.

  20. Occupational Clusters.

    Science.gov (United States)

    Pottawattamie County School System, Council Bluffs, IA.

    The 15 occupational clusters (transportation, fine arts and humanities, communications and media, personal service occupations, construction, hospitality and recreation, health occupations, marine science occupations, consumer and homemaking-related occupations, agribusiness and natural resources, environment, public service, business and office…

  1. Fuzzy Clustering

    DEFF Research Database (Denmark)

    Berks, G.; Keyserlingk, Diedrich Graf von; Jantzen, Jan

    2000-01-01

    A symptom is a condition indicating the presence of a disease, especially, when regarded as an aid in diagnosis.Symptoms are the smallest units indicating the existence of a disease. A syndrome on the other hand is an aggregate, set or cluster of concurrent symptoms which together indicate...... and clustering are the basic concerns in medicine. Classification depends on definitions of the classes and their required degree of participant of the elements in the cases' symptoms. In medicine imprecise conditions are the rule and therefore fuzzy methods are much more suitable than crisp ones. Fuzzy c......-mean clustering is an easy and well improved tool, which has been applied in many medical fields. We used c-mean fuzzy clustering after feature extraction from an aphasia database. Factor analysis was applied on a correlation matrix of 26 symptoms of language disorders and led to five factors. The factors...

  2. Cluster generator

    Science.gov (United States)

    Donchev, Todor I [Urbana, IL; Petrov, Ivan G [Champaign, IL

    2011-05-31

    Described herein is an apparatus and a method for producing atom clusters based on a gas discharge within a hollow cathode. The hollow cathode includes one or more walls. The one or more walls define a sputtering chamber within the hollow cathode and include a material to be sputtered. A hollow anode is positioned at an end of the sputtering chamber, and atom clusters are formed when a gas discharge is generated between the hollow anode and the hollow cathode.

  3. Cluster Bulleticity

    OpenAIRE

    Massey, Richard; Kitching, Thomas; Nagai, Daisuke

    2010-01-01

    The unique properties of dark matter are revealed during collisions between clusters of galaxies, such as the bullet cluster (1E 0657−56) and baby bullet (MACS J0025−12). These systems provide evidence for an additional, invisible mass in the separation between the distributions of their total mass, measured via gravitational lensing, and their ordinary ‘baryonic’ matter, measured via its X-ray emission. Unfortunately, the information available from these systems is limited by their rarity. C...

  4. Cluster headache

    OpenAIRE

    Leroux, Elizabeth; Ducros, Anne

    2008-01-01

    Abstract Cluster headache (CH) is a primary headache disease characterized by recurrent short-lasting attacks (15 to 180 minutes) of excruciating unilateral periorbital pain accompanied by ipsilateral autonomic signs (lacrimation, nasal congestion, ptosis, miosis, lid edema, redness of the eye). It affects young adults, predominantly males. Prevalence is estimated at 0.5–1.0/1,000. CH has a circannual and circadian periodicity, attacks being clustered (hence the name) in bouts that can occur ...

  5. Six weeks of unsupervised Nintendo Wii Fit gaming is effective at improving balance in independent older adults.

    Science.gov (United States)

    Nicholson, Vaughan Patrick; McKean, Mark; Lowe, John; Fawcett, Christine; Burkett, Brendan

    2015-01-01

    To determine the effectiveness of unsupervised Nintendo Wii Fit balance training in older adults. Forty-one older adults were recruited from local retirement villages and educational settings to participate in a six-week two-group repeated measures study. The Wii group (n = 19, 75 ± 6 years) undertook 30 min of unsupervised Wii balance gaming three times per week in their retirement village while the comparison group (n = 22, 74 ± 5 years) continued with their usual exercise program. Participants' balance abilities were assessed pre- and postintervention. The Wii Fit group demonstrated significant improvements (P balance, lateral reach (left and right), and gait speed compared with the comparison group. Reported levels of enjoyment following game play increased during the study. Six weeks of unsupervised Wii balance training is an effective modality for improving balance in independent older adults.

  6. Unsupervised progressive elastic band exercises for frail geriatric inpatients objectively monitored by new exercise-integrated technology

    DEFF Research Database (Denmark)

    Rathleff, Camilla Rams; Bandholm, T.; Spaich, Erika Geraldina

    2017-01-01

    the amount of supervised training, and unsupervised training could possibly supplement supervised training thereby increasing the total exercise dose during admission. A new valid and reliable technology, the BandCizer, objectively measures the exact training dosage performed. The purpose was to investigate...... feasibility and acceptability of an unsupervised progressive strength training intervention monitored by BandCizer for frail geriatric inpatients. Methods: This feasibility trial included 15 frail inpatients at a geriatric ward. At hospitalization, the patients were prescribed two elastic band exercises...... of 2-min pauses and a time-under-tension of 8 s. The feasibility criterion for the unsupervised progressive exercises was that 33% of the recommended number of sets would be performed by at least 30% of patients. In addition, patients and staff were interviewed about their experiences...

  7. Combination of the clustered regularly interspaced short palindromic repeats (CRISPR)-associated 9 technique with the piggybac transposon system for mouse in utero electroporation to study cortical development.

    Science.gov (United States)

    Cheng, Man; Jin, Xubin; Mu, Lili; Wang, Fangyu; Li, Wei; Zhong, Xiaoling; Liu, Xuan; Shen, Wenchen; Liu, Ying; Zhou, Yan

    2016-09-01

    In utero electroporation (IUE) is commonly used to study cortical development of cerebrum by downregulating or overexpressing genes of interest in neural progenitor cells (NPCs) of small mammals. However, exogenous plasmids are lost or diluted over time. Furthermore, gene knockdown based on short-hairpin RNAs may exert nonspecific effects that lead to aberrant neuronal migration. Genomic engineering by the clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated 9 (Cas9) system has great research and therapeutic potentials. Here we integrate the CRISPR/Cas9 components into the piggyBac (PB) transposon system (the CRISPR/Cas9-PB toolkit) for cortical IUEs. The mouse Sry-related HMG box-2 (Sox2) gene was selected as the target for its application. Most transduced cortical NPCs were depleted of SOX2 protein as early as 3 days post-IUE, whereas expressions of SOX1 and PAX6 remained intact. Furthermore, both the WT Cas9 and the D10A nickase mutant Cas9n showed comparable knockout efficiency. Transduced cortical cells were purified with fluorescence-activated cell sorting, and effective gene editing at the Sox2 loci was confirmed. Thus, application of the CRISPR/Cas9-PB toolkit in IUE is a promising strategy to study gene functions in cortical NPCs and their progeny. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  8. Dense Fe cluster-assembled films by energetic cluster deposition

    International Nuclear Information System (INIS)

    Peng, D.L.; Yamada, H.; Hihara, T.; Uchida, T.; Sumiyama, K.

    2004-01-01

    High-density Fe cluster-assembled films were produced at room temperature by an energetic cluster deposition. Though cluster-assemblies are usually sooty and porous, the present Fe cluster-assembled films are lustrous and dense, revealing a soft magnetic behavior. Size-monodispersed Fe clusters with the mean cluster size d=9 nm were synthesized using a plasma-gas-condensation technique. Ionized clusters are accelerated electrically and deposited onto the substrate together with neutral clusters from the same cluster source. Packing fraction and saturation magnetic flux density increase rapidly and magnetic coercivity decreases remarkably with increasing acceleration voltage. The Fe cluster-assembled film obtained at the acceleration voltage of -20 kV has a packing fraction of 0.86±0.03, saturation magnetic flux density of 1.78±0.05 Wb/m 2 , and coercivity value smaller than 80 A/m. The resistivity at room temperature is ten times larger than that of bulk Fe metal

  9. Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition

    Directory of Open Access Journals (Sweden)

    Suzuki Motoyuki

    2009-01-01

    Full Text Available Abstract We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents. The proposed method chooses keywords from the preliminary recognition result and retrieves Web documents using the chosen keywords. A problem is that the selected keywords tend to contain misrecognized words. The proposed method introduces two new ideas for avoiding the effects of keywords derived from misrecognized words. The first idea is to compose multiple queries from selected keyword candidates so that the misrecognized words and correct words do not fall into one query. The second idea is that the number of Web documents downloaded for each query is determined according to the "query relevance." Combining these two ideas, we can alleviate bad effect of misrecognized keywords by decreasing the number of downloaded Web documents from queries that contain misrecognized keywords. Finally, we examine a method of determining the number of iterative adaptations based on the recognition likelihood. Experiments have shown that the proposed stopping criterion can determine almost the optimum number of iterations. In the final experiment, the word accuracy without adaptation (55.29% was improved to 60.38%, which was 1.13 point better than the result of the conventional unsupervised adaptation method (59.25%.

  10. Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition

    Directory of Open Access Journals (Sweden)

    Akinori Ito

    2009-01-01

    Full Text Available We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents. The proposed method chooses keywords from the preliminary recognition result and retrieves Web documents using the chosen keywords. A problem is that the selected keywords tend to contain misrecognized words. The proposed method introduces two new ideas for avoiding the effects of keywords derived from misrecognized words. The first idea is to compose multiple queries from selected keyword candidates so that the misrecognized words and correct words do not fall into one query. The second idea is that the number of Web documents downloaded for each query is determined according to the “query relevance.” Combining these two ideas, we can alleviate bad effect of misrecognized keywords by decreasing the number of downloaded Web documents from queries that contain misrecognized keywords. Finally, we examine a method of determining the number of iterative adaptations based on the recognition likelihood. Experiments have shown that the proposed stopping criterion can determine almost the optimum number of iterations. In the final experiment, the word accuracy without adaptation (55.29% was improved to 60.38%, which was 1.13 point better than the result of the conventional unsupervised adaptation method (59.25%.

  11. Constrained Versions of DEDICOM for Use in Unsupervised Part-Of-Speech Tagging

    Energy Technology Data Exchange (ETDEWEB)

    Dunlavy, Daniel; Peter A. Chew

    2016-05-01

    This reports describes extensions of DEDICOM (DEcomposition into DIrectional COMponents) data models [3] that incorporate bound and linear constraints. The main purpose of these extensions is to investigate the use of improved data models for unsupervised part-of-speech tagging, as described by Chew et al. [2]. In that work, a single domain, two-way DEDICOM model was computed on a matrix of bigram fre- quencies of tokens in a corpus and used to identify parts-of-speech as an unsupervised approach to that problem. An open problem identi ed in that work was the com- putation of a DEDICOM model that more closely resembled the matrices used in a Hidden Markov Model (HMM), speci cally through post-processing of the DEDICOM factor matrices. The work reported here consists of the description of several models that aim to provide a direct solution to that problem and a way to t those models. The approach taken here is to incorporate the model requirements as bound and lin- ear constrains into the DEDICOM model directly and solve the data tting problem as a constrained optimization problem. This is in contrast to the typical approaches in the literature, where the DEDICOM model is t using unconstrained optimization approaches, and model requirements are satis ed as a post-processing step.

  12. Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP.

    Directory of Open Access Journals (Sweden)

    Yoonsik Shim

    2016-10-01

    Full Text Available We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP. The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.

  13. Intelligent Fault Diagnosis of Rotary Machinery Based on Unsupervised Multiscale Representation Learning

    Science.gov (United States)

    Jiang, Guo-Qian; Xie, Ping; Wang, Xiao; Chen, Meng; He, Qun

    2017-11-01

    The performance of traditional vibration based fault diagnosis methods greatly depends on those handcrafted features extracted using signal processing algorithms, which require significant amounts of domain knowledge and human labor, and do not generalize well to new diagnosis domains. Recently, unsupervised representation learning provides an alternative promising solution to feature extraction in traditional fault diagnosis due to its superior learning ability from unlabeled data. Given that vibration signals usually contain multiple temporal structures, this paper proposes a multiscale representation learning (MSRL) framework to learn useful features directly from raw vibration signals, with the aim to capture rich and complementary fault pattern information at different scales. In our proposed approach, a coarse-grained procedure is first employed to obtain multiple scale signals from an original vibration signal. Then, sparse filtering, a newly developed unsupervised learning algorithm, is applied to automatically learn useful features from each scale signal, respectively, and then the learned features at each scale to be concatenated one by one to obtain multiscale representations. Finally, the multiscale representations are fed into a supervised classifier to achieve diagnosis results. Our proposed approach is evaluated using two different case studies: motor bearing and wind turbine gearbox fault diagnosis. Experimental results show that the proposed MSRL approach can take full advantages of the availability of unlabeled data to learn discriminative features and achieved better performance with higher accuracy and stability compared to the traditional approaches.

  14. Supervised and Unsupervised Aspect Category Detection for Sentiment Analysis with Co-occurrence Data.

    Science.gov (United States)

    Schouten, Kim; van der Weijde, Onne; Frasincar, Flavius; Dekker, Rommert

    2018-04-01

    Using online consumer reviews as electronic word of mouth to assist purchase-decision making has become increasingly popular. The Web provides an extensive source of consumer reviews, but one can hardly read all reviews to obtain a fair evaluation of a product or service. A text processing framework that can summarize reviews, would therefore be desirable. A subtask to be performed by such a framework would be to find the general aspect categories addressed in review sentences, for which this paper presents two methods. In contrast to most existing approaches, the first method presented is an unsupervised method that applies association rule mining on co-occurrence frequency data obtained from a corpus to find these aspect categories. While not on par with state-of-the-art supervised methods, the proposed unsupervised method performs better than several simple baselines, a similar but supervised method, and a supervised baseline, with an -score of 67%. The second method is a supervised variant that outperforms existing methods with an -score of 84%.

  15. Modelling unsupervised online-learning of artificial grammars: linking implicit and statistical learning.

    Science.gov (United States)

    Rohrmeier, Martin A; Cross, Ian

    2014-07-01

    Humans rapidly learn complex structures in various domains. Findings of above-chance performance of some untrained control groups in artificial grammar learning studies raise questions about the extent to which learning can occur in an untrained, unsupervised testing situation with both correct and incorrect structures. The plausibility of unsupervised online-learning effects was modelled with n-gram, chunking and simple recurrent network models. A novel evaluation framework was applied, which alternates forced binary grammaticality judgments and subsequent learning of the same stimulus. Our results indicate a strong online learning effect for n-gram and chunking models and a weaker effect for simple recurrent network models. Such findings suggest that online learning is a plausible effect of statistical chunk learning that is possible when ungrammatical sequences contain a large proportion of grammatical chunks. Such common effects of continuous statistical learning may underlie statistical and implicit learning paradigms and raise implications for study design and testing methodologies. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. An Improved EMD-Based Dissimilarity Metric for Unsupervised Linear Subspace Learning

    Directory of Open Access Journals (Sweden)

    Xiangchun Yu

    2018-01-01

    Full Text Available We investigate a novel way of robust face image feature extraction by adopting the methods based on Unsupervised Linear Subspace Learning to extract a small number of good features. Firstly, the face image is divided into blocks with the specified size, and then we propose and extract pooled Histogram of Oriented Gradient (pHOG over each block. Secondly, an improved Earth Mover’s Distance (EMD metric is adopted to measure the dissimilarity between blocks of one face image and the corresponding blocks from the rest of face images. Thirdly, considering the limitations of the original Locality Preserving Projections (LPP, we proposed the Block Structure LPP (BSLPP, which effectively preserves the structural information of face images. Finally, an adjacency graph is constructed and a small number of good features of a face image are obtained by methods based on Unsupervised Linear Subspace Learning. A series of experiments have been conducted on several well-known face databases to evaluate the effectiveness of the proposed algorithm. In addition, we construct the noise, geometric distortion, slight translation, slight rotation AR, and Extended Yale B face databases, and we verify the robustness of the proposed algorithm when faced with a certain degree of these disturbances.

  17. Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP.

    Science.gov (United States)

    Shim, Yoonsik; Philippides, Andrew; Staras, Kevin; Husbands, Phil

    2016-10-01

    We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.

  18. Rational Variety Mapping for Contrast-Enhanced Nonlinear Unsupervised Segmentation of Multispectral Images of Unstained Specimen

    Science.gov (United States)

    Kopriva, Ivica; Hadžija, Mirko; Popović Hadžija, Marijana; Korolija, Marina; Cichocki, Andrzej

    2011-01-01

    A methodology is proposed for nonlinear contrast-enhanced unsupervised segmentation of multispectral (color) microscopy images of principally unstained specimens. The methodology exploits spectral diversity and spatial sparseness to find anatomical differences between materials (cells, nuclei, and background) present in the image. It consists of rth-order rational variety mapping (RVM) followed by matrix/tensor factorization. Sparseness constraint implies duality between nonlinear unsupervised segmentation and multiclass pattern assignment problems. Classes not linearly separable in the original input space become separable with high probability in the higher-dimensional mapped space. Hence, RVM mapping has two advantages: it takes implicitly into account nonlinearities present in the image (ie, they are not required to be known) and it increases spectral diversity (ie, contrast) between materials, due to increased dimensionality of the mapped space. This is expected to improve performance of systems for automated classification and analysis of microscopic histopathological images. The methodology was validated using RVM of the second and third orders of the experimental multispectral microscopy images of unstained sciatic nerve fibers (nervus ischiadicus) and of unstained white pulp in the spleen tissue, compared with a manually defined ground truth labeled by two trained pathophysiologists. The methodology can also be useful for additional contrast enhancement of images of stained specimens. PMID:21708116

  19. Unsupervised Symbolization of Signal Time Series for Extraction of the Embedded Information

    Directory of Open Access Journals (Sweden)

    Yue Li

    2017-03-01

    Full Text Available This paper formulates an unsupervised algorithm for symbolization of signal time series to capture the embedded dynamic behavior. The key idea is to convert time series of the digital signal into a string of (spatially discrete symbols from which the embedded dynamic information can be extracted in an unsupervised manner (i.e., no requirement for labeling of time series. The main challenges here are: (1 definition of the symbol assignment for the time series; (2 identification of the partitioning segment locations in the signal space of time series; and (3 construction of probabilistic finite-state automata (PFSA from the symbol strings that contain temporal patterns. The reported work addresses these challenges by maximizing the mutual information measures between symbol strings and PFSA states. The proposed symbolization method has been validated by numerical simulation as well as by experimentation in a laboratory environment. Performance of the proposed algorithm has been compared to that of two commonly used algorithms of time series partitioning.

  20. Wavelet-based unsupervised learning method for electrocardiogram suppression in surface electromyograms.

    Science.gov (United States)

    Niegowski, Maciej; Zivanovic, Miroslav

    2016-03-01

    We present a novel approach aimed at removing electrocardiogram (ECG) perturbation from single-channel surface electromyogram (EMG) recordings by means of unsupervised learning of wavelet-based intensity images. The general idea is to combine the suitability of certain wavelet decomposition bases which provide sparse electrocardiogram time-frequency representations, with the capacity of non-negative matrix factorization (NMF) for extracting patterns from images. In order to overcome convergence problems which often arise in NMF-related applications, we design a novel robust initialization strategy which ensures proper signal decomposition in a wide range of ECG contamination levels. Moreover, the method can be readily used because no a priori knowledge or parameter adjustment is needed. The proposed method was evaluated on real surface EMG signals against two state-of-the-art unsupervised learning algorithms and a singular spectrum analysis based method. The results, expressed in terms of high-to-low energy ratio, normalized median frequency, spectral power difference and normalized average rectified value, suggest that the proposed method enables better ECG-EMG separation quality than the reference methods. Copyright © 2015 IPEM. Published by Elsevier Ltd. All rights reserved.

  1. Unsupervised Object Modeling and Segmentation with Symmetry Detection for Human Activity Recognition

    Directory of Open Access Journals (Sweden)

    Jui-Yuan Su

    2015-04-01

    Full Text Available In this paper we present a novel unsupervised approach to detecting and segmenting objects as well as their constituent symmetric parts in an image. Traditional unsupervised image segmentation is limited by two obvious deficiencies: the object detection accuracy degrades with the misaligned boundaries between the segmented regions and the target, and pre-learned models are required to group regions into meaningful objects. To tackle these difficulties, the proposed approach aims at incorporating the pair-wise detection of symmetric patches to achieve the goal of segmenting images into symmetric parts. The skeletons of these symmetric parts then provide estimates of the bounding boxes to locate the target objects. Finally, for each detected object, the graphcut-based segmentation algorithm is applied to find its contour. The proposed approach has significant advantages: no a priori object models are used, and multiple objects are detected. To verify the effectiveness of the approach based on the cues that a face part contains an oval shape and skin colors, human objects are extracted from among the detected objects. The detected human objects and their parts are finally tracked across video frames to capture the object part movements for learning the human activity models from video clips. Experimental results show that the proposed method gives good performance on publicly available datasets.

  2. Unsupervised Scalable Statistical Method for Identifying Influential Users in Online Social Networks.

    Science.gov (United States)

    Azcorra, A; Chiroque, L F; Cuevas, R; Fernández Anta, A; Laniado, H; Lillo, R E; Romo, J; Sguera, C

    2018-05-03

    Billions of users interact intensively every day via Online Social Networks (OSNs) such as Facebook, Twitter, or Google+. This makes OSNs an invaluable source of information, and channel of actuation, for sectors like advertising, marketing, or politics. To get the most of OSNs, analysts need to identify influential users that can be leveraged for promoting products, distributing messages, or improving the image of companies. In this report we propose a new unsupervised method, Massive Unsupervised Outlier Detection (MUOD), based on outliers detection, for providing support in the identification of influential users. MUOD is scalable, and can hence be used in large OSNs. Moreover, it labels the outliers as of shape, magnitude, or amplitude, depending of their features. This allows classifying the outlier users in multiple different classes, which are likely to include different types of influential users. Applying MUOD to a subset of roughly 400 million Google+ users, it has allowed identifying and discriminating automatically sets of outlier users, which present features associated to different definitions of influential users, like capacity to attract engagement, capacity to attract a large number of followers, or high infection capacity.

  3. A scale space approach for unsupervised feature selection in mass spectra classification for ovarian cancer detection.

    Science.gov (United States)

    Ceccarelli, Michele; d'Acierno, Antonio; Facchiano, Angelo

    2009-10-15

    Mass spectrometry spectra, widely used in proteomics studies as a screening tool for protein profiling and to detect discriminatory signals, are high dimensional data. A large number of local maxima (a.k.a. peaks) have to be analyzed as part of computational pipelines aimed at the realization of efficient predictive and screening protocols. With this kind of data dimensions and samples size the risk of over-fitting and selection bias is pervasive. Therefore the development of bio-informatics methods based on unsupervised feature extraction can lead to general tools which can be applied to several fields of predictive proteomics. We propose a method for feature selection and extraction grounded on the theory of multi-scale spaces for high resolution spectra derived from analysis of serum. Then we use support vector machines for classification. In particular we use a database containing 216 samples spectra divided in 115 cancer and 91 control samples. The overall accuracy averaged over a large cross validation study is 98.18. The area under the ROC curve of the best selected model is 0.9962. We improved previous known results on the problem on the same data, with the advantage that the proposed method has an unsupervised feature selection phase. All the developed code, as MATLAB scripts, can be downloaded from http://medeaserver.isa.cnr.it/dacierno/spectracode.htm.

  4. MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering

    Directory of Open Access Journals (Sweden)

    Ashlock Daniel

    2009-08-01

    Full Text Available Abstract Background Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. Results We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. Conclusion The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors.

  5. MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering.

    Science.gov (United States)

    Kim, Eun-Youn; Kim, Seon-Young; Ashlock, Daniel; Nam, Dougu

    2009-08-22

    Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors.

  6. The effect of long-range air mass transport pathways on PM10 and NO2 concentrations at urban and rural background sites in Ireland: Quantification using clustering techniques.

    Science.gov (United States)

    Donnelly, Aoife A; Broderick, Brian M; Misstear, Bruce D

    2015-01-01

    The specific aims of this paper are to: (i) quantify the effects of various long range transport pathways nitrogen dioxide (NO2) and particulate matter with diameter less than 10μm (PM10) concentrations in Ireland and identify air mass movement corridors which may lead to incidences poor air quality for application in forecasting; (ii) compare the effects of such pathways at various sites; (iii) assess pathways associated with a period of decreased air quality in Ireland. The origin of and the regions traversed by an air mass 96h prior to reaching a receptor is modelled and k-means clustering is applied to create air-mass groups. Significant differences in air pollution levels were found between air mass cluster types at urban and rural sites. It was found that easterly or recirculated air masses lead to higher NO2 and PM10 levels with average NO2 levels varying between 124% and 239% of the seasonal mean and average PM10 levels varying between 103% and 199% of the seasonal mean at urban and rural sites. Easterly air masses are more frequent during winter months leading to higher overall concentrations. The span in relative concentrations between air mass clusters is highest at the rural site indicating that regional factors are controlling concentration levels. The methods used in this paper could be applied to assist in modelling and forecasting air quality based on long range transport pathways and forecast meteorology without the requirement for detailed emissions data over a large regional domain or the use of computationally demanding modelling techniques.

  7. Hierarchical clustering of HPV genotype patterns in the ASCUS-LSIL triage study

    Science.gov (United States)

    Wentzensen, Nicolas; Wilson, Lauren E.; Wheeler, Cosette M.; Carreon, Joseph D.; Gravitt, Patti E.; Schiffman, Mark; Castle, Philip E.

    2010-01-01

    Anogenital cancers are associated with about 13 carcinogenic HPV types in a broader group that cause cervical intraepithelial neoplasia (CIN). Multiple concurrent cervical HPV infections are common which complicate the attribution of HPV types to different grades of CIN. Here we report the analysis of HPV genotype patterns in the ASCUS-LSIL triage study using unsupervised hierarchical clustering. Women who underwent colposcopy at baseline (n = 2780) were grouped into 20 disease categories based on histology and cytology. Disease groups and HPV genotypes were clustered using complete linkage. Risk of 2-year cumulative CIN3+, viral load, colposcopic impression, and age were compared between disease groups and major clusters. Hierarchical clustering yielded four major disease clusters: Cluster 1 included all CIN3 histology with abnormal cytology; Cluster 2 included CIN3 histology with normal cytology and combinations with either CIN2 or high-grade squamous intraepithelial lesion (HSIL) cytology; Cluster 3 included older women with normal or low grade histology/cytology and low viral load; Cluster 4 included younger women with low grade histology/cytology, multiple infections, and the highest viral load. Three major groups of HPV genotypes were identified: Group 1 included only HPV16; Group 2 included nine carcinogenic types plus non-carcinogenic HPV53 and HPV66; and Group 3 included non-carcinogenic types plus carcinogenic HPV33 and HPV45. Clustering results suggested that colposcopy missed a prevalent precancer in many women with no biopsy/normal histology and HSIL. This result was confirmed by an elevated 2-year risk of CIN3+ in these groups. Our novel approach to study multiple genotype infections in cervical disease using unsupervised hierarchical clustering can address complex genotype distributions on a population level. PMID:20959485

  8. Distribuição de subgrupos com base nas respostas fisiológicas em jogadores profissionais de futebol pela técnica K Means Cluster Subgroup distribution based on physiological responses in professional soccer players by K-means cluster technique

    Directory of Open Access Journals (Sweden)

    Luiz Fernando Novack

    2013-04-01

    improved in all athletes collectively. CONCLUSION: The results make us conclude that group distribution by K Means Clustering technique can be performed using physiological responses of athletes in an attempt to optimize training for professional soccer players with focus on the common main training needs regardless of their tactical function played on the field.

  9. Time series clustering in large data sets

    Directory of Open Access Journals (Sweden)

    Jiří Fejfar

    2011-01-01

    Full Text Available The clustering of time series is a widely researched area. There are many methods for dealing with this task. We are actually using the Self-organizing map (SOM with the unsupervised learning algorithm for clustering of time series. After the first experiment (Fejfar, Weinlichová, Šťastný, 2009 it seems that the whole concept of the clustering algorithm is correct but that we have to perform time series clustering on much larger dataset to obtain more accurate results and to find the correlation between configured parameters and results more precisely. The second requirement arose in a need for a well-defined evaluation of results. It seems useful to use sound recordings as instances of time series again. There are many recordings to use in digital libraries, many interesting features and patterns can be found in this area. We are searching for recordings with the similar development of information density in this experiment. It can be used for musical form investigation, cover songs detection and many others applications.The objective of the presented paper is to compare clustering results made with different parameters of feature vectors and the SOM itself. We are describing time series in a simplistic way evaluating standard deviations for separated parts of recordings. The resulting feature vectors are clustered with the SOM in batch training mode with different topologies varying from few neurons to large maps.There are other algorithms discussed, usable for finding similarities between time series and finally conclusions for further research are presented. We also present an overview of the related actual literature and projects.

  10. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features.

    Science.gov (United States)

    Nikfarjam, Azadeh; Sarker, Abeed; O'Connor, Karen; Ginn, Rachel; Gonzalez, Graciela

    2015-05-01

    Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words' semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  11. Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination.

    Science.gov (United States)

    Yau, Christopher; Holmes, Chris

    2011-07-01

    We propose a hierarchical Bayesian nonparametric mixture model for clustering when some of the covariates are assumed to be of varying relevance to the clustering problem. This can be thought of as an issue in variable selection for unsupervised learning. We demonstrate that by defining a hierarchical population based nonparametric prior on the cluster locations scaled by the inverse covariance matrices of the likelihood we arrive at a 'sparsity prior' representation which admits a conditionally conjugate prior. This allows us to perform full Gibbs sampling to obtain posterior distributions over parameters of interest including an explicit measure of each covariate's relevance and a distribution over the number of potential clusters present in the data. This also allows for individual cluster specific variable selection. We demonstrate improved inference on a number of canonical problems.

  12. Clustering performance comparison using K-means and expectation maximization algorithms.

    Science.gov (United States)

    Jung, Yong Gyu; Kang, Min Soo; Heo, Jun

    2014-11-14

    Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.

  13. Semi-supervised Probabilistic Distance Clustering and the Uncertainty of Classification

    Science.gov (United States)

    Iyigun, Cem; Ben-Israel, Adi

    Semi-supervised clustering is an attempt to reconcile clustering (unsupervised learning) and classification (supervised learning, using prior information on the data). These two modes of data analysis are combined in a parameterized model, the parameter θ ∈ [0, 1] is the weight attributed to the prior information, θ = 0 corresponding to clustering, and θ = 1 to classification. The results (cluster centers, classification rule) depend on the parameter θ, an insensitivity to θ indicates that the prior information is in agreement with the intrinsic cluster structure, and is otherwise redundant. This explains why some data sets (such as the Wisconsin breast cancer data, Merz and Murphy, UCI repository of machine learning databases, University of California, Irvine, CA) give good results for all reasonable classification methods. The uncertainty of classification is represented here by the geometric mean of the membership probabilities, shown to be an entropic distance related to the Kullback-Leibler divergence.

  14. Progressive Exponential Clustering-Based Steganography

    Directory of Open Access Journals (Sweden)

    Li Yue

    2010-01-01

    Full Text Available Cluster indexing-based steganography is an important branch of data-hiding techniques. Such schemes normally achieve good balance between high embedding capacity and low embedding distortion. However, most cluster indexing-based steganographic schemes utilise less efficient clustering algorithms for embedding data, which causes redundancy and leaves room for increasing the embedding capacity further. In this paper, a new clustering algorithm, called progressive exponential clustering (PEC, is applied to increase the embedding capacity by avoiding redundancy. Meanwhile, a cluster expansion algorithm is also developed in order to further increase the capacity without sacrificing imperceptibility.

  15. Applications of Cluster Analysis to the Creation of Perfectionism Profiles: A Comparison of two Clustering Approaches

    Directory of Open Access Journals (Sweden)

    Jocelyn H Bolin

    2014-04-01

    Full Text Available Although traditional clustering methods (e.g., K-means have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  16. Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches.

    Science.gov (United States)

    Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C

    2014-01-01

    Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  17. Extracting aerobic system dynamics during unsupervised activities of daily living using wearable sensor machine learning models.

    Science.gov (United States)

    Beltrame, Thomas; Amelard, Robert; Wong, Alexander; Hughson, Richard L

    2018-02-01

    Physical activity levels are related through algorithms to the energetic demand, with no information regarding the integrity of the multiple physiological systems involved in the energetic supply. Longitudinal analysis of the oxygen uptake (V̇o 2 ) by wearable sensors in realistic settings might permit development of a practical tool for the study of the longitudinal aerobic system dynamics (i.e., V̇o 2 kinetics). This study evaluated aerobic system dynamics based on predicted V̇o 2 data obtained from wearable sensors during unsupervised activities of daily living (μADL). Thirteen healthy men performed a laboratory-controlled moderate exercise protocol and were monitored for ≈6 h/day for 4 days (μADL data). Variables derived from hip accelerometer (ACC HIP ), heart rate monitor, and respiratory bands during μADL were extracted and processed by a validated random forest regression model to predict V̇o 2 . The aerobic system analysis was based on the frequency-domain analysis of ACC HIP and predicted V̇o 2 data obtained during μADL. Optimal samples for frequency domain analysis (constrained to ≤0.01 Hz) were selected when ACC HIP was higher than 0.05 g at a given frequency (i.e., participants were active). The temporal characteristics of predicted V̇o 2 data during μADL correlated with the temporal characteristics of measured V̇o 2 data during laboratory-controlled protocol ([Formula: see text] = 0.82, P system dynamics can be investigated during unsupervised activities of daily living by wearable sensors. Although speculative, these algorithms have the potential to be incorporated into wearable systems for early detection of changes in health status in realistic environments by detecting changes in aerobic response dynamics. NEW & NOTEWORTHY The early detection of subclinical aerobic system impairments might be indicative of impaired physiological reserves that impact the capacity for physical activity. This study is the first to use wearable

  18. Clustering with Obstacles in Spatial Databases

    OpenAIRE

    El-Zawawy, Mohamed A.; El-Sharkawi, Mohamed E.

    2009-01-01

    Clustering large spatial databases is an important problem, which tries to find the densely populated regions in a spatial area to be used in data mining, knowledge discovery, or efficient information retrieval. However most algorithms have ignored the fact that physical obstacles such as rivers, lakes, and highways exist in the real world and could thus affect the result of the clustering. In this paper, we propose CPO, an efficient clustering technique to solve the problem of clustering in ...

  19. The delicate balance between parental protection, unsupervised wandering, and adolescents' autonomy and its relation with antisocial behavior : The TRAILS study

    NARCIS (Netherlands)

    Sentse, M.; Dijkstra, J.K.; Lindenberg, S.; Ormel, J.; Veenstra, R.

    In a large sample of early adolescents (T2: N = 1023; M age = 13.51; 55.5% girls), the impact of parental protection and unsupervised wandering on adolescents' antisocial behavior 2.5 years later was tested in this TRAILS study; gender and parental knowledge were controlled for. In addition, the

  20. The Delicate Balance between Parental Protection, Unsupervised Wandering, and Adolescents' Autonomy and Its Relation with Antisocial Behavior: The TRAILS Study

    Science.gov (United States)

    Sentse, Miranda; Dijkstra, Jan Kornelis; Lindenberg, Siegwart; Ormel, Johan; Veenstra, Rene

    2010-01-01

    In a large sample of early adolescents (T2: N = 1023; M age = 13.51; 55.5% girls), the impact of parental protection and unsupervised wandering on adolescents' antisocial behavior 2.5 years later was tested in this TRAILS study; gender and parental knowledge were controlled for. In addition, the level of biological maturation and having antisocial…