WorldWideScience

Sample records for cluster analysis method

  1. MANNER OF STOCKS SORTING USING CLUSTER ANALYSIS METHODS

    Directory of Open Access Journals (Sweden)

    Jana Halčinová

    2014-06-01

    Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.

  2. The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

    Science.gov (United States)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-07-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  3. Comparative analysis of clustering methods for gene expression time course data

    Directory of Open Access Journals (Sweden)

    Ivan G. Costa

    2004-01-01

    Full Text Available This work performs a data driven comparative study of clustering methods used in the analysis of gene expression time courses (or time series. Five clustering methods found in the literature of gene expression analysis are compared: agglomerative hierarchical clustering, CLICK, dynamical clustering, k-means and self-organizing maps. In order to evaluate the methods, a k-fold cross-validation procedure adapted to unsupervised methods is applied. The accuracy of the results is assessed by the comparison of the partitions obtained in these experiments with gene annotation, such as protein function and series classification.

  4. Clustering analysis

    International Nuclear Information System (INIS)

    Romli

    1997-01-01

    Cluster analysis is the name of group of multivariate techniques whose principal purpose is to distinguish similar entities from the characteristics they process.To study this analysis, there are several algorithms that can be used. Therefore, this topic focuses to discuss the algorithms, such as, similarity measures, and hierarchical clustering which includes single linkage, complete linkage and average linkage method. also, non-hierarchical clustering method, which is popular name K -mean method ' will be discussed. Finally, this paper will be described the advantages and disadvantages of every methods

  5. Cluster analysis for applications

    CERN Document Server

    Anderberg, Michael R

    1973-01-01

    Cluster Analysis for Applications deals with methods and various applications of cluster analysis. Topics covered range from variables and scales to measures of association among variables and among data units. Conceptual problems in cluster analysis are discussed, along with hierarchical and non-hierarchical clustering methods. The necessary elements of data analysis, statistics, cluster analysis, and computer implementation are integrated vertically to cover the complete path from raw data to a finished analysis.Comprised of 10 chapters, this book begins with an introduction to the subject o

  6. Semi-supervised clustering methods.

    Science.gov (United States)

    Bair, Eric

    2013-01-01

    Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.

  7. Performance Analysis of Unsupervised Clustering Methods for Brain Tumor Segmentation

    Directory of Open Access Journals (Sweden)

    Tushar H Jaware

    2013-10-01

    Full Text Available Medical image processing is the most challenging and emerging field of neuroscience. The ultimate goal of medical image analysis in brain MRI is to extract important clinical features that would improve methods of diagnosis & treatment of disease. This paper focuses on methods to detect & extract brain tumour from brain MR images. MATLAB is used to design, software tool for locating brain tumor, based on unsupervised clustering methods. K-Means clustering algorithm is implemented & tested on data base of 30 images. Performance evolution of unsupervised clusteringmethods is presented.

  8. Semi-supervised clustering methods

    Science.gov (United States)

    Bair, Eric

    2013-01-01

    Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as “semi-supervised clustering” methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided. PMID:24729830

  9. Hybrid Tracking Algorithm Improvements and Cluster Analysis Methods.

    Science.gov (United States)

    1982-02-26

    UPGMA ), and Ward’s method. Ling’s papers describe a (k,r) clustering method. Each of these methods have individual characteristics which make them...Reference 7), UPGMA is probably the most frequently used clustering strategy. UPGMA tries to group new points into an existing cluster by using an

  10. Differences Between Ward's and UPGMA Methods of Cluster Analysis: Implications for School Psychology.

    Science.gov (United States)

    Hale, Robert L.; Dougherty, Donna

    1988-01-01

    Compared the efficacy of two methods of cluster analysis, the unweighted pair-groups method using arithmetic averages (UPGMA) and Ward's method, for students grouped on intelligence, achievement, and social adjustment by both clustering methods. Found UPGMA more efficacious based on output, on cophenetic correlation coefficients generated by each…

  11. Interactive K-Means Clustering Method Based on User Behavior for Different Analysis Target in Medicine.

    Science.gov (United States)

    Lei, Yang; Yu, Dai; Bin, Zhang; Yang, Yang

    2017-01-01

    Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor's knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K -means clustering method to improve the user's satisfactions towards the result. The core of this method is to get the user's feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user's business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user's requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.

  12. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    Directory of Open Access Journals (Sweden)

    I. Crawford

    2015-11-01

    Full Text Available In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4 where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio–hydro–atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen–Rocky Mountain Biogenic Aerosol Study ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the

  13. Determining wood chip size: image analysis and clustering methods

    Directory of Open Access Journals (Sweden)

    Paolo Febbi

    2013-09-01

    Full Text Available One of the standard methods for the determination of the size distribution of wood chips is the oscillating screen method (EN 15149- 1:2010. Recent literature demonstrated how image analysis could return highly accurate measure of the dimensions defined for each individual particle, and could promote a new method depending on the geometrical shape to determine the chip size in a more accurate way. A sample of wood chips (8 litres was sieved through horizontally oscillating sieves, using five different screen hole diameters (3.15, 8, 16, 45, 63 mm; the wood chips were sorted in decreasing size classes and the mass of all fractions was used to determine the size distribution of the particles. Since the chip shape and size influence the sieving results, Wang’s theory, which concerns the geometric forms, was considered. A cluster analysis on the shape descriptors (Fourier descriptors and size descriptors (area, perimeter, Feret diameters, eccentricity was applied to observe the chips distribution. The UPGMA algorithm was applied on Euclidean distance. The obtained dendrogram shows a group separation according with the original three sieving fractions. A comparison has been made between the traditional sieve and clustering results. This preliminary result shows how the image analysis-based method has a high potential for the characterization of wood chip size distribution and could be further investigated. Moreover, this method could be implemented in an online detection machine for chips size characterization. An improvement of the results is expected by using supervised multivariate methods that utilize known class memberships. The main objective of the future activities will be to shift the analysis from a 2-dimensional method to a 3- dimensional acquisition process.

  14. Cluster analysis

    CERN Document Server

    Everitt, Brian S; Leese, Morven; Stahl, Daniel

    2011-01-01

    Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.Real life examples are used throughout to demons

  15. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis

    Directory of Open Access Journals (Sweden)

    Huanhuan Li

    2017-08-01

    Full Text Available The Shipboard Automatic Identification System (AIS is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW, a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our

  16. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis.

    Science.gov (United States)

    Li, Huanhuan; Liu, Jingxian; Liu, Ryan Wen; Xiong, Naixue; Wu, Kefeng; Kim, Tai-Hoon

    2017-08-04

    The Shipboard Automatic Identification System (AIS) is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW), a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA) is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our proposed method with

  17. A comparison of heuristic and model-based clustering methods for dietary pattern analysis.

    Science.gov (United States)

    Greve, Benjamin; Pigeot, Iris; Huybrechts, Inge; Pala, Valeria; Börnhorst, Claudia

    2016-02-01

    Cluster analysis is widely applied to identify dietary patterns. A new method based on Gaussian mixture models (GMM) seems to be more flexible compared with the commonly applied k-means and Ward's method. In the present paper, these clustering approaches are compared to find the most appropriate one for clustering dietary data. The clustering methods were applied to simulated data sets with different cluster structures to compare their performance knowing the true cluster membership of observations. Furthermore, the three methods were applied to FFQ data assessed in 1791 children participating in the IDEFICS (Identification and Prevention of Dietary- and Lifestyle-Induced Health Effects in Children and Infants) Study to explore their performance in practice. The GMM outperformed the other methods in the simulation study in 72 % up to 100 % of cases, depending on the simulated cluster structure. Comparing the computationally less complex k-means and Ward's methods, the performance of k-means was better in 64-100 % of cases. Applied to real data, all methods identified three similar dietary patterns which may be roughly characterized as a 'non-processed' cluster with a high consumption of fruits, vegetables and wholemeal bread, a 'balanced' cluster with only slight preferences of single foods and a 'junk food' cluster. The simulation study suggests that clustering via GMM should be preferred due to its higher flexibility regarding cluster volume, shape and orientation. The k-means seems to be a good alternative, being easier to use while giving similar results when applied to real data.

  18. Developing a Clustering-Based Empirical Bayes Analysis Method for Hotspot Identification

    Directory of Open Access Journals (Sweden)

    Yajie Zou

    2017-01-01

    Full Text Available Hotspot identification (HSID is a critical part of network-wide safety evaluations. Typical methods for ranking sites are often rooted in using the Empirical Bayes (EB method to estimate safety from both observed crash records and predicted crash frequency based on similar sites. The performance of the EB method is highly related to the selection of a reference group of sites (i.e., roadway segments or intersections similar to the target site from which safety performance functions (SPF used to predict crash frequency will be developed. As crash data often contain underlying heterogeneity that, in essence, can make them appear to be generated from distinct subpopulations, methods are needed to select similar sites in a principled manner. To overcome this possible heterogeneity problem, EB-based HSID methods that use common clustering methodologies (e.g., mixture models, K-means, and hierarchical clustering to select “similar” sites for building SPFs are developed. Performance of the clustering-based EB methods is then compared using real crash data. Here, HSID results, when computed on Texas undivided rural highway cash data, suggest that all three clustering-based EB analysis methods are preferred over the conventional statistical methods. Thus, properly classifying the road segments for heterogeneous crash data can further improve HSID accuracy.

  19. Cluster cosmological analysis with X ray instrumental observables: introduction and testing of AsPIX method

    International Nuclear Information System (INIS)

    Valotti, Andrea

    2016-01-01

    Cosmology is one of the fundamental pillars of astrophysics, as such it contains many unsolved puzzles. To investigate some of those puzzles, we analyze X-ray surveys of galaxy clusters. These surveys are possible thanks to the bremsstrahlung emission of the intra-cluster medium. The simultaneous fit of cluster counts as a function of mass and distance provides an independent measure of cosmological parameters such as Ω m , σ s , and the dark energy equation of state w0. A novel approach to cosmological analysis using galaxy cluster data, called top-down, was developed in N. Clerc et al. (2012). This top-down approach is based purely on instrumental observables that are considered in a two-dimensional X-ray color-magnitude diagram. The method self-consistently includes selection effects and scaling relationships. It also provides a means of bypassing the computation of individual cluster masses. My work presents an extension of the top-down method by introducing the apparent size of the cluster, creating a three-dimensional X-ray cluster diagram. The size of a cluster is sensitive to both the cluster mass and its angular diameter, so it must also be included in the assessment of selection effects. The performance of this new method is investigated using a Fisher analysis. In parallel, I have studied the effects of the intrinsic scatter in the cluster size scaling relation on the sample selection as well as on the obtained cosmological parameters. To validate the method, I estimate uncertainties of cosmological parameters with MCMC method Amoeba minimization routine and using two simulated XMM surveys that have an increasing level of complexity. The first simulated survey is a set of toy catalogues of 100 and 10000 deg 2 , whereas the second is a 1000 deg 2 catalogue that was generated using an Aardvark semi-analytical N-body simulation. This comparison corroborates the conclusions of the Fisher analysis. In conclusion, I find that a cluster diagram that accounts

  20. [Cluster analysis in biomedical researches].

    Science.gov (United States)

    Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

    2013-01-01

    Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.

  1. a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis

    Science.gov (United States)

    Huang, W.; Li, S.; Xu, S.

    2016-06-01

    How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the

  2. Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture

    Science.gov (United States)

    Sanfilippo, Antonio [Richland, WA; Calapristi, Augustin J [West Richland, WA; Crow, Vernon L [Richland, WA; Hetzler, Elizabeth G [Kennewick, WA; Turner, Alan E [Kennewick, WA

    2009-12-22

    Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture are described. In one aspect, a document clustering method includes providing a document set comprising a plurality of documents, providing a cluster comprising a subset of the documents of the document set, using a plurality of terms of the documents, providing a cluster label indicative of subject matter content of the documents of the cluster, wherein the cluster label comprises a plurality of word senses, and selecting one of the word senses of the cluster label.

  3. Comprehensive cluster analysis with Transitivity Clustering.

    Science.gov (United States)

    Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan

    2011-03-01

    Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

  4. Cluster analysis of European Y-chromosomal STR haplotypes using the discrete Laplace method

    DEFF Research Database (Denmark)

    Andersen, Mikkel Meyer; Eriksen, Poul Svante; Morling, Niels

    2014-01-01

    The European Y-chromosomal short tandem repeat (STR) haplotype distribution has previously been analysed in various ways. Here, we introduce a new way of analysing population substructure using a new method based on clustering within the discrete Laplace exponential family that models the probabi......The European Y-chromosomal short tandem repeat (STR) haplotype distribution has previously been analysed in various ways. Here, we introduce a new way of analysing population substructure using a new method based on clustering within the discrete Laplace exponential family that models...... the probability distribution of the Y-STR haplotypes. Creating a consistent statistical model of the haplotypes enables us to perform a wide range of analyses. Previously, haplotype frequency estimation using the discrete Laplace method has been validated. In this paper we investigate how the discrete Laplace...... method can be used for cluster analysis to further validate the discrete Laplace method. A very important practical fact is that the calculations can be performed on a normal computer. We identified two sub-clusters of the Eastern and Western European Y-STR haplotypes similar to results of previous...

  5. Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster

    Science.gov (United States)

    Syakur, M. A.; Khotimah, B. K.; Rochman, E. M. S.; Satoto, B. D.

    2018-04-01

    Clustering is a data mining technique used to analyse data that has variations and the number of lots. Clustering was process of grouping data into a cluster, so they contained data that is as similar as possible and different from other cluster objects. SMEs Indonesia has a variety of customers, but SMEs do not have the mapping of these customers so they did not know which customers are loyal or otherwise. Customer mapping is a grouping of customer profiling to facilitate analysis and policy of SMEs in the production of goods, especially batik sales. Researchers will use a combination of K-Means method with elbow to improve efficient and effective k-means performance in processing large amounts of data. K-Means Clustering is a localized optimization method that is sensitive to the selection of the starting position from the midpoint of the cluster. So choosing the starting position from the midpoint of a bad cluster will result in K-Means Clustering algorithm resulting in high errors and poor cluster results. The K-means algorithm has problems in determining the best number of clusters. So Elbow looks for the best number of clusters on the K-means method. Based on the results obtained from the process in determining the best number of clusters with elbow method can produce the same number of clusters K on the amount of different data. The result of determining the best number of clusters with elbow method will be the default for characteristic process based on case study. Measurement of k-means value of k-means has resulted in the best clusters based on SSE values on 500 clusters of batik visitors. The result shows the cluster has a sharp decrease is at K = 3, so K as the cut-off point as the best cluster.

  6. Grey Wolf Optimizer Based on Powell Local Optimization Method for Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Sen Zhang

    2015-01-01

    Full Text Available One heuristic evolutionary algorithm recently proposed is the grey wolf optimizer (GWO, inspired by the leadership hierarchy and hunting mechanism of grey wolves in nature. This paper presents an extended GWO algorithm based on Powell local optimization method, and we call it PGWO. PGWO algorithm significantly improves the original GWO in solving complex optimization problems. Clustering is a popular data analysis and data mining technique. Hence, the PGWO could be applied in solving clustering problems. In this study, first the PGWO algorithm is tested on seven benchmark functions. Second, the PGWO algorithm is used for data clustering on nine data sets. Compared to other state-of-the-art evolutionary algorithms, the results of benchmark and data clustering demonstrate the superior performance of PGWO algorithm.

  7. A THREE-STEP SPATIAL-TEMPORAL-SEMANTIC CLUSTERING METHOD FOR HUMAN ACTIVITY PATTERN ANALYSIS

    Directory of Open Access Journals (Sweden)

    W. Huang

    2016-06-01

    Full Text Available How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time to four dimensions (space, time and semantics. More specifically, not only a location and time that people stay and spend are collected, but also what people “say” for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The

  8. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations

    Science.gov (United States)

    2014-01-01

    Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations

  9. A Novel Double Cluster and Principal Component Analysis-Based Optimization Method for the Orbit Design of Earth Observation Satellites

    Directory of Open Access Journals (Sweden)

    Yunfeng Dong

    2017-01-01

    Full Text Available The weighted sum and genetic algorithm-based hybrid method (WSGA-based HM, which has been applied to multiobjective orbit optimizations, is negatively influenced by human factors through the artificial choice of the weight coefficients in weighted sum method and the slow convergence of GA. To address these two problems, a cluster and principal component analysis-based optimization method (CPC-based OM is proposed, in which many candidate orbits are gradually randomly generated until the optimal orbit is obtained using a data mining method, that is, cluster analysis based on principal components. Then, the second cluster analysis of the orbital elements is introduced into CPC-based OM to improve the convergence, developing a novel double cluster and principal component analysis-based optimization method (DCPC-based OM. In DCPC-based OM, the cluster analysis based on principal components has the advantage of reducing the human influences, and the cluster analysis based on six orbital elements can reduce the search space to effectively accelerate convergence. The test results from a multiobjective numerical benchmark function and the orbit design results of an Earth observation satellite show that DCPC-based OM converges more efficiently than WSGA-based HM. And DCPC-based OM, to some degree, reduces the influence of human factors presented in WSGA-based HM.

  10. Performance Analysis of Entropy Methods on K Means in Clustering Process

    Science.gov (United States)

    Dicky Syahputra Lubis, Mhd.; Mawengkang, Herman; Suwilo, Saib

    2017-12-01

    K Means is a non-hierarchical data clustering method that attempts to partition existing data into one or more clusters / groups. This method partitions the data into clusters / groups so that data that have the same characteristics are grouped into the same cluster and data that have different characteristics are grouped into other groups.The purpose of this data clustering is to minimize the objective function set in the clustering process, which generally attempts to minimize variation within a cluster and maximize the variation between clusters. However, the main disadvantage of this method is that the number k is often not known before. Furthermore, a randomly chosen starting point may cause two points to approach the distance to be determined as two centroids. Therefore, for the determination of the starting point in K Means used entropy method where this method is a method that can be used to determine a weight and take a decision from a set of alternatives. Entropy is able to investigate the harmony in discrimination among a multitude of data sets. Using Entropy criteria with the highest value variations will get the highest weight. Given this entropy method can help K Means work process in determining the starting point which is usually determined at random. Thus the process of clustering on K Means can be more quickly known by helping the entropy method where the iteration process is faster than the K Means Standard process. Where the postoperative patient dataset of the UCI Repository Machine Learning used and using only 12 data as an example of its calculations is obtained by entropy method only with 2 times iteration can get the desired end result.

  11. application of single-linkage clustering method in the analysis of ...

    African Journals Online (AJOL)

    Admin

    ANALYSIS OF GROWTH RATE OF GROSS DOMESTIC PRODUCT. (GDP) AT ... The end result of the algorithm is a tree of clusters called a dendrogram, which shows how the clusters are ..... Number of cluster sum from from observations of ...

  12. Integrative cluster analysis in bioinformatics

    CERN Document Server

    Abu-Jamous, Basel; Nandi, Asoke K

    2015-01-01

    Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review o

  13. Analysis of cost data in a cluster-randomized, controlled trial: comparison of methods

    DEFF Research Database (Denmark)

    Sokolowski, Ineta; Ørnbøl, Eva; Rosendal, Marianne

    studies have used non-valid analysis of skewed data. We propose two different methods to compare mean cost in two groups. Firstly, we use a non-parametric bootstrap method where the re-sampling takes place on two levels in order to take into account the cluster effect. Secondly, we proceed with a log......-transformation of the cost data and apply the normal theory on these data. Again we try to account for the cluster effect. The performance of these two methods is investigated in a simulation study. The advantages and disadvantages of the different approaches are discussed.......  We consider health care data from a cluster-randomized intervention study in primary care to test whether the average health care costs among study patients differ between the two groups. The problems of analysing cost data are that most data are severely skewed. Median instead of mean...

  14. Clustering Methods with Qualitative Data: a Mixed-Methods Approach for Prevention Research with Small Samples.

    Science.gov (United States)

    Henry, David; Dymnicki, Allison B; Mohatt, Nathaniel; Allen, James; Kelly, James G

    2015-10-01

    Qualitative methods potentially add depth to prevention research but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed-methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed-methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-means clustering, and latent class analysis produced similar levels of accuracy with binary data and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a "real-world" example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities.

  15. Clustering Methods with Qualitative Data: A Mixed Methods Approach for Prevention Research with Small Samples

    Science.gov (United States)

    Henry, David; Dymnicki, Allison B.; Mohatt, Nathaniel; Allen, James; Kelly, James G.

    2016-01-01

    Qualitative methods potentially add depth to prevention research, but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data, but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-Means clustering, and latent class analysis produced similar levels of accuracy with binary data, and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a “real-world” example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities. PMID:25946969

  16. Cluster Analysis of the Newcastle Electronic Corpus of Tyneside English: A Comparison of Methods

    NARCIS (Netherlands)

    Moisl, Hermann; Jones, Valerie M.

    2005-01-01

    This article examines the feasibility of an empirical approach to sociolinguistic analysis of the Newcastle Electronic Corpus of Tyneside English using exploratory multivariate methods. It addresses a known problem with one class of such methods, hierarchical cluster analysis—that different

  17. The cosmological analysis of X-ray cluster surveys - I. A new method for interpreting number counts

    Science.gov (United States)

    Clerc, N.; Pierre, M.; Pacaud, F.; Sadibekova, T.

    2012-07-01

    We present a new method aimed at simplifying the cosmological analysis of X-ray cluster surveys. It is based on purely instrumental observable quantities considered in a two-dimensional X-ray colour-magnitude diagram (hardness ratio versus count rate). The basic principle is that even in rather shallow surveys, substantial information on cluster redshift and temperature is present in the raw X-ray data and can be statistically extracted; in parallel, such diagrams can be readily predicted from an ab initio cosmological modelling. We illustrate the methodology for the case of a 100-deg2XMM survey having a sensitivity of ˜10-14 erg s-1 cm-2 and fit at the same time, the survey selection function, the cluster evolutionary scaling relations and the cosmology; our sole assumption - driven by the limited size of the sample considered in the case study - is that the local cluster scaling relations are known. We devote special attention to the realistic modelling of the count-rate measurement uncertainties and evaluate the potential of the method via a Fisher analysis. In the absence of individual cluster redshifts, the count rate and hardness ratio (CR-HR) method appears to be much more efficient than the traditional approach based on cluster counts (i.e. dn/dz, requiring redshifts). In the case where redshifts are available, our method performs similar to the traditional mass function (dn/dM/dz) for the purely cosmological parameters, but constrains better parameters defining the cluster scaling relations and their evolution. A further practical advantage of the CR-HR method is its simplicity: this fully top-down approach totally bypasses the tedious steps consisting in deriving cluster masses from X-ray temperature measurements.

  18. The relationship between supplier networks and industrial clusters: an analysis based on the cluster mapping method

    Directory of Open Access Journals (Sweden)

    Ichiro IWASAKI

    2010-06-01

    Full Text Available Michael Porter’s concept of competitive advantages emphasizes the importance of regional cooperation of various actors in order to gain competitiveness on globalized markets. Foreign investors may play an important role in forming such cooperation networks. Their local suppliers tend to concentrate regionally. They can form, together with local institutions of education, research, financial and other services, development agencies, the nucleus of cooperative clusters. This paper deals with the relationship between supplier networks and clusters. Two main issues are discussed in more detail: the interest of multinational companies in entering regional clusters and the spillover effects that may stem from their participation. After the discussion on the theoretical background, the paper introduces a relatively new analytical method: “cluster mapping” - a method that can spot regional hot spots of specific economic activities with cluster building potential. Experience with the method was gathered in the US and in the European Union. After the discussion on the existing empirical evidence, the authors introduce their own cluster mapping results, which they obtained by using a refined version of the original methodology.

  19. METHOD OF CONSTRUCTION OF GENETIC DATA CLUSTERS

    Directory of Open Access Journals (Sweden)

    N. A. Novoselova

    2016-01-01

    Full Text Available The paper presents a method of construction of genetic data clusters (functional modules using the randomized matrices. To build the functional modules the selection and analysis of the eigenvalues of the gene profiles correlation matrix is performed. The principal components, corresponding to the eigenvalues, which are significantly different from those obtained for the randomly generated correlation matrix, are used for the analysis. Each selected principal component forms gene cluster. In a comparative experiment with the analogs the proposed method shows the advantage in allocating statistically significant different-sized clusters, the ability to filter non- informative genes and to extract the biologically interpretable functional modules matching the real data structure.

  20. Comparing 3 dietary pattern methods--cluster analysis, factor analysis, and index analysis--With colorectal cancer risk: The NIH-AARP Diet and Health Study.

    Science.gov (United States)

    Reedy, Jill; Wirfält, Elisabet; Flood, Andrew; Mitrou, Panagiota N; Krebs-Smith, Susan M; Kipnis, Victor; Midthune, Douglas; Leitzmann, Michael; Hollenbeck, Albert; Schatzkin, Arthur; Subar, Amy F

    2010-02-15

    The authors compared dietary pattern methods-cluster analysis, factor analysis, and index analysis-with colorectal cancer risk in the National Institutes of Health (NIH)-AARP Diet and Health Study (n = 492,306). Data from a 124-item food frequency questionnaire (1995-1996) were used to identify 4 clusters for men (3 clusters for women), 3 factors, and 4 indexes. Comparisons were made with adjusted relative risks and 95% confidence intervals, distributions of individuals in clusters by quintile of factor and index scores, and health behavior characteristics. During 5 years of follow-up through 2000, 3,110 colorectal cancer cases were ascertained. In men, the vegetables and fruits cluster, the fruits and vegetables factor, the fat-reduced/diet foods factor, and all indexes were associated with reduced risk; the meat and potatoes factor was associated with increased risk. In women, reduced risk was found with the Healthy Eating Index-2005 and increased risk with the meat and potatoes factor. For men, beneficial health characteristics were seen with all fruit/vegetable patterns, diet foods patterns, and indexes, while poorer health characteristics were found with meat patterns. For women, findings were similar except that poorer health characteristics were seen with diet foods patterns. Similarities were found across methods, suggesting basic qualities of healthy diets. Nonetheless, findings vary because each method answers a different question.

  1. The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    Chen Yidong

    2004-01-01

    Full Text Available An unsupervised data clustering method, called the local maximum clustering (LMC method, is proposed for identifying clusters in experiment data sets based on research interest. A magnitude property is defined according to research purposes, and data sets are clustered around each local maximum of the magnitude property. By properly defining a magnitude property, this method can overcome many difficulties in microarray data clustering such as reduced projection in similarities, noises, and arbitrary gene distribution. To critically evaluate the performance of this clustering method in comparison with other methods, we designed three model data sets with known cluster distributions and applied the LMC method as well as the hierarchic clustering method, the -mean clustering method, and the self-organized map method to these model data sets. The results show that the LMC method produces the most accurate clustering results. As an example of application, we applied the method to cluster the leukemia samples reported in the microarray study of Golub et al. (1999.

  2. Improving local clustering based top-L link prediction methods via asymmetric link clustering information

    Science.gov (United States)

    Wu, Zhihao; Lin, Youfang; Zhao, Yiji; Yan, Hongyan

    2018-02-01

    Networks can represent a wide range of complex systems, such as social, biological and technological systems. Link prediction is one of the most important problems in network analysis, and has attracted much research interest recently. Many link prediction methods have been proposed to solve this problem with various techniques. We can note that clustering information plays an important role in solving the link prediction problem. In previous literatures, we find node clustering coefficient appears frequently in many link prediction methods. However, node clustering coefficient is limited to describe the role of a common-neighbor in different local networks, because it cannot distinguish different clustering abilities of a node to different node pairs. In this paper, we shift our focus from nodes to links, and propose the concept of asymmetric link clustering (ALC) coefficient. Further, we improve three node clustering based link prediction methods via the concept of ALC. The experimental results demonstrate that ALC-based methods outperform node clustering based methods, especially achieving remarkable improvements on food web, hamster friendship and Internet networks. Besides, comparing with other methods, the performance of ALC-based methods are very stable in both globalized and personalized top-L link prediction tasks.

  3. Comparing the performance of biomedical clustering methods

    DEFF Research Database (Denmark)

    Wiwie, Christian; Baumbach, Jan; Röttger, Richard

    2015-01-01

    expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future......Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene....... This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide...

  4. Progeny Clustering: A Method to Identify Biological Phenotypes

    Science.gov (United States)

    Hu, Chenyue W.; Kornblau, Steven M.; Slater, John H.; Qutub, Amina A.

    2015-01-01

    Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset. PMID:26267476

  5. Cluster Analysis of the Newcastle Electronic Corpus of Tyneside English: In A Comparison of Methods

    NARCIS (Netherlands)

    Moisl, Hermann; Jones, Valerie M.

    2005-01-01

    This article examines the feasibility of an empirical approach to sociolinguistic analysis of the Newcastle Electronic Corpus of Tyneside English using exploratory multivariate methods. It addresses a known problem with one class of such methods, hierarchical cluster analysis—that different

  6. A comparison of three clustering methods for finding subgroups in MRI, SMS or clinical data: SPSS TwoStep Cluster analysis, Latent Gold and SNOB.

    Science.gov (United States)

    Kent, Peter; Jensen, Rikke K; Kongsted, Alice

    2014-10-02

    There are various methodological approaches to identifying clinically important subgroups and one method is to identify clusters of characteristics that differentiate people in cross-sectional and/or longitudinal data using Cluster Analysis (CA) or Latent Class Analysis (LCA). There is a scarcity of head-to-head comparisons that can inform the choice of which clustering method might be suitable for particular clinical datasets and research questions. Therefore, the aim of this study was to perform a head-to-head comparison of three commonly available methods (SPSS TwoStep CA, Latent Gold LCA and SNOB LCA). The performance of these three methods was compared: (i) quantitatively using the number of subgroups detected, the classification probability of individuals into subgroups, the reproducibility of results, and (ii) qualitatively using subjective judgments about each program's ease of use and interpretability of the presentation of results.We analysed five real datasets of varying complexity in a secondary analysis of data from other research projects. Three datasets contained only MRI findings (n = 2,060 to 20,810 vertebral disc levels), one dataset contained only pain intensity data collected for 52 weeks by text (SMS) messaging (n = 1,121 people), and the last dataset contained a range of clinical variables measured in low back pain patients (n = 543 people). Four artificial datasets (n = 1,000 each) containing subgroups of varying complexity were also analysed testing the ability of these clustering methods to detect subgroups and correctly classify individuals when subgroup membership was known. The results from the real clinical datasets indicated that the number of subgroups detected varied, the certainty of classifying individuals into those subgroups varied, the findings had perfect reproducibility, some programs were easier to use and the interpretability of the presentation of their findings also varied. The results from the artificial datasets

  7. Clustering Methods Application for Customer Segmentation to Manage Advertisement Campaign

    OpenAIRE

    Maciej Kutera; Mirosława Lasek

    2010-01-01

    Clustering methods are recently so advanced elaborated algorithms for large collection data analysis that they have been already included today to data mining methods. Clustering methods are nowadays larger and larger group of methods, very quickly evolving and having more and more various applications. In the article, our research concerning usefulness of clustering methods in customer segmentation to manage advertisement campaign is presented. We introduce results obtained by using four sel...

  8. Fuzzy C-means method for clustering microarray data.

    Science.gov (United States)

    Dembélé, Doulaye; Kastner, Philippe

    2003-05-22

    Clustering analysis of data from DNA microarray hybridization studies is essential for identifying biologically relevant groups of genes. Partitional clustering methods such as K-means or self-organizing maps assign each gene to a single cluster. However, these methods do not provide information about the influence of a given gene for the overall shape of clusters. Here we apply a fuzzy partitioning method, Fuzzy C-means (FCM), to attribute cluster membership values to genes. A major problem in applying the FCM method for clustering microarray data is the choice of the fuzziness parameter m. We show that the commonly used value m = 2 is not appropriate for some data sets, and that optimal values for m vary widely from one data set to another. We propose an empirical method, based on the distribution of distances between genes in a given data set, to determine an adequate value for m. By setting threshold levels for the membership values, genes which are tigthly associated to a given cluster can be selected. Using a yeast cell cycle data set as an example, we show that this selection increases the overall biological significance of the genes within the cluster. Supplementary text and Matlab functions are available at http://www-igbmc.u-strasbg.fr/fcm/

  9. Cluster analysis as a method for determining size ranges for spinal implants: disc lumbar replacement prosthesis dimensions from magnetic resonance images.

    Science.gov (United States)

    Lei, Dang; Holder, Roger L; Smith, Francis W; Wardlaw, Douglas; Hukins, David W L

    2006-12-01

    Statistical analysis of clinical radiologic data. To develop an objective method for finding the number of sizes for a lumbar disc replacement. Cluster analysis is a well-established technique for sorting observations into clusters so that the "similarity level" is maximal if they belong to the same cluster and minimal otherwise. Magnetic resonance scans from 69 patients, with no abnormal discs, yielded 206 sagittal and transverse images of 206 discs (levels L3-L4-L5-S1). Anteroposterior and lateral dimensions were measured from vertebral margins on transverse images; disc heights were measured from sagittal images. Hierarchical cluster analysis was performed to determine the number of clusters followed by nonhierarchical (K-means) cluster analysis. Discriminant analysis was used to determine how well the clusters could be used to classify an observation. The most successful method of clustering the data involved the following parameters: anteroposterior dimension; lateral dimension (both were the mean of results from the superior and inferior margins of a vertebral body, measured on transverse images); and maximum disc height (from a midsagittal image). These were grouped into 7 clusters so that a discriminant analysis was capable of correctly classifying 97.1% of the observations. The mean and standard deviations for the parameter values in each cluster were determined. Cluster analysis has been successfully used to find the dimensions of the minimum number of prosthesis sizes required to replace L3-L4 to L5-S1 discs; the range of sizes would enable them to be used at higher lumbar levels in some patients.

  10. Membership determination of open clusters based on a spectral clustering method

    Science.gov (United States)

    Gao, Xin-Hua

    2018-06-01

    We present a spectral clustering (SC) method aimed at segregating reliable members of open clusters in multi-dimensional space. The SC method is a non-parametric clustering technique that performs cluster division using eigenvectors of the similarity matrix; no prior knowledge of the clusters is required. This method is more flexible in dealing with multi-dimensional data compared to other methods of membership determination. We use this method to segregate the cluster members of five open clusters (Hyades, Coma Ber, Pleiades, Praesepe, and NGC 188) in five-dimensional space; fairly clean cluster members are obtained. We find that the SC method can capture a small number of cluster members (weak signal) from a large number of field stars (heavy noise). Based on these cluster members, we compute the mean proper motions and distances for the Hyades, Coma Ber, Pleiades, and Praesepe clusters, and our results are in general quite consistent with the results derived by other authors. The test results indicate that the SC method is highly suitable for segregating cluster members of open clusters based on high-precision multi-dimensional astrometric data such as Gaia data.

  11. WebGimm: An integrated web-based platform for cluster analysis, functional analysis, and interactive visualization of results.

    Science.gov (United States)

    Joshi, Vineet K; Freudenberg, Johannes M; Hu, Zhen; Medvedovic, Mario

    2011-01-17

    Cluster analysis methods have been extensively researched, but the adoption of new methods is often hindered by technical barriers in their implementation and use. WebGimm is a free cluster analysis web-service, and an open source general purpose clustering web-server infrastructure designed to facilitate easy deployment of integrated cluster analysis servers based on clustering and functional annotation algorithms implemented in R. Integrated functional analyses and interactive browsing of both, clustering structure and functional annotations provides a complete analytical environment for cluster analysis and interpretation of results. The Java Web Start client-based interface is modeled after the familiar cluster/treeview packages making its use intuitive to a wide array of biomedical researchers. For biomedical researchers, WebGimm provides an avenue to access state of the art clustering procedures. For Bioinformatics methods developers, WebGimm offers a convenient avenue to deploy their newly developed clustering methods. WebGimm server, software and manuals can be freely accessed at http://ClusterAnalysis.org/.

  12. Cluster analysis of track structure

    International Nuclear Information System (INIS)

    Michalik, V.

    1991-01-01

    One of the possibilities of classifying track structures is application of conventional partition techniques of analysis of multidimensional data to the track structure. Using these cluster algorithms this paper attempts to find characteristics of radiation reflecting the spatial distribution of ionizations in the primary particle track. An absolute frequency distribution of clusters of ionizations giving the mean number of clusters produced by radiation per unit of deposited energy can serve as this characteristic. General computation techniques used as well as methods of calculations of distributions of clusters for different radiations are discussed. 8 refs.; 5 figs

  13. bcl::Cluster : A method for clustering biological molecules coupled with visualization in the Pymol Molecular Graphics System.

    Science.gov (United States)

    Alexander, Nathan; Woetzel, Nils; Meiler, Jens

    2011-02-01

    Clustering algorithms are used as data analysis tools in a wide variety of applications in Biology. Clustering has become especially important in protein structure prediction and virtual high throughput screening methods. In protein structure prediction, clustering is used to structure the conformational space of thousands of protein models. In virtual high throughput screening, databases with millions of drug-like molecules are organized by structural similarity, e.g. common scaffolds. The tree-like dendrogram structure obtained from hierarchical clustering can provide a qualitative overview of the results, which is important for focusing detailed analysis. However, in practice it is difficult to relate specific components of the dendrogram directly back to the objects of which it is comprised and to display all desired information within the two dimensions of the dendrogram. The current work presents a hierarchical agglomerative clustering method termed bcl::Cluster. bcl::Cluster utilizes the Pymol Molecular Graphics System to graphically depict dendrograms in three dimensions. This allows simultaneous display of relevant biological molecules as well as additional information about the clusters and the members comprising them.

  14. Cluster analysis in phenotyping a Portuguese population.

    Science.gov (United States)

    Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J

    2015-09-03

    Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.

  15. Clustering Methods Application for Customer Segmentation to Manage Advertisement Campaign

    Directory of Open Access Journals (Sweden)

    Maciej Kutera

    2010-10-01

    Full Text Available Clustering methods are recently so advanced elaborated algorithms for large collection data analysis that they have been already included today to data mining methods. Clustering methods are nowadays larger and larger group of methods, very quickly evolving and having more and more various applications. In the article, our research concerning usefulness of clustering methods in customer segmentation to manage advertisement campaign is presented. We introduce results obtained by using four selected methods which have been chosen because their peculiarities suggested their applicability to our purposes. One of the analyzed method k-means clustering with random selected initial cluster seeds gave very good results in customer segmentation to manage advertisement campaign and these results were presented in details in the article. In contrast one of the methods (hierarchical average linkage was found useless in customer segmentation. Further investigations concerning benefits of clustering methods in customer segmentation to manage advertisement campaign is worth continuing, particularly that finding solutions in this field can give measurable profits for marketing activity.

  16. Clustering methods for the optimization of atomic cluster structure

    Science.gov (United States)

    Bagattini, Francesco; Schoen, Fabio; Tigli, Luca

    2018-04-01

    In this paper, we propose a revised global optimization method and apply it to large scale cluster conformation problems. In the 1990s, the so-called clustering methods were considered among the most efficient general purpose global optimization techniques; however, their usage has quickly declined in recent years, mainly due to the inherent difficulties of clustering approaches in large dimensional spaces. Inspired from the machine learning literature, we redesigned clustering methods in order to deal with molecular structures in a reduced feature space. Our aim is to show that by suitably choosing a good set of geometrical features coupled with a very efficient descent method, an effective optimization tool is obtained which is capable of finding, with a very high success rate, all known putative optima for medium size clusters without any prior information, both for Lennard-Jones and Morse potentials. The main result is that, beyond being a reliable approach, the proposed method, based on the idea of starting a computationally expensive deep local search only when it seems worth doing so, is capable of saving a huge amount of searches with respect to an analogous algorithm which does not employ a clustering phase. In this paper, we are not claiming the superiority of the proposed method compared to specific, refined, state-of-the-art procedures, but rather indicating a quite straightforward way to save local searches by means of a clustering scheme working in a reduced variable space, which might prove useful when included in many modern methods.

  17. Topic modeling for cluster analysis of large biological and medical datasets.

    Science.gov (United States)

    Zhao, Weizhong; Zou, Wen; Chen, James J

    2014-01-01

    The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting

  18. A New Cluster Analysis-Marker-Controlled Watershed Method for Separating Particles of Granular Soils.

    Science.gov (United States)

    Alam, Md Ferdous; Haque, Asadul

    2017-10-18

    An accurate determination of particle-level fabric of granular soils from tomography data requires a maximum correct separation of particles. The popular marker-controlled watershed separation method is widely used to separate particles. However, the watershed method alone is not capable of producing the maximum separation of particles when subjected to boundary stresses leading to crushing of particles. In this paper, a new separation method, named as Monash Particle Separation Method (MPSM), has been introduced. The new method automatically determines the optimal contrast coefficient based on cluster evaluation framework to produce the maximum accurate separation outcomes. Finally, the particles which could not be separated by the optimal contrast coefficient were separated by integrating cuboid markers generated from the clustering by Gaussian mixture models into the routine watershed method. The MPSM was validated on a uniformly graded sand volume subjected to one-dimensional compression loading up to 32 MPa. It was demonstrated that the MPSM is capable of producing the best possible separation of particles required for the fabric analysis.

  19. Robust cluster analysis and variable selection

    CERN Document Server

    Ritter, Gunter

    2014-01-01

    Clustering remains a vibrant area of research in statistics. Although there are many books on this topic, there are relatively few that are well founded in the theoretical aspects. In Robust Cluster Analysis and Variable Selection, Gunter Ritter presents an overview of the theory and applications of probabilistic clustering and variable selection, synthesizing the key research results of the last 50 years. The author focuses on the robust clustering methods he found to be the most useful on simulated data and real-time applications. The book provides clear guidance for the varying needs of bot

  20. A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.

    Science.gov (United States)

    Ferrari, Alberto; Comelli, Mario

    2016-12-01

    In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.

  1. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis

    Science.gov (United States)

    Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

    2015-01-01

    Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383

  2. Dynamic analysis of clustered building structures using substructures methods

    International Nuclear Information System (INIS)

    Leimbach, K.R.; Krutzik, N.J.

    1989-01-01

    The dynamic substructure approach to the building cluster on a common base mat starts with the generation of Ritz-vectors for each building on a rigid foundation. The base mat plus the foundation soil is subjected to kinematic constraint modes, for example constant, linear, quadratic or cubic constraints. These constraint modes are also imposed on the buildings. By enforcing kinematic compatibility of the complete structural system on the basis of the constraint modes a reduced Ritz model of the complete cluster is obtained. This reduced model can now be analyzed by modal time history or response spectrum methods

  3. Simultaneous Two-Way Clustering of Multiple Correspondence Analysis

    Science.gov (United States)

    Hwang, Heungsun; Dillon, William R.

    2010-01-01

    A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is…

  4. Pattern recognition in menstrual bleeding diaries by statistical cluster analysis

    Directory of Open Access Journals (Sweden)

    Wessel Jens

    2009-07-01

    Full Text Available Abstract Background The aim of this paper is to empirically identify a treatment-independent statistical method to describe clinically relevant bleeding patterns by using bleeding diaries of clinical studies on various sex hormone containing drugs. Methods We used the four cluster analysis methods single, average and complete linkage as well as the method of Ward for the pattern recognition in menstrual bleeding diaries. The optimal number of clusters was determined using the semi-partial R2, the cubic cluster criterion, the pseudo-F- and the pseudo-t2-statistic. Finally, the interpretability of the results from a gynecological point of view was assessed. Results The method of Ward yielded distinct clusters of the bleeding diaries. The other methods successively chained the observations into one cluster. The optimal number of distinctive bleeding patterns was six. We found two desirable and four undesirable bleeding patterns. Cyclic and non cyclic bleeding patterns were well separated. Conclusion Using this cluster analysis with the method of Ward medications and devices having an impact on bleeding can be easily compared and categorized.

  5. Functional connectivity analysis of the neural bases of emotion regulation: A comparison of independent component method with density-based k-means clustering method.

    Science.gov (United States)

    Zou, Ling; Guo, Qian; Xu, Yi; Yang, Biao; Jiao, Zhuqing; Xiang, Jianbo

    2016-04-29

    Functional magnetic resonance imaging (fMRI) is an important tool in neuroscience for assessing connectivity and interactions between distant areas of the brain. To find and characterize the coherent patterns of brain activity as a means of identifying brain systems for the cognitive reappraisal of the emotion task, both density-based k-means clustering and independent component analysis (ICA) methods can be applied to characterize the interactions between brain regions involved in cognitive reappraisal of emotion. Our results reveal that compared with the ICA method, the density-based k-means clustering method provides a higher sensitivity of polymerization. In addition, it is more sensitive to those relatively weak functional connection regions. Thus, the study concludes that in the process of receiving emotional stimuli, the relatively obvious activation areas are mainly distributed in the frontal lobe, cingulum and near the hypothalamus. Furthermore, density-based k-means clustering method creates a more reliable method for follow-up studies of brain functional connectivity.

  6. A NEW METHOD TO QUANTIFY X-RAY SUBSTRUCTURES IN CLUSTERS OF GALAXIES

    Energy Technology Data Exchange (ETDEWEB)

    Andrade-Santos, Felipe; Lima Neto, Gastao B.; Lagana, Tatiana F. [Departamento de Astronomia, Instituto de Astronomia, Geofisica e Ciencias Atmosfericas, Universidade de Sao Paulo, Geofisica e Ciencias Atmosfericas, Rua do Matao 1226, Cidade Universitaria, 05508-090 Sao Paulo, SP (Brazil)

    2012-02-20

    We present a new method to quantify substructures in clusters of galaxies, based on the analysis of the intensity of structures. This analysis is done in a residual image that is the result of the subtraction of a surface brightness model, obtained by fitting a two-dimensional analytical model ({beta}-model or Sersic profile) with elliptical symmetry, from the X-ray image. Our method is applied to 34 clusters observed by the Chandra Space Telescope that are in the redshift range z in [0.02, 0.2] and have a signal-to-noise ratio (S/N) greater than 100. We present the calibration of the method and the relations between the substructure level with physical quantities, such as the mass, X-ray luminosity, temperature, and cluster redshift. We use our method to separate the clusters in two sub-samples of high- and low-substructure levels. We conclude, using Monte Carlo simulations, that the method recuperates very well the true amount of substructure for small angular core radii clusters (with respect to the whole image size) and good S/N observations. We find no evidence of correlation between the substructure level and physical properties of the clusters such as gas temperature, X-ray luminosity, and redshift; however, analysis suggest a trend between the substructure level and cluster mass. The scaling relations for the two sub-samples (high- and low-substructure level clusters) are different (they present an offset, i.e., given a fixed mass or temperature, low-substructure clusters tend to be more X-ray luminous), which is an important result for cosmological tests using the mass-luminosity relation to obtain the cluster mass function, since they rely on the assumption that clusters do not present different scaling relations according to their dynamical state.

  7. Trend analysis using non-stationary time series clustering based on the finite element method

    Science.gov (United States)

    Gorji Sefidmazgi, M.; Sayemuzzaman, M.; Homaifar, A.; Jha, M. K.; Liess, S.

    2014-05-01

    In order to analyze low-frequency variability of climate, it is useful to model the climatic time series with multiple linear trends and locate the times of significant changes. In this paper, we have used non-stationary time series clustering to find change points in the trends. Clustering in a multi-dimensional non-stationary time series is challenging, since the problem is mathematically ill-posed. Clustering based on the finite element method (FEM) is one of the methods that can analyze multidimensional time series. One important attribute of this method is that it is not dependent on any statistical assumption and does not need local stationarity in the time series. In this paper, it is shown how the FEM-clustering method can be used to locate change points in the trend of temperature time series from in situ observations. This method is applied to the temperature time series of North Carolina (NC) and the results represent region-specific climate variability despite higher frequency harmonics in climatic time series. Next, we investigated the relationship between the climatic indices with the clusters/trends detected based on this clustering method. It appears that the natural variability of climate change in NC during 1950-2009 can be explained mostly by AMO and solar activity.

  8. Marketing research cluster analysis

    OpenAIRE

    Marić Nebojša

    2002-01-01

    One area of applications of cluster analysis in marketing is identification of groups of cities and towns with similar demographic profiles. This paper considers main aspects of cluster analysis by an example of clustering 12 cities with the use of Minitab software.

  9. Prioritizing the risk of plant pests by clustering methods; self-organising maps, k-means and hierarchical clustering

    Directory of Open Access Journals (Sweden)

    Susan Worner

    2013-09-01

    Full Text Available For greater preparedness, pest risk assessors are required to prioritise long lists of pest species with potential to establish and cause significant impact in an endangered area. Such prioritization is often qualitative, subjective, and sometimes biased, relying mostly on expert and stakeholder consultation. In recent years, cluster based analyses have been used to investigate regional pest species assemblages or pest profiles to indicate the risk of new organism establishment. Such an approach is based on the premise that the co-occurrence of well-known global invasive pest species in a region is not random, and that the pest species profile or assemblage integrates complex functional relationships that are difficult to tease apart. In other words, the assemblage can help identify and prioritise species that pose a threat in a target region. A computational intelligence method called a Kohonen self-organizing map (SOM, a type of artificial neural network, was the first clustering method applied to analyse assemblages of invasive pests. The SOM is a well known dimension reduction and visualization method especially useful for high dimensional data that more conventional clustering methods may not analyse suitably. Like all clustering algorithms, the SOM can give details of clusters that identify regions with similar pest assemblages, possible donor and recipient regions. More important, however SOM connection weights that result from the analysis can be used to rank the strength of association of each species within each regional assemblage. Species with high weights that are not already established in the target region are identified as high risk. However, the SOM analysis is only the first step in a process to assess risk to be used alongside or incorporated within other measures. Here we illustrate the application of SOM analyses in a range of contexts in invasive species risk assessment, and discuss other clustering methods such as k

  10. Marketing research cluster analysis

    Directory of Open Access Journals (Sweden)

    Marić Nebojša

    2002-01-01

    Full Text Available One area of applications of cluster analysis in marketing is identification of groups of cities and towns with similar demographic profiles. This paper considers main aspects of cluster analysis by an example of clustering 12 cities with the use of Minitab software.

  11. Semi-supervised consensus clustering for gene expression data analysis

    OpenAIRE

    Wang, Yunli; Pan, Youlian

    2014-01-01

    Background Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and do...

  12. Methodology сomparative statistical analysis of Russian industry based on cluster analysis

    Directory of Open Access Journals (Sweden)

    Sergey S. Shishulin

    2017-01-01

    Full Text Available The article is devoted to researching of the possibilities of applying multidimensional statistical analysis in the study of industrial production on the basis of comparing its growth rates and structure with other developed and developing countries of the world. The purpose of this article is to determine the optimal set of statistical methods and the results of their application to industrial production data, which would give the best access to the analysis of the result.Data includes such indicators as output, output, gross value added, the number of employed and other indicators of the system of national accounts and operational business statistics. The objects of observation are the industry of the countrys of the Customs Union, the United States, Japan and Erope in 2005-2015. As the research tool used as the simplest methods of transformation, graphical and tabular visualization of data, and methods of statistical analysis. In particular, based on a specialized software package (SPSS, the main components method, discriminant analysis, hierarchical methods of cluster analysis, Ward’s method and k-means were applied.The application of the method of principal components to the initial data makes it possible to substantially and effectively reduce the initial space of industrial production data. Thus, for example, in analyzing the structure of industrial production, the reduction was from fifteen industries to three basic, well-interpreted factors: the relatively extractive industries (with a low degree of processing, high-tech industries and consumer goods (medium-technology sectors. At the same time, as a result of comparison of the results of application of cluster analysis to the initial data and data obtained on the basis of the principal components method, it was established that clustering industrial production data on the basis of new factors significantly improves the results of clustering.As a result of analyzing the parameters of

  13. Cluster analysis of radionuclide concentrations in beach sand

    NARCIS (Netherlands)

    de Meijer, R.J.; James, I.; Jennings, P.J.; Keoyers, J.E.

    This paper presents a method in which natural radionuclide concentrations of beach sand minerals are traced along a stretch of coast by cluster analysis. This analysis yields two groups of mineral deposit with different origins. The method deviates from standard methods of following dispersal of

  14. Cluster Analysis as an Analytical Tool of Population Policy

    Directory of Open Access Journals (Sweden)

    Oksana Mikhaylovna Shubat

    2017-12-01

    Full Text Available The predicted negative trends in Russian demography (falling birth rates, population decline actualize the need to strengthen measures of family and population policy. Our research purpose is to identify groups of Russian regions with similar characteristics in the family sphere using cluster analysis. The findings should make an important contribution to the field of family policy. We used hierarchical cluster analysis based on the Ward method and the Euclidean distance for segmentation of Russian regions. Clustering is based on four variables, which allowed assessing the family institution in the region. The authors used the data of Federal State Statistics Service from 2010 to 2015. Clustering and profiling of each segment has allowed forming a model of Russian regions depending on the features of the family institution in these regions. The authors revealed four clusters grouping regions with similar problems in the family sphere. This segmentation makes it possible to develop the most relevant family policy measures in each group of regions. Thus, the analysis has shown a high degree of differentiation of the family institution in the regions. This suggests that a unified approach to population problems’ solving is far from being effective. To achieve greater results in the implementation of family policy, a differentiated approach is needed. Methods of multidimensional data classification can be successfully applied as a relevant analytical toolkit. Further research could develop the adaptation of multidimensional classification methods to the analysis of the population problems in Russian regions. In particular, the algorithms of nonparametric cluster analysis may be of relevance in future studies.

  15. The multi-scattering-Xα method for analysis of the electronic structure of atomic clusters

    International Nuclear Information System (INIS)

    Bahurmuz, A.A.; Woo, C.H.

    1984-12-01

    A computer program, MSXALPHA, has been developed to carry out a quantum-mechanical analysis of the electronic structure of molecules and atomic clusters using the Multi-Scattering-Xα (MSXα) method. The MSXALPHA program is based on a code obtained from the University of Alberta; several improvements and new features were incorporated to increase generality and efficiency. The major ones are: (1) minimization of core memory usage, (2) reduction of execution time, (3) introduction of a dynamic core allocation scheme for a large number of arrays, (4) incorporation of an atomic program to generate numerical orbitals used to construct the initial molecular potential, and (5) inclusion of a routine to evaluate total energy. This report is divided into three parts. The first discusses the theory of the MSXα method. The second gives a detailed description of the program, MSXALPHA. The third discusses the results of calculations carried out for the methane molecule (CH 4 ) and a four-atom zirconium cluster (Zr 4 )

  16. From virtual clustering analysis to self-consistent clustering analysis: a mathematical study

    Science.gov (United States)

    Tang, Shaoqiang; Zhang, Lei; Liu, Wing Kam

    2018-03-01

    In this paper, we propose a new homogenization algorithm, virtual clustering analysis (VCA), as well as provide a mathematical framework for the recently proposed self-consistent clustering analysis (SCA) (Liu et al. in Comput Methods Appl Mech Eng 306:319-341, 2016). In the mathematical theory, we clarify the key assumptions and ideas of VCA and SCA, and derive the continuous and discrete Lippmann-Schwinger equations. Based on a key postulation of "once response similarly, always response similarly", clustering is performed in an offline stage by machine learning techniques (k-means and SOM), and facilitates substantial reduction of computational complexity in an online predictive stage. The clear mathematical setup allows for the first time a convergence study of clustering refinement in one space dimension. Convergence is proved rigorously, and found to be of second order from numerical investigations. Furthermore, we propose to suitably enlarge the domain in VCA, such that the boundary terms may be neglected in the Lippmann-Schwinger equation, by virtue of the Saint-Venant's principle. In contrast, they were not obtained in the original SCA paper, and we discover these terms may well be responsible for the numerical dependency on the choice of reference material property. Since VCA enhances the accuracy by overcoming the modeling error, and reduce the numerical cost by avoiding an outer loop iteration for attaining the material property consistency in SCA, its efficiency is expected even higher than the recently proposed SCA algorithm.

  17. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...

  18. Application Of WIMS Code To Calculation Kartini Reactor Parameters By Pin-Cell And Cluster Method

    International Nuclear Information System (INIS)

    Sumarsono, Bambang; Tjiptono, T.W.

    1996-01-01

    Analysis UZrH fuel element parameters calculation in Kartini Reactor by WIMS Code has been done. The analysis is done by pin cell and cluster method. The pin cell method is done as a function percent burn-up and by 8 group 3 region analysis and cluster method by 8 group 12 region analysis. From analysis and calculation resulted K ∼ = 1.3687 by pin cell method and K ∼ = 1.3162 by cluster method and so deviation is 3.83%. By pin cell analysis as a function percent burn-up at the percent burn-up greater than 59.50%, the multiplication factor is less than one (k ∼ < 1) it is mean that the fuel element reactivity is negative

  19. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda.

    Science.gov (United States)

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-03-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards' method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. © The Author 2015

  20. Validity studies among hierarchical methods of cluster analysis using cophenetic correlation coefficient

    International Nuclear Information System (INIS)

    Carvalho, Priscilla R.; Munita, Casimiro S.; Lapolli, André L.

    2017-01-01

    The literature presents many methods for partitioning of data base, and is difficult choose which is the most suitable, since the various combinations of methods based on different measures of dissimilarity can lead to different patterns of grouping and false interpretations. Nevertheless, little effort has been expended in evaluating these methods empirically using an archaeological data base. In this way, the objective of this work is make a comparative study of the different cluster analysis methods and identify which is the most appropriate. For this, the study was carried out using a data base of the Archaeometric Studies Group from IPEN-CNEN/SP, in which 45 samples of ceramic fragments from three archaeological sites were analyzed by instrumental neutron activation analysis (INAA) which were determinate the mass fraction of 13 elements (As, Ce, Cr, Eu, Fe, Hf, La, Na, Nd, Sc, Sm, Th, U). The methods used for this study were: single linkage, complete linkage, average linkage, centroid and Ward. The validation was done using the cophenetic correlation coefficient and comparing these values the average linkage method obtained better results. A script of the statistical program R with some functions was created to obtain the cophenetic correlation. By means of these values was possible to choose the most appropriate method to be used in the data base. (author)

  1. Validity studies among hierarchical methods of cluster analysis using cophenetic correlation coefficient

    Energy Technology Data Exchange (ETDEWEB)

    Carvalho, Priscilla R.; Munita, Casimiro S.; Lapolli, André L., E-mail: prii.ramos@gmail.com, E-mail: camunita@ipen.br, E-mail: alapolli@ipen.br [Instituto de Pesquisas Energéticas e Nucleares (IPEN/CNEN-SP), São Paulo, SP (Brazil)

    2017-07-01

    The literature presents many methods for partitioning of data base, and is difficult choose which is the most suitable, since the various combinations of methods based on different measures of dissimilarity can lead to different patterns of grouping and false interpretations. Nevertheless, little effort has been expended in evaluating these methods empirically using an archaeological data base. In this way, the objective of this work is make a comparative study of the different cluster analysis methods and identify which is the most appropriate. For this, the study was carried out using a data base of the Archaeometric Studies Group from IPEN-CNEN/SP, in which 45 samples of ceramic fragments from three archaeological sites were analyzed by instrumental neutron activation analysis (INAA) which were determinate the mass fraction of 13 elements (As, Ce, Cr, Eu, Fe, Hf, La, Na, Nd, Sc, Sm, Th, U). The methods used for this study were: single linkage, complete linkage, average linkage, centroid and Ward. The validation was done using the cophenetic correlation coefficient and comparing these values the average linkage method obtained better results. A script of the statistical program R with some functions was created to obtain the cophenetic correlation. By means of these values was possible to choose the most appropriate method to be used in the data base. (author)

  2. Trend analysis using non-stationary time series clustering based on the finite element method

    OpenAIRE

    Gorji Sefidmazgi, M.; Sayemuzzaman, M.; Homaifar, A.; Jha, M. K.; Liess, S.

    2014-01-01

    In order to analyze low-frequency variability of climate, it is useful to model the climatic time series with multiple linear trends and locate the times of significant changes. In this paper, we have used non-stationary time series clustering to find change points in the trends. Clustering in a multi-dimensional non-stationary time series is challenging, since the problem is mathematically ill-posed. Clustering based on the finite element method (FEM) is one of the methods ...

  3. K-Line Patterns’ Predictive Power Analysis Using the Methods of Similarity Match and Clustering

    Directory of Open Access Journals (Sweden)

    Lv Tao

    2017-01-01

    Full Text Available Stock price prediction based on K-line patterns is the essence of candlestick technical analysis. However, there are some disputes on whether the K-line patterns have predictive power in academia. To help resolve the debate, this paper uses the data mining methods of pattern recognition, pattern clustering, and pattern knowledge mining to research the predictive power of K-line patterns. The similarity match model and nearest neighbor-clustering algorithm are proposed for solving the problem of similarity match and clustering of K-line series, respectively. The experiment includes testing the predictive power of the Three Inside Up pattern and Three Inside Down pattern with the testing dataset of the K-line series data of Shanghai 180 index component stocks over the latest 10 years. Experimental results show that (1 the predictive power of a pattern varies a great deal for different shapes and (2 each of the existing K-line patterns requires further classification based on the shape feature for improving the prediction performance.

  4. Clustering Analysis for Credit Default Probabilities in a Retail Bank Portfolio

    Directory of Open Access Journals (Sweden)

    Elena ANDREI (DRAGOMIR

    2012-08-01

    Full Text Available Methods underlying cluster analysis are very useful in data analysis, especially when the processed volume of data is very large, so that it becomes impossible to extract essential information, unless specific instruments are used to summarize and structure the gross information. In this context, cluster analysis techniques are used particularly, for systematic information analysis. The aim of this article is to build an useful model for banking field, based on data mining techniques, by dividing the groups of borrowers into clusters, in order to obtain a profile of the customers (debtors and good payers. We assume that a class is appropriate if it contains members that have a high degree of similarity and the standard method for measuring the similarity within a group shows the lowest variance. After clustering, data mining techniques are implemented on the cluster with bad debtors, reaching a very high accuracy after implementation. The paper is structured as follows: Section 2 describes the model for data analysis based on a specific scoring model that we proposed. In section 3, we present a cluster analysis using K-means algorithm and the DM models are applied on a specific cluster. Section 4 shows the conclusions.

  5. Cluster analysis as a prediction tool for pregnancy outcomes.

    Science.gov (United States)

    Banjari, Ines; Kenjerić, Daniela; Šolić, Krešimir; Mandić, Milena L

    2015-03-01

    Considering specific physiology changes during gestation and thinking of pregnancy as a "critical window", classification of pregnant women at early pregnancy can be considered as crucial. The paper demonstrates the use of a method based on an approach from intelligent data mining, cluster analysis. Cluster analysis method is a statistical method which makes possible to group individuals based on sets of identifying variables. The method was chosen in order to determine possibility for classification of pregnant women at early pregnancy to analyze unknown correlations between different variables so that the certain outcomes could be predicted. 222 pregnant women from two general obstetric offices' were recruited. The main orient was set on characteristics of these pregnant women: their age, pre-pregnancy body mass index (BMI) and haemoglobin value. Cluster analysis gained a 94.1% classification accuracy rate with three branch- es or groups of pregnant women showing statistically significant correlations with pregnancy outcomes. The results are showing that pregnant women both of older age and higher pre-pregnancy BMI have a significantly higher incidence of delivering baby of higher birth weight but they gain significantly less weight during pregnancy. Their babies are also longer, and these women have significantly higher probability for complications during pregnancy (gestosis) and higher probability of induced or caesarean delivery. We can conclude that the cluster analysis method can appropriately classify pregnant women at early pregnancy to predict certain outcomes.

  6. Novel Signal Noise Reduction Method through Cluster Analysis, Applied to Photoplethysmography.

    Science.gov (United States)

    Waugh, William; Allen, John; Wightman, James; Sims, Andrew J; Beale, Thomas A W

    2018-01-01

    Physiological signals can often become contaminated by noise from a variety of origins. In this paper, an algorithm is described for the reduction of sporadic noise from a continuous periodic signal. The design can be used where a sample of a periodic signal is required, for example, when an average pulse is needed for pulse wave analysis and characterization. The algorithm is based on cluster analysis for selecting similar repetitions or pulses from a periodic single. This method selects individual pulses without noise, returns a clean pulse signal, and terminates when a sufficiently clean and representative signal is received. The algorithm is designed to be sufficiently compact to be implemented on a microcontroller embedded within a medical device. It has been validated through the removal of noise from an exemplar photoplethysmography (PPG) signal, showing increasing benefit as the noise contamination of the signal increases. The algorithm design is generalised to be applicable for a wide range of physiological (physical) signals.

  7. A cluster merging method for time series microarray with production values.

    Science.gov (United States)

    Chira, Camelia; Sedano, Javier; Camara, Monica; Prieto, Carlos; Villar, Jose R; Corchado, Emilio

    2014-09-01

    A challenging task in time-course microarray data analysis is to cluster genes meaningfully combining the information provided by multiple replicates covering the same key time points. This paper proposes a novel cluster merging method to accomplish this goal obtaining groups with highly correlated genes. The main idea behind the proposed method is to generate a clustering starting from groups created based on individual temporal series (representing different biological replicates measured in the same time points) and merging them by taking into account the frequency by which two genes are assembled together in each clustering. The gene groups at the level of individual time series are generated using several shape-based clustering methods. This study is focused on a real-world time series microarray task with the aim to find co-expressed genes related to the production and growth of a certain bacteria. The shape-based clustering methods used at the level of individual time series rely on identifying similar gene expression patterns over time which, in some models, are further matched to the pattern of production/growth. The proposed cluster merging method is able to produce meaningful gene groups which can be naturally ranked by the level of agreement on the clustering among individual time series. The list of clusters and genes is further sorted based on the information correlation coefficient and new problem-specific relevant measures. Computational experiments and results of the cluster merging method are analyzed from a biological perspective and further compared with the clustering generated based on the mean value of time series and the same shape-based algorithm.

  8. Statistical Techniques Applied to Aerial Radiometric Surveys (STAARS): cluster analysis. National Uranium Resource Evaluation

    International Nuclear Information System (INIS)

    Pirkle, F.L.; Stablein, N.K.; Howell, J.A.; Wecksung, G.W.; Duran, B.S.

    1982-11-01

    One objective of the aerial radiometric surveys flown as part of the US Department of Energy's National Uranium Resource Evaluation (NURE) program was to ascertain the regional distribution of near-surface radioelement abundances. Some method for identifying groups of observations with similar radioelement values was therefore required. It is shown in this report that cluster analysis can identify such groups even when no a priori knowledge of the geology of an area exists. A method of convergent k-means cluster analysis coupled with a hierarchical cluster analysis is used to classify 6991 observations (three radiometric variables at each observation location) from the Precambrian rocks of the Copper Mountain, Wyoming, area. Another method, one that combines a principal components analysis with a convergent k-means analysis, is applied to the same data. These two methods are compared with a convergent k-means analysis that utilizes available geologic knowledge. All three methods identify four clusters. Three of the clusters represent background values for the Precambrian rocks of the area, and one represents outliers (anomalously high 214 Bi). A segmentation of the data corresponding to geologic reality as discovered by other methods has been achieved based solely on analysis of aerial radiometric data. The techniques employed are composites of classical clustering methods designed to handle the special problems presented by large data sets. 20 figures, 7 tables

  9. Water quality assessment with hierarchical cluster analysis based on Mahalanobis distance.

    Science.gov (United States)

    Du, Xiangjun; Shao, Fengjing; Wu, Shunyao; Zhang, Hanlin; Xu, Si

    2017-07-01

    Water quality assessment is crucial for assessment of marine eutrophication, prediction of harmful algal blooms, and environment protection. Previous studies have developed many numeric modeling methods and data driven approaches for water quality assessment. The cluster analysis, an approach widely used for grouping data, has also been employed. However, there are complex correlations between water quality variables, which play important roles in water quality assessment but have always been overlooked. In this paper, we analyze correlations between water quality variables and propose an alternative method for water quality assessment with hierarchical cluster analysis based on Mahalanobis distance. Further, we cluster water quality data collected form coastal water of Bohai Sea and North Yellow Sea of China, and apply clustering results to evaluate its water quality. To evaluate the validity, we also cluster the water quality data with cluster analysis based on Euclidean distance, which are widely adopted by previous studies. The results show that our method is more suitable for water quality assessment with many correlated water quality variables. To our knowledge, it is the first attempt to apply Mahalanobis distance for coastal water quality assessment.

  10. Cluster temperature. Methods for its measurement and stabilization

    International Nuclear Information System (INIS)

    Makarov, G N

    2008-01-01

    Cluster temperature is an important material parameter essential to many physical and chemical processes involving clusters and cluster beams. Because of the diverse methods by which clusters can be produced, excited, and stabilized, and also because of the widely ranging values of atomic and molecular binding energies (approximately from 10 -5 to 10 eV) and numerous energy relaxation channels in clusters, cluster temperature (internal energy) ranges from 10 -3 to about 10 8 K. This paper reviews research on cluster temperature and describes methods for its measurement and stabilization. The role of cluster temperature in and its influence on physical and chemical processes is discussed. Results on the temperature dependence of cluster properties are presented. The way in which cluster temperature relates to cluster structure and to atomic and molecular interaction potentials in clusters is addressed. Methods for strong excitation of clusters and channels for their energy relaxation are discussed. Some applications of clusters and cluster beams are considered. (reviews of topical problems)

  11. A two-stage method for microcalcification cluster segmentation in mammography by deformable models

    International Nuclear Information System (INIS)

    Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.; Karahaliou, A.; Costaridou, L.; Vassiou, K.

    2015-01-01

    Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods are applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST cluster , average of minimum distance—AMINDIST cluster ) and the area overlap measure (AOM cluster ). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing tenfold cross

  12. Using Cluster Analysis for Data Mining in Educational Technology Research

    Science.gov (United States)

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

    2012-01-01

    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  13. A Novel Divisive Hierarchical Clustering Algorithm for Geospatial Analysis

    Directory of Open Access Journals (Sweden)

    Shaoning Li

    2017-01-01

    Full Text Available In the fields of geographic information systems (GIS and remote sensing (RS, the clustering algorithm has been widely used for image segmentation, pattern recognition, and cartographic generalization. Although clustering analysis plays a key role in geospatial modelling, traditional clustering methods are limited due to computational complexity, noise resistant ability and robustness. Furthermore, traditional methods are more focused on the adjacent spatial context, which makes it hard for the clustering methods to be applied to multi-density discrete objects. In this paper, a new method, cell-dividing hierarchical clustering (CDHC, is proposed based on convex hull retraction. The main steps are as follows. First, a convex hull structure is constructed to describe the global spatial context of geospatial objects. Then, the retracting structure of each borderline is established in sequence by setting the initial parameter. The objects are split into two clusters (i.e., “sub-clusters” if the retracting structure intersects with the borderlines. Finally, clusters are repeatedly split and the initial parameter is updated until the terminate condition is satisfied. The experimental results show that CDHC separates the multi-density objects from noise sufficiently and also reduces complexity compared to the traditional agglomerative hierarchical clustering algorithm.

  14. Short-Term Wind Power Forecasting Based on Clustering Pre-Calculated CFD Method

    Directory of Open Access Journals (Sweden)

    Yimei Wang

    2018-04-01

    Full Text Available To meet the increasing wind power forecasting (WPF demands of newly built wind farms without historical data, physical WPF methods are widely used. The computational fluid dynamics (CFD pre-calculated flow fields (CPFF-based WPF is a promising physical approach, which can balance well the competing demands of computational efficiency and accuracy. To enhance its adaptability for wind farms in complex terrain, a WPF method combining wind turbine clustering with CPFF is first proposed where the wind turbines in the wind farm are clustered and a forecasting is undertaken for each cluster. K-means, hierarchical agglomerative and spectral analysis methods are used to establish the wind turbine clustering models. The Silhouette Coefficient, Calinski-Harabaz index and within-between index are proposed as criteria to evaluate the effectiveness of the established clustering models. Based on different clustering methods and schemes, various clustering databases are built for clustering pre-calculated CFD (CPCC-based short-term WPF. For the wind farm case studied, clustering evaluation criteria show that hierarchical agglomerative clustering has reasonable results, spectral clustering is better and K-means gives the best performance. The WPF results produced by different clustering databases also prove the effectiveness of the three evaluation criteria in turn. The newly developed CPCC model has a much higher WPF accuracy than the CPFF model without using clustering techniques, both on temporal and spatial scales. The research provides supports for both the development and improvement of short-term physical WPF systems.

  15. Multiscale visual quality assessment for cluster analysis with self-organizing maps

    Science.gov (United States)

    Bernard, Jürgen; von Landesberger, Tatiana; Bremm, Sebastian; Schreck, Tobias

    2011-01-01

    Cluster analysis is an important data mining technique for analyzing large amounts of data, reducing many objects to a limited number of clusters. Cluster visualization techniques aim at supporting the user in better understanding the characteristics and relationships among the found clusters. While promising approaches to visual cluster analysis already exist, these usually fall short of incorporating the quality of the obtained clustering results. However, due to the nature of the clustering process, quality plays an important aspect, as for most practical data sets, typically many different clusterings are possible. Being aware of clustering quality is important to judge the expressiveness of a given cluster visualization, or to adjust the clustering process with refined parameters, among others. In this work, we present an encompassing suite of visual tools for quality assessment of an important visual cluster algorithm, namely, the Self-Organizing Map (SOM) technique. We define, measure, and visualize the notion of SOM cluster quality along a hierarchy of cluster abstractions. The quality abstractions range from simple scalar-valued quality scores up to the structural comparison of a given SOM clustering with output of additional supportive clustering methods. The suite of methods allows the user to assess the SOM quality on the appropriate abstraction level, and arrive at improved clustering results. We implement our tools in an integrated system, apply it on experimental data sets, and show its applicability.

  16. Momentum-space cluster dual-fermion method

    Science.gov (United States)

    Iskakov, Sergei; Terletska, Hanna; Gull, Emanuel

    2018-03-01

    Recent years have seen the development of two types of nonlocal extensions to the single-site dynamical mean field theory. On one hand, cluster approximations, such as the dynamical cluster approximation, recover short-range momentum-dependent correlations nonperturbatively. On the other hand, diagrammatic extensions, such as the dual-fermion theory, recover long-ranged corrections perturbatively. The correct treatment of both strong short-ranged and weak long-ranged correlations within the same framework is therefore expected to lead to a quick convergence of results, and offers the potential of obtaining smooth self-energies in nonperturbative regimes of phase space. In this paper, we present an exact cluster dual-fermion method based on an expansion around the dynamical cluster approximation. Unlike previous formulations, our method does not employ a coarse-graining approximation to the interaction, which we show to be the leading source of error at high temperature, and converges to the exact result independently of the size of the underlying cluster. We illustrate the power of the method with results for the second-order cluster dual-fermion approximation to the single-particle self-energies and double occupancies.

  17. Relation between financial market structure and the real economy: comparison between clustering methods.

    Science.gov (United States)

    Musmeci, Nicoló; Aste, Tomaso; Di Matteo, T

    2015-01-01

    We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing the clustering structure with the underlying industrial activity classification. We apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. By taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover,we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging [corrected].

  18. Relation between financial market structure and the real economy: comparison between clustering methods.

    Directory of Open Access Journals (Sweden)

    Nicoló Musmeci

    Full Text Available We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing the clustering structure with the underlying industrial activity classification. We apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. By taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover,we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging [corrected].

  19. AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number

    Directory of Open Access Journals (Sweden)

    Cooper James B

    2010-03-01

    Full Text Available Abstract Background Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry. Results We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four. Conclusions By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at http://jimcooperlab.mcdb.ucsb.edu/autosome.

  20. An incremental DPMM-based method for trajectory clustering, modeling, and retrieval.

    Science.gov (United States)

    Hu, Weiming; Li, Xi; Tian, Guodong; Maybank, Stephen; Zhang, Zhongfei

    2013-05-01

    Trajectory analysis is the basis for many applications, such as indexing of motion events in videos, activity recognition, and surveillance. In this paper, the Dirichlet process mixture model (DPMM) is applied to trajectory clustering, modeling, and retrieval. We propose an incremental version of a DPMM-based clustering algorithm and apply it to cluster trajectories. An appropriate number of trajectory clusters is determined automatically. When trajectories belonging to new clusters arrive, the new clusters can be identified online and added to the model without any retraining using the previous data. A time-sensitive Dirichlet process mixture model (tDPMM) is applied to each trajectory cluster for learning the trajectory pattern which represents the time-series characteristics of the trajectories in the cluster. Then, a parameterized index is constructed for each cluster. A novel likelihood estimation algorithm for the tDPMM is proposed, and a trajectory-based video retrieval model is developed. The tDPMM-based probabilistic matching method and the DPMM-based model growing method are combined to make the retrieval model scalable and adaptable. Experimental comparisons with state-of-the-art algorithms demonstrate the effectiveness of our algorithm.

  1. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods.

    Science.gov (United States)

    Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo

    2016-01-01

    Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community.

  2. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods

    Science.gov (United States)

    Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo

    2016-01-01

    Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community. PMID:27124610

  3. CLUSTER ANALYSIS UKRAINIAN REGIONAL DISTRIBUTION BY LEVEL OF INNOVATION

    Directory of Open Access Journals (Sweden)

    Roman Shchur

    2016-07-01

    Full Text Available   SWOT-analysis of the threats and benefits of innovation development strategy of Ivano-Frankivsk region in the context of financial support was сonducted. Methodical approach to determine of public-private partnerships potential that is tool of innovative economic development financing was identified. Cluster analysis of possibilities of forming public-private partnership in a particular region was carried out. Optimal set of problem areas that require urgent solutions and financial security is defined on the basis of cluster approach. It will help to form practical recommendations for the formation of an effective financial mechanism in the regions of Ukraine. Key words: the mechanism of innovation development financial provision, innovation development, public-private partnerships, cluster analysis, innovative development strategy.

  4. Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.

    Science.gov (United States)

    Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun

    2017-01-01

    Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.

  5. Application of cluster analysis and unsupervised learning to multivariate tissue characterization

    International Nuclear Information System (INIS)

    Momenan, R.; Insana, M.F.; Wagner, R.F.; Garra, B.S.; Loew, M.H.

    1987-01-01

    This paper describes a procedure for classifying tissue types from unlabeled acoustic measurements (data type unknown) using unsupervised cluster analysis. These techniques are being applied to unsupervised ultrasonic image segmentation and tissue characterization. The performance of a new clustering technique is measured and compared with supervised methods, such as a linear Bayes classifier. In these comparisons two objectives are sought: a) How well does the clustering method group the data?; b) Do the clusters correspond to known tissue classes? The first question is investigated by a measure of cluster similarity and dispersion. The second question involves a comparison with a supervised technique using labeled data

  6. Detecting and extracting clusters in atom probe data: A simple, automated method using Voronoi cells

    International Nuclear Information System (INIS)

    Felfer, P.; Ceguerra, A.V.; Ringer, S.P.; Cairney, J.M.

    2015-01-01

    The analysis of the formation of clusters in solid solutions is one of the most common uses of atom probe tomography. Here, we present a method where we use the Voronoi tessellation of the solute atoms and its geometric dual, the Delaunay triangulation to test for spatial/chemical randomness of the solid solution as well as extracting the clusters themselves. We show how the parameters necessary for cluster extraction can be determined automatically, i.e. without user interaction, making it an ideal tool for the screening of datasets and the pre-filtering of structures for other spatial analysis techniques. Since the Voronoi volumes are closely related to atomic concentrations, the parameters resulting from this analysis can also be used for other concentration based methods such as iso-surfaces. - Highlights: • Cluster analysis of atom probe data can be significantly simplified by using the Voronoi cell volumes of the atomic distribution. • Concentration fields are defined on a single atomic basis using Voronoi cells. • All parameters for the analysis are determined by optimizing the separation probability of bulk atoms vs clustered atoms

  7. CHOOSING A HEALTH INSTITUTION WITH MULTIPLE CORRESPONDENCE ANALYSIS AND CLUSTER ANALYSIS IN A POPULATION BASED STUDY

    Directory of Open Access Journals (Sweden)

    ASLI SUNER

    2013-06-01

    Full Text Available Multiple correspondence analysis is a method making easy to interpret the categorical variables given in contingency tables, showing the similarities, associations as well as divergences among these variables via graphics on a lower dimensional space. Clustering methods are helped to classify the grouped data according to their similarities and to get useful summarized data from them. In this study, interpretations of multiple correspondence analysis are supported by cluster analysis; factors affecting referred health institute such as age, disease group and health insurance are examined and it is aimed to compare results of the methods.

  8. Cluster analysis by optimal decomposition of induced fuzzy sets

    Energy Technology Data Exchange (ETDEWEB)

    Backer, E

    1978-01-01

    Nonsupervised pattern recognition is addressed and the concept of fuzzy sets is explored in order to provide the investigator (data analyst) additional information supplied by the pattern class membership values apart from the classical pattern class assignments. The basic ideas behind the pattern recognition problem, the clustering problem, and the concept of fuzzy sets in cluster analysis are discussed, and a brief review of the literature of the fuzzy cluster analysis is given. Some mathematical aspects of fuzzy set theory are briefly discussed; in particular, a measure of fuzziness is suggested. The optimization-clustering problem is characterized. Then the fundamental idea behind affinity decomposition is considered. Next, further analysis takes place with respect to the partitioning-characterization functions. The iterative optimization procedure is then addressed. The reclassification function is investigated and convergence properties are examined. Finally, several experiments in support of the method suggested are described. Four object data sets serve as appropriate test cases. 120 references, 70 figures, 11 tables. (RWR)

  9. A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation.

    Directory of Open Access Journals (Sweden)

    Rosemary M McCloskey

    2017-11-01

    Full Text Available Clustering infections by genetic similarity is a popular technique for identifying potential outbreaks of infectious disease, in part because sequences are now routinely collected for clinical management of many infections. A diverse number of nonparametric clustering methods have been developed for this purpose. These methods are generally intuitive, rapid to compute, and readily scale with large data sets. However, we have found that nonparametric clustering methods can be biased towards identifying clusters of diagnosis-where individuals are sampled sooner post-infection-rather than the clusters of rapid transmission that are meant to be potential foci for public health efforts. We develop a fundamentally new approach to genetic clustering based on fitting a Markov-modulated Poisson process (MMPP, which represents the evolution of transmission rates along the tree relating different infections. We evaluated this model-based method alongside five nonparametric clustering methods using both simulated and actual HIV sequence data sets. For simulated clusters of rapid transmission, the MMPP clustering method obtained higher mean sensitivity (85% and specificity (91% than the nonparametric methods. When we applied these clustering methods to published sequences from a study of HIV-1 genetic clusters in Seattle, USA, we found that the MMPP method categorized about half (46% as many individuals to clusters compared to the other methods. Furthermore, the mean internal branch lengths that approximate transmission rates were significantly shorter in clusters extracted using MMPP, but not by other methods. We determined that the computing time for the MMPP method scaled linearly with the size of trees, requiring about 30 seconds for a tree of 1,000 tips and about 20 minutes for 50,000 tips on a single computer. This new approach to genetic clustering has significant implications for the application of pathogen sequence analysis to public health, where

  10. Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential Evolution

    OpenAIRE

    Satish Gajawada; Durga Toshniwal

    2012-01-01

    Differential Evolution (DE) is an algorithm for evolutionary optimization. Clustering problems have beensolved by using DE based clustering methods but these methods may fail to find clusters hidden insubspaces of high dimensional datasets. Subspace and projected clustering methods have been proposed inliterature to find subspace clusters that are present in subspaces of dataset. In this paper we proposeVINAYAKA, a semi-supervised projected clustering method based on DE. In this method DE opt...

  11. The use of cluster analysis method for the localization of acoustic emission sources detected during the hydrotest of PWR pressure vessels

    International Nuclear Information System (INIS)

    Liska, J.; Svetlik, M.; Slama, K.

    1982-01-01

    The acoustic emission method is a promising tool for checking reactor pressure vessel integrity. Localization of emission sources is the first and the most important step in processing emission signals. The paper describes the emission sources localization method which is based on cluster analysis of a set of points depicting the emission events in the plane of coordinates of their occurrence. The method is based on using this set of points for constructing the minimum spanning tree and its partition into fragments corresponding to point clusters. Furthermore, the laws are considered of probability distribution of the minimum spanning tree edge length for one and several clusters with the aim of finding the optimum length of the critical edge for the partition of the tree. Practical application of the method is demonstrated on localizing the emission sources detected during a hydrotest of a pressure vessel used for testing the reactor pressure vessel covers. (author)

  12. Full text clustering and relationship network analysis of biomedical publications.

    Directory of Open Access Journals (Sweden)

    Renchu Guan

    Full Text Available Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  13. Full text clustering and relationship network analysis of biomedical publications.

    Science.gov (United States)

    Guan, Renchu; Yang, Chen; Marchese, Maurizio; Liang, Yanchun; Shi, Xiaohu

    2014-01-01

    Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP) to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  14. Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.

    Science.gov (United States)

    Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D

    2017-06-01

    Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.

  15. Single pass kernel k-means clustering method

    Indian Academy of Sciences (India)

    In unsupervised classification, kernel -means clustering method has been shown to perform better than conventional -means clustering method in ... 518501, India; Department of Computer Science and Engineering, Jawaharlal Nehru Technological University, Anantapur College of Engineering, Anantapur 515002, India ...

  16. Analysis and comparison of very large metagenomes with fast clustering and functional annotation

    Directory of Open Access Journals (Sweden)

    Li Weizhong

    2009-10-01

    Full Text Available Abstract Background The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. Results The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes". Conclusion RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from http://tools.camera.calit2.net/camera/rammcap/.

  17. A comparison of latent class, K-means, and K-median methods for clustering dichotomous data.

    Science.gov (United States)

    Brusco, Michael J; Shireman, Emilie; Steinley, Douglas

    2017-09-01

    The problem of partitioning a collection of objects based on their measurements on a set of dichotomous variables is a well-established problem in psychological research, with applications including clinical diagnosis, educational testing, cognitive categorization, and choice analysis. Latent class analysis and K-means clustering are popular methods for partitioning objects based on dichotomous measures in the psychological literature. The K-median clustering method has recently been touted as a potentially useful tool for psychological data and might be preferable to its close neighbor, K-means, when the variable measures are dichotomous. We conducted simulation-based comparisons of the latent class, K-means, and K-median approaches for partitioning dichotomous data. Although all 3 methods proved capable of recovering cluster structure, K-median clustering yielded the best average performance, followed closely by latent class analysis. We also report results for the 3 methods within the context of an application to transitive reasoning data, in which it was found that the 3 approaches can exhibit profound differences when applied to real data. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  18. Applications of Cluster Analysis to the Creation of Perfectionism Profiles: A Comparison of two Clustering Approaches

    Directory of Open Access Journals (Sweden)

    Jocelyn H Bolin

    2014-04-01

    Full Text Available Although traditional clustering methods (e.g., K-means have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  19. Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches.

    Science.gov (United States)

    Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C

    2014-01-01

    Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  20. [Optimization of cluster analysis based on drug resistance profiles of MRSA isolates].

    Science.gov (United States)

    Tani, Hiroya; Kishi, Takahiko; Gotoh, Minehiro; Yamagishi, Yuka; Mikamo, Hiroshige

    2015-12-01

    We examined 402 methicillin-resistant Staphylococcus aureus (MRSA) strains isolated from clinical specimens in our hospital between November 19, 2010 and December 27, 2011 to evaluate the similarity between cluster analysis of drug susceptibility tests and pulsed-field gel electrophoresis (PFGE). The results showed that the 402 strains tested were classified into 27 PFGE patterns (151 subtypes of patterns). Cluster analyses of drug susceptibility tests with the cut-off distance yielding a similar classification capability showed favorable results--when the MIC method was used, and minimum inhibitory concentration (MIC) values were used directly in the method, the level of agreement with PFGE was 74.2% when 15 drugs were tested. The Unweighted Pair Group Method with Arithmetic mean (UPGMA) method was effective when the cut-off distance was 16. Using the SIR method in which susceptible (S), intermediate (I), and resistant (R) were coded as 0, 2, and 3, respectively, according to the Clinical and Laboratory Standards Institute (CLSI) criteria, the level of agreement with PFGE was 75.9% when the number of drugs tested was 17, the method used for clustering was the UPGMA, and the cut-off distance was 3.6. In addition, to assess the reproducibility of the results, 10 strains were randomly sampled from the overall test and subjected to cluster analysis. This was repeated 100 times under the same conditions. The results indicated good reproducibility of the results, with the level of agreement with PFGE showing a mean of 82.0%, standard deviation of 12.1%, and mode of 90.0% for the MIC method and a mean of 80.0%, standard deviation of 13.4%, and mode of 90.0% for the SIR method. In summary, cluster analysis for drug susceptibility tests is useful for the epidemiological analysis of MRSA.

  1. Application of clustering methods: Regularized Markov clustering (R-MCL) for analyzing dengue virus similarity

    Science.gov (United States)

    Lestari, D.; Raharjo, D.; Bustamam, A.; Abdillah, B.; Widhianto, W.

    2017-07-01

    Dengue virus consists of 10 different constituent proteins and are classified into 4 major serotypes (DEN 1 - DEN 4). This study was designed to perform clustering against 30 protein sequences of dengue virus taken from Virus Pathogen Database and Analysis Resource (VIPR) using Regularized Markov Clustering (R-MCL) algorithm and then we analyze the result. By using Python program 3.4, R-MCL algorithm produces 8 clusters with more than one centroid in several clusters. The number of centroid shows the density level of interaction. Protein interactions that are connected in a tissue, form a complex protein that serves as a specific biological process unit. The analysis of result shows the R-MCL clustering produces clusters of dengue virus family based on the similarity role of their constituent protein, regardless of serotypes.

  2. Using cluster analysis to organize and explore regional GPS velocities

    Science.gov (United States)

    Simpson, Robert W.; Thatcher, Wayne; Savage, James C.

    2012-01-01

    Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.

  3. The IMACS Cluster Building Survey. I. Description of the Survey and Analysis Methods

    Science.gov (United States)

    Oemler Jr., Augustus; Dressler, Alan; Gladders, Michael G.; Rigby, Jane R.; Bai, Lei; Kelson, Daniel; Villanueva, Edward; Fritz, Jacopo; Rieke, George; Poggianti, Bianca M.; hide

    2013-01-01

    The IMACS Cluster Building Survey uses the wide field spectroscopic capabilities of the IMACS spectrograph on the 6.5 m Baade Telescope to survey the large-scale environment surrounding rich intermediate-redshift clusters of galaxies. The goal is to understand the processes which may be transforming star-forming field galaxies into quiescent cluster members as groups and individual galaxies fall into the cluster from the surrounding supercluster. This first paper describes the survey: the data taking and reduction methods. We provide new calibrations of star formation rates (SFRs) derived from optical and infrared spectroscopy and photometry. We demonstrate that there is a tight relation between the observed SFR per unit B luminosity, and the ratio of the extinctions of the stellar continuum and the optical emission lines.With this, we can obtain accurate extinction-corrected colors of galaxies. Using these colors as well as other spectral measures, we determine new criteria for the existence of ongoing and recent starbursts in galaxies.

  4. THE IMACS CLUSTER BUILDING SURVEY. I. DESCRIPTION OF THE SURVEY AND ANALYSIS METHODS

    Energy Technology Data Exchange (ETDEWEB)

    Oemler, Augustus Jr.; Dressler, Alan; Kelson, Daniel; Villanueva, Edward [Observatories of the Carnegie Institution for Science, 813 Santa Barbara St., Pasadena, CA 91101-1292 (United States); Gladders, Michael G. [Department of Astronomy and Astrophysics, University of Chicago, Chicago, IL 60637 (United States); Rigby, Jane R. [Observational Cosmology Lab, NASA Goddard Space Flight Center, Greenbelt, MD 20771 (United States); Bai Lei [Department of Astronomy and Astrophysics, University of Toronto, 50 St. George Street, Toronto, ON M5S 3H4 (Canada); Fritz, Jacopo [Sterrenkundig Observatorium, Universiteit Gent, Krijgslaan 281 S9, B-9000 Gent (Belgium); Rieke, George [Steward Observatory, University of Arizona, Tucson, AZ 8572 (United States); Poggianti, Bianca M.; Vulcani, Benedetta, E-mail: oemler@obs.carnegiescience.edu [INAF-Osservatorio Astronomico di Padova, Vicolo dell' Osservatorio 5, I-35122 Padova (Italy)

    2013-06-10

    The IMACS Cluster Building Survey uses the wide field spectroscopic capabilities of the IMACS spectrograph on the 6.5 m Baade Telescope to survey the large-scale environment surrounding rich intermediate-redshift clusters of galaxies. The goal is to understand the processes which may be transforming star-forming field galaxies into quiescent cluster members as groups and individual galaxies fall into the cluster from the surrounding supercluster. This first paper describes the survey: the data taking and reduction methods. We provide new calibrations of star formation rates (SFRs) derived from optical and infrared spectroscopy and photometry. We demonstrate that there is a tight relation between the observed SFR per unit B luminosity, and the ratio of the extinctions of the stellar continuum and the optical emission lines. With this, we can obtain accurate extinction-corrected colors of galaxies. Using these colors as well as other spectral measures, we determine new criteria for the existence of ongoing and recent starbursts in galaxies.

  5. Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis.

    Science.gov (United States)

    Kim, Mi Ae; Shin, Seung Woo; Park, Jong Sook; Uh, Soo Taek; Chang, Hun Soo; Bae, Da Jeong; Cho, You Sook; Park, Hae Sim; Yoon, Ho Joo; Choi, Byoung Whui; Kim, Yong Hoon; Park, Choon Sik

    2017-11-01

    Asthma is a heterogeneous disease characterized by various types of airway inflammation and obstruction. Therefore, it is classified into several subphenotypes, such as early-onset atopic, obese non-eosinophilic, benign, and eosinophilic asthma, using cluster analysis. A number of asthmatics frequently experience exacerbation over a long-term follow-up period, but the exacerbation-prone subphenotype has rarely been evaluated by cluster analysis. This prompted us to identify clusters reflecting asthma exacerbation. A uniform cluster analysis method was applied to 259 adult asthmatics who were regularly followed-up for over 1 year using 12 variables, selected on the basis of their contribution to asthma phenotypes. After clustering, clinical profiles and exacerbation rates during follow-up were compared among the clusters. Four subphenotypes were identified: cluster 1 was comprised of patients with early-onset atopic asthma with preserved lung function, cluster 2 late-onset non-atopic asthma with impaired lung function, cluster 3 early-onset atopic asthma with severely impaired lung function, and cluster 4 late-onset non-atopic asthma with well-preserved lung function. The patients in clusters 2 and 3 were identified as exacerbation-prone asthmatics, showing a higher risk of asthma exacerbation. Two different phenotypes of exacerbation-prone asthma were identified among Korean asthmatics using cluster analysis; both were characterized by impaired lung function, but the age at asthma onset and atopic status were different between the two. Copyright © 2017 The Korean Academy of Asthma, Allergy and Clinical Immunology · The Korean Academy of Pediatric Allergy and Respiratory Disease

  6. Comparison of population-averaged and cluster-specific models for the analysis of cluster randomized trials with missing binary outcomes: a simulation study

    Directory of Open Access Journals (Sweden)

    Ma Jinhui

    2013-01-01

    Full Text Available Abstracts Background The objective of this simulation study is to compare the accuracy and efficiency of population-averaged (i.e. generalized estimating equations (GEE and cluster-specific (i.e. random-effects logistic regression (RELR models for analyzing data from cluster randomized trials (CRTs with missing binary responses. Methods In this simulation study, clustered responses were generated from a beta-binomial distribution. The number of clusters per trial arm, the number of subjects per cluster, intra-cluster correlation coefficient, and the percentage of missing data were allowed to vary. Under the assumption of covariate dependent missingness, missing outcomes were handled by complete case analysis, standard multiple imputation (MI and within-cluster MI strategies. Data were analyzed using GEE and RELR. Performance of the methods was assessed using standardized bias, empirical standard error, root mean squared error (RMSE, and coverage probability. Results GEE performs well on all four measures — provided the downward bias of the standard error (when the number of clusters per arm is small is adjusted appropriately — under the following scenarios: complete case analysis for CRTs with a small amount of missing data; standard MI for CRTs with variance inflation factor (VIF 50. RELR performs well only when a small amount of data was missing, and complete case analysis was applied. Conclusion GEE performs well as long as appropriate missing data strategies are adopted based on the design of CRTs and the percentage of missing data. In contrast, RELR does not perform well when either standard or within-cluster MI strategy is applied prior to the analysis.

  7. Melodic pattern discovery by structural analysis via wavelets and clustering techniques

    DEFF Research Database (Denmark)

    Velarde, Gissel; Meredith, David

    We present an automatic method to support melodic pattern discovery by structural analysis of symbolic representations by means of wavelet analysis and clustering techniques. In previous work, we used the method to recognize the parent works of melodic segments, or to classify tunes into tune......-means to cluster melodic segments into groups of measured similarity and obtain a raking of the most prototypical melodic segments or patterns and their occurrences. We test the method on the JKU Patterns Development Database and evaluate it based on the ground truth defined by the MIREX 2013 Discovery of Repeated...... Themes & Sections task. We compare the results of our method to the output of geometric approaches. Finally, we discuss about the relevance of our wavelet-based analysis in relation to structure, pattern discovery, similarity and variation, and comment about the considerations of the method when used...

  8. Global classification of human facial healthy skin using PLS discriminant analysis and clustering analysis.

    Science.gov (United States)

    Guinot, C; Latreille, J; Tenenhaus, M; Malvy, D J

    2001-04-01

    Today's classifications of healthy skin are predominantly based on a very limited number of skin characteristics, such as skin oiliness or susceptibility to sun exposure. The aim of the present analysis was to set up a global classification of healthy facial skin, using mathematical models. This classification is based on clinical, biophysical skin characteristics and self-reported information related to the skin, as well as the results of a theoretical skin classification assessed separately for the frontal and the malar zones of the face. In order to maximize the predictive power of the models with a minimum of variables, the Partial Least Square (PLS) discriminant analysis method was used. The resulting PLS components were subjected to clustering analyses to identify the plausible number of clusters and to group the individuals according to their proximities. Using this approach, four PLS components could be constructed and six clusters were found relevant. So, from the 36 hypothetical combinations of the theoretical skin types classification, we tended to a strengthened six classes proposal. Our data suggest that the association of the PLS discriminant analysis and the clustering methods leads to a valid and simple way to classify healthy human skin and represents a potentially useful tool for cosmetic and dermatological research.

  9. A Comparison of Methods for Player Clustering via Behavioral Telemetry

    DEFF Research Database (Denmark)

    Drachen, Anders; Thurau, C.; Sifa, R.

    2013-01-01

    patterns in the behavioral data, and developing profiles that are actionable to game developers. There are numerous methods for unsupervised clustering of user behavior, e.g. k-means/c-means, Nonnegative Matrix Factorization, or Principal Component Analysis. Although all yield behavior categorizations......, interpretation of the resulting categories in terms of actual play behavior can be difficult if not impossible. In this paper, a range of unsupervised techniques are applied together with Archetypal Analysis to develop behavioral clusters from playtime data of 70,014 World of Warcraft players, covering a five......The analysis of user behavior in digital games has been aided by the introduction of user telemetry in game development, which provides unprecedented access to quantitative data on user behavior from the installed game clients of the entire population of players. Player behavior telemetry datasets...

  10. Integration of Qualitative and Quantitative Methods: Building and Interpreting Clusters from Grounded Theory and Discourse Analysis

    Directory of Open Access Journals (Sweden)

    Aldo Merlino

    2007-01-01

    Full Text Available Qualitative methods present a wide spectrum of application possibilities as well as opportunities for combining qualitative and quantitative methods. In the social sciences fruitful theoretical discussions and a great deal of empirical research have taken place. This article introduces an empirical investigation which demonstrates the logic of combining methodologies as well as the collection and interpretation, both sequential as simultaneous, of qualitative and quantitative data. Specifically, the investigation process will be described, beginning with a grounded theory methodology and its combination with the techniques of structural semiotics discourse analysis to generate—in a first phase—an instrument for quantitative measuring and to understand—in a second phase—clusters obtained by quantitative analysis. This work illustrates how qualitative methods allow for the comprehension of the discursive and behavioral elements under study, and how they function as support making sense of and giving meaning to quantitative data. URN: urn:nbn:de:0114-fqs0701219

  11. Cluster analysis of typhoid cases in Kota Bharu, Kelantan, Malaysia

    Directory of Open Access Journals (Sweden)

    Nazarudin Safian

    2008-09-01

    Full Text Available Typhoid fever is still a major public health problem globally as well as in Malaysia. This study was done to identify the spatial epidemiology of typhoid fever in the Kota Bharu District of Malaysia as a first step to developing more advanced analysis of the whole country. The main characteristic of the epidemiological pattern that interested us was whether typhoid cases occurred in clusters or whether they were evenly distributed throughout the area. We also wanted to know at what spatial distances they were clustered. All confirmed typhoid cases that were reported to the Kota Bharu District Health Department from the year 2001 to June of 2005 were taken as the samples. From the home address of the cases, the location of the house was traced and a coordinate was taken using handheld GPS devices. Spatial statistical analysis was done to determine the distribution of typhoid cases, whether clustered, random or dispersed. The spatial statistical analysis was done using CrimeStat III software to determine whether typhoid cases occur in clusters, and later on to determine at what distances it clustered. From 736 cases involved in the study there was significant clustering for cases occurring in the years 2001, 2002, 2003 and 2005. There was no significant clustering in year 2004. Typhoid clustering also occurred strongly for distances up to 6 km. This study shows that typhoid cases occur in clusters, and this method could be applicable to describe spatial epidemiology for a specific area. (Med J Indones 2008; 17: 175-82Keywords: typhoid, clustering, spatial epidemiology, GIS

  12. Clustering of users of digital libraries through log file analysis

    Directory of Open Access Journals (Sweden)

    Juan Antonio Martínez-Comeche

    2017-09-01

    Full Text Available This study analyzes how users perform information retrieval tasks when introducing queries to the Hispanic Digital Library. Clusters of users are differentiated based on their distinct information behavior. The study used the log files collected by the server over a year and different possible clustering algorithms are compared. The k-means algorithm is found to be a suitable clustering method for the analysis of large log files from digital libraries. In the case of the Hispanic Digital Library the results show three clusters of users and the characteristic information behavior of each group is described.

  13. Going beyond clustering in MD trajectory analysis: an application to villin headpiece folding.

    Directory of Open Access Journals (Sweden)

    Aruna Rajan

    2010-04-01

    Full Text Available Recent advances in computing technology have enabled microsecond long all-atom molecular dynamics (MD simulations of biological systems. Methods that can distill the salient features of such large trajectories are now urgently needed. Conventional clustering methods used to analyze MD trajectories suffer from various setbacks, namely (i they are not data driven, (ii they are unstable to noise and changes in cut-off parameters such as cluster radius and cluster number, and (iii they do not reduce the dimensionality of the trajectories, and hence are unsuitable for finding collective coordinates. We advocate the application of principal component analysis (PCA and a non-metric multidimensional scaling (nMDS method to reduce MD trajectories and overcome the drawbacks of clustering. To illustrate the superiority of nMDS over other methods in reducing data and reproducing salient features, we analyze three complete villin headpiece folding trajectories. Our analysis suggests that the folding process of the villin headpiece is structurally heterogeneous.

  14. Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient.

    Science.gov (United States)

    Yao, Jianchao; Chang, Chunqi; Salmi, Mari L; Hung, Yeung Sam; Loraine, Ann; Roux, Stanley J

    2008-06-18

    Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. This study shows that SCC is an alternative to the Pearson

  15. A Multidimensional and Multimembership Clustering Method for Social Networks and Its Application in Customer Relationship Management

    Directory of Open Access Journals (Sweden)

    Peixin Zhao

    2013-01-01

    Full Text Available Community detection in social networks plays an important role in cluster analysis. Many traditional techniques for one-dimensional problems have been proven inadequate for high-dimensional or mixed type datasets due to the data sparseness and attribute redundancy. In this paper we propose a graph-based clustering method for multidimensional datasets. This novel method has two distinguished features: nonbinary hierarchical tree and the multi-membership clusters. The nonbinary hierarchical tree clearly highlights meaningful clusters, while the multimembership feature may provide more useful service strategies. Experimental results on the customer relationship management confirm the effectiveness of the new method.

  16. Clusters of galaxies as tools in observational cosmology : results from x-ray analysis

    International Nuclear Information System (INIS)

    Weratschnig, J.M.

    2009-01-01

    Clusters of galaxies are the largest gravitationally bound structures in the universe. They can be used as ideal tools to study large scale structure formation (e.g. when studying merger clusters) and provide highly interesting environments to analyse several characteristic interaction processes (like ram pressure stripping of galaxies, magnetic fields). In this dissertation thesis, we have studied several clusters of galaxies using X-ray observations. To obtain scientific results, we have applied different data reduction and analysis methods. With a combination of morphological and spectral analysis, the merger cluster Abell 514 was studied in much detail. It has a highly interesting morphology and shows signs for an ongoing merger as well as a shock. using a new method to detect substructure, we have analysed several clusters to determine whether any substructure is present in the X-ray image. This hints towards a real structure in the distribution of the intra-cluster medium (ICM) and is evidence for ongoing mergers. The results from this analysis are extensively used with the cluster of galaxies Abell S1136. Here, we study the ICM distribution and compare its structure with the spatial distribution of star forming galaxies. Cluster magnetic fields are another important topic of my thesis. They can be studied in Radio observations, which can be put into relation with results from X-ray observations. using observational data from several clusters, we could support the theory that cluster magnetic fields are frozen into the ICM. (author)

  17. Convex Clustering: An Attractive Alternative to Hierarchical Clustering

    Science.gov (United States)

    Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth

    2015-01-01

    The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340

  18. A Trajectory Regression Clustering Technique Combining a Novel Fuzzy C-Means Clustering Algorithm with the Least Squares Method

    Directory of Open Access Journals (Sweden)

    Xiangbing Zhou

    2018-04-01

    Full Text Available Rapidly growing GPS (Global Positioning System trajectories hide much valuable information, such as city road planning, urban travel demand, and population migration. In order to mine the hidden information and to capture better clustering results, a trajectory regression clustering method (an unsupervised trajectory clustering method is proposed to reduce local information loss of the trajectory and to avoid getting stuck in the local optimum. Using this method, we first define our new concept of trajectory clustering and construct a novel partitioning (angle-based partitioning method of line segments; second, the Lagrange-based method and Hausdorff-based K-means++ are integrated in fuzzy C-means (FCM clustering, which are used to maintain the stability and the robustness of the clustering process; finally, least squares regression model is employed to achieve regression clustering of the trajectory. In our experiment, the performance and effectiveness of our method is validated against real-world taxi GPS data. When comparing our clustering algorithm with the partition-based clustering algorithms (K-means, K-median, and FCM, our experimental results demonstrate that the presented method is more effective and generates a more reasonable trajectory.

  19. Open-Source Sequence Clustering Methods Improve the State Of the Art.

    Science.gov (United States)

    Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob

    2016-01-01

    Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http

  20. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    Science.gov (United States)

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  1. Cluster analysis of clinical data identifies fibromyalgia subgroups.

    Directory of Open Access Journals (Sweden)

    Elisa Docampo

    Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.

  2. An Examination of Three Spatial Event Cluster Detection Methods

    Directory of Open Access Journals (Sweden)

    Hensley H. Mariathas

    2015-03-01

    Full Text Available In spatial disease surveillance, geographic areas with large numbers of disease cases are to be identified, so that targeted investigations can be pursued. Geographic areas with high disease rates are called disease clusters and statistical cluster detection tests are used to identify geographic areas with higher disease rates than expected by chance alone. In some situations, disease-related events rather than individuals are of interest for geographical surveillance, and methods to detect clusters of disease-related events are called event cluster detection methods. In this paper, we examine three distributional assumptions for the events in cluster detection: compound Poisson, approximate normal and multiple hypergeometric (exact. The methods differ on the choice of distributional assumption for the potentially multiple correlated events per individual. The methods are illustrated on emergency department (ED presentations by children and youth (age < 18 years because of substance use in the province of Alberta, Canada, during 1 April 2007, to 31 March 2008. Simulation studies are conducted to investigate Type I error and the power of the clustering methods.

  3. Improvement of economic potential estimation methods for enterprise with potential branch clusters use

    Directory of Open Access Journals (Sweden)

    V.Ya. Nusinov

    2017-08-01

    Full Text Available The research determines that the current existing methods of enterprise’s economic potential estimation are based on the use of additive, multiplicative and rating models. It is determined that the existing methods have a row of defects. For example, not all the methods take into account the branch features of the analysis, and also the level of development of the enterprise comparatively with other enterprises. It is suggested to level such defects by an account at the estimation of potential integral level not only by branch features of enterprises activity but also by the intra-account economic clusterization of such enterprises. Scientific works which are connected with the using of clusters for the estimation of economic potential are generalized. According to the results of generalization it is determined that it is possible to distinguish 9 scientific approaches in this direction: the use of natural clusterization of enterprises with the purpose of estimation and increase of region potential; the use of natural clusterization of enterprises with the purpose of estimation and increase of industry potential; use of artificial clusterization of enterprises with the purpose of estimation and increase of region potential; use of artificial clusterization of enterprises with the purpose of estimation and increase of industry potential; the use of artificial clusterization of enterprises with the purpose of clustering potential estimation; the use of artificial clusterization of enterprises with the purpose of estimation of clustering competitiveness potential; the use of natural (artificial clusterization for the estimation of clustering efficiency; the use of natural (artificial clusterization for the increase of level at region (industries development; the use of methods of economic potential of region (industries estimation or its constituents for the construction of the clusters. It is determined that the use of clusterization method in

  4. A simple and fast method to determine the parameters for fuzzy c-means cluster analysis

    DEFF Research Database (Denmark)

    Schwämmle, Veit; Jensen, Ole Nørregaard

    2010-01-01

    MOTIVATION: Fuzzy c-means clustering is widely used to identify cluster structures in high-dimensional datasets, such as those obtained in DNA microarray and quantitative proteomics experiments. One of its main limitations is the lack of a computationally fast method to set optimal values...... of algorithm parameters. Wrong parameter values may either lead to the inclusion of purely random fluctuations in the results or ignore potentially important data. The optimal solution has parameter values for which the clustering does not yield any results for a purely random dataset but which detects cluster...... formation with maximum resolution on the edge of randomness. RESULTS: Estimation of the optimal parameter values is achieved by evaluation of the results of the clustering procedure applied to randomized datasets. In this case, the optimal value of the fuzzifier follows common rules that depend only...

  5. A hybrid method based on a new clustering technique and multilayer perceptron neural networks for hourly solar radiation forecasting

    International Nuclear Information System (INIS)

    Azimi, R.; Ghayekhloo, M.; Ghofrani, M.

    2016-01-01

    Highlights: • A novel clustering approach is proposed based on the data transformation approach. • A novel cluster selection method based on correlation analysis is presented. • The proposed hybrid clustering approach leads to deep learning for MLPNN. • A hybrid forecasting method is developed to predict solar radiations. • The evaluation results show superior performance of the proposed forecasting model. - Abstract: Accurate forecasting of renewable energy sources plays a key role in their integration into the grid. This paper proposes a hybrid solar irradiance forecasting framework using a Transformation based K-means algorithm, named TB K-means, to increase the forecast accuracy. The proposed clustering method is a combination of a new initialization technique, K-means algorithm and a new gradual data transformation approach. Unlike the other K-means based clustering methods which are not capable of providing a fixed and definitive answer due to the selection of different cluster centroids for each run, the proposed clustering provides constant results for different runs of the algorithm. The proposed clustering is combined with a time-series analysis, a novel cluster selection algorithm and a multilayer perceptron neural network (MLPNN) to develop the hybrid solar radiation forecasting method for different time horizons (1 h ahead, 2 h ahead, …, 48 h ahead). The performance of the proposed TB K-means clustering is evaluated using several different datasets and compared with different variants of K-means algorithm. Solar datasets with different solar radiation characteristics are also used to determine the accuracy and processing speed of the developed forecasting method with the proposed TB K-means and other clustering techniques. The results of direct comparison with other well-established forecasting models demonstrate the superior performance of the proposed hybrid forecasting method. Furthermore, a comparative analysis with the benchmark solar

  6. Assessment of Random Assignment in Training and Test Sets using Generalized Cluster Analysis Technique

    Directory of Open Access Journals (Sweden)

    Sorana D. BOLBOACĂ

    2011-06-01

    Full Text Available Aim: The properness of random assignment of compounds in training and validation sets was assessed using the generalized cluster technique. Material and Method: A quantitative Structure-Activity Relationship model using Molecular Descriptors Family on Vertices was evaluated in terms of assignment of carboquinone derivatives in training and test sets during the leave-many-out analysis. Assignment of compounds was investigated using five variables: observed anticancer activity and four structure descriptors. Generalized cluster analysis with K-means algorithm was applied in order to investigate if the assignment of compounds was or not proper. The Euclidian distance and maximization of the initial distance using a cross-validation with a v-fold of 10 was applied. Results: All five variables included in analysis proved to have statistically significant contribution in identification of clusters. Three clusters were identified, each of them containing both carboquinone derivatives belonging to training as well as to test sets. The observed activity of carboquinone derivatives proved to be normal distributed on every. The presence of training and test sets in all clusters identified using generalized cluster analysis with K-means algorithm and the distribution of observed activity within clusters sustain a proper assignment of compounds in training and test set. Conclusion: Generalized cluster analysis using the K-means algorithm proved to be a valid method in assessment of random assignment of carboquinone derivatives in training and test sets.

  7. The reflection of hierarchical cluster analysis of co-occurrence matrices in SPSS

    NARCIS (Netherlands)

    Zhou, Q.; Leng, F.; Leydesdorff, L.

    2015-01-01

    Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare

  8. Symptom Clusters in Advanced Cancer Patients: An Empirical Comparison of Statistical Methods and the Impact on Quality of Life.

    Science.gov (United States)

    Dong, Skye T; Costa, Daniel S J; Butow, Phyllis N; Lovell, Melanie R; Agar, Meera; Velikova, Galina; Teckle, Paulos; Tong, Allison; Tebbutt, Niall C; Clarke, Stephen J; van der Hoek, Kim; King, Madeleine T; Fayers, Peter M

    2016-01-01

    Symptom clusters in advanced cancer can influence patient outcomes. There is large heterogeneity in the methods used to identify symptom clusters. To investigate the consistency of symptom cluster composition in advanced cancer patients using different statistical methodologies for all patients across five primary cancer sites, and to examine which clusters predict functional status, a global assessment of health and global quality of life. Principal component analysis and exploratory factor analysis (with different rotation and factor selection methods) and hierarchical cluster analysis (with different linkage and similarity measures) were used on a data set of 1562 advanced cancer patients who completed the European Organization for the Research and Treatment of Cancer Quality of Life Questionnaire-Core 30. Four clusters consistently formed for many of the methods and cancer sites: tense-worry-irritable-depressed (emotional cluster), fatigue-pain, nausea-vomiting, and concentration-memory (cognitive cluster). The emotional cluster was a stronger predictor of overall quality of life than the other clusters. Fatigue-pain was a stronger predictor of overall health than the other clusters. The cognitive cluster and fatigue-pain predicted physical functioning, role functioning, and social functioning. The four identified symptom clusters were consistent across statistical methods and cancer types, although there were some noteworthy differences. Statistical derivation of symptom clusters is in need of greater methodological guidance. A psychosocial pathway in the management of symptom clusters may improve quality of life. Biological mechanisms underpinning symptom clusters need to be delineated by future research. A framework for evidence-based screening, assessment, treatment, and follow-up of symptom clusters in advanced cancer is essential. Copyright © 2016 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.

  9. Multisource Images Analysis Using Collaborative Clustering

    Directory of Open Access Journals (Sweden)

    Pierre Gançarski

    2008-04-01

    Full Text Available The development of very high-resolution (VHR satellite imagery has produced a huge amount of data. The multiplication of satellites which embed different types of sensors provides a lot of heterogeneous images. Consequently, the image analyst has often many different images available, representing the same area of the Earth surface. These images can be from different dates, produced by different sensors, or even at different resolutions. The lack of machine learning tools using all these representations in an overall process constraints to a sequential analysis of these various images. In order to use all the information available simultaneously, we propose a framework where different algorithms can use different views of the scene. Each one works on a different remotely sensed image and, thus, produces different and useful information. These algorithms work together in a collaborative way through an automatic and mutual refinement of their results, so that all the results have almost the same number of clusters, which are statistically similar. Finally, a unique result is produced, representing a consensus among the information obtained by each clustering method on its own image. The unified result and the complementarity of the single results (i.e., the agreement between the clustering methods as well as the disagreement lead to a better understanding of the scene. The experiments carried out on multispectral remote sensing images have shown that this method is efficient to extract relevant information and to improve the scene understanding.

  10. Fault detection of flywheel system based on clustering and principal component analysis

    Directory of Open Access Journals (Sweden)

    Wang Rixin

    2015-12-01

    Full Text Available Considering the nonlinear, multifunctional properties of double-flywheel with closed-loop control, a two-step method including clustering and principal component analysis is proposed to detect the two faults in the multifunctional flywheels. At the first step of the proposed algorithm, clustering is taken as feature recognition to check the instructions of “integrated power and attitude control” system, such as attitude control, energy storage or energy discharge. These commands will ask the flywheel system to work in different operation modes. Therefore, the relationship of parameters in different operations can define the cluster structure of training data. Ordering points to identify the clustering structure (OPTICS can automatically identify these clusters by the reachability-plot. K-means algorithm can divide the training data into the corresponding operations according to the reachability-plot. Finally, the last step of proposed model is used to define the relationship of parameters in each operation through the principal component analysis (PCA method. Compared with the PCA model, the proposed approach is capable of identifying the new clusters and learning the new behavior of incoming data. The simulation results show that it can effectively detect the faults in the multifunctional flywheels system.

  11. Fuzzy Clustering Methods and their Application to Fuzzy Modeling

    DEFF Research Database (Denmark)

    Kroszynski, Uri; Zhou, Jianjun

    1999-01-01

    Fuzzy modeling techniques based upon the analysis of measured input/output data sets result in a set of rules that allow to predict system outputs from given inputs. Fuzzy clustering methods for system modeling and identification result in relatively small rule-bases, allowing fast, yet accurate....... An illustrative synthetic example is analyzed, and prediction accuracy measures are compared between the different variants...

  12. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome

    Energy Technology Data Exchange (ETDEWEB)

    Lalonde, Michel, E-mail: mlalonde15@rogers.com; Wassenaar, Richard [Department of Physics, Carleton University, Ottawa, Ontario K1S 5B6 (Canada); Wells, R. Glenn; Birnie, David; Ruddy, Terrence D. [Division of Cardiology, University of Ottawa Heart Institute, Ottawa, Ontario K1Y 4W7 (Canada)

    2014-07-15

    Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster

  13. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome

    International Nuclear Information System (INIS)

    Lalonde, Michel; Wassenaar, Richard; Wells, R. Glenn; Birnie, David; Ruddy, Terrence D.

    2014-01-01

    Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster

  14. Identifying clinical course patterns in SMS data using cluster analysis

    DEFF Research Database (Denmark)

    Kent, Peter; Kongsted, Alice

    2012-01-01

    ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically important...... showed that clinical course patterns can be identified by cluster analysis using all SMS time points as cluster variables. This method is simple, intuitive and does not require a high level of statistical skill. However, there are alternative ways of managing SMS data and many different methods...

  15. Tweets clustering using latent semantic analysis

    Science.gov (United States)

    Rasidi, Norsuhaili Mahamed; Bakar, Sakhinah Abu; Razak, Fatimah Abdul

    2017-04-01

    Social media are becoming overloaded with information due to the increasing number of information feeds. Unlike other social media, Twitter users are allowed to broadcast a short message called as `tweet". In this study, we extract tweets related to MH370 for certain of time. In this paper, we present overview of our approach for tweets clustering to analyze the users' responses toward tragedy of MH370. The tweets were clustered based on the frequency of terms obtained from the classification process. The method we used for the text classification is Latent Semantic Analysis. As a result, there are two types of tweets that response to MH370 tragedy which is emotional and non-emotional. We show some of our initial results to demonstrate the effectiveness of our approach.

  16. Unbiased methods for removing systematics from galaxy clustering measurements

    Science.gov (United States)

    Elsner, Franz; Leistedt, Boris; Peiris, Hiranya V.

    2016-02-01

    Measuring the angular clustering of galaxies as a function of redshift is a powerful method for extracting information from the three-dimensional galaxy distribution. The precision of such measurements will dramatically increase with ongoing and future wide-field galaxy surveys. However, these are also increasingly sensitive to observational and astrophysical contaminants. Here, we study the statistical properties of three methods proposed for controlling such systematics - template subtraction, basic mode projection, and extended mode projection - all of which make use of externally supplied template maps, designed to characterize and capture the spatial variations of potential systematic effects. Based on a detailed mathematical analysis, and in agreement with simulations, we find that the template subtraction method in its original formulation returns biased estimates of the galaxy angular clustering. We derive closed-form expressions that should be used to correct results for this shortcoming. Turning to the basic mode projection algorithm, we prove it to be free of any bias, whereas we conclude that results computed with extended mode projection are biased. Within a simplified setup, we derive analytical expressions for the bias and discuss the options for correcting it in more realistic configurations. Common to all three methods is an increased estimator variance induced by the cleaning process, albeit at different levels. These results enable unbiased high-precision clustering measurements in the presence of spatially varying systematics, an essential step towards realizing the full potential of current and planned galaxy surveys.

  17. Identifying influential individuals on intensive care units: using cluster analysis to explore culture.

    Science.gov (United States)

    Fong, Allan; Clark, Lindsey; Cheng, Tianyi; Franklin, Ella; Fernandez, Nicole; Ratwani, Raj; Parker, Sarah Henrickson

    2017-07-01

    The objective of this paper is to identify attribute patterns of influential individuals in intensive care units using unsupervised cluster analysis. Despite the acknowledgement that culture of an organisation is critical to improving patient safety, specific methods to shift culture have not been explicitly identified. A social network analysis survey was conducted and an unsupervised cluster analysis was used. A total of 100 surveys were gathered. Unsupervised cluster analysis was used to group individuals with similar dimensions highlighting three general genres of influencers: well-rounded, knowledge and relational. Culture is created locally by individual influencers. Cluster analysis is an effective way to identify common characteristics among members of an intensive care unit team that are noted as highly influential by their peers. To change culture, identifying and then integrating the influencers in intervention development and dissemination may create more sustainable and effective culture change. Additional studies are ongoing to test the effectiveness of utilising these influencers to disseminate patient safety interventions. This study offers an approach that can be helpful in both identifying and understanding influential team members and may be an important aspect of developing methods to change organisational culture. © 2017 John Wiley & Sons Ltd.

  18. Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study.

    Science.gov (United States)

    Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein

    2014-11-01

    Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. Characterizing Suicide in Toronto: An Observational Study and Cluster Analysis

    Science.gov (United States)

    Sinyor, Mark; Schaffer, Ayal; Streiner, David L

    2014-01-01

    Objective: To determine whether people who have died from suicide in a large epidemiologic sample form clusters based on demographic, clinical, and psychosocial factors. Method: We conducted a coroner’s chart review for 2886 people who died in Toronto, Ontario, from 1998 to 2010, and whose death was ruled as suicide by the Office of the Chief Coroner of Ontario. A cluster analysis using known suicide risk factors was performed to determine whether suicide deaths separate into distinct groups. Clusters were compared according to person- and suicide-specific factors. Results: Five clusters emerged. Cluster 1 had the highest proportion of females and nonviolent methods, and all had depression and a past suicide attempt. Cluster 2 had the highest proportion of people with a recent stressor and violent suicide methods, and all were married. Cluster 3 had mostly males between the ages of 20 and 64, and all had either experienced recent stressors, suffered from mental illness, or had a history of substance abuse. Cluster 4 had the youngest people and the highest proportion of deaths by jumping from height, few were married, and nearly one-half had bipolar disorder or schizophrenia. Cluster 5 had all unmarried people with no prior suicide attempts, and were the least likely to have an identified mental illness and most likely to leave a suicide note. Conclusions: People who die from suicide assort into different patterns of demographic, clinical, and death-specific characteristics. Identifying and studying subgroups of suicides may advance our understanding of the heterogeneous nature of suicide and help to inform development of more targeted suicide prevention strategies. PMID:24444321

  20. CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

    Science.gov (United States)

    Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin

    2017-08-31

    Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.

  1. Gene expression data clustering and it’s application in differential analysis of leukemia

    Directory of Open Access Journals (Sweden)

    M. Vahedi

    2008-02-01

    Full Text Available Introduction: DNA microarray technique is one of the most important categories in bioinformatics,which allows the possibility of monitoring thousands of expressed genes has been resulted in creatinggiant data bases of gene expression data, recently. Statistical analysis of such databases includednormalization, clustering, classification and etc.Materials and Methods: Golub et al (1999 collected data bases of leukemia based on the method ofoligonucleotide. The data is on the internet. In this paper, we analyzed gene expression data. It wasclustered by several methods including multi-dimensional scaling, hierarchical and non-hierarchicalclustering. Data set included 20 Acute Lymphoblastic Leukemia (ALL patients and 14 Acute MyeloidLeukemia (AML patients. The results of tow methods of clustering were compared with regard to realgrouping (ALL & AML. R software was used for data analysis.Results: Specificity and sensitivity of divisive hierarchical clustering in diagnosing of ALL patientswere 75% and 92%, respectively. Specificity and sensitivity of partitioning around medoids indiagnosing of ALL patients were 90% and 93%, respectively. These results showed a wellaccomplishment of both methods of clustering. It is considerable that, due to clustering methodsresults, one of the samples was placed in ALL groups, which was in AML group in clinical test.Conclusion: With regard to concordance of the results with real grouping of data, therefore we canuse these methods in the cases where we don't have accurate information of real grouping of data.Moreover, Results of clustering might distinct subgroups of data in such a way that would be necessaryfor concordance with clinical outcomes, laboratory results and so on.

  2. Feature-Space Clustering for fMRI Meta-Analysis

    DEFF Research Database (Denmark)

    Goutte, Cyril; Hansen, Lars Kai; Liptrot, Mathew G.

    2001-01-01

    MRI sequences containing several hundreds of images, it is sometimes necessary to invoke feature extraction to reduce the dimensionality of the data space. A second interesting application is in the meta-analysis of fMRI experiment, where features are obtained from a possibly large number of single......-voxel analyses. In particular this allows the checking of the differences and agreements between different methods of analysis. Both approaches are illustrated on a fMRI data set involving visual stimulation, and we show that the feature space clustering approach yields nontrivial results and, in particular......, shows interesting differences between individual voxel analysis performed with traditional methods. © 2001 Wiley-Liss, Inc....

  3. Exact WKB analysis and cluster algebras

    International Nuclear Information System (INIS)

    Iwaki, Kohei; Nakanishi, Tomoki

    2014-01-01

    We develop the mutation theory in the exact WKB analysis using the framework of cluster algebras. Under a continuous deformation of the potential of the Schrödinger equation on a compact Riemann surface, the Stokes graph may change the topology. We call this phenomenon the mutation of Stokes graphs. Along the mutation of Stokes graphs, the Voros symbols, which are monodromy data of the equation, also mutate due to the Stokes phenomenon. We show that the Voros symbols mutate as variables of a cluster algebra with surface realization. As an application, we obtain the identities of Stokes automorphisms associated with periods of cluster algebras. The paper also includes an extensive introduction of the exact WKB analysis and the surface realization of cluster algebras for nonexperts. This article is part of a special issue of Journal of Physics A: Mathematical and Theoretical devoted to ‘Cluster algebras in mathematical physics’. (paper)

  4. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    Science.gov (United States)

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.

  5. Cluster analysis of HZE particle tracks as applied to space radiobiology problems

    International Nuclear Information System (INIS)

    Batmunkh, M.; Bayarchimeg, L.; Lkhagva, O.; Belov, O.

    2013-01-01

    A cluster analysis is performed of ionizations in tracks produced by the most abundant nuclei in the charge and energy spectra of the galactic cosmic rays. The frequency distribution of clusters is estimated for cluster sizes comparable to the DNA molecule at different packaging levels. For this purpose, an improved K-mean-based algorithm is suggested. This technique allows processing particle tracks containing a large number of ionization events without setting the number of clusters as an input parameter. Using this method, the ionization distribution pattern is analyzed depending on the cluster size and particle's linear energy transfer

  6. ANALYSIS OF DEVELOPING BATIK INDUSTRY CLUSTER IN BAKARAN VILLAGE CENTRAL JAVA PROVINCE

    Directory of Open Access Journals (Sweden)

    Hermanto Hermanto

    2017-06-01

    Full Text Available SMEs grow in a cluster in a certain geographical area. The entrepreneurs grow and thrive through the business cluster. Central Java Province has a lot of business clusters in improving the regional economy, one of which is batik industry cluster. Pati Regency is one of regencies / city in Central Java that has the lowest turnover. Batik industy cluster in Pati develops quite well, which can be seen from the increasing number of batik industry incorporated in the cluster. This research examines the strategy of developing the batik industry cluster in Pati Regency. The purpose of this research is to determine the proper strategy for developing the batik industry clusters in Pati. The method of research is quantitative. The analysis tool of this research is the Strengths, Weakness, Opportunity, Threats (SWOT analysis. The result of SWOT analysis in this research shows that the proper strategy for developing the batik industry cluster in Pati is optimizing the management of batik business cluster in Bakaran Village; the local government provides information of the facility of business capital loans; the utilization of labors from Bakaran Village while improving the quality of labors by training, and marketing the Bakaran batik to the broader markets while maintaining the quality of batik. Advice that can be given from this research is that the parties who have a role in batik industry cluster development in Bakaran Village, Pati Regency, such as the Local Government.

  7. Applying Clustering to Statistical Analysis of Student Reasoning about Two-Dimensional Kinematics

    Science.gov (United States)

    Springuel, R. Padraic; Wittman, Michael C.; Thompson, John R.

    2007-01-01

    We use clustering, an analysis method not presently common to the physics education research community, to group and characterize student responses to written questions about two-dimensional kinematics. Previously, clustering has been used to analyze multiple-choice data; we analyze free-response data that includes both sketches of vectors and…

  8. Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases.

    Science.gov (United States)

    Le Vu, Stéphane; Ratmann, Oliver; Delpech, Valerie; Brown, Alison E; Gill, O Noel; Tostevin, Anna; Fraser, Christophe; Volz, Erik M

    2018-06-01

    Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  9. Cosmological analysis of galaxy clusters surveys in X-rays

    International Nuclear Information System (INIS)

    Clerc, N.

    2012-01-01

    Clusters of galaxies are the most massive objects in equilibrium in our Universe. Their study allows to test cosmological scenarios of structure formation with precision, bringing constraints complementary to those stemming from the cosmological background radiation, supernovae or galaxies. They are identified through the X-ray emission of their heated gas, thus facilitating their mapping at different epochs of the Universe. This report presents two surveys of galaxy clusters detected in X-rays and puts forward a method for their cosmological interpretation. Thanks to its multi-wavelength coverage extending over 10 sq. deg. and after one decade of expertise, the XMM-LSS allows a systematic census of clusters in a large volume of the Universe. In the framework of this survey, the first part of this report describes the techniques developed to the purpose of characterizing the detected objects. A particular emphasis is placed on the most distant ones (z ≥ 1) through the complementarity of observations in X-ray, optical and infrared bands. Then the X-CLASS survey is fully described. Based on XMM archival data, it provides a new catalogue of 800 clusters detected in X-rays. A cosmological analysis of this survey is performed thanks to 'CR-HR' diagrams. This new method self-consistently includes selection effects and scaling relations and provides a means to bypass the computation of individual cluster masses. Propositions are made for applying this method to future surveys as XMM-XXL and eRosita. (author) [fr

  10. Method for detecting clusters of possible uranium deposits

    International Nuclear Information System (INIS)

    Conover, W.J.; Bement, T.R.; Iman, R.L.

    1978-01-01

    When a two-dimensional map contains points that appear to be scattered somewhat at random, a question that often arises is whether groups of points that appear to cluster are merely exhibiting ordinary behavior, which one can expect with any random distribution of points, or whether the clusters are too pronounced to be attributable to chance alone. A method for detecting clusters along a straight line is applied to the two-dimensional map of 214 Bi anomalies observed as part of the National Uranium Resource Evaluation Program in the Lubbock, Texas, region. Some exact probabilities associated with this method are computed and compared with two approximate methods. The two methods for approximating probabilities work well in the cases examined and can be used when it is not feasible to obtain the exact probabilities

  11. Application of cluster analysis to geochemical compositional data for identifying ore-related geochemical anomalies

    Science.gov (United States)

    Zhou, Shuguang; Zhou, Kefa; Wang, Jinlin; Yang, Genfang; Wang, Shanshan

    2017-12-01

    Cluster analysis is a well-known technique that is used to analyze various types of data. In this study, cluster analysis is applied to geochemical data that describe 1444 stream sediment samples collected in northwestern Xinjiang with a sample spacing of approximately 2 km. Three algorithms (the hierarchical, k-means, and fuzzy c-means algorithms) and six data transformation methods (the z-score standardization, ZST; the logarithmic transformation, LT; the additive log-ratio transformation, ALT; the centered log-ratio transformation, CLT; the isometric log-ratio transformation, ILT; and no transformation, NT) are compared in terms of their effects on the cluster analysis of the geochemical compositional data. The study shows that, on the one hand, the ZST does not affect the results of column- or variable-based (R-type) cluster analysis, whereas the other methods, including the LT, the ALT, and the CLT, have substantial effects on the results. On the other hand, the results of the row- or observation-based (Q-type) cluster analysis obtained from the geochemical data after applying NT and the ZST are relatively poor. However, we derive some improved results from the geochemical data after applying the CLT, the ILT, the LT, and the ALT. Moreover, the k-means and fuzzy c-means clustering algorithms are more reliable than the hierarchical algorithm when they are used to cluster the geochemical data. We apply cluster analysis to the geochemical data to explore for Au deposits within the study area, and we obtain a good correlation between the results retrieved by combining the CLT or the ILT with the k-means or fuzzy c-means algorithms and the potential zones of Au mineralization. Therefore, we suggest that the combination of the CLT or the ILT with the k-means or fuzzy c-means algorithms is an effective tool to identify potential zones of mineralization from geochemical data.

  12. De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units

    Directory of Open Access Journals (Sweden)

    Sarah L. Westcott

    2015-12-01

    Full Text Available Background. 16S rRNA gene sequences are routinely assigned to operational taxonomic units (OTUs that are then used to analyze complex microbial communities. A number of methods have been employed to carry out the assignment of 16S rRNA gene sequences to OTUs leading to confusion over which method is optimal. A recent study suggested that a clustering method should be selected based on its ability to generate stable OTU assignments that do not change as additional sequences are added to the dataset. In contrast, we contend that the quality of the OTU assignments, the ability of the method to properly represent the distances between the sequences, is more important.Methods. Our analysis implemented six de novo clustering algorithms including the single linkage, complete linkage, average linkage, abundance-based greedy clustering, distance-based greedy clustering, and Swarm and the open and closed-reference methods. Using two previously published datasets we used the Matthew’s Correlation Coefficient (MCC to assess the stability and quality of OTU assignments.Results. The stability of OTU assignments did not reflect the quality of the assignments. Depending on the dataset being analyzed, the average linkage and the distance and abundance-based greedy clustering methods generated OTUs that were more likely to represent the actual distances between sequences than the open and closed-reference methods. We also demonstrated that for the greedy algorithms VSEARCH produced assignments that were comparable to those produced by USEARCH making VSEARCH a viable free and open source alternative to USEARCH. Further interrogation of the reference-based methods indicated that when USEARCH or VSEARCH were used to identify the closest reference, the OTU assignments were sensitive to the order of the reference sequences because the reference sequences can be identical over the region being considered. More troubling was the observation that while both USEARCH and

  13. Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability

    Science.gov (United States)

    Miller, Christopher B.; Bartlett, Delwyn J.; Mullins, Anna E.; Dodds, Kirsty L.; Gordon, Christopher J.; Kyle, Simon D.; Kim, Jong Won; D'Rozario, Angela L.; Lee, Rico S.C.; Comas, Maria; Marshall, Nathaniel S.; Yee, Brendon J.; Espie, Colin A.; Grunstein, Ronald R.

    2016-01-01

    Study Objectives: To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative (q)-EEG and heart rate variability (HRV). Methods: Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. Results: From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q-EEG. Clinical Trial Registration: Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. Citation: Miller CB, Bartlett DJ, Mullins AE, Dodds KL, Gordon CJ, Kyle SD, Kim JW, D'Rozario AL, Lee RS, Comas M, Marshall NS, Yee BJ, Espie CA, Grunstein RR. Clusters of Insomnia Disorder: an exploratory cluster analysis of objective sleep parameters reveals differences in neurocognitive functioning, quantitative EEG, and heart rate variability. SLEEP 2016;39(11):1993–2004. PMID:27568796

  14. Spatial cluster analysis of nanoscopically mapped serotonin receptors for classification of fixed brain tissue

    Science.gov (United States)

    Sams, Michael; Silye, Rene; Göhring, Janett; Muresan, Leila; Schilcher, Kurt; Jacak, Jaroslaw

    2014-01-01

    We present a cluster spatial analysis method using nanoscopic dSTORM images to determine changes in protein cluster distributions within brain tissue. Such methods are suitable to investigate human brain tissue and will help to achieve a deeper understanding of brain disease along with aiding drug development. Human brain tissue samples are usually treated postmortem via standard fixation protocols, which are established in clinical laboratories. Therefore, our localization microscopy-based method was adapted to characterize protein density and protein cluster localization in samples fixed using different protocols followed by common fluorescent immunohistochemistry techniques. The localization microscopy allows nanoscopic mapping of serotonin 5-HT1A receptor groups within a two-dimensional image of a brain tissue slice. These nanoscopically mapped proteins can be confined to clusters by applying the proposed statistical spatial analysis. Selected features of such clusters were subsequently used to characterize and classify the tissue. Samples were obtained from different types of patients, fixed with different preparation methods, and finally stored in a human tissue bank. To verify the proposed method, samples of a cryopreserved healthy brain have been compared with epitope-retrieved and paraffin-fixed tissues. Furthermore, samples of healthy brain tissues were compared with data obtained from patients suffering from mental illnesses (e.g., major depressive disorder). Our work demonstrates the applicability of localization microscopy and image analysis methods for comparison and classification of human brain tissues at a nanoscopic level. Furthermore, the presented workflow marks a unique technological advance in the characterization of protein distributions in brain tissue sections.

  15. A Web service substitution method based on service cluster nets

    Science.gov (United States)

    Du, YuYue; Gai, JunJing; Zhou, MengChu

    2017-11-01

    Service substitution is an important research topic in the fields of Web services and service-oriented computing. This work presents a novel method to analyse and substitute Web services. A new concept, called a Service Cluster Net Unit, is proposed based on Web service clusters. A service cluster is converted into a Service Cluster Net Unit. Then it is used to analyse whether the services in the cluster can satisfy some service requests. Meanwhile, the substitution methods of an atomic service and a composite service are proposed. The correctness of the proposed method is proved, and the effectiveness is shown and compared with the state-of-the-art method via an experiment. It can be readily applied to e-commerce service substitution to meet the business automation needs.

  16. Single pass kernel k-means clustering method

    Indian Academy of Sciences (India)

    paper proposes a simple and faster version of the kernel k-means clustering ... It has been considered as an important tool ... On the other hand, kernel-based clustering methods, like kernel k-means clus- ..... able at the UCI machine learning repository (Murphy 1994). ... All the data sets have only numeric valued features.

  17. Cluster Analysis of Maize Inbred Lines

    Directory of Open Access Journals (Sweden)

    Jiban Shrestha

    2016-12-01

    Full Text Available The determination of diversity among inbred lines is important for heterosis breeding. Sixty maize inbred lines were evaluated for their eight agro morphological traits during winter season of 2011 to analyze their genetic diversity. Clustering was done by average linkage method. The inbred lines were grouped into six clusters. Inbred lines grouped into Clusters II had taller plants with maximum number of leaves. The cluster III was characterized with shorter plants with minimum number of leaves. The inbred lines categorized into cluster V had early flowering whereas the group into cluster VI had late flowering time. The inbred lines grouped into the cluster III were characterized by higher value of anthesis silking interval (ASI and those of cluster VI had lower value of ASI. These results showed that the inbred lines having widely divergent clusters can be utilized in hybrid breeding programme.

  18. Application of microarray analysis on computer cluster and cloud platforms.

    Science.gov (United States)

    Bernau, C; Boulesteix, A-L; Knaus, J

    2013-01-01

    Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.

  19. The polarizable embedding coupled cluster method

    DEFF Research Database (Denmark)

    Sneskov, Kristian; Schwabe, Tobias; Kongsted, Jacob

    2011-01-01

    We formulate a new combined quantum mechanics/molecular mechanics (QM/MM) method based on a self-consistent polarizable embedding (PE) scheme. For the description of the QM region, we apply the popular coupled cluster (CC) method detailing the inclusion of electrostatic and polarization effects...

  20. Pre-crash scenarios at road junctions: A clustering method for car crash data.

    Science.gov (United States)

    Nitsche, Philippe; Thomas, Pete; Stuetz, Rainer; Welsh, Ruth

    2017-10-01

    Given the recent advancements in autonomous driving functions, one of the main challenges is safe and efficient operation in complex traffic situations such as road junctions. There is a need for comprehensive testing, either in virtual simulation environments or on real-world test tracks. This paper presents a novel data analysis method including the preparation, analysis and visualization of car crash data, to identify the critical pre-crash scenarios at T- and four-legged junctions as a basis for testing the safety of automated driving systems. The presented method employs k-medoids to cluster historical junction crash data into distinct partitions and then applies the association rules algorithm to each cluster to specify the driving scenarios in more detail. The dataset used consists of 1056 junction crashes in the UK, which were exported from the in-depth "On-the-Spot" database. The study resulted in thirteen crash clusters for T-junctions, and six crash clusters for crossroads. Association rules revealed common crash characteristics, which were the basis for the scenario descriptions. The results support existing findings on road junction accidents and provide benchmark situations for safety performance tests in order to reduce the possible number parameter combinations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Cluster analysis for DNA methylation profiles having a detection threshold

    Directory of Open Access Journals (Sweden)

    Siegmund Kimberly D

    2006-07-01

    Full Text Available Abstract Background DNA methylation, a molecular feature used to investigate tumor heterogeneity, can be measured on many genomic regions using the MethyLight technology. Due to the combination of the underlying biology of DNA methylation and the MethyLight technology, the measurements, while being generated on a continuous scale, have a large number of 0 values. This suggests that conventional clustering methodology may not perform well on this data. Results We compare performance of existing methodology (such as k-means with two novel methods that explicitly allow for the preponderance of values at 0. We also consider how the ability to successfully cluster such data depends upon the number of informative genes for which methylation is measured and the correlation structure of the methylation values for those genes. We show that when data is collected for a sufficient number of genes, our models do improve clustering performance compared to methods, such as k-means, that do not explicitly respect the supposed biological realities of the situation. Conclusion The performance of analysis methods depends upon how well the assumptions of those methods reflect the properties of the data being analyzed. Differing technologies will lead to data with differing properties, and should therefore be analyzed differently. Consequently, it is prudent to give thought to what the properties of the data are likely to be, and which analysis method might therefore be likely to best capture those properties.

  2. Atomic and electronic structure of clusters from car-Parrinello method

    International Nuclear Information System (INIS)

    Kumar, V.

    1994-06-01

    With the development of ab-initio molecular dynamics method, it has now become possible to study the static and dynamical properties of clusters containing up to a few tens of atoms. Here I present a review of the method within the framework of the density functional theory and pseudopotential approach to represent the electron-ion interaction and discuss some of its applications to clusters. Particular attention is focussed on the structure and bonding properties of clusters as a function of their size. Applications to clusters of alkali metals and Al, non-metal - metal transition in divalent metal clusters, molecular clusters of carbon and Sb are discussed in detail. Some results are also presented on mixed clusters. (author). 121 refs, 24 ifigs

  3. Cluster size statistic and cluster mass statistic: two novel methods for identifying changes in functional connectivity between groups or conditions.

    Science.gov (United States)

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS) and cluster mass statistic (CMS)--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.

  4. Phenotypes of asthma in low-income children and adolescents: cluster analysis

    Directory of Open Access Journals (Sweden)

    Anna Lucia Barros Cabral

    Full Text Available ABSTRACT Objective: Studies characterizing asthma phenotypes have predominantly included adults or have involved children and adolescents in developed countries. Therefore, their applicability in other populations, such as those of developing countries, remains indeterminate. Our objective was to determine how low-income children and adolescents with asthma in Brazil are distributed across a cluster analysis. Methods: We included 306 children and adolescents (6-18 years of age with a clinical diagnosis of asthma and under medical treatment for at least one year of follow-up. At enrollment, all the patients were clinically stable. For the cluster analysis, we selected 20 variables commonly measured in clinical practice and considered important in defining asthma phenotypes. Variables with high multicollinearity were excluded. A cluster analysis was applied using a twostep agglomerative test and log-likelihood distance measure. Results: Three clusters were defined for our population. Cluster 1 (n = 94 included subjects with normal pulmonary function, mild eosinophil inflammation, few exacerbations, later age at asthma onset, and mild atopy. Cluster 2 (n = 87 included those with normal pulmonary function, a moderate number of exacerbations, early age at asthma onset, more severe eosinophil inflammation, and moderate atopy. Cluster 3 (n = 108 included those with poor pulmonary function, frequent exacerbations, severe eosinophil inflammation, and severe atopy. Conclusions: Asthma was characterized by the presence of atopy, number of exacerbations, and lung function in low-income children and adolescents in Brazil. The many similarities with previous cluster analyses of phenotypes indicate that this approach shows good generalizability.

  5. Latent cluster analysis of ALS phenotypes identifies prognostically differing groups.

    Directory of Open Access Journals (Sweden)

    Jeban Ganesalingam

    2009-09-01

    Full Text Available Amyotrophic lateral sclerosis (ALS is a degenerative disease predominantly affecting motor neurons and manifesting as several different phenotypes. Whether these phenotypes correspond to different underlying disease processes is unknown. We used latent cluster analysis to identify groupings of clinical variables in an objective and unbiased way to improve phenotyping for clinical and research purposes.Latent class cluster analysis was applied to a large database consisting of 1467 records of people with ALS, using discrete variables which can be readily determined at the first clinic appointment. The model was tested for clinical relevance by survival analysis of the phenotypic groupings using the Kaplan-Meier method.The best model generated five distinct phenotypic classes that strongly predicted survival (p<0.0001. Eight variables were used for the latent class analysis, but a good estimate of the classification could be obtained using just two variables: site of first symptoms (bulbar or limb and time from symptom onset to diagnosis (p<0.00001.The five phenotypic classes identified using latent cluster analysis can predict prognosis. They could be used to stratify patients recruited into clinical trials and generating more homogeneous disease groups for genetic, proteomic and risk factor research.

  6. Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review.

    Science.gov (United States)

    Kristunas, Caroline; Morris, Tom; Gray, Laura

    2017-11-15

    To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Any, not limited to healthcare settings. Any taking part in an SW-CRT published up to March 2016. The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22-0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  7. Diagnostics of subtropical plants functional state by cluster analysis

    Directory of Open Access Journals (Sweden)

    Oksana Belous

    2016-05-01

    Full Text Available The article presents an application example of statistical methods for data analysis on diagnosis of the adaptive capacity of subtropical plants varieties. We depicted selection indicators and basic physiological parameters that were defined as diagnostic. We used evaluation on a set of parameters of water regime, there are: determination of water deficit of the leaves, determining the fractional composition of water and detection parameters of the concentration of cell sap (CCS (for tea culture flushes. These settings are characterized by high liability and high responsiveness to the effects of many abiotic factors that determined the particular care in the selection of plant material for analysis and consideration of the impact on sustainability. On the basis of the experimental data calculated the coefficients of pair correlation between climatic factors and used physiological indicators. The result was a selection of physiological and biochemical indicators proposed to assess the adaptability and included in the basis of methodical recommendations on diagnostics of the functional state of the studied cultures. Analysis of complex studies involving a large number of indicators is quite difficult, especially does not allow to quickly identify the similarity of new varieties for their adaptive responses to adverse factors, and, therefore, to set general requirements to conditions of cultivation. Use of cluster analysis suggests that in the analysis of only quantitative data; define a set of variables used to assess varieties (and the more sampling, the more accurate the clustering will happen, be sure to ascertain the measure of similarity (or difference between objects. It is shown that the identification of diagnostic features, which are subjected to statistical processing, impact the accuracy of the varieties classification. Selection in result of the mono-clusters analysis (variety tea Kolhida; hazelnut Lombardsky red; variety kiwi Monty

  8. Extending the input–output energy balance methodology in agriculture through cluster analysis

    International Nuclear Information System (INIS)

    Bojacá, Carlos Ricardo; Casilimas, Héctor Albeiro; Gil, Rodrigo; Schrevens, Eddie

    2012-01-01

    The input–output balance methodology has been applied to characterize the energy balance of agricultural systems. This study proposes to extend this methodology with the inclusion of multivariate analysis to reveal particular patterns in the energy use of a system. The objective was to demonstrate the usefulness of multivariate exploratory techniques to analyze the variability found in a farming system and, establish efficiency categories that can be used to improve the energy balance of the system. To this purpose an input–output analysis was applied to the major greenhouse tomato production area in Colombia. Individual energy profiles were built and the k-means clustering method was applied to the production factors. On average, the production system in the study zone consumes 141.8 GJ ha −1 to produce 96.4 GJ ha −1 , resulting in an energy efficiency of 0.68. With the k-means clustering analysis, three clusters of farmers were identified with energy efficiencies of 0.54, 0.67 and 0.78. The most energy efficient cluster grouped 56.3% of the farmers. It is possible to optimize the production system by improving the management practices of those with the lowest energy use efficiencies. Multivariate analysis techniques demonstrated to be a complementary pathway to improve the energy efficiency of a system. -- Highlights: ► An input–output energy balance was estimated for greenhouse tomatoes in Colombia. ► We used the k-means clustering method to classify growers based on their energy use. ► Three clusters of growers were found with energy efficiencies of 0.54, 0.67 and 0.78. ► Overall system optimization is possible by improving the energy use of the less efficient.

  9. A Clustering Method for Data in Cylindrical Coordinates

    Directory of Open Access Journals (Sweden)

    Kazuhisa Fujita

    2017-01-01

    Full Text Available We propose a new clustering method for data in cylindrical coordinates based on the k-means. The goal of the k-means family is to maximize an optimization function, which requires a similarity. Thus, we need a new similarity to obtain the new clustering method for data in cylindrical coordinates. In this study, we first derive a new similarity for the new clustering method by assuming a particular probabilistic model. A data point in cylindrical coordinates has radius, azimuth, and height. We assume that the azimuth is sampled from a von Mises distribution and the radius and the height are independently generated from isotropic Gaussian distributions. We derive the new similarity from the log likelihood of the assumed probability distribution. Our experiments demonstrate that the proposed method using the new similarity can appropriately partition synthetic data defined in cylindrical coordinates. Furthermore, we apply the proposed method to color image quantization and show that the methods successfully quantize a color image with respect to the hue element.

  10. A New Waveform Signal Processing Method Based on Adaptive Clustering-Genetic Algorithms

    International Nuclear Information System (INIS)

    Noha Shaaban; Fukuzo Masuda; Hidetsugu Morota

    2006-01-01

    We present a fast digital signal processing method for numerical analysis of individual pulses from CdZnTe compound semiconductor detectors. Using Maxi-Mini Distance Algorithm and Genetic Algorithms based discrimination technique. A parametric approach has been used for classifying the discriminated waveforms into a set of clusters each has a similar signal shape with a corresponding pulse height spectrum. A corrected total pulse height spectrum was obtained by applying a normalization factor for the full energy peak for each cluster with a highly improvements in the energy spectrum characteristics. This method applied successfully for both simulated and real measured data, it can be applied to any detector suffers from signal shape variation. (authors)

  11. Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

    Energy Technology Data Exchange (ETDEWEB)

    Data Analysis and Visualization (IDAV) and the Department of Computer Science, University of California, Davis, One Shields Avenue, Davis CA 95616, USA,; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,' ' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA; Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA; Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA,; Computer Science Division,University of California, Berkeley, CA, USA,; Computer Science Department, University of California, Irvine, CA, USA,; All authors are with the Berkeley Drosophila Transcription Network Project, Lawrence Berkeley National Laboratory,; Rubel, Oliver; Weber, Gunther H.; Huang, Min-Yu; Bethel, E. Wes; Biggin, Mark D.; Fowlkes, Charless C.; Hendriks, Cris L. Luengo; Keranen, Soile V. E.; Eisen, Michael B.; Knowles, David W.; Malik, Jitendra; Hagen, Hans; Hamann, Bernd

    2008-05-12

    The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii) evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.

  12. Environmental conflict analysis using an integrated grey clustering and entropy-weight method: A case study of a mining project in Peru.

    OpenAIRE

    Delgado-Villanueva, Kiko Alexi; Romero Gil, Inmaculada

    2016-01-01

    [EN] Environmental conflict analysis (henceforth ECA) has become a key factor for the viability of projects and welfare of affected populations. In this study, we propose an approach for ECA using an integrated grey clustering and entropy-weight method (The IGCEW method). The case study considered a mining project in northern Peru. Three stakeholder groups and seven criteria were identified. The data were gathered by conducting field interviews. The results revealed that for the groups urban ...

  13. Cluster analysis

    OpenAIRE

    Mucha, Hans-Joachim; Sofyan, Hizir

    2000-01-01

    As an explorative technique, duster analysis provides a description or a reduction in the dimension of the data. It classifies a set of observations into two or more mutually exclusive unknown groups based on combinations of many variables. Its aim is to construct groups in such a way that the profiles of objects in the same groups are relatively homogenous whereas the profiles of objects in different groups are relatively heterogeneous. Clustering is distinct from classification techniques, ...

  14. Reliability analysis of cluster-based ad-hoc networks

    International Nuclear Information System (INIS)

    Cook, Jason L.; Ramirez-Marquez, Jose Emmanuel

    2008-01-01

    The mobile ad-hoc wireless network (MAWN) is a new and emerging network scheme that is being employed in a variety of applications. The MAWN varies from traditional networks because it is a self-forming and dynamic network. The MAWN is free of infrastructure and, as such, only the mobile nodes comprise the network. Pairs of nodes communicate either directly or through other nodes. To do so, each node acts, in turn, as a source, destination, and relay of messages. The virtue of a MAWN is the flexibility this provides; however, the challenge for reliability analyses is also brought about by this unique feature. The variability and volatility of the MAWN configuration makes typical reliability methods (e.g. reliability block diagram) inappropriate because no single structure or configuration represents all manifestations of a MAWN. For this reason, new methods are being developed to analyze the reliability of this new networking technology. New published methods adapt to this feature by treating the configuration probabilistically or by inclusion of embedded mobility models. This paper joins both methods together and expands upon these works by modifying the problem formulation to address the reliability analysis of a cluster-based MAWN. The cluster-based MAWN is deployed in applications with constraints on networking resources such as bandwidth and energy. This paper presents the problem's formulation, a discussion of applicable reliability metrics for the MAWN, and illustration of a Monte Carlo simulation method through the analysis of several example networks

  15. Cluster: A New Application for Spatial Analysis of Pixelated Data for Epiphytotics.

    Science.gov (United States)

    Nelson, Scot C; Corcoja, Iulian; Pethybridge, Sarah J

    2017-12-01

    Spatial analysis of epiphytotics is essential to develop and test hypotheses about pathogen ecology, disease dynamics, and to optimize plant disease management strategies. Data collection for spatial analysis requires substantial investment in time to depict patterns in various frames and hierarchies. We developed a new approach for spatial analysis of pixelated data in digital imagery and incorporated the method in a stand-alone desktop application called Cluster. The user isolates target entities (clusters) by designating up to 24 pixel colors as nontargets and moves a threshold slider to visualize the targets. The app calculates the percent area occupied by targeted pixels, identifies the centroids of targeted clusters, and computes the relative compass angle of orientation for each cluster. Users can deselect anomalous clusters manually and/or automatically by specifying a size threshold value to exclude smaller targets from the analysis. Up to 1,000 stochastic simulations randomly place the centroids of each cluster in ranked order of size (largest to smallest) within each matrix while preserving their calculated angles of orientation for the long axes. A two-tailed probability t test compares the mean inter-cluster distances for the observed versus the values derived from randomly simulated maps. This is the basis for statistical testing of the null hypothesis that the clusters are randomly distributed within the frame of interest. These frames can assume any shape, from natural (e.g., leaf) to arbitrary (e.g., a rectangular or polygonal field). Cluster summarizes normalized attributes of clusters, including pixel number, axis length, axis width, compass orientation, and the length/width ratio, available to the user as a downloadable spreadsheet. Each simulated map may be saved as an image and inspected. Provided examples demonstrate the utility of Cluster to analyze patterns at various spatial scales in plant pathology and ecology and highlight the

  16. DGA Clustering and Analysis: Mastering Modern, Evolving Threats, DGALab

    Directory of Open Access Journals (Sweden)

    Alexander Chailytko

    2016-05-01

    Full Text Available Domain Generation Algorithms (DGA is a basic building block used in almost all modern malware. Malware researchers have attempted to tackle the DGA problem with various tools and techniques, with varying degrees of success. We present a complex solution to populate DGA feed using reversed DGAs, third-party feeds, and a smart DGA extraction and clustering based on emulation of a large number of samples. Smart DGA extraction requires no reverse engineering and works regardless of the DGA type or initialization vector, while enabling a cluster-based analysis. Our method also automatically allows analysis of the whole malware family, specific campaign, etc. We present our system and demonstrate its abilities on more than 20 malware families. This includes showing connections between different campaigns, as well as comparing results. Most importantly, we discuss how to utilize the outcome of the analysis to create smarter protections against similar malware.

  17. Statistical analysis of activation and reaction energies with quasi-variational coupled-cluster theory

    Science.gov (United States)

    Black, Joshua A.; Knowles, Peter J.

    2018-06-01

    The performance of quasi-variational coupled-cluster (QV) theory applied to the calculation of activation and reaction energies has been investigated. A statistical analysis of results obtained for six different sets of reactions has been carried out, and the results have been compared to those from standard single-reference methods. In general, the QV methods lead to increased activation energies and larger absolute reaction energies compared to those obtained with traditional coupled-cluster theory.

  18. Clustering analysis of water distribution systems: identifying critical components and community impacts.

    Science.gov (United States)

    Diao, K; Farmani, R; Fu, G; Astaraie-Imani, M; Ward, S; Butler, D

    2014-01-01

    Large water distribution systems (WDSs) are networks with both topological and behavioural complexity. Thereby, it is usually difficult to identify the key features of the properties of the system, and subsequently all the critical components within the system for a given purpose of design or control. One way is, however, to more explicitly visualize the network structure and interactions between components by dividing a WDS into a number of clusters (subsystems). Accordingly, this paper introduces a clustering strategy that decomposes WDSs into clusters with stronger internal connections than external connections. The detected cluster layout is very similar to the community structure of the served urban area. As WDSs may expand along with urban development in a community-by-community manner, the correspondingly formed distribution clusters may reveal some crucial configurations of WDSs. For verification, the method is applied to identify all the critical links during firefighting for the vulnerability analysis of a real-world WDS. Moreover, both the most critical pipes and clusters are addressed, given the consequences of pipe failure. Compared with the enumeration method, the method used in this study identifies the same group of the most critical components, and provides similar criticality prioritizations of them in a more computationally efficient time.

  19. An Integrated Mixed Methods Research Design: Example of the Project Foreign Language Learning Strategies and Achievement: Analysis of Strategy Clusters and Sequences

    OpenAIRE

    Vlčková Kateřina

    2014-01-01

    The presentation focused on an so called integrated mixed method research design example on a basis of a Czech Science Foundation Project Nr. GAP407/12/0432 "Foreign Language Learning Strategies and Achievement: Analysis of Strategy Clusters and Sequences". All main integrated parts of the mixed methods research design were discussed: the aim, theoretical framework, research question, methods and validity threats. Prezentace se zaměřovala na tzv. integrovaný vícemetodový výzkumný design na...

  20. [Principal component analysis and cluster analysis of inorganic elements in sea cucumber Apostichopus japonicus].

    Science.gov (United States)

    Liu, Xiao-Fang; Xue, Chang-Hu; Wang, Yu-Ming; Li, Zhao-Jie; Xue, Yong; Xu, Jie

    2011-11-01

    The present study is to investigate the feasibility of multi-elements analysis in determination of the geographical origin of sea cucumber Apostichopus japonicus, and to make choice of the effective tracers in sea cucumber Apostichopus japonicus geographical origin assessment. The content of the elements such as Al, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Mo, Cd, Hg and Pb in sea cucumber Apostichopus japonicus samples from seven places of geographical origin were determined by means of ICP-MS. The results were used for the development of elements database. Cluster analysis(CA) and principal component analysis (PCA) were applied to differentiate the sea cucumber Apostichopus japonicus geographical origin. Three principal components which accounted for over 89% of the total variance were extracted from the standardized data. The results of Q-type cluster analysis showed that the 26 samples could be clustered reasonably into five groups, the classification results were significantly associated with the marine distribution of the sea cucumber Apostichopus japonicus samples. The CA and PCA were the effective methods for elements analysis of sea cucumber Apostichopus japonicus samples. The content of the mineral elements in sea cucumber Apostichopus japonicus samples was good chemical descriptors for differentiating their geographical origins.

  1. Accommodating error analysis in comparison and clustering of molecular fingerprints.

    Science.gov (United States)

    Salamon, H; Segal, M R; Ponce de Leon, A; Small, P M

    1998-01-01

    Molecular epidemiologic studies of infectious diseases rely on pathogen genotype comparisons, which usually yield patterns comprising sets of DNA fragments (DNA fingerprints). We use a highly developed genotyping system, IS6110-based restriction fragment length polymorphism analysis of Mycobacterium tuberculosis, to develop a computational method that automates comparison of large numbers of fingerprints. Because error in fragment length measurements is proportional to fragment length and is positively correlated for fragments within a lane, an align-and-count method that compensates for relative scaling of lanes reliably counts matching fragments between lanes. Results of a two-step method we developed to cluster identical fingerprints agree closely with 5 years of computer-assisted visual matching among 1,335 M. tuberculosis fingerprints. Fully documented and validated methods of automated comparison and clustering will greatly expand the scope of molecular epidemiology.

  2. Identification and validation of asthma phenotypes in Chinese population using cluster analysis.

    Science.gov (United States)

    Wang, Lei; Liang, Rui; Zhou, Ting; Zheng, Jing; Liang, Bing Miao; Zhang, Hong Ping; Luo, Feng Ming; Gibson, Peter G; Wang, Gang

    2017-10-01

    Asthma is a heterogeneous airway disease, so it is crucial to clearly identify clinical phenotypes to achieve better asthma management. To identify and prospectively validate asthma clusters in a Chinese population. Two hundred eighty-four patients were consecutively recruited and 18 sociodemographic and clinical variables were collected. Hierarchical cluster analysis was performed by the Ward method followed by k-means cluster analysis. Then, a prospective 12-month cohort study was used to validate the identified clusters. Five clusters were successfully identified. Clusters 1 (n = 71) and 3 (n = 81) were mild asthma phenotypes with slight airway obstruction and low exacerbation risk, but with a sex differential. Cluster 2 (n = 65) described an "allergic" phenotype, cluster 4 (n = 33) featured a "fixed airflow limitation" phenotype with smoking, and cluster 5 (n = 34) was a "low socioeconomic status" phenotype. Patients in clusters 2, 4, and 5 had distinctly lower socioeconomic status and more psychological symptoms. Cluster 2 had a significantly increased risk of exacerbations (risk ratio [RR] 1.13, 95% confidence interval [CI] 1.03-1.25), unplanned visits for asthma (RR 1.98, 95% CI 1.07-3.66), and emergency visits for asthma (RR 7.17, 95% CI 1.26-40.80). Cluster 4 had an increased risk of unplanned visits (RR 2.22, 95% CI 1.02-4.81), and cluster 5 had increased emergency visits (RR 12.72, 95% CI 1.95-69.78). Kaplan-Meier analysis confirmed that cluster grouping was predictive of time to the first asthma exacerbation, unplanned visit, emergency visit, and hospital admission (P clusters as "allergic asthma," "fixed airflow limitation," and "low socioeconomic status" phenotypes that are at high risk of severe asthma exacerbations and that have management implications for clinical practice in developing countries. Copyright © 2017 American College of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  3. Parity among interpretation methods of MLEE patterns and disparity among clustering methods in epidemiological typing of Candida albicans.

    Science.gov (United States)

    Boriollo, Marcelo Fabiano Gomes; Rosa, Edvaldo Antonio Ribeiro; Gonçalves, Reginaldo Bruno; Höfling, José Francisco

    2006-03-01

    The typing of C. albicans by MLEE (multilocus enzyme electrophoresis) is dependent on the interpretation of enzyme electrophoretic patterns, and the study of the epidemiological relationships of these yeasts can be conducted by cluster analysis. Therefore, the aims of the present study were to first determine the discriminatory power of genetic interpretation (deduction of the allelic composition of diploid organisms) and numerical interpretation (mere determination of the presence and absence of bands) of MLEE patterns, and then to determine the concordance (Pearson product-moment correlation coefficient) and similarity (Jaccard similarity coefficient) of the groups of strains generated by three cluster analysis models, and the discriminatory power of such models as well [model A: genetic interpretation, genetic distance matrix of Nei (d(ij)) and UPGMA dendrogram; model B: genetic interpretation, Dice similarity matrix (S(D1)) and UPGMA dendrogram; model C: numerical interpretation, Dice similarity matrix (S(D2)) and UPGMA dendrogram]. MLEE was found to be a powerful and reliable tool for the typing of C. albicans due to its high discriminatory power (>0.9). Discriminatory power indicated that numerical interpretation is a method capable of discriminating a greater number of strains (47 versus 43 subtypes), but also pointed to model B as a method capable of providing a greater number of groups, suggesting its use for the typing of C. albicans by MLEE and cluster analysis. Very good agreement was only observed between the elements of the matrices S(D1) and S(D2), but a large majority of the groups generated in the three UPGMA dendrograms showed similarity S(J) between 4.8% and 75%, suggesting disparities in the conclusions obtained by the cluster assays.

  4. Application of clustering analysis in the prediction of photovoltaic power generation based on neural network

    Science.gov (United States)

    Cheng, K.; Guo, L. M.; Wang, Y. K.; Zafar, M. T.

    2017-11-01

    In order to select effective samples in the large number of data of PV power generation years and improve the accuracy of PV power generation forecasting model, this paper studies the application of clustering analysis in this field and establishes forecasting model based on neural network. Based on three different types of weather on sunny, cloudy and rainy days, this research screens samples of historical data by the clustering analysis method. After screening, it establishes BP neural network prediction models using screened data as training data. Then, compare the six types of photovoltaic power generation prediction models before and after the data screening. Results show that the prediction model combining with clustering analysis and BP neural networks is an effective method to improve the precision of photovoltaic power generation.

  5. Subtypes of autism by cluster analysis based on structural MRI data.

    Science.gov (United States)

    Hrdlicka, Michal; Dudova, Iva; Beranova, Irena; Lisy, Jiri; Belsan, Tomas; Neuwirth, Jiri; Komarek, Vladimir; Faladova, Ludvika; Havlovicova, Marketa; Sedlacek, Zdenek; Blatny, Marek; Urbanek, Tomas

    2005-05-01

    The aim of our study was to subcategorize Autistic Spectrum Disorders (ASD) using a multidisciplinary approach. Sixty four autistic patients (mean age 9.4+/-5.6 years) were entered into a cluster analysis. The clustering analysis was based on MRI data. The clusters obtained did not differ significantly in the overall severity of autistic symptomatology as measured by the total score on the Childhood Autism Rating Scale (CARS). The clusters could be characterized as showing significant differences: Cluster 1: showed the largest sizes of the genu and splenium of the corpus callosum (CC), the lowest pregnancy order and the lowest frequency of facial dysmorphic features. Cluster 2: showed the largest sizes of the amygdala and hippocampus (HPC), the least abnormal visual response on the CARS, the lowest frequency of epilepsy and the least frequent abnormal psychomotor development during the first year of life. Cluster 3: showed the largest sizes of the caput of the nucleus caudatus (NC), the smallest sizes of the HPC and facial dysmorphic features were always present. Cluster 4: showed the smallest sizes of the genu and splenium of the CC, as well as the amygdala, and caput of the NC, the most abnormal visual response on the CARS, the highest frequency of epilepsy, the highest pregnancy order, abnormal psychomotor development during the first year of life was always present and facial dysmorphic features were always present. This multidisciplinary approach seems to be a promising method for subtyping autism.

  6. Symptom Clusters in People Living with HIV Attending Five Palliative Care Facilities in Two Sub-Saharan African Countries: A Hierarchical Cluster Analysis.

    Science.gov (United States)

    Moens, Katrien; Siegert, Richard J; Taylor, Steve; Namisango, Eve; Harding, Richard

    2015-01-01

    Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries. Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward's method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores. Among the sample (N=217) the mean age was 36.5 (SD 9.0), 73.2% were female, and 49.1% were on antiretroviral therapy (ART). The cluster analysis produced five symptom clusters identified as: 1) dermatological; 2) generalised anxiety and elimination; 3) social and image; 4) persistently present; and 5) a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, ppeople living with HIV with longitudinally collected symptom data to test cluster stability and identify common symptom trajectories is recommended.

  7. 3D Building Models Segmentation Based on K-Means++ Cluster Analysis

    Science.gov (United States)

    Zhang, C.; Mao, B.

    2016-10-01

    3D mesh model segmentation is drawing increasing attentions from digital geometry processing field in recent years. The original 3D mesh model need to be divided into separate meaningful parts or surface patches based on certain standards to support reconstruction, compressing, texture mapping, model retrieval and etc. Therefore, segmentation is a key problem for 3D mesh model segmentation. In this paper, we propose a method to segment Collada (a type of mesh model) 3D building models into meaningful parts using cluster analysis. Common clustering methods segment 3D mesh models by K-means, whose performance heavily depends on randomized initial seed points (i.e., centroid) and different randomized centroid can get quite different results. Therefore, we improved the existing method and used K-means++ clustering algorithm to solve this problem. Our experiments show that K-means++ improves both the speed and the accuracy of K-means, and achieve good and meaningful results.

  8. 3D BUILDING MODELS SEGMENTATION BASED ON K-MEANS++ CLUSTER ANALYSIS

    Directory of Open Access Journals (Sweden)

    C. Zhang

    2016-10-01

    Full Text Available 3D mesh model segmentation is drawing increasing attentions from digital geometry processing field in recent years. The original 3D mesh model need to be divided into separate meaningful parts or surface patches based on certain standards to support reconstruction, compressing, texture mapping, model retrieval and etc. Therefore, segmentation is a key problem for 3D mesh model segmentation. In this paper, we propose a method to segment Collada (a type of mesh model 3D building models into meaningful parts using cluster analysis. Common clustering methods segment 3D mesh models by K-means, whose performance heavily depends on randomized initial seed points (i.e., centroid and different randomized centroid can get quite different results. Therefore, we improved the existing method and used K-means++ clustering algorithm to solve this problem. Our experiments show that K-means++ improves both the speed and the accuracy of K-means, and achieve good and meaningful results.

  9. Analysis of plasmaspheric plumes: CLUSTER and IMAGE observations

    Directory of Open Access Journals (Sweden)

    F. Darrouzet

    2006-07-01

    Full Text Available Plasmaspheric plumes have been routinely observed by CLUSTER and IMAGE. The CLUSTER mission provides high time resolution four-point measurements of the plasmasphere near perigee. Total electron density profiles have been derived from the electron plasma frequency identified by the WHISPER sounder supplemented, in-between soundings, by relative variations of the spacecraft potential measured by the electric field instrument EFW; ion velocity is also measured onboard these satellites. The EUV imager onboard the IMAGE spacecraft provides global images of the plasmasphere with a spatial resolution of 0.1 RE every 10 min; such images acquired near apogee from high above the pole show the geometry of plasmaspheric plumes, their evolution and motion. We present coordinated observations of three plume events and compare CLUSTER in-situ data with global images of the plasmasphere obtained by IMAGE. In particular, we study the geometry and the orientation of plasmaspheric plumes by using four-point analysis methods. We compare several aspects of plume motion as determined by different methods: (i inner and outer plume boundary velocity calculated from time delays of this boundary as observed by the wave experiment WHISPER on the four spacecraft, (ii drift velocity measured by the electron drift instrument EDI onboard CLUSTER and (iii global velocity determined from successive EUV images. These different techniques consistently indicate that plasmaspheric plumes rotate around the Earth, with their foot fully co-rotating, but with their tip rotating slower and moving farther out.

  10. Cluster analysis in severe emphysema subjects using phenotype and genotype data: an exploratory investigation

    Directory of Open Access Journals (Sweden)

    Martinez Fernando J

    2010-03-01

    Full Text Available Abstract Background Numerous studies have demonstrated associations between genetic markers and COPD, but results have been inconsistent. One reason may be heterogeneity in disease definition. Unsupervised learning approaches may assist in understanding disease heterogeneity. Methods We selected 31 phenotypic variables and 12 SNPs from five candidate genes in 308 subjects in the National Emphysema Treatment Trial (NETT Genetics Ancillary Study cohort. We used factor analysis to select a subset of phenotypic variables, and then used cluster analysis to identify subtypes of severe emphysema. We examined the phenotypic and genotypic characteristics of each cluster. Results We identified six factors accounting for 75% of the shared variability among our initial phenotypic variables. We selected four phenotypic variables from these factors for cluster analysis: 1 post-bronchodilator FEV1 percent predicted, 2 percent bronchodilator responsiveness, and quantitative CT measurements of 3 apical emphysema and 4 airway wall thickness. K-means cluster analysis revealed four clusters, though separation between clusters was modest: 1 emphysema predominant, 2 bronchodilator responsive, with higher FEV1; 3 discordant, with a lower FEV1 despite less severe emphysema and lower airway wall thickness, and 4 airway predominant. Of the genotypes examined, membership in cluster 1 (emphysema-predominant was associated with TGFB1 SNP rs1800470. Conclusions Cluster analysis may identify meaningful disease subtypes and/or groups of related phenotypic variables even in a highly selected group of severe emphysema subjects, and may be useful for genetic association studies.

  11. Fast optimization of binary clusters using a novel dynamic lattice searching method

    International Nuclear Information System (INIS)

    Wu, Xia; Cheng, Wen

    2014-01-01

    Global optimization of binary clusters has been a difficult task despite of much effort and many efficient methods. Directing toward two types of elements (i.e., homotop problem) in binary clusters, two classes of virtual dynamic lattices are constructed and a modified dynamic lattice searching (DLS) method, i.e., binary DLS (BDLS) method, is developed. However, it was found that the BDLS can only be utilized for the optimization of binary clusters with small sizes because homotop problem is hard to be solved without atomic exchange operation. Therefore, the iterated local search (ILS) method is adopted to solve homotop problem and an efficient method based on the BDLS method and ILS, named as BDLS-ILS, is presented for global optimization of binary clusters. In order to assess the efficiency of the proposed method, binary Lennard-Jones clusters with up to 100 atoms are investigated. Results show that the method is proved to be efficient. Furthermore, the BDLS-ILS method is also adopted to study the geometrical structures of (AuPd) 79 clusters with DFT-fit parameters of Gupta potential

  12. The use of different clustering methods in the evaluation of genetic diversity in upland cotton

    Directory of Open Access Journals (Sweden)

    Laíse Ferreira de Araújo

    Full Text Available The continuous development and evaluation of new genotypes through crop breeding is essential in order to obtain new cultivars. The objective of this work was to evaluate the genetic divergences between cultivars of upland cotton (Gossypium hirsutum L. using the agronomic and technological characteristics of the fibre, in order to select superior parent plants. The experiment was set up during 2010 at the Federal University of Ceará in Fortaleza, Ceará, Brazil. Eleven cultivars of upland cotton were used in an experimental design of randomised blocks with three replications. In order to evaluate the genetic diversity among cultivars, the generalised Mahalanobis distance matrix was calculated, with cluster analysis then being applied, employing various methods: single linkage, Ward, complete linkage, median, average linkage within a cluster and average linkage between clusters. Genetic variability exists among the evaluated genotypes. The most consistant clustering method was that employing average linkage between clusters. Among the characteristics assessed, mean boll weight presented the highest contribution to genetic diversity, followed by elongation at rupture. Employing the method of mean linkage between clusters, the cultivars with greater genetic divergence were BRS Acacia and LD Frego; those of greater similarity were BRS Itaúba and BRS Araripe.

  13. Semiparametric Bayesian analysis of accelerated failure time models with cluster structures.

    Science.gov (United States)

    Li, Zhaonan; Xu, Xinyi; Shen, Junshan

    2017-11-10

    In this paper, we develop a Bayesian semiparametric accelerated failure time model for survival data with cluster structures. Our model allows distributional heterogeneity across clusters and accommodates their relationships through a density ratio approach. Moreover, a nonparametric mixture of Dirichlet processes prior is placed on the baseline distribution to yield full distributional flexibility. We illustrate through simulations that our model can greatly improve estimation accuracy by effectively pooling information from multiple clusters, while taking into account the heterogeneity in their random error distributions. We also demonstrate the implementation of our method using analysis of Mayo Clinic Trial in Primary Biliary Cirrhosis. Copyright © 2017 John Wiley & Sons, Ltd.

  14. MMPI-2: Cluster Analysis of Personality Profiles in Perinatal Depression—Preliminary Evidence

    Directory of Open Access Journals (Sweden)

    Valentina Meuti

    2014-01-01

    Full Text Available Background. To assess personality characteristics of women who develop perinatal depression. Methods. The study started with a screening of a sample of 453 women in their third trimester of pregnancy, to which was administered a survey data form, the Edinburgh Postnatal Depression Scale (EPDS and the Minnesota Multiphasic Personality Inventory 2 (MMPI-2. A clinical group of subjects with perinatal depression (PND, 55 subjects was selected; clinical and validity scales of MMPI-2 were used as predictors in hierarchical cluster analysis carried out. Results. The analysis identified three clusters of personality profile: two “clinical” clusters (1 and 3 and an “apparently common” one (cluster 2. The first cluster (39.5% collects structures of personality with prevalent obsessive or dependent functioning tending to develop a “psychasthenic” depression; the third cluster (13.95% includes women with prevalent borderline functioning tending to develop “dysphoric” depression; the second cluster (46.5% shows a normal profile with a “defensive” attitude, probably due to the presence of defense mechanisms or to the fear of stigma. Conclusion. Characteristics of personality have a key role in clinical manifestations of perinatal depression; it is important to detect them to identify mothers at risk and to plan targeted therapeutic interventions.

  15. MMPI-2: Cluster Analysis of Personality Profiles in Perinatal Depression—Preliminary Evidence

    Science.gov (United States)

    Grillo, Alessandra; Lauriola, Marco; Giacchetti, Nicoletta

    2014-01-01

    Background. To assess personality characteristics of women who develop perinatal depression. Methods. The study started with a screening of a sample of 453 women in their third trimester of pregnancy, to which was administered a survey data form, the Edinburgh Postnatal Depression Scale (EPDS) and the Minnesota Multiphasic Personality Inventory 2 (MMPI-2). A clinical group of subjects with perinatal depression (PND, 55 subjects) was selected; clinical and validity scales of MMPI-2 were used as predictors in hierarchical cluster analysis carried out. Results. The analysis identified three clusters of personality profile: two “clinical” clusters (1 and 3) and an “apparently common” one (cluster 2). The first cluster (39.5%) collects structures of personality with prevalent obsessive or dependent functioning tending to develop a “psychasthenic” depression; the third cluster (13.95%) includes women with prevalent borderline functioning tending to develop “dysphoric” depression; the second cluster (46.5%) shows a normal profile with a “defensive” attitude, probably due to the presence of defense mechanisms or to the fear of stigma. Conclusion. Characteristics of personality have a key role in clinical manifestations of perinatal depression; it is important to detect them to identify mothers at risk and to plan targeted therapeutic interventions. PMID:25574499

  16. Onto-clust--a methodology for combining clustering analysis and ontological methods for identifying groups of comorbidities for developmental disorders.

    Science.gov (United States)

    Peleg, Mor; Asbeh, Nuaman; Kuflik, Tsvi; Schertz, Mitchell

    2009-02-01

    Children with developmental disorders usually exhibit multiple developmental problems (comorbidities). Hence, such diagnosis needs to revolve on developmental disorder groups. Our objective is to systematically identify developmental disorder groups and represent them in an ontology. We developed a methodology that combines two methods (1) a literature-based ontology that we created, which represents developmental disorders and potential developmental disorder groups, and (2) clustering for detecting comorbid developmental disorders in patient data. The ontology is used to interpret and improve clustering results and the clustering results are used to validate the ontology and suggest directions for its development. We evaluated our methodology by applying it to data of 1175 patients from a child development clinic. We demonstrated that the ontology improves clustering results, bringing them closer to an expert generated gold-standard. We have shown that our methodology successfully combines an ontology with a clustering method to support systematic identification and representation of developmental disorder groups.

  17. Evaluation of Portland cement from X-ray diffraction associated with cluster analysis

    International Nuclear Information System (INIS)

    Gobbo, Luciano de Andrade; Montanheiro, Tarcisio Jose; Montanheiro, Filipe; Sant'Agostino, Lilia Mascarenhas

    2013-01-01

    The Brazilian cement industry produced 64 million tons of cement in 2012, with noteworthy contribution of CP-II (slag), CP-III (blast furnace) and CP-IV (pozzolanic) cements. The industrial pole comprises about 80 factories that utilize raw materials of different origins and chemical compositions that require enhanced analytical technologies to optimize production in order to gain space in the growing consumer market in Brazil. This paper assesses the sensitivity of mineralogical analysis by X-ray diffraction associated with cluster analysis to distinguish different kinds of cements with different additions. This technique can be applied, for example, in the prospection of different types of limestone (calcitic, dolomitic and siliceous) as well as in the qualification of different clinkers. The cluster analysis does not require any specific knowledge of the mineralogical composition of the diffractograms to be clustered; rather, it is based on their similarity. The materials tested for addition have different origins: fly ashes from different power stations from South Brazil and slag from different steel plants in the Southeast. Cement with different additions of limestone and white Portland cement were also used. The Rietveld method of qualitative and quantitative analysis was used for measuring the results generated by the cluster analysis technique. (author)

  18. Using the Cluster Analysis and the Principal Component Analysis in Evaluating the Quality of a Destination

    Directory of Open Access Journals (Sweden)

    Ida Vajčnerová

    2016-01-01

    Full Text Available The objective of the paper is to explore possibilities of evaluating the quality of a tourist destination by means of the principal components analysis (PCA and the cluster analysis. In the paper both types of analysis are compared on the basis of the results they provide. The aim is to identify advantage and limits of both methods and provide methodological suggestion for their further use in the tourism research. The analyses is based on the primary data from the customers’ satisfaction survey with the key quality factors of a destination. As output of the two statistical methods is creation of groups or cluster of quality factors that are similar in terms of respondents’ evaluations, in order to facilitate the evaluation of the quality of tourist destinations. Results shows the possibility to use both tested methods. The paper is elaborated in the frame of wider research project aimed to develop a methodology for the quality evaluation of tourist destinations, especially in the context of customer satisfaction and loyalty.

  19. Interactive visual exploration and refinement of cluster assignments.

    Science.gov (United States)

    Kern, Michael; Lex, Alexander; Gehlenborg, Nils; Johnson, Chris R

    2017-09-12

    With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don't properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data. In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes. Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes.

  20. Clustering Trajectories by Relevant Parts for Air Traffic Analysis.

    Science.gov (United States)

    Andrienko, Gennady; Andrienko, Natalia; Fuchs, Georg; Garcia, Jose Manuel Cordero

    2018-01-01

    Clustering of trajectories of moving objects by similarity is an important technique in movement analysis. Existing distance functions assess the similarity between trajectories based on properties of the trajectory points or segments. The properties may include the spatial positions, times, and thematic attributes. There may be a need to focus the analysis on certain parts of trajectories, i.e., points and segments that have particular properties. According to the analysis focus, the analyst may need to cluster trajectories by similarity of their relevant parts only. Throughout the analysis process, the focus may change, and different parts of trajectories may become relevant. We propose an analytical workflow in which interactive filtering tools are used to attach relevance flags to elements of trajectories, clustering is done using a distance function that ignores irrelevant elements, and the resulting clusters are summarized for further analysis. We demonstrate how this workflow can be useful for different analysis tasks in three case studies with real data from the domain of air traffic. We propose a suite of generic techniques and visualization guidelines to support movement data analysis by means of relevance-aware trajectory clustering.

  1. Cluster-based analysis of multi-model climate ensembles

    Science.gov (United States)

    Hyde, Richard; Hossaini, Ryan; Leeson, Amber A.

    2018-06-01

    Clustering - the automated grouping of similar data - can provide powerful and unique insight into large and complex data sets, in a fast and computationally efficient manner. While clustering has been used in a variety of fields (from medical image processing to economics), its application within atmospheric science has been fairly limited to date, and the potential benefits of the application of advanced clustering techniques to climate data (both model output and observations) has yet to be fully realised. In this paper, we explore the specific application of clustering to a multi-model climate ensemble. We hypothesise that clustering techniques can provide (a) a flexible, data-driven method of testing model-observation agreement and (b) a mechanism with which to identify model development priorities. We focus our analysis on chemistry-climate model (CCM) output of tropospheric ozone - an important greenhouse gas - from the recent Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP). Tropospheric column ozone from the ACCMIP ensemble was clustered using the Data Density based Clustering (DDC) algorithm. We find that a multi-model mean (MMM) calculated using members of the most-populous cluster identified at each location offers a reduction of up to ˜ 20 % in the global absolute mean bias between the MMM and an observed satellite-based tropospheric ozone climatology, with respect to a simple, all-model MMM. On a spatial basis, the bias is reduced at ˜ 62 % of all locations, with the largest bias reductions occurring in the Northern Hemisphere - where ozone concentrations are relatively large. However, the bias is unchanged at 9 % of all locations and increases at 29 %, particularly in the Southern Hemisphere. The latter demonstrates that although cluster-based subsampling acts to remove outlier model data, such data may in fact be closer to observed values in some locations. We further demonstrate that clustering can provide a viable and

  2. HICOSMO - X-ray analysis of a complete sample of galaxy clusters

    Science.gov (United States)

    Schellenberger, G.; Reiprich, T.

    2017-10-01

    Galaxy clusters are known to be the largest virialized objects in the Universe. Based on the theory of structure formation one can use them as cosmological probes, since they originate from collapsed overdensities in the early Universe and witness its history. The X-ray regime provides the unique possibility to measure in detail the most massive visible component, the intra cluster medium. Using Chandra observations of a local sample of 64 bright clusters (HIFLUGCS) we provide total (hydrostatic) and gas mass estimates of each cluster individually. Making use of the completeness of the sample we quantify two interesting cosmological parameters by a Bayesian cosmological likelihood analysis. We find Ω_{M}=0.3±0.01 and σ_{8}=0.79±0.03 (statistical uncertainties) using our default analysis strategy combining both, a mass function analysis and the gas mass fraction results. The main sources of biases that we discuss and correct here are (1) the influence of galaxy groups (higher incompleteness in parent samples and a differing behavior of the L_{x} - M relation), (2) the hydrostatic mass bias (as determined by recent hydrodynamical simulations), (3) the extrapolation of the total mass (comparing various methods), (4) the theoretical halo mass function and (5) other cosmological (non-negligible neutrino mass), and instrumental (calibration) effects.

  3. A study of several CAD methods for classification of clustered microcalcifications

    Science.gov (United States)

    Wei, Liyang; Yang, Yongyi; Nishikawa, Robert M.; Jiang, Yulei

    2005-04-01

    In this paper we investigate several state-of-the-art machine-learning methods for automated classification of clustered microcalcifications (MCs), aimed to assisting radiologists for more accurate diagnosis of breast cancer in a computer-aided diagnosis (CADx) scheme. The methods we consider include: support vector machine (SVM), kernel Fisher discriminant (KFD), and committee machines (ensemble averaging and AdaBoost), most of which have been developed recently in statistical learning theory. We formulate differentiation of malignant from benign MCs as a supervised learning problem, and apply these learning methods to develop the classification algorithms. As input, these methods use image features automatically extracted from clustered MCs. We test these methods using a database of 697 clinical mammograms from 386 cases, which include a wide spectrum of difficult-to-classify cases. We use receiver operating characteristic (ROC) analysis to evaluate and compare the classification performance by the different methods. In addition, we also investigate how to combine information from multiple-view mammograms of the same case so that the best decision can be made by a classifier. In our experiments, the kernel-based methods (i.e., SVM, KFD) yield the best performance, significantly outperforming a well-established CADx approach based on neural network learning.

  4. Statistical Analysis of the labor Market in Ukraine Using Multidimensional Classification Methods: the Regional Aspect

    Directory of Open Access Journals (Sweden)

    Korepanov Oleksiy S.

    2017-12-01

    Full Text Available The aim of the article is to study the labor market in Ukraine in the regional context using cluster analysis methods. The current state of the labor market in regions of Ukraine is analyzed, and a system of statistical indicators that influence the state and development of this market is formed. The expediency of using cluster analysis for grouping regions according to the level of development of the labor market is substantiated. The essence of cluster analysis is revealed, its main goal, key tasks, which can be solved by means of such analysis, are presented, basic stages of the analysis are considered. The main methods of clustering are described and, based on the results of the simulation, the advantages and disadvantages of each method are justified. In the work the clustering of regions of Ukraine by the level of labor market development using different methods of cluster analysis is carried out, conclusions on the results of the calculations performed are presented, and the main directions for further research are outlined.

  5. Data Clustering

    Science.gov (United States)

    Wagstaff, Kiri L.

    2012-03-01

    On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained

  6. Advanced cluster methods for correlated-electron systems

    Energy Technology Data Exchange (ETDEWEB)

    Fischer, Andre

    2015-04-27

    In this thesis, quantum cluster methods are used to calculate electronic properties of correlated-electron systems. A special focus lies in the determination of the ground state properties of a 3/4 filled triangular lattice within the one-band Hubbard model. At this filling, the electronic density of states exhibits a so-called van Hove singularity and the Fermi surface becomes perfectly nested, causing an instability towards a variety of spin-density-wave (SDW) and superconducting states. While chiral d+id-wave superconductivity has been proposed as the ground state in the weak coupling limit, the situation towards strong interactions is unclear. Additionally, quantum cluster methods are used here to investigate the interplay of Coulomb interactions and symmetry-breaking mechanisms within the nematic phase of iron-pnictide superconductors. The transition from a tetragonal to an orthorhombic phase is accompanied by a significant change in electronic properties, while long-range magnetic order is not established yet. The driving force of this transition may not only be phonons but also magnetic or orbital fluctuations. The signatures of these scenarios are studied with quantum cluster methods to identify the most important effects. Here, cluster perturbation theory (CPT) and its variational extention, the variational cluster approach (VCA) are used to treat the respective systems on a level beyond mean-field theory. Short-range correlations are incorporated numerically exactly by exact diagonalization (ED). In the VCA, long-range interactions are included by variational optimization of a fictitious symmetry-breaking field based on a self-energy functional approach. Due to limitations of ED, cluster sizes are limited to a small number of degrees of freedom. For the 3/4 filled triangular lattice, the VCA is performed for different cluster symmetries. A strong symmetry dependence and finite-size effects make a comparison of the results from different clusters difficult

  7. Applying clustering to statistical analysis of student reasoning about two-dimensional kinematics

    Directory of Open Access Journals (Sweden)

    R. Padraic Springuel

    2007-12-01

    Full Text Available We use clustering, an analysis method not presently common to the physics education research community, to group and characterize student responses to written questions about two-dimensional kinematics. Previously, clustering has been used to analyze multiple-choice data; we analyze free-response data that includes both sketches of vectors and written elements. The primary goal of this paper is to describe the methodology itself; we include a brief overview of relevant results.

  8. Using Cluster Analysis to Group Countries for Cost-effectiveness Analysis: An Application to Sub-Saharan Africa.

    Science.gov (United States)

    Russell, Louise B; Bhanot, Gyan; Kim, Sun-Young; Sinha, Anushua

    2018-02-01

    To explore the use of cluster analysis to define groups of similar countries for the purpose of evaluating the cost-effectiveness of a public health intervention-maternal immunization-within the constraints of a project budget originally meant for an overall regional analysis. We used the most common cluster analysis algorithm, K-means, and the most common measure of distance, Euclidean distance, to group 37 low-income, sub-Saharan African countries on the basis of 24 measures of economic development, general health resources, and past success in public health programs. The groups were tested for robustness and reviewed by regional disease experts. We explored 2-, 3- and 4-group clustering. Public health performance was consistently important in determining the groups. For the 2-group clustering, for example, infant mortality in Group 1 was 81 per 1,000 live births compared with 51 per 1,000 in Group 2, and 67% of children in Group 1 received DPT immunization compared with 87% in Group 2. The experts preferred four groups to fewer, on the ground that national decision makers would more readily recognize their country among four groups. Clusters defined by K-means clustering made sense to subject experts and allowed a more detailed evaluation of the cost-effectiveness of maternal immunization within the constraint of the project budget. The method may be useful for other evaluations that, without having the resources to conduct separate analyses for each unit, seek to inform decision makers in numerous countries or subdivisions within countries, such as states or counties.

  9. Defining functioning levels in patients with schizophrenia: A combination of a novel clustering method and brain SPECT analysis.

    Science.gov (United States)

    Catherine, Faget-Agius; Aurélie, Vincenti; Eric, Guedj; Pierre, Michel; Raphaëlle, Richieri; Marine, Alessandrini; Pascal, Auquier; Christophe, Lançon; Laurent, Boyer

    2017-12-30

    This study aims to define functioning levels of patients with schizophrenia by using a method of interpretable clustering based on a specific functioning scale, the Functional Remission Of General Schizophrenia (FROGS) scale, and to test their validity regarding clinical and neuroimaging characterization. In this observational study, patients with schizophrenia have been classified using a hierarchical top-down method called clustering using unsupervised binary trees (CUBT). Socio-demographic, clinical, and neuroimaging SPECT perfusion data were compared between the different clusters to ensure their clinical relevance. A total of 242 patients were analyzed. A four-group functioning level structure has been identified: 54 are classified as "minimal", 81 as "low", 64 as "moderate", and 43 as "high". The clustering shows satisfactory statistical properties, including reproducibility and discriminancy. The 4 clusters consistently differentiate patients. "High" functioning level patients reported significantly the lowest scores on the PANSS and the CDSS, and the highest scores on the GAF, the MARS and S-QoL 18. Functioning levels were significantly associated with cerebral perfusion of two relevant areas: the left inferior parietal cortex and the anterior cingulate. Our study provides relevant functioning levels in schizophrenia, and may enhance the use of functioning scale. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. A robust automatic leukocyte recognition method based on island-clustering texture

    Directory of Open Access Journals (Sweden)

    Xiaoshun Li

    2016-01-01

    Full Text Available A leukocyte recognition method for human peripheral blood smear based on island-clustering texture (ICT is proposed. By analyzing the features of the five typical classes of leukocyte images, a new ICT model is established. Firstly, some feature points are extracted in a gray leukocyte image by mean-shift clustering to be the centers of islands. Secondly, the growing region is employed to create regions of the islands in which the seeds are just these feature points. These islands distribution can describe a new texture. Finally, a distinguished parameter vector of these islands is created as the ICT features by combining the ICT features with the geometric features of the leukocyte. Then the five typical classes of leukocytes can be recognized successfully at the correct recognition rate of more than 92.3% with a total sample of 1310 leukocytes. Experimental results show the feasibility of the proposed method. Further analysis reveals that the method is robust and results can provide important information for disease diagnosis.

  11. Self-consistent clustering analysis: an efficient multiscale scheme for inelastic heterogeneous materials

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Z.; Bessa, M. A.; Liu, W.K.

    2017-10-25

    A predictive computational theory is shown for modeling complex, hierarchical materials ranging from metal alloys to polymer nanocomposites. The theory can capture complex mechanisms such as plasticity and failure that span across multiple length scales. This general multiscale material modeling theory relies on sound principles of mathematics and mechanics, and a cutting-edge reduced order modeling method named self-consistent clustering analysis (SCA) [Zeliang Liu, M.A. Bessa, Wing Kam Liu, “Self-consistent clustering analysis: An efficient multi-scale scheme for inelastic heterogeneous materials,” Comput. Methods Appl. Mech. Engrg. 306 (2016) 319–341]. SCA reduces by several orders of magnitude the computational cost of micromechanical and concurrent multiscale simulations, while retaining the microstructure information. This remarkable increase in efficiency is achieved with a data-driven clustering method. Computationally expensive operations are performed in the so-called offline stage, where degrees of freedom (DOFs) are agglomerated into clusters. The interaction tensor of these clusters is computed. In the online or predictive stage, the Lippmann-Schwinger integral equation is solved cluster-wise using a self-consistent scheme to ensure solution accuracy and avoid path dependence. To construct a concurrent multiscale model, this scheme is applied at each material point in a macroscale structure, replacing a conventional constitutive model with the average response computed from the microscale model using just the SCA online stage. A regularized damage theory is incorporated in the microscale that avoids the mesh and RVE size dependence that commonly plagues microscale damage calculations. The SCA method is illustrated with two cases: a carbon fiber reinforced polymer (CFRP) structure with the concurrent multiscale model and an application to fatigue prediction for additively manufactured metals. For the CFRP problem, a speed up estimated to be about

  12. Improving estimation of kinetic parameters in dynamic force spectroscopy using cluster analysis

    Science.gov (United States)

    Yen, Chi-Fu; Sivasankar, Sanjeevi

    2018-03-01

    Dynamic Force Spectroscopy (DFS) is a widely used technique to characterize the dissociation kinetics and interaction energy landscape of receptor-ligand complexes with single-molecule resolution. In an Atomic Force Microscope (AFM)-based DFS experiment, receptor-ligand complexes, sandwiched between an AFM tip and substrate, are ruptured at different stress rates by varying the speed at which the AFM-tip and substrate are pulled away from each other. The rupture events are grouped according to their pulling speeds, and the mean force and loading rate of each group are calculated. These data are subsequently fit to established models, and energy landscape parameters such as the intrinsic off-rate (koff) and the width of the potential energy barrier (xβ) are extracted. However, due to large uncertainties in determining mean forces and loading rates of the groups, errors in the estimated koff and xβ can be substantial. Here, we demonstrate that the accuracy of fitted parameters in a DFS experiment can be dramatically improved by sorting rupture events into groups using cluster analysis instead of sorting them according to their pulling speeds. We test different clustering algorithms including Gaussian mixture, logistic regression, and K-means clustering, under conditions that closely mimic DFS experiments. Using Monte Carlo simulations, we benchmark the performance of these clustering algorithms over a wide range of koff and xβ, under different levels of thermal noise, and as a function of both the number of unbinding events and the number of pulling speeds. Our results demonstrate that cluster analysis, particularly K-means clustering, is very effective in improving the accuracy of parameter estimation, particularly when the number of unbinding events are limited and not well separated into distinct groups. Cluster analysis is easy to implement, and our performance benchmarks serve as a guide in choosing an appropriate method for DFS data analysis.

  13. Allergen Sensitization Pattern by Sex: A Cluster Analysis in Korea.

    Science.gov (United States)

    Ohn, Jungyoon; Paik, Seung Hwan; Doh, Eun Jin; Park, Hyun-Sun; Yoon, Hyun-Sun; Cho, Soyun

    2017-12-01

    Allergens tend to sensitize simultaneously. Etiology of this phenomenon has been suggested to be allergen cross-reactivity or concurrent exposure. However, little is known about specific allergen sensitization patterns. To investigate the allergen sensitization characteristics according to gender. Multiple allergen simultaneous test (MAST) is widely used as a screening tool for detecting allergen sensitization in dermatologic clinics. We retrospectively reviewed the medical records of patients with MAST results between 2008 and 2014 in our Department of Dermatology. A cluster analysis was performed to elucidate the allergen-specific immunoglobulin (Ig)E cluster pattern. The results of MAST (39 allergen-specific IgEs) from 4,360 cases were analyzed. By cluster analysis, 39items were grouped into 8 clusters. Each cluster had characteristic features. When compared with female, the male group tended to be sensitized more frequently to all tested allergens, except for fungus allergens cluster. The cluster and comparative analysis results demonstrate that the allergen sensitization is clustered, manifesting allergen similarity or co-exposure. Only the fungus cluster allergens tend to sensitize female group more frequently than male group.

  14. A Cluster Analysis of Personality Style in Adults with ADHD

    Science.gov (United States)

    Robin, Arthur L.; Tzelepis, Angela; Bedway, Marquita

    2008-01-01

    Objective: The purpose of this study was to use hierarchical linear cluster analysis to examine the normative personality styles of adults with ADHD. Method: A total of 311 adults with ADHD completed the Millon Index of Personality Styles, which consists of 24 scales assessing motivating aims, cognitive modes, and interpersonal behaviors. Results:…

  15. Cluster Analysis of the International Stellarator Confinement Database

    International Nuclear Information System (INIS)

    Kus, A.; Dinklage, A.; Preuss, R.; Ascasibar, E.; Harris, J. H.; Okamura, S.; Yamada, H.; Sano, F.; Stroth, U.; Talmadge, J.

    2008-01-01

    Heterogeneous structure of collected data is one of the problems that occur during derivation of scalings for energy confinement time, and whose analysis tourns out to be wide and complicated matter. The International Stellarator Confinement Database [1], shortly ISCDB, comprises in its latest version 21 a total of 3647 observations from 8 experimental devices, 2067 therefrom beeing so far completed for upcoming analyses. For confinement scaling studies 1933 observation were chosen as the standard dataset. Here we describe a statistical method of cluster analysis for identification of possible cohesive substructures in ISDCB and present some preliminary results

  16. Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

    Science.gov (United States)

    Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

    2013-03-01

    Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.

  17. Profiling physical activity motivation based on self-determination theory: a cluster analysis approach.

    Science.gov (United States)

    Friederichs, Stijn Ah; Bolman, Catherine; Oenema, Anke; Lechner, Lilian

    2015-01-01

    In order to promote physical activity uptake and maintenance in individuals who do not comply with physical activity guidelines, it is important to increase our understanding of physical activity motivation among this group. The present study aimed to examine motivational profiles in a large sample of adults who do not comply with physical activity guidelines. The sample for this study consisted of 2473 individuals (31.4% male; age 44.6 ± 12.9). In order to generate motivational profiles based on motivational regulation, a cluster analysis was conducted. One-way analyses of variance were then used to compare the clusters in terms of demographics, physical activity level, motivation to be active and subjective experience while being active. Three motivational clusters were derived based on motivational regulation scores: a low motivation cluster, a controlled motivation cluster and an autonomous motivation cluster. These clusters differed significantly from each other with respect to physical activity behavior, motivation to be active and subjective experience while being active. Overall, the autonomous motivation cluster displayed more favorable characteristics compared to the other two clusters. The results of this study provide additional support for the importance of autonomous motivation in the context of physical activity behavior. The three derived clusters may be relevant in the context of physical activity interventions as individuals within the different clusters might benefit most from different intervention approaches. In addition, this study shows that cluster analysis is a useful method for differentiating between motivational profiles in large groups of individuals who do not comply with physical activity guidelines.

  18. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.

    Science.gov (United States)

    Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy

    2016-01-01

    Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.

  19. On the blind use of statistical tools in the analysis of globular cluster stars

    Science.gov (United States)

    D'Antona, Francesca; Caloi, Vittoria; Tailo, Marco

    2018-04-01

    As with most data analysis methods, the Bayesian method must be handled with care. We show that its application to determine stellar evolution parameters within globular clusters can lead to paradoxical results if used without the necessary precautions. This is a cautionary tale on the use of statistical tools for big data analysis.

  20. Mathematical classification and clustering

    CERN Document Server

    Mirkin, Boris

    1996-01-01

    I am very happy to have this opportunity to present the work of Boris Mirkin, a distinguished Russian scholar in the areas of data analysis and decision making methodologies. The monograph is devoted entirely to clustering, a discipline dispersed through many theoretical and application areas, from mathematical statistics and combina­ torial optimization to biology, sociology and organizational structures. It compiles an immense amount of research done to date, including many original Russian de­ velopments never presented to the international community before (for instance, cluster-by-cluster versions of the K-Means method in Chapter 4 or uniform par­ titioning in Chapter 5). The author's approach, approximation clustering, allows him both to systematize a great part of the discipline and to develop many in­ novative methods in the framework of optimization problems. The optimization methods considered are proved to be meaningful in the contexts of data analysis and clustering. The material presented in ...

  1. Phenotype Clustering of Breast Epithelial Cells in Confocal Imagesbased on Nuclear Protein Distribution Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Long, Fuhui; Peng, Hanchuan; Sudar, Damir; Levievre, Sophie A.; Knowles, David W.

    2006-09-05

    Background: The distribution of the chromatin-associatedproteins plays a key role in directing nuclear function. Previously, wedeveloped an image-based method to quantify the nuclear distributions ofproteins and showed that these distributions depended on the phenotype ofhuman mammary epithelial cells. Here we describe a method that creates ahierarchical tree of the given cell phenotypes and calculates thestatistical significance between them, based on the clustering analysisof nuclear protein distributions. Results: Nuclear distributions ofnuclear mitotic apparatus protein were previously obtained fornon-neoplastic S1 and malignant T4-2 human mammary epithelial cellscultured for up to 12 days. Cell phenotype was defined as S1 or T4-2 andthe number of days in cultured. A probabilistic ensemble approach wasused to define a set of consensus clusters from the results of multipletraditional cluster analysis techniques applied to the nucleardistribution data. Cluster histograms were constructed to show how cellsin any one phenotype were distributed across the consensus clusters.Grouping various phenotypes allowed us to build phenotype trees andcalculate the statistical difference between each group. The resultsshowed that non-neoplastic S1 cells could be distinguished from malignantT4-2 cells with 94.19 percent accuracy; that proliferating S1 cells couldbe distinguished from differentiated S1 cells with 92.86 percentaccuracy; and showed no significant difference between the variousphenotypes of T4-2 cells corresponding to increasing tumor sizes.Conclusion: This work presents a cluster analysis method that canidentify significant cell phenotypes, based on the nuclear distributionof specific proteins, with high accuracy.

  2. Clustering and training set selection methods for improving the accuracy of quantitative laser induced breakdown spectroscopy

    International Nuclear Information System (INIS)

    Anderson, Ryan B.; Bell, James F.; Wiens, Roger C.; Morris, Richard V.; Clegg, Samuel M.

    2012-01-01

    We investigated five clustering and training set selection methods to improve the accuracy of quantitative chemical analysis of geologic samples by laser induced breakdown spectroscopy (LIBS) using partial least squares (PLS) regression. The LIBS spectra were previously acquired for 195 rock slabs and 31 pressed powder geostandards under 7 Torr CO 2 at a stand-off distance of 7 m at 17 mJ per pulse to simulate the operational conditions of the ChemCam LIBS instrument on the Mars Science Laboratory Curiosity rover. The clustering and training set selection methods, which do not require prior knowledge of the chemical composition of the test-set samples, are based on grouping similar spectra and selecting appropriate training spectra for the partial least squares (PLS2) model. These methods were: (1) hierarchical clustering of the full set of training spectra and selection of a subset for use in training; (2) k-means clustering of all spectra and generation of PLS2 models based on the training samples within each cluster; (3) iterative use of PLS2 to predict sample composition and k-means clustering of the predicted compositions to subdivide the groups of spectra; (4) soft independent modeling of class analogy (SIMCA) classification of spectra, and generation of PLS2 models based on the training samples within each class; (5) use of Bayesian information criteria (BIC) to determine an optimal number of clusters and generation of PLS2 models based on the training samples within each cluster. The iterative method and the k-means method using 5 clusters showed the best performance, improving the absolute quadrature root mean squared error (RMSE) by ∼ 3 wt.%. The statistical significance of these improvements was ∼ 85%. Our results show that although clustering methods can modestly improve results, a large and diverse training set is the most reliable way to improve the accuracy of quantitative LIBS. In particular, additional sulfate standards and specifically

  3. Detection of secondary structure elements in proteins by hydrophobic cluster analysis.

    Science.gov (United States)

    Woodcock, S; Mornon, J P; Henrissat, B

    1992-10-01

    Hydrophobic cluster analysis (HCA) is a protein sequence comparison method based on alpha-helical representations of the sequences where the size, shape and orientation of the clusters of hydrophobic residues are primarily compared. The effectiveness of HCA has been suggested to originate from its potential ability to focus on the residues forming the hydrophobic core of globular proteins. We have addressed the robustness of the bidimensional representation used for HCA in its ability to detect the regular secondary structure elements of proteins. Various parameters have been studied such as those governing cluster size and limits, the hydrophobic residues constituting the clusters as well as the potential shift of the cluster positions with respect to the position of the regular secondary structure elements. The following results have been found to support the alpha-helical bidimensional representation used in HCA: (i) there is a positive correlation (clearly above background noise) between the hydrophobic clusters and the regular secondary structure elements in proteins; (ii) the hydrophobic clusters are centred on the regular secondary structure elements; (iii) the pitch of the helical representation which gives the best correspondence is that of an alpha-helix. The correspondence between hydrophobic clusters and regular secondary structure elements suggests a way to implement variable gap penalties during the automatic alignment of protein sequences.

  4. Advanced analysis of forest fire clustering

    Science.gov (United States)

    Kanevski, Mikhail; Pereira, Mario; Golay, Jean

    2017-04-01

    Analysis of point pattern clustering is an important topic in spatial statistics and for many applications: biodiversity, epidemiology, natural hazards, geomarketing, etc. There are several fundamental approaches used to quantify spatial data clustering using topological, statistical and fractal measures. In the present research, the recently introduced multi-point Morisita index (mMI) is applied to study the spatial clustering of forest fires in Portugal. The data set consists of more than 30000 fire events covering the time period from 1975 to 2013. The distribution of forest fires is very complex and highly variable in space. mMI is a multi-point extension of the classical two-point Morisita index. In essence, mMI is estimated by covering the region under study by a grid and by computing how many times more likely it is that m points selected at random will be from the same grid cell than it would be in the case of a complete random Poisson process. By changing the number of grid cells (size of the grid cells), mMI characterizes the scaling properties of spatial clustering. From mMI, the data intrinsic dimension (fractal dimension) of the point distribution can be estimated as well. In this study, the mMI of forest fires is compared with the mMI of random patterns (RPs) generated within the validity domain defined as the forest area of Portugal. It turns out that the forest fires are highly clustered inside the validity domain in comparison with the RPs. Moreover, they demonstrate different scaling properties at different spatial scales. The results obtained from the mMI analysis are also compared with those of fractal measures of clustering - box counting and sand box counting approaches. REFERENCES Golay J., Kanevski M., Vega Orozco C., Leuenberger M., 2014: The multipoint Morisita index for the analysis of spatial patterns. Physica A, 406, 191-202. Golay J., Kanevski M. 2015: A new estimator of intrinsic dimension based on the multipoint Morisita index

  5. Research on the method of information system risk state estimation based on clustering particle filter

    Directory of Open Access Journals (Sweden)

    Cui Jia

    2017-05-01

    Full Text Available With the purpose of reinforcing correlation analysis of risk assessment threat factors, a dynamic assessment method of safety risks based on particle filtering is proposed, which takes threat analysis as the core. Based on the risk assessment standards, the method selects threat indicates, applies a particle filtering algorithm to calculate influencing weight of threat indications, and confirms information system risk levels by combining with state estimation theory. In order to improve the calculating efficiency of the particle filtering algorithm, the k-means cluster algorithm is introduced to the particle filtering algorithm. By clustering all particles, the author regards centroid as the representative to operate, so as to reduce calculated amount. The empirical experience indicates that the method can embody the relation of mutual dependence and influence in risk elements reasonably. Under the circumstance of limited information, it provides the scientific basis on fabricating a risk management control strategy.

  6. Research on the method of information system risk state estimation based on clustering particle filter

    Science.gov (United States)

    Cui, Jia; Hong, Bei; Jiang, Xuepeng; Chen, Qinghua

    2017-05-01

    With the purpose of reinforcing correlation analysis of risk assessment threat factors, a dynamic assessment method of safety risks based on particle filtering is proposed, which takes threat analysis as the core. Based on the risk assessment standards, the method selects threat indicates, applies a particle filtering algorithm to calculate influencing weight of threat indications, and confirms information system risk levels by combining with state estimation theory. In order to improve the calculating efficiency of the particle filtering algorithm, the k-means cluster algorithm is introduced to the particle filtering algorithm. By clustering all particles, the author regards centroid as the representative to operate, so as to reduce calculated amount. The empirical experience indicates that the method can embody the relation of mutual dependence and influence in risk elements reasonably. Under the circumstance of limited information, it provides the scientific basis on fabricating a risk management control strategy.

  7. Physicochemical properties of different corn varieties by principal components analysis and cluster analysis

    International Nuclear Information System (INIS)

    Zeng, J.; Li, G.; Sun, J.

    2013-01-01

    Principal components analysis and cluster analysis were used to investigate the properties of different corn varieties. The chemical compositions and some properties of corn flour which processed by drying milling were determined. The results showed that the chemical compositions and physicochemical properties were significantly different among twenty six corn varieties. The quality of corn flour was concerned with five principal components from principal component analysis and the contribution rate of starch pasting properties was important, which could account for 48.90%. Twenty six corn varieties could be classified into four groups by cluster analysis. The consistency between principal components analysis and cluster analysis indicated that multivariate analyses were feasible in the study of corn variety properties. (author)

  8. Cluster fusion algorithm: application to Lennard-Jones clusters

    DEFF Research Database (Denmark)

    Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter

    2006-01-01

    paths up to the cluster size of 150 atoms. We demonstrate that in this way all known global minima structures of the Lennard-Jones clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence......We present a new general theoretical framework for modelling the cluster structure and apply it to description of the Lennard-Jones clusters. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system and absorbing its energy at each step, we find cluster growing...... for the clusters of noble gas atoms and compare it with experimental observations. We report the striking correspondence of the peaks in the dependence of the second derivative of the binding energy per atom on cluster size calculated for the chain of the Lennard-Jones clusters based on the icosahedral symmetry...

  9. Cluster fusion algorithm: application to Lennard-Jones clusters

    DEFF Research Database (Denmark)

    Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter

    2008-01-01

    paths up to the cluster size of 150 atoms. We demonstrate that in this way all known global minima structures of the Lennard-Jones clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence......We present a new general theoretical framework for modelling the cluster structure and apply it to description of the Lennard-Jones clusters. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system and absorbing its energy at each step, we find cluster growing...... for the clusters of noble gas atoms and compare it with experimental observations. We report the striking correspondence of the peaks in the dependence of the second derivative of the binding energy per atom on cluster size calculated for the chain of the Lennard-Jones clusters based on the icosahedral symmetry...

  10. Taxonomical analysis of the Cancer cluster of galaxies

    International Nuclear Information System (INIS)

    Perea, J.; Olmo, A. del; Moles, M.

    1986-01-01

    A description is presented of the Cancer cluster of galaxies, based on a taxonomical analysis in (α,delta, Vsub(r)) space. Earlier results by previous authors on the lack of dynamical entity of the cluster are confirmed. The present analysis points out the existence of a binary structure in the most populated region of the complex. (author)

  11. Sensitivity evaluation of dynamic speckle activity measurements using clustering methods

    International Nuclear Information System (INIS)

    Etchepareborda, Pablo; Federico, Alejandro; Kaufmann, Guillermo H.

    2010-01-01

    We evaluate and compare the use of competitive neural networks, self-organizing maps, the expectation-maximization algorithm, K-means, and fuzzy C-means techniques as partitional clustering methods, when the sensitivity of the activity measurement of dynamic speckle images needs to be improved. The temporal history of the acquired intensity generated by each pixel is analyzed in a wavelet decomposition framework, and it is shown that the mean energy of its corresponding wavelet coefficients provides a suited feature space for clustering purposes. The sensitivity obtained by using the evaluated clustering techniques is also compared with the well-known methods of Konishi-Fujii, weighted generalized differences, and wavelet entropy. The performance of the partitional clustering approach is evaluated using simulated dynamic speckle patterns and also experimental data.

  12. An Extended Affinity Propagation Clustering Method Based on Different Data Density Types

    Directory of Open Access Journals (Sweden)

    XiuLi Zhao

    2015-01-01

    Full Text Available Affinity propagation (AP algorithm, as a novel clustering method, does not require the users to specify the initial cluster centers in advance, which regards all data points as potential exemplars (cluster centers equally and groups the clusters totally by the similar degree among the data points. But in many cases there exist some different intensive areas within the same data set, which means that the data set does not distribute homogeneously. In such situation the AP algorithm cannot group the data points into ideal clusters. In this paper, we proposed an extended AP clustering algorithm to deal with such a problem. There are two steps in our method: firstly the data set is partitioned into several data density types according to the nearest distances of each data point; and then the AP clustering method is, respectively, used to group the data points into clusters in each data density type. Two experiments are carried out to evaluate the performance of our algorithm: one utilizes an artificial data set and the other uses a real seismic data set. The experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithm itself.

  13. Hyperplane distance neighbor clustering based on local discriminant analysis for complex chemical processes monitoring

    Energy Technology Data Exchange (ETDEWEB)

    Lu, Chunhong; Xiao, Shaoqing; Gu, Xiaofeng [Jiangnan University, Wuxi (China)

    2014-11-15

    The collected training data often include both normal and faulty samples for complex chemical processes. However, some monitoring methods, such as partial least squares (PLS), principal component analysis (PCA), independent component analysis (ICA) and Fisher discriminant analysis (FDA), require fault-free data to build the normal operation model. These techniques are applicable after the preliminary step of data clustering is applied. We here propose a novel hyperplane distance neighbor clustering (HDNC) based on the local discriminant analysis (LDA) for chemical process monitoring. First, faulty samples are separated from normal ones using the HDNC method. Then, the optimal subspace for fault detection and classification can be obtained using the LDA approach. The proposed method takes the multimodality within the faulty data into account, and thus improves the capability of process monitoring significantly. The HDNC-LDA monitoring approach is applied to two simulation processes and then compared with the conventional FDA based on the K-nearest neighbor (KNN-FDA) method. The results obtained in two different scenarios demonstrate the superiority of the HDNC-LDA approach in terms of fault detection and classification accuracy.

  14. Hyperplane distance neighbor clustering based on local discriminant analysis for complex chemical processes monitoring

    International Nuclear Information System (INIS)

    Lu, Chunhong; Xiao, Shaoqing; Gu, Xiaofeng

    2014-01-01

    The collected training data often include both normal and faulty samples for complex chemical processes. However, some monitoring methods, such as partial least squares (PLS), principal component analysis (PCA), independent component analysis (ICA) and Fisher discriminant analysis (FDA), require fault-free data to build the normal operation model. These techniques are applicable after the preliminary step of data clustering is applied. We here propose a novel hyperplane distance neighbor clustering (HDNC) based on the local discriminant analysis (LDA) for chemical process monitoring. First, faulty samples are separated from normal ones using the HDNC method. Then, the optimal subspace for fault detection and classification can be obtained using the LDA approach. The proposed method takes the multimodality within the faulty data into account, and thus improves the capability of process monitoring significantly. The HDNC-LDA monitoring approach is applied to two simulation processes and then compared with the conventional FDA based on the K-nearest neighbor (KNN-FDA) method. The results obtained in two different scenarios demonstrate the superiority of the HDNC-LDA approach in terms of fault detection and classification accuracy

  15. Study on Adaptive Parameter Determination of Cluster Analysis in Urban Management Cases

    Science.gov (United States)

    Fu, J. Y.; Jing, C. F.; Du, M. Y.; Fu, Y. L.; Dai, P. P.

    2017-09-01

    The fine management for cities is the important way to realize the smart city. The data mining which uses spatial clustering analysis for urban management cases can be used in the evaluation of urban public facilities deployment, and support the policy decisions, and also provides technical support for the fine management of the city. Aiming at the problem that DBSCAN algorithm which is based on the density-clustering can not realize parameter adaptive determination, this paper proposed the optimizing method of parameter adaptive determination based on the spatial analysis. Firstly, making analysis of the function Ripley's K for the data set to realize adaptive determination of global parameter MinPts, which means setting the maximum aggregation scale as the range of data clustering. Calculating every point object's highest frequency K value in the range of Eps which uses K-D tree and setting it as the value of clustering density to realize the adaptive determination of global parameter MinPts. Then, the R language was used to optimize the above process to accomplish the precise clustering of typical urban management cases. The experimental results based on the typical case of urban management in XiCheng district of Beijing shows that: The new DBSCAN clustering algorithm this paper presents takes full account of the data's spatial and statistical characteristic which has obvious clustering feature, and has a better applicability and high quality. The results of the study are not only helpful for the formulation of urban management policies and the allocation of urban management supervisors in XiCheng District of Beijing, but also to other cities and related fields.

  16. STUDY ON ADAPTIVE PARAMETER DETERMINATION OF CLUSTER ANALYSIS IN URBAN MANAGEMENT CASES

    Directory of Open Access Journals (Sweden)

    J. Y. Fu

    2017-09-01

    Full Text Available The fine management for cities is the important way to realize the smart city. The data mining which uses spatial clustering analysis for urban management cases can be used in the evaluation of urban public facilities deployment, and support the policy decisions, and also provides technical support for the fine management of the city. Aiming at the problem that DBSCAN algorithm which is based on the density-clustering can not realize parameter adaptive determination, this paper proposed the optimizing method of parameter adaptive determination based on the spatial analysis. Firstly, making analysis of the function Ripley's K for the data set to realize adaptive determination of global parameter MinPts, which means setting the maximum aggregation scale as the range of data clustering. Calculating every point object’s highest frequency K value in the range of Eps which uses K-D tree and setting it as the value of clustering density to realize the adaptive determination of global parameter MinPts. Then, the R language was used to optimize the above process to accomplish the precise clustering of typical urban management cases. The experimental results based on the typical case of urban management in XiCheng district of Beijing shows that: The new DBSCAN clustering algorithm this paper presents takes full account of the data’s spatial and statistical characteristic which has obvious clustering feature, and has a better applicability and high quality. The results of the study are not only helpful for the formulation of urban management policies and the allocation of urban management supervisors in XiCheng District of Beijing, but also to other cities and related fields.

  17. Assessment of surface water quality using hierarchical cluster analysis

    Directory of Open Access Journals (Sweden)

    Dheeraj Kumar Dabgerwal

    2016-02-01

    Full Text Available This study was carried out to assess the physicochemical quality river Varuna inVaranasi,India. Water samples were collected from 10 sites during January-June 2015. Pearson correlation analysis was used to assess the direction and strength of relationship between physicochemical parameters. Hierarchical Cluster analysis was also performed to determine the sources of pollution in the river Varuna. The result showed quite high value of DO, Nitrate, BOD, COD and Total Alkalinity, above the BIS permissible limit. The results of correlation analysis identified key water parameters as pH, electrical conductivity, total alkalinity and nitrate, which influence the concentration of other water parameters. Cluster analysis identified three major clusters of sampling sites out of total 10 sites, according to the similarity in water quality. This study illustrated the usefulness of correlation and cluster analysis for getting better information about the river water quality.International Journal of Environment Vol. 5 (1 2016,  pp: 32-44

  18. Swarm: robust and fast clustering method for amplicon-based studies

    Science.gov (United States)

    Rognes, Torbjørn; Quince, Christopher; de Vargas, Colomban; Dunthorn, Micah

    2014-01-01

    Popular de novo amplicon clustering methods suffer from two fundamental flaws: arbitrary global clustering thresholds, and input-order dependency induced by centroid selection. Swarm was developed to address these issues by first clustering nearly identical amplicons iteratively using a local threshold, and then by using clusters’ internal structure and amplicon abundances to refine its results. This fast, scalable, and input-order independent approach reduces the influence of clustering parameters and produces robust operational taxonomic units. PMID:25276506

  19. Swarm: robust and fast clustering method for amplicon-based studies

    Directory of Open Access Journals (Sweden)

    Frédéric Mahé

    2014-09-01

    Full Text Available Popular de novo amplicon clustering methods suffer from two fundamental flaws: arbitrary global clustering thresholds, and input-order dependency induced by centroid selection. Swarm was developed to address these issues by first clustering nearly identical amplicons iteratively using a local threshold, and then by using clusters’ internal structure and amplicon abundances to refine its results. This fast, scalable, and input-order independent approach reduces the influence of clustering parameters and produces robust operational taxonomic units.

  20. Applications of modern statistical methods to analysis of data in physical science

    Science.gov (United States)

    Wicker, James Eric

    Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance

  1. A formal concept analysis approach to consensus clustering of multi-experiment expression data

    Science.gov (United States)

    2014-01-01

    Background Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them. Results We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group. These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological

  2. Analysis of Learning Development With Sugeno Fuzzy Logic And Clustering

    Directory of Open Access Journals (Sweden)

    Maulana Erwin Saputra

    2017-06-01

    Full Text Available In the first journal, I made this attempt to analyze things that affect the achievement of students in each school of course vary. Because students are one of the goals of achieving the goals of successful educational organizations. The mental influence of students’ emotions and behaviors themselves in relation to learning performance. Fuzzy logic can be used in various fields as well as Clustering for grouping, as in Learning Development analyzes. The process will be performed on students based on the symptoms that exist. In this research will use fuzzy logic and clustering. Fuzzy is an uncertain logic but its excess is capable in the process of language reasoning so that in its design is not required complicated mathematical equations. However Clustering method is K-Means method is method where data analysis is broken down by group k (k = 1,2,3, .. k. To know the optimal number of Performance group. The results of the research is with a questionnaire entered into matlab will produce a value that means in generating the graph. And simplify the school in seeing Student performance in the learning process by using certain criteria. So from the system that obtained the results for a decision-making required by the school.

  3. Emergy-based comparative analysis on industrial clusters: economic and technological development zone of Shenyang area, China.

    Science.gov (United States)

    Liu, Zhe; Geng, Yong; Zhang, Pan; Dong, Huijuan; Liu, Zuoxi

    2014-09-01

    In China, local governments of many areas prefer to give priority to the development of heavy industrial clusters in pursuit of high value of gross domestic production (GDP) growth to get political achievements, which usually results in higher costs from ecological degradation and environmental pollution. Therefore, effective methods and reasonable evaluation system are urgently needed to evaluate the overall efficiency of industrial clusters. Emergy methods links economic and ecological systems together, which can evaluate the contribution of ecological products and services as well as the load placed on environmental systems. This method has been successfully applied in many case studies of ecosystem but seldom in industrial clusters. This study applied the methodology of emergy analysis to perform the efficiency of industrial clusters through a series of emergy-based indices as well as the proposed indicators. A case study of Shenyang Economic Technological Development Area (SETDA) was investigated to show the emergy method's practical potential to evaluate industrial clusters to inform environmental policy making. The results of our study showed that the industrial cluster of electric equipment and electronic manufacturing produced the most economic value and had the highest efficiency of energy utilization among the four industrial clusters. However, the sustainability index of the industrial cluster of food and beverage processing was better than the other industrial clusters.

  4. IGSA: Individual Gene Sets Analysis, including Enrichment and Clustering.

    Science.gov (United States)

    Wu, Lingxiang; Chen, Xiujie; Zhang, Denan; Zhang, Wubing; Liu, Lei; Ma, Hongzhe; Yang, Jingbo; Xie, Hongbo; Liu, Bo; Jin, Qing

    2016-01-01

    Analysis of gene sets has been widely applied in various high-throughput biological studies. One weakness in the traditional methods is that they neglect the heterogeneity of genes expressions in samples which may lead to the omission of some specific and important gene sets. It is also difficult for them to reflect the severities of disease and provide expression profiles of gene sets for individuals. We developed an application software called IGSA that leverages a powerful analytical capacity in gene sets enrichment and samples clustering. IGSA calculates gene sets expression scores for each sample and takes an accumulating clustering strategy to let the samples gather into the set according to the progress of disease from mild to severe. We focus on gastric, pancreatic and ovarian cancer data sets for the performance of IGSA. We also compared the results of IGSA in KEGG pathways enrichment with David, GSEA, SPIA, ssGSEA and analyzed the results of IGSA clustering and different similarity measurement methods. Notably, IGSA is proved to be more sensitive and specific in finding significant pathways, and can indicate related changes in pathways with the severity of disease. In addition, IGSA provides with significant gene sets profile for each sample.

  5. Analysis of Aspects of Innovation in a Brazilian Cluster

    Directory of Open Access Journals (Sweden)

    Adriana Valélia Saraceni

    2012-09-01

    Full Text Available Innovation through clustering has become very important on the increased significance that interaction represents on innovation and learning process concept. This study aims to identify whereas a case analysis on innovation process in a cluster represents on the learning process. Therefore, this study is developed in two stages. First, we used a preliminary case study verifying a cluster innovation analysis and it Innovation Index, for further, exploring a combined body of theory and practice. Further, the second stage is developed by exploring the learning process concept. Both stages allowed us building a theory model for the learning process development in clusters. The main results of the model development come up with a mechanism of improvement implementation on clusters when case studies are applied.

  6. Phenotypic clustering: a novel method for microglial morphology analysis.

    Science.gov (United States)

    Verdonk, Franck; Roux, Pascal; Flamant, Patricia; Fiette, Laurence; Bozza, Fernando A; Simard, Sébastien; Lemaire, Marc; Plaud, Benoit; Shorte, Spencer L; Sharshar, Tarek; Chrétien, Fabrice; Danckaert, Anne

    2016-06-17

    Microglial cells are tissue-resident macrophages of the central nervous system. They are extremely dynamic, sensitive to their microenvironment and present a characteristic complex and heterogeneous morphology and distribution within the brain tissue. Many experimental clues highlight a strong link between their morphology and their function in response to aggression. However, due to their complex "dendritic-like" aspect that constitutes the major pool of murine microglial cells and their dense network, precise and powerful morphological studies are not easy to realize and complicate correlation with molecular or clinical parameters. Using the knock-in mouse model CX3CR1(GFP/+), we developed a 3D automated confocal tissue imaging system coupled with morphological modelling of many thousands of microglial cells revealing precise and quantitative assessment of major cell features: cell density, cell body area, cytoplasm area and number of primary, secondary and tertiary processes. We determined two morphological criteria that are the complexity index (CI) and the covered environment area (CEA) allowing an innovative approach lying in (i) an accurate and objective study of morphological changes in healthy or pathological condition, (ii) an in situ mapping of the microglial distribution in different neuroanatomical regions and (iii) a study of the clustering of numerous cells, allowing us to discriminate different sub-populations. Our results on more than 20,000 cells by condition confirm at baseline a regional heterogeneity of the microglial distribution and phenotype that persists after induction of neuroinflammation by systemic injection of lipopolysaccharide (LPS). Using clustering analysis, we highlight that, at resting state, microglial cells are distributed in four microglial sub-populations defined by their CI and CEA with a regional pattern and a specific behaviour after challenge. Our results counteract the classical view of a homogenous regional resting

  7. Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-Means Cluster Analysis

    Science.gov (United States)

    de Craen, Saskia; Commandeur, Jacques J. F.; Frank, Laurence E.; Heiser, Willem J.

    2006-01-01

    K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these…

  8. Crouch gait patterns defined using k-means cluster analysis are related to underlying clinical pathology.

    Science.gov (United States)

    Rozumalski, Adam; Schwartz, Michael H

    2009-08-01

    In this study a gait classification method was developed and applied to subjects with Cerebral palsy who walk with excessive knee flexion at initial contact. Sagittal plane gait data, simplified using the gait features method, is used as input into a k-means cluster analysis to determine homogeneous groups. Several clinical domains were explored to determine if the clusters are related to underlying pathology. These domains included age, joint range-of-motion, strength, selective motor control, and spasticity. Principal component analysis is used to determine one overall score for each of the multi-joint domains (strength, selective motor control, and spasticity). The current study shows that there are five clusters among children with excessive knee flexion at initial contact. These clusters were labeled, in order of increasing gait pathology: (1) mild crouch with mild equinus, (2) moderate crouch, (3) moderate crouch with anterior pelvic tilt, (4) moderate crouch with equinus, and (5) severe crouch. Further analysis showed that age, range-of-motion, strength, selective motor control, and spasticity were significantly different between the clusters (p<0.001). The general tendency was for the clinical domains to worsen as gait pathology increased. This new classification tool can be used to define homogeneous groups of subjects in crouch gait, which can help guide treatment decisions and outcomes assessment.

  9. Non-Hierarchical Clustering as a method to analyse an open-ended ...

    African Journals Online (AJOL)

    Apple

    Keywords: algebraic thinking; cluster analysis; mathematics education; quantitative analysis. Introduction. Extensive ..... C1, C2 and C3 represent the three centroids of the three clusters formed. .... 6ALd. All these strategies are algebraic and 'high- ... 1995), of the didactical aspects related to teaching .... Brazil, 18-23 July.

  10. Clustering Dycom

    KAUST Repository

    Minku, Leandro L.

    2017-10-06

    Background: Software Effort Estimation (SEE) can be formulated as an online learning problem, where new projects are completed over time and may become available for training. In this scenario, a Cross-Company (CC) SEE approach called Dycom can drastically reduce the number of Within-Company (WC) projects needed for training, saving the high cost of collecting such training projects. However, Dycom relies on splitting CC projects into different subsets in order to create its CC models. Such splitting can have a significant impact on Dycom\\'s predictive performance. Aims: This paper investigates whether clustering methods can be used to help finding good CC splits for Dycom. Method: Dycom is extended to use clustering methods for creating the CC subsets. Three different clustering methods are investigated, namely Hierarchical Clustering, K-Means, and Expectation-Maximisation. Clustering Dycom is compared against the original Dycom with CC subsets of different sizes, based on four SEE databases. A baseline WC model is also included in the analysis. Results: Clustering Dycom with K-Means can potentially help to split the CC projects, managing to achieve similar or better predictive performance than Dycom. However, K-Means still requires the number of CC subsets to be pre-defined, and a poor choice can negatively affect predictive performance. EM enables Dycom to automatically set the number of CC subsets while still maintaining or improving predictive performance with respect to the baseline WC model. Clustering Dycom with Hierarchical Clustering did not offer significant advantage in terms of predictive performance. Conclusion: Clustering methods can be an effective way to automatically generate Dycom\\'s CC subsets.

  11. Symmetrized partial-wave method for density-functional cluster calculations

    International Nuclear Information System (INIS)

    Averill, F.W.; Painter, G.S.

    1994-01-01

    The computational advantage and accuracy of the Harris method is linked to the simplicity and adequacy of the reference-density model. In an earlier paper, we investigated one way the Harris functional could be extended to systems outside the limits of weakly interacting atoms by making the charge density of the interacting atoms self-consistent within the constraints of overlapping spherical atomic densities. In the present study, a method is presented for augmenting the interacting atom charge densities with symmetrized partial-wave expansions on each atomic site. The added variational freedom of the partial waves leads to a scheme capable of giving exact results within a given exchange-correlation approximation while maintaining many of the desirable convergence and stability properties of the original Harris method. Incorporation of the symmetry of the cluster in the partial-wave construction further reduces the level of computational effort. This partial-wave cluster method is illustrated by its application to the dimer C 2 , the hypothetical atomic cluster Fe 6 Al 8 , and the benzene molecule

  12. Analysis of genetic association using hierarchical clustering and cluster validation indices.

    Science.gov (United States)

    Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L

    2017-10-01

    It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. The potential of clustering methods to define intersection test scenarios: Assessing real-life performance of AEB.

    Science.gov (United States)

    Sander, Ulrich; Lubbe, Nils

    2018-04-01

    usage of different sets of clustering variables resulted in substantially different numbers of clusters. The stability of the resulting clusters increased with prioritization of categorical over continuous variables. For each different set of cluster variables, a strong in-cluster variance of avoided versus non-avoided accidents for the specified Intersection AEB was present. The medoids did not predict the most common Intersection AEB behavior in each cluster. Despite thorough analysis using various cluster methods and variable sets, it was impossible to reduce the diversity of intersection accidents into a set of test scenarios without compromising the ability to predict real-life performance of Intersection AEB. Although this does not imply that other methods cannot succeed, it was observed that small changes in the definition of a scenario resulted in a different avoidance outcome. Therefore, we suggest using limited physical testing to validate more extensive virtual simulations to evaluate vehicle safety. Copyright © 2018 Elsevier Ltd. All rights reserved.

  14. Structure and substructure analysis of DAFT/FADA galaxy clusters in the [0.4-0.9] redshift range

    Science.gov (United States)

    Guennou, L.; Adami, C.; Durret, F.; Lima Neto, G. B.; Ulmer, M. P.; Clowe, D.; LeBrun, V.; Martinet, N.; Allam, S.; Annis, J.; Basa, S.; Benoist, C.; Biviano, A.; Cappi, A.; Cypriano, E. S.; Gavazzi, R.; Halliday, C.; Ilbert, O.; Jullo, E.; Just, D.; Limousin, M.; Márquez, I.; Mazure, A.; Murphy, K. J.; Plana, H.; Rostagni, F.; Russeil, D.; Schirmer, M.; Slezak, E.; Tucker, D.; Zaritsky, D.; Ziegler, B.

    2014-01-01

    Context. The DAFT/FADA survey is based on the study of ~90 rich (masses found in the literature >2 × 1014 M⊙) and moderately distant clusters (redshifts 0.4 DAFT/FADA survey for which XMM-Newton and/or a sufficient number of galaxy redshifts in the cluster range are available, with the aim of detecting substructures and evidence for merging events. These properties are discussed in the framework of standard cold dark matter (ΛCDM) cosmology. Methods: In X-rays, we analysed the XMM-Newton data available, fit a β-model, and subtracted it to identify residuals. We used Chandra data, when available, to identify point sources. In the optical, we applied a Serna & Gerbal (SG) analysis to clusters with at least 15 spectroscopic galaxy redshifts available in the cluster range. We discuss the substructure detection efficiencies of both methods. Results: XMM-Newton data were available for 32 clusters, for which we derive the X-ray luminosity and a global X-ray temperature for 25 of them. For 23 clusters we were able to fit the X-ray emissivity with a β-model and subtract it to detect substructures in the X-ray gas. A dynamical analysis based on the SG method was applied to the clusters having at least 15 spectroscopic galaxy redshifts in the cluster range: 18 X-ray clusters and 11 clusters with no X-ray data. The choice of a minimum number of 15 redshifts implies that only major substructures will be detected. Ten substructures were detected both in X-rays and by the SG method. Most of the substructures detected both in X-rays and with the SG method are probably at their first cluster pericentre approach and are relatively recent infalls. We also find hints of a decreasing X-ray gas density profile core radius with redshift. Conclusions: The percentage of mass included in substructures was found to be roughly constant with redshift values of 5-15%, in agreement both with the general CDM framework and with the results of numerical simulations. Galaxies in substructures

  15. Clustering and training set selection methods for improving the accuracy of quantitative laser induced breakdown spectroscopy

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Ryan B., E-mail: randerson@astro.cornell.edu [Cornell University Department of Astronomy, 406 Space Sciences Building, Ithaca, NY 14853 (United States); Bell, James F., E-mail: Jim.Bell@asu.edu [Arizona State University School of Earth and Space Exploration, Bldg.: INTDS-A, Room: 115B, Box 871404, Tempe, AZ 85287 (United States); Wiens, Roger C., E-mail: rwiens@lanl.gov [Los Alamos National Laboratory, P.O. Box 1663 MS J565, Los Alamos, NM 87545 (United States); Morris, Richard V., E-mail: richard.v.morris@nasa.gov [NASA Johnson Space Center, 2101 NASA Parkway, Houston, TX 77058 (United States); Clegg, Samuel M., E-mail: sclegg@lanl.gov [Los Alamos National Laboratory, P.O. Box 1663 MS J565, Los Alamos, NM 87545 (United States)

    2012-04-15

    We investigated five clustering and training set selection methods to improve the accuracy of quantitative chemical analysis of geologic samples by laser induced breakdown spectroscopy (LIBS) using partial least squares (PLS) regression. The LIBS spectra were previously acquired for 195 rock slabs and 31 pressed powder geostandards under 7 Torr CO{sub 2} at a stand-off distance of 7 m at 17 mJ per pulse to simulate the operational conditions of the ChemCam LIBS instrument on the Mars Science Laboratory Curiosity rover. The clustering and training set selection methods, which do not require prior knowledge of the chemical composition of the test-set samples, are based on grouping similar spectra and selecting appropriate training spectra for the partial least squares (PLS2) model. These methods were: (1) hierarchical clustering of the full set of training spectra and selection of a subset for use in training; (2) k-means clustering of all spectra and generation of PLS2 models based on the training samples within each cluster; (3) iterative use of PLS2 to predict sample composition and k-means clustering of the predicted compositions to subdivide the groups of spectra; (4) soft independent modeling of class analogy (SIMCA) classification of spectra, and generation of PLS2 models based on the training samples within each class; (5) use of Bayesian information criteria (BIC) to determine an optimal number of clusters and generation of PLS2 models based on the training samples within each cluster. The iterative method and the k-means method using 5 clusters showed the best performance, improving the absolute quadrature root mean squared error (RMSE) by {approx} 3 wt.%. The statistical significance of these improvements was {approx} 85%. Our results show that although clustering methods can modestly improve results, a large and diverse training set is the most reliable way to improve the accuracy of quantitative LIBS. In particular, additional sulfate standards and

  16. A Latent Variable Clustering Method for Wireless Sensor Networks

    DEFF Research Database (Denmark)

    Vasilev, Vladislav; Iliev, Georgi; Poulkov, Vladimir

    2016-01-01

    In this paper we derive a clustering method based on the Hidden Conditional Random Field (HCRF) model in order to maximizes the performance of a wireless sensor. Our novel approach to clustering in this paper is in the application of an index invariant graph that we defined in a previous work and...

  17. Cluster Physics with Merging Galaxy Clusters

    Directory of Open Access Journals (Sweden)

    Sandor M. Molnar

    2016-02-01

    Full Text Available Collisions between galaxy clusters provide a unique opportunity to study matter in a parameter space which cannot be explored in our laboratories on Earth. In the standard LCDM model, where the total density is dominated by the cosmological constant ($Lambda$ and the matter density by cold dark matter (CDM, structure formation is hierarchical, and clusters grow mostly by merging.Mergers of two massive clusters are the most energetic events in the universe after the Big Bang,hence they provide a unique laboratory to study cluster physics.The two main mass components in clusters behave differently during collisions:the dark matter is nearly collisionless, responding only to gravity, while the gas is subject to pressure forces and dissipation, and shocks and turbulenceare developed during collisions. In the present contribution we review the different methods used to derive the physical properties of merging clusters. Different physical processes leave their signatures on different wavelengths, thusour review is based on a multifrequency analysis. In principle, the best way to analyze multifrequency observations of merging clustersis to model them using N-body/HYDRO numerical simulations. We discuss the results of such detailed analyses.New high spatial and spectral resolution ground and space based telescopeswill come online in the near future. Motivated by these new opportunities,we briefly discuss methods which will be feasible in the near future in studying merging clusters.

  18. The Views of Turkish Pre-Service Teachers about Effectiveness of Cluster Method as a Teaching Writing Method

    Science.gov (United States)

    Kitis, Emine; Türkel, Ali

    2017-01-01

    The aim of this study is to find out Turkish pre-service teachers' views on effectiveness of cluster method as a writing teaching method. The Cluster Method can be defined as a connotative creative writing method. The way the method works is that the person who brainstorms on connotations of a word or a concept in abscence of any kind of…

  19. Outcome-Driven Cluster Analysis with Application to Microarray Data.

    Directory of Open Access Journals (Sweden)

    Jessie J Hsu

    Full Text Available One goal of cluster analysis is to sort characteristics into groups (clusters so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes into groups of highly correlated genes that have the same effect on the outcome (recovery. We propose a random effects model where the genes within each group (cluster equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.

  20. Cluster analysis of rural, urban, and curbside atmospheric particle size data.

    Science.gov (United States)

    Beddows, David C S; Dall'Osto, Manuel; Harrison, Roy M

    2009-07-01

    Particle size is a key determinant of the hazard posed by airborne particles. Continuous multivariate particle size data have been collected using aerosol particle size spectrometers sited at four locations within the UK: Harwell (Oxfordshire); Regents Park (London); British Telecom Tower (London); and Marylebone Road (London). These data have been analyzed using k-means cluster analysis, deduced to be the preferred cluster analysis technique, selected from an option of four partitional cluster packages, namelythe following: Fuzzy; k-means; k-median; and Model-Based clustering. Using cluster validation indices k-means clustering was shown to produce clusters with the smallest size, furthest separation, and importantly the highest degree of similarity between the elements within each partition. Using k-means clustering, the complexity of the data set is reduced allowing characterization of the data according to the temporal and spatial trends of the clusters. At Harwell, the rural background measurement site, the cluster analysis showed that the spectra may be differentiated by their modal-diameters and average temporal trends showing either high counts during the day-time or night-time hours. Likewise for the urban sites, the cluster analysis differentiated the spectra into a small number of size distributions according their modal-diameter, the location of the measurement site, and time of day. The responsible aerosol emission, formation, and dynamic processes can be inferred according to the cluster characteristics and correlation to concurrently measured meteorological, gas phase, and particle phase measurements.

  1. Herd Clustering: A synergistic data clustering approach using collective intelligence

    KAUST Repository

    Wong, Kachun

    2014-10-01

    Traditional data mining methods emphasize on analytical abilities to decipher data, assuming that data are static during a mining process. We challenge this assumption, arguing that we can improve the analysis by vitalizing data. In this paper, this principle is used to develop a new clustering algorithm. Inspired by herd behavior, the clustering method is a synergistic approach using collective intelligence called Herd Clustering (HC). The novel part is laid in its first stage where data instances are represented by moving particles. Particles attract each other locally and form clusters by themselves as shown in the case studies reported. To demonstrate its effectiveness, the performance of HC is compared to other state-of-the art clustering methods on more than thirty datasets using four performance metrics. An application for DNA motif discovery is also conducted. The results support the effectiveness of HC and thus the underlying philosophy. © 2014 Elsevier B.V.

  2. HDclassif : An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data

    Directory of Open Access Journals (Sweden)

    Laurent Berge

    2012-01-01

    Full Text Available This paper presents the R package HDclassif which is devoted to the clustering and the discriminant analysis of high-dimensional data. The classification methods proposed in the package result from a new parametrization of the Gaussian mixture model which combines the idea of dimension reduction and model constraints on the covariance matrices. The supervised classification method using this parametrization is called high dimensional discriminant analysis (HDDA. In a similar manner, the associated clustering method iscalled high dimensional data clustering (HDDC and uses the expectation-maximization algorithm for inference. In order to correctly t the data, both methods estimate the specific subspace and the intrinsic dimension of the groups. Due to the constraints on the covariance matrices, the number of parameters to estimate is significantly lower than other model-based methods and this allows the methods to be stable and efficient in high dimensions. Two introductory examples illustrated with R codes allow the user to discover the hdda and hddc functions. Experiments on simulated and real datasets also compare HDDC and HDDA with existing classification methods on high-dimensional datasets. HDclassif is a free software and distributed under the general public license, as part of the R software project.

  3. Kernel method for clustering based on optimal target vector

    International Nuclear Information System (INIS)

    Angelini, Leonardo; Marinazzo, Daniele; Pellicoro, Mario; Stramaglia, Sebastiano

    2006-01-01

    We introduce Ising models, suitable for dichotomic clustering, with couplings that are (i) both ferro- and anti-ferromagnetic (ii) depending on the whole data-set and not only on pairs of samples. Couplings are determined exploiting the notion of optimal target vector, here introduced, a link between kernel supervised and unsupervised learning. The effectiveness of the method is shown in the case of the well-known iris data-set and in benchmarks of gene expression levels, where it works better than existing methods for dichotomic clustering

  4. Comparative analysis on the selection of number of clusters in community detection

    Science.gov (United States)

    Kawamoto, Tatsuro; Kabashima, Yoshiyuki

    2018-02-01

    We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.

  5. Developing cluster strategy of apples dodol SMEs by integration K-means clustering and analytical hierarchy process method

    Science.gov (United States)

    Mustaniroh, S. A.; Effendi, U.; Silalahi, R. L. R.; Sari, T.; Ala, M.

    2018-03-01

    The purposes of this research were to determine the grouping of apples dodol small and medium enterprises (SMEs) in Batu City and to determine an appropriate development strategy for each cluster. The methods used for clustering SMEs was k-means. The Analytical Hierarchy Process (AHP) approach was then applied to determine the development strategy priority for each cluster. The variables used in grouping include production capacity per month, length of operation, investment value, average sales revenue per month, amount of SMEs assets, and the number of workers. Several factors were considered in AHP include industry cluster, government, as well as related and supporting industries. Data was collected using the methods of questionaire and interviews. SMEs respondents were selected among SMEs appels dodol in Batu City using purposive sampling. The result showed that two clusters were formed from five apples dodol SMEs. The 1stcluster of apples dodol SMEs, classified as small enterprises, included SME A, SME C, and SME D. The 2ndcluster of SMEs apples dodol, classified as medium enterprises, consisted of SME B and SME E. The AHP results indicated that the priority development strategy for the 1stcluster of apples dodol SMEs was improving quality and the product standardisation, while for the 2nd cluster was increasing the marketing access.

  6. A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data.

    Science.gov (United States)

    Mo, Qianxing; Shen, Ronglai; Guo, Cui; Vannucci, Marina; Chan, Keith S; Hilsenbeck, Susan G

    2018-01-01

    Identification of clinically relevant tumor subtypes and omics signatures is an important task in cancer translational research for precision medicine. Large-scale genomic profiling studies such as The Cancer Genome Atlas (TCGA) Research Network have generated vast amounts of genomic, transcriptomic, epigenomic, and proteomic data. While these studies have provided great resources for researchers to discover clinically relevant tumor subtypes and driver molecular alterations, there are few computationally efficient methods and tools for integrative clustering analysis of these multi-type omics data. Therefore, the aim of this article is to develop a fully Bayesian latent variable method (called iClusterBayes) that can jointly model omics data of continuous and discrete data types for identification of tumor subtypes and relevant omics features. Specifically, the proposed method uses a few latent variables to capture the inherent structure of multiple omics data sets to achieve joint dimension reduction. As a result, the tumor samples can be clustered in the latent variable space and relevant omics features that drive the sample clustering are identified through Bayesian variable selection. This method significantly improve on the existing integrative clustering method iClusterPlus in terms of statistical inference and computational speed. By analyzing TCGA and simulated data sets, we demonstrate the excellent performance of the proposed method in revealing clinically meaningful tumor subtypes and driver omics features. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  7. Cluster analysis for determining distribution center location

    Science.gov (United States)

    Lestari Widaningrum, Dyah; Andika, Aditya; Murphiyanto, Richard Dimas Julian

    2017-12-01

    Determination of distribution facilities is highly important to survive in the high level of competition in today’s business world. Companies can operate multiple distribution centers to mitigate supply chain risk. Thus, new problems arise, namely how many and where the facilities should be provided. This study examines a fast-food restaurant brand, which located in the Greater Jakarta. This brand is included in the category of top 5 fast food restaurant chain based on retail sales. There were three stages in this study, compiling spatial data, cluster analysis, and network analysis. Cluster analysis results are used to consider the location of the additional distribution center. Network analysis results show a more efficient process referring to a shorter distance to the distribution process.

  8. Identification of Counterfeit Alcoholic Beverages Using Cluster Analysis in Principal-Component Space

    Science.gov (United States)

    Khodasevich, M. A.; Sinitsyn, G. V.; Gres'ko, M. A.; Dolya, V. M.; Rogovaya, M. V.; Kazberuk, A. V.

    2017-07-01

    A study of 153 brands of commercial vodka products showed that counterfeit samples could be identified by introducing a unified additive at the minimum concentration acceptable for instrumental detection and multivariate analysis of UV-Vis transmission spectra. Counterfeit products were detected with 100% probability by using hierarchical cluster analysis or the C-means method in two-dimensional principal-component space.

  9. Damage evolution analysis of coal samples under cyclic loading based on single-link cluster method

    Science.gov (United States)

    Zhang, Zhibo; Wang, Enyuan; Li, Nan; Li, Xuelong; Wang, Xiaoran; Li, Zhonghui

    2018-05-01

    In this paper, the acoustic emission (AE) response of coal samples under cyclic loading is measured. The results show that there is good positive relation between AE parameters and stress. The AE signal of coal samples under cyclic loading exhibits an obvious Kaiser Effect. The single-link cluster (SLC) method is applied to analyze the spatial evolution characteristics of AE events and the damage evolution process of coal samples. It is found that a subset scale of the SLC structure becomes smaller and smaller when the number of cyclic loading increases, and there is a negative linear relationship between the subset scale and the degree of damage. The spatial correlation length ξ of an SLC structure is calculated. The results show that ξ fluctuates around a certain value from the second cyclic loading process to the fifth cyclic loading process, but spatial correlation length ξ clearly increases in the sixth loading process. Based on the criterion of microcrack density, the coal sample failure process is the transformation from small-scale damage to large-scale damage, which is the reason for changes in the spatial correlation length. Through a systematic analysis, the SLC method is an effective method to research the damage evolution process of coal samples under cyclic loading, and will provide important reference values for studying coal bursts.

  10. Analysis of candidates for interacting galaxy clusters. I. A1204 and A2029/A2033

    Science.gov (United States)

    Gonzalez, Elizabeth Johana; de los Rios, Martín; Oio, Gabriel A.; Lang, Daniel Hernández; Tagliaferro, Tania Aguirre; Domínguez R., Mariano J.; Castellón, José Luis Nilo; Cuevas L., Héctor; Valotto, Carlos A.

    2018-04-01

    Context. Merging galaxy clusters allow for the study of different mass components, dark and baryonic, separately. Also, their occurrence enables to test the ΛCDM scenario, which can be used to put constraints on the self-interacting cross-section of the dark-matter particle. Aim. It is necessary to perform a homogeneous analysis of these systems. Hence, based on a recently presented sample of candidates for interacting galaxy clusters, we present the analysis of two of these cataloged systems. Methods: In this work, the first of a series devoted to characterizing galaxy clusters in merger processes, we perform a weak lensing analysis of clusters A1204 and A2029/A2033 to derive the total masses of each identified interacting structure together with a dynamical study based on a two-body model. We also describe the gas and the mass distributions in the field through a lensing and an X-ray analysis. This is the first of a series of works which will analyze these type of system in order to characterize them. Results: Neither merging cluster candidate shows evidence of having had a recent merger event. Nevertheless, there is dynamical evidence that these systems could be interacting or could interact in the future. Conclusions: It is necessary to include more constraints in order to improve the methodology of classifying merging galaxy clusters. Characterization of these clusters is important in order to properly understand the nature of these systems and their connection with dynamical studies.

  11. Clustering Multiple Sclerosis Subgroups with Multifractal Methods and Self-Organizing Map Algorithm

    Science.gov (United States)

    Karaca, Yeliz; Cattani, Carlo

    Magnetic resonance imaging (MRI) is the most sensitive method to detect chronic nervous system diseases such as multiple sclerosis (MS). In this paper, Brownian motion Hölder regularity functions (polynomial, periodic (sine), exponential) for 2D image, such as multifractal methods were applied to MR brain images, aiming to easily identify distressed regions, in MS patients. With these regions, we have proposed an MS classification based on the multifractal method by using the Self-Organizing Map (SOM) algorithm. Thus, we obtained a cluster analysis by identifying pixels from distressed regions in MR images through multifractal methods and by diagnosing subgroups of MS patients through artificial neural networks.

  12. Consensus of satellite cluster flight using an energy-matching optimal control method

    Science.gov (United States)

    Luo, Jianjun; Zhou, Liang; Zhang, Bo

    2017-11-01

    This paper presents an optimal control method for consensus of satellite cluster flight under a kind of energy matching condition. Firstly, the relation between energy matching and satellite periodically bounded relative motion is analyzed, and the satellite energy matching principle is applied to configure the initial conditions. Then, period-delayed errors are adopted as state variables to establish the period-delayed errors dynamics models of a single satellite and the cluster. Next a novel satellite cluster feedback control protocol with coupling gain is designed, so that the satellite cluster periodically bounded relative motion consensus problem (period-delayed errors state consensus problem) is transformed to the stability of a set of matrices with the same low dimension. Based on the consensus region theory in the research of multi-agent system consensus issues, the coupling gain can be obtained to satisfy the requirement of consensus region and decouple the satellite cluster information topology and the feedback control gain matrix, which can be determined by Linear quadratic regulator (LQR) optimal method. This method can realize the consensus of satellite cluster period-delayed errors, leading to the consistency of semi-major axes (SMA) and the energy-matching of satellite cluster. Then satellites can emerge the global coordinative cluster behavior. Finally the feasibility and effectiveness of the present energy-matching optimal consensus for satellite cluster flight is verified through numerical simulations.

  13. Orbit Clustering Based on Transfer Cost

    Science.gov (United States)

    Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.

    2013-01-01

    We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.

  14. Kinetic methods for measuring the temperature of clusters and nanoparticles in molecular beams

    International Nuclear Information System (INIS)

    Makarov, Grigorii N

    2011-01-01

    The temperature (internal energy) of clusters and nanoparticles is an important physical parameter which affects many of their properties and the character of processes they are involved in. At the same time, determining the temperature of free clusters and nanoparticles in molecular beams is a rather complicated problem because the temperature of small particles depends on their size. In this paper, recently developed kinetic methods for measuring the temperature of clusters and nanoparticles in molecular beams are reviewed. The definition of temperature in the present context is given, and how the temperature affects the properties of and the processes involving the particles is discussed. The temperature behavior of clusters and nanoparticles near a phase transition point is analyzed. Early methods for measuring the temperature of large clusters are briefly described. It is shown that, compared to other methods, new kinetic methods are more universal and applicable for determining the temperature of clusters and nanoparticles of practically any size and composition. The future development and applications of these methods are outlined. (reviews of topical problems)

  15. A semantics-based method for clustering of Chinese web search results

    Science.gov (United States)

    Zhang, Hui; Wang, Deqing; Wang, Li; Bi, Zhuming; Chen, Yong

    2014-01-01

    Information explosion is a critical challenge to the development of modern information systems. In particular, when the application of an information system is over the Internet, the amount of information over the web has been increasing exponentially and rapidly. Search engines, such as Google and Baidu, are essential tools for people to find the information from the Internet. Valuable information, however, is still likely submerged in the ocean of search results from those tools. By clustering the results into different groups based on subjects automatically, a search engine with the clustering feature allows users to select most relevant results quickly. In this paper, we propose an online semantics-based method to cluster Chinese web search results. First, we employ the generalised suffix tree to extract the longest common substrings (LCSs) from search snippets. Second, we use the HowNet to calculate the similarities of the words derived from the LCSs, and extract the most representative features by constructing the vocabulary chain. Third, we construct a vector of text features and calculate snippets' semantic similarities. Finally, we improve the Chameleon algorithm to cluster snippets. Extensive experimental results have shown that the proposed algorithm has outperformed over the suffix tree clustering method and other traditional clustering methods.

  16. Performance Analysis of Combined Methods of Genetic Algorithm and K-Means Clustering in Determining the Value of Centroid

    Science.gov (United States)

    Adya Zizwan, Putra; Zarlis, Muhammad; Budhiarti Nababan, Erna

    2017-12-01

    The determination of Centroid on K-Means Algorithm directly affects the quality of the clustering results. Determination of centroid by using random numbers has many weaknesses. The GenClust algorithm that combines the use of Genetic Algorithms and K-Means uses a genetic algorithm to determine the centroid of each cluster. The use of the GenClust algorithm uses 50% chromosomes obtained through deterministic calculations and 50% is obtained from the generation of random numbers. This study will modify the use of the GenClust algorithm in which the chromosomes used are 100% obtained through deterministic calculations. The results of this study resulted in performance comparisons expressed in Mean Square Error influenced by centroid determination on K-Means method by using GenClust method, modified GenClust method and also classic K-Means.

  17. A novel clustering and supervising users' profiles method

    Institute of Scientific and Technical Information of China (English)

    Zhu Mingfu; Zhang Hongbin; Song Fangyun

    2005-01-01

    To better understand different users' accessing intentions, a novel clustering and supervising method based on accessing path is presented. This method divides users' interest space to express the distribution of users' interests, and directly to instruct the constructing process of web pages indexing for advanced performance.

  18. Seismic clusters analysis in Northeastern Italy by the nearest-neighbor approach

    Science.gov (United States)

    Peresan, Antonella; Gentili, Stefania

    2018-01-01

    The main features of earthquake clusters in Northeastern Italy are explored, with the aim to get new insights on local scale patterns of seismicity in the area. The study is based on a systematic analysis of robustly and uniformly detected seismic clusters, which are identified by a statistical method, based on nearest-neighbor distances of events in the space-time-energy domain. The method permits us to highlight and investigate the internal structure of earthquake sequences, and to differentiate the spatial properties of seismicity according to the different topological features of the clusters structure. To analyze seismicity of Northeastern Italy, we use information from local OGS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics since 1977. A preliminary reappraisal of the earthquake bulletins is carried out and the area of sufficient completeness is outlined. Various techniques are considered to estimate the scaling parameters that characterize earthquakes occurrence in the region, namely the b-value and the fractal dimension of epicenters distribution, required for the application of the nearest-neighbor technique. Specifically, average robust estimates of the parameters of the Unified Scaling Law for Earthquakes, USLE, are assessed for the whole outlined region and are used to compute the nearest-neighbor distances. Clusters identification by the nearest-neighbor method turn out quite reliable and robust with respect to the minimum magnitude cutoff of the input catalog; the identified clusters are well consistent with those obtained from manual aftershocks identification of selected sequences. We demonstrate that the earthquake clusters have distinct preferred geographic locations, and we identify two areas that differ substantially in the examined clustering properties. Specifically, burst-like sequences are associated with the north-western part and swarm-like sequences with the south-eastern part of the study

  19. Automated analysis of organic particles using cluster SIMS

    Energy Technology Data Exchange (ETDEWEB)

    Gillen, Greg; Zeissler, Cindy; Mahoney, Christine; Lindstrom, Abigail; Fletcher, Robert; Chi, Peter; Verkouteren, Jennifer; Bright, David; Lareau, Richard T.; Boldman, Mike

    2004-06-15

    Cluster primary ion bombardment combined with secondary ion imaging is used on an ion microscope secondary ion mass spectrometer for the spatially resolved analysis of organic particles on various surfaces. Compared to the use of monoatomic primary ion beam bombardment, the use of a cluster primary ion beam (SF{sub 5}{sup +} or C{sub 8}{sup -}) provides significant improvement in molecular ion yields and a reduction in beam-induced degradation of the analyte molecules. These characteristics of cluster bombardment, along with automated sample stage control and custom image analysis software are utilized to rapidly characterize the spatial distribution of trace explosive particles, narcotics and inkjet-printed microarrays on a variety of surfaces.

  20. Agent-based method for distributed clustering of textual information

    Science.gov (United States)

    Potok, Thomas E [Oak Ridge, TN; Reed, Joel W [Knoxville, TN; Elmore, Mark T [Oak Ridge, TN; Treadwell, Jim N [Louisville, TN

    2010-09-28

    A computer method and system for storing, retrieving and displaying information has a multiplexing agent (20) that calculates a new document vector (25) for a new document (21) to be added to the system and transmits the new document vector (25) to master cluster agents (22) and cluster agents (23) for evaluation. These agents (22, 23) perform the evaluation and return values upstream to the multiplexing agent (20) based on the similarity of the document to documents stored under their control. The multiplexing agent (20) then sends the document (21) and the document vector (25) to the master cluster agent (22), which then forwards it to a cluster agent (23) or creates a new cluster agent (23) to manage the document (21). The system also searches for stored documents according to a search query having at least one term and identifying the documents found in the search, and displays the documents in a clustering display (80) of similarity so as to indicate similarity of the documents to each other.

  1. Clustering Educational Digital Library Usage Data: A Comparison of Latent Class Analysis and K-Means Algorithms

    Science.gov (United States)

    Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei

    2013-01-01

    This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…

  2. Social Learning Network Analysis Model to Identify Learning Patterns Using Ontology Clustering Techniques and Meaningful Learning

    Science.gov (United States)

    Firdausiah Mansur, Andi Besse; Yusof, Norazah

    2013-01-01

    Clustering on Social Learning Network still not explored widely, especially when the network focuses on e-learning system. Any conventional methods are not really suitable for the e-learning data. SNA requires content analysis, which involves human intervention and need to be carried out manually. Some of the previous clustering techniques need…

  3. a Probabilistic Embedding Clustering Method for Urban Structure Detection

    Science.gov (United States)

    Lin, X.; Li, H.; Zhang, Y.; Gao, L.; Zhao, L.; Deng, M.

    2017-09-01

    Urban structure detection is a basic task in urban geography. Clustering is a core technology to detect the patterns of urban spatial structure, urban functional region, and so on. In big data era, diverse urban sensing datasets recording information like human behaviour and human social activity, suffer from complexity in high dimension and high noise. And unfortunately, the state-of-the-art clustering methods does not handle the problem with high dimension and high noise issues concurrently. In this paper, a probabilistic embedding clustering method is proposed. Firstly, we come up with a Probabilistic Embedding Model (PEM) to find latent features from high dimensional urban sensing data by "learning" via probabilistic model. By latent features, we could catch essential features hidden in high dimensional data known as patterns; with the probabilistic model, we can also reduce uncertainty caused by high noise. Secondly, through tuning the parameters, our model could discover two kinds of urban structure, the homophily and structural equivalence, which means communities with intensive interaction or in the same roles in urban structure. We evaluated the performance of our model by conducting experiments on real-world data and experiments with real data in Shanghai (China) proved that our method could discover two kinds of urban structure, the homophily and structural equivalence, which means clustering community with intensive interaction or under the same roles in urban space.

  4. A PROBABILISTIC EMBEDDING CLUSTERING METHOD FOR URBAN STRUCTURE DETECTION

    Directory of Open Access Journals (Sweden)

    X. Lin

    2017-09-01

    Full Text Available Urban structure detection is a basic task in urban geography. Clustering is a core technology to detect the patterns of urban spatial structure, urban functional region, and so on. In big data era, diverse urban sensing datasets recording information like human behaviour and human social activity, suffer from complexity in high dimension and high noise. And unfortunately, the state-of-the-art clustering methods does not handle the problem with high dimension and high noise issues concurrently. In this paper, a probabilistic embedding clustering method is proposed. Firstly, we come up with a Probabilistic Embedding Model (PEM to find latent features from high dimensional urban sensing data by “learning” via probabilistic model. By latent features, we could catch essential features hidden in high dimensional data known as patterns; with the probabilistic model, we can also reduce uncertainty caused by high noise. Secondly, through tuning the parameters, our model could discover two kinds of urban structure, the homophily and structural equivalence, which means communities with intensive interaction or in the same roles in urban structure. We evaluated the performance of our model by conducting experiments on real-world data and experiments with real data in Shanghai (China proved that our method could discover two kinds of urban structure, the homophily and structural equivalence, which means clustering community with intensive interaction or under the same roles in urban space.

  5. Communication: Time-dependent optimized coupled-cluster method for multielectron dynamics

    Science.gov (United States)

    Sato, Takeshi; Pathak, Himadri; Orimo, Yuki; Ishikawa, Kenichi L.

    2018-02-01

    Time-dependent coupled-cluster method with time-varying orbital functions, called time-dependent optimized coupled-cluster (TD-OCC) method, is formulated for multielectron dynamics in an intense laser field. We have successfully derived the equations of motion for CC amplitudes and orthonormal orbital functions based on the real action functional, and implemented the method including double excitations (TD-OCCD) and double and triple excitations (TD-OCCDT) within the optimized active orbitals. The present method is size extensive and gauge invariant, a polynomial cost-scaling alternative to the time-dependent multiconfiguration self-consistent-field method. The first application of the TD-OCC method of intense-laser driven correlated electron dynamics in Ar atom is reported.

  6. A method to determine the number of nanoparticles in a cluster using conventional optical microscopes

    International Nuclear Information System (INIS)

    Kang, Hyeonggon; Attota, Ravikiran; Tondare, Vipin; Vladár, András E.; Kavuri, Premsagar

    2015-01-01

    We present a method that uses conventional optical microscopes to determine the number of nanoparticles in a cluster, which is typically not possible using traditional image-based optical methods due to the diffraction limit. The method, called through-focus scanning optical microscopy (TSOM), uses a series of optical images taken at varying focus levels to achieve this. The optical images cannot directly resolve the individual nanoparticles, but contain information related to the number of particles. The TSOM method makes use of this information to determine the number of nanoparticles in a cluster. Initial good agreement between the simulations and the measurements is also presented. The TSOM method can be applied to fluorescent and non-fluorescent as well as metallic and non-metallic nano-scale materials, including soft materials, making it attractive for tag-less, high-speed, optical analysis of nanoparticles down to 45 nm diameter

  7. Radionuclide identification using subtractive clustering method

    International Nuclear Information System (INIS)

    Farias, Marcos Santana; Mourelle, Luiza de Macedo

    2011-01-01

    Radionuclide identification is crucial to planning protective measures in emergency situations. This paper presents the application of a method for a classification system of radioactive elements with a fast and efficient response. To achieve this goal is proposed the application of subtractive clustering algorithm. The proposed application can be implemented in reconfigurable hardware, a flexible medium to implement digital hardware circuits. (author)

  8. Assessment of repeatability of composition of perfumed waters by high-performance liquid chromatography combined with numerical data analysis based on cluster analysis (HPLC UV/VIS - CA).

    Science.gov (United States)

    Ruzik, L; Obarski, N; Papierz, A; Mojski, M

    2015-06-01

    High-performance liquid chromatography (HPLC) with UV/VIS spectrophotometric detection combined with the chemometric method of cluster analysis (CA) was used for the assessment of repeatability of composition of nine types of perfumed waters. In addition, the chromatographic method of separating components of the perfume waters under analysis was subjected to an optimization procedure. The chromatograms thus obtained were used as sources of data for the chemometric method of cluster analysis (CA). The result was a classification of a set comprising 39 perfumed water samples with a similar composition at a specified level of probability (level of agglomeration). A comparison of the classification with the manufacturer's declarations reveals a good degree of consistency and demonstrates similarity between samples in different classes. A combination of the chromatographic method with cluster analysis (HPLC UV/VIS - CA) makes it possible to quickly assess the repeatability of composition of perfumed waters at selected levels of probability. © 2014 Society of Cosmetic Scientists and the Société Française de Cosmétologie.

  9. Profitability and efficiency of Italian utilities: cluster analysis of financial statement ratios

    International Nuclear Information System (INIS)

    Linares, E.

    2008-01-01

    The last ten years have witnessed conspicuous changes in European and Italian regulation of public utility services and in the strategies of the major players in these fields. In response to these changes Italian utilities have made a variety of choices regarding size, presence in more or less capital-intensive stages of different value chains, and diversification. These choices have been implemented both through internal growth and by means of mergers and acquisitions. In this context it is interesting to try to establish whether there is a nexus between these choices and the performance of Italian utilities in terms of profitability and efficiency. Therefore statistical multivariate analysis techniques (cluster analysis and factor analysis) have been applied to several ratios obtained from the 2005 financial statement of 34 utilities. First, a hierarchical cluster analysis method has been applied to financial statement data in order to identify homogeneous groups based on several indicators of the incidence of costs (external costs, personnel costs, depreciation and amortization), profitability (return on sales, return on assets, return on equity) and efficiency (in the utilization of personnel, of total assets, of property, plant and equipment). Five clusters have been found. Then the clusters have been characterized in terms of the aforementioned indicators, the presence in different stages of the energy value chains (electricity and gas) and other descriptive variables (such as turnover, number of employees, assets, percentage of property, plant and equipment on total assets, sales revenues from electricity, gas, water supply and sanitation, waste collection and treatment and other services). In a second round cluster analysis has been preceded by factor analysis, in order to find a smaller set of variables. This procedure has revealed three not directly observable factors that can be interpreted as follows: i) efficiency in ordinary and financial management

  10. Network Analysis Tools: from biological networks to clusters and pathways.

    Science.gov (United States)

    Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Vanderstocken, Gilles; van Helden, Jacques

    2008-01-01

    Network Analysis Tools (NeAT) is a suite of computer tools that integrate various algorithms for the analysis of biological networks: comparison between graphs, between clusters, or between graphs and clusters; network randomization; analysis of degree distribution; network-based clustering and path finding. The tools are interconnected to enable a stepwise analysis of the network through a complete analytical workflow. In this protocol, we present a typical case of utilization, where the tasks above are combined to decipher a protein-protein interaction network retrieved from the STRING database. The results returned by NeAT are typically subnetworks, networks enriched with additional information (i.e., clusters or paths) or tables displaying statistics. Typical networks comprising several thousands of nodes and arcs can be analyzed within a few minutes. The complete protocol can be read and executed in approximately 1 h.

  11. Performance analysis of clustering techniques over microarray data: A case study

    Science.gov (United States)

    Dash, Rasmita; Misra, Bijan Bihari

    2018-03-01

    Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.

  12. Local wavelet correlation: applicationto timing analysis of multi-satellite CLUSTER data

    Directory of Open Access Journals (Sweden)

    J. Soucek

    2004-12-01

    Full Text Available Multi-spacecraft space observations, such as those of CLUSTER, can be used to infer information about local plasma structures by exploiting the timing differences between subsequent encounters of these structures by individual satellites. We introduce a novel wavelet-based technique, the Local Wavelet Correlation (LWC, which allows one to match the corresponding signatures of large-scale structures in the data from multiple spacecraft and determine the relative time shifts between the crossings. The LWC is especially suitable for analysis of strongly non-stationary time series, where it enables one to estimate the time lags in a more robust and systematic way than ordinary cross-correlation techniques. The technique, together with its properties and some examples of its application to timing analysis of bow shock and magnetopause crossing observed by CLUSTER, are presented. We also compare the performance and reliability of the technique with classical discontinuity analysis methods. Key words. Radio science (signal processing – Space plasma physics (discontinuities; instruments and techniques

  13. Recent advances in coupled-cluster methods

    CERN Document Server

    Bartlett, Rodney J

    1997-01-01

    Today, coupled-cluster (CC) theory has emerged as the most accurate, widely applicable approach for the correlation problem in molecules. Furthermore, the correct scaling of the energy and wavefunction with size (i.e. extensivity) recommends it for studies of polymers and crystals as well as molecules. CC methods have also paid dividends for nuclei, and for certain strongly correlated systems of interest in field theory.In order for CC methods to have achieved this distinction, it has been necessary to formulate new, theoretical approaches for the treatment of a variety of essential quantities

  14. Cluster analysis of Southeastern U.S. climate stations

    Science.gov (United States)

    Stooksbury, D. E.; Michaels, P. J.

    1991-09-01

    A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.

  15. Multishell method: Exact treatment of a cluster in an effective medium

    International Nuclear Information System (INIS)

    Gonis, A.; Garland, J.W.

    1977-01-01

    A method is presented for the exact determination of the Green's function of a cluster embedded in a given effective medium. This method, the multishell method, is applicable even to systems with off-diagonal disorder, extended-range hopping, multiple bands, and/or hybridization, and is computationally practicable for any system described by a tight-binding or interpolation-scheme Hamiltonian. It allows one to examine the effects of local environment on the densities of states and site spectral weight functions of disordered systems. For any given analytic effective medium characterized by a non-negative density of states the method yields analytic cluster Green's functions and non-negative site spectral weight functions. Previous methods used for the calculation of the Green's function of a cluster embedded in a given effective medium have not been exact. The results of numerical calculations for model systems show that even the best of these previous methods can lead to substantial errors, at least for small clusters in two- and three-dimensional lattices. These results also show that fluctuations in local environment have large effects on site spectral weight functions, even in cases in which the single-site coherent-potential approximation yields an accurate overall density of states

  16. Novel Clustering Method Based on K-Medoids and Mobility Metric

    Directory of Open Access Journals (Sweden)

    Y. Hamzaoui

    2018-06-01

    Full Text Available The structure and constraint of MANETS influence negatively the performance of QoS, moreover the main routing protocols proposed generally operate in flat routing. Hence, this structure gives the bad results of QoS when the network becomes larger and denser. To solve this problem we use one of the most popular methods named clustering. The present paper comes within the frameworks of research to improve the QoS in MANETs. In this paper we propose a new algorithm of clustering based on the new mobility metric and K-Medoid to distribute the nodes into several clusters. Intuitively our algorithm can give good results in terms of stability of the cluster, and can also extend life time of cluster head.

  17. Graph analysis of cell clusters forming vascular networks

    Science.gov (United States)

    Alves, A. P.; Mesquita, O. N.; Gómez-Gardeñes, J.; Agero, U.

    2018-03-01

    This manuscript describes the experimental observation of vasculogenesis in chick embryos by means of network analysis. The formation of the vascular network was observed in the area opaca of embryos from 40 to 55 h of development. In the area opaca endothelial cell clusters self-organize as a primitive and approximately regular network of capillaries. The process was observed by bright-field microscopy in control embryos and in embryos treated with Bevacizumab (Avastin), an antibody that inhibits the signalling of the vascular endothelial growth factor (VEGF). The sequence of images of the vascular growth were thresholded, and used to quantify the forming network in control and Avastin-treated embryos. This characterization is made by measuring vessels density, number of cell clusters and the largest cluster density. From the original images, the topology of the vascular network was extracted and characterized by means of the usual network metrics such as: the degree distribution, average clustering coefficient, average short path length and assortativity, among others. This analysis allows to monitor how the largest connected cluster of the vascular network evolves in time and provides with quantitative evidence of the disruptive effects that Avastin has on the tree structure of vascular networks.

  18. Identification and characterization of earthquake clusters: a comparative analysis for selected sequences in Italy

    Science.gov (United States)

    Peresan, Antonella; Gentili, Stefania

    2017-04-01

    Identification and statistical characterization of seismic clusters may provide useful insights about the features of seismic energy release and their relation to physical properties of the crust within a given region. Moreover, a number of studies based on spatio-temporal analysis of main-shocks occurrence require preliminary declustering of the earthquake catalogs. Since various methods, relying on different physical/statistical assumptions, may lead to diverse classifications of earthquakes into main events and related events, we aim to investigate the classification differences among different declustering techniques. Accordingly, a formal selection and comparative analysis of earthquake clusters is carried out for the most relevant earthquakes in North-Eastern Italy, as reported in the local OGS-CRS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics since 1977. The comparison is then extended to selected earthquake sequences associated with a different seismotectonic setting, namely to events that occurred in the region struck by the recent Central Italy destructive earthquakes, making use of INGV data. Various techniques, ranging from classical space-time windows methods to ad hoc manual identification of aftershocks, are applied for detection of earthquake clusters. In particular, a statistical method based on nearest-neighbor distances of events in space-time-energy domain, is considered. Results from clusters identification by the nearest-neighbor method turn out quite robust with respect to the time span of the input catalogue, as well as to minimum magnitude cutoff. The identified clusters for the largest events reported in North-Eastern Italy since 1977 are well consistent with those reported in earlier studies, which were aimed at detailed manual aftershocks identification. The study shows that the data-driven approach, based on the nearest-neighbor distances, can be satisfactorily applied to decompose the seismic

  19. A Flocking Based algorithm for Document Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.

  20. Electricity Consumption Clustering Using Smart Meter Data

    Directory of Open Access Journals (Sweden)

    Alexander Tureczek

    2018-04-01

    Full Text Available Electricity smart meter consumption data is enabling utilities to analyze consumption information at unprecedented granularity. Much focus has been directed towards consumption clustering for diversifying tariffs; through modern clustering methods, cluster analyses have been performed. However, the clusters developed exhibit a large variation with resulting shadow clusters, making it impossible to truly identify the individual clusters. Using clearly defined dwelling types, this paper will present methods to improve clustering by harvesting inherent structure from the smart meter data. This paper clusters domestic electricity consumption using smart meter data from the Danish city of Esbjerg. Methods from time series analysis and wavelets are applied to enable the K-Means clustering method to account for autocorrelation in data and thereby improve the clustering performance. The results show the importance of data knowledge and we identify sub-clusters of consumption within the dwelling types and enable K-Means to produce satisfactory clustering by accounting for a temporal component. Furthermore our study shows that careful preprocessing of the data to account for intrinsic structure enables better clustering performance by the K-Means method.

  1. An effective trust-based recommendation method using a novel graph clustering algorithm

    Science.gov (United States)

    Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin

    2015-10-01

    Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.

  2. Classification of attempted suicide by cluster analysis: A study of 888 suicide attempters presenting to the emergency department.

    Science.gov (United States)

    Kim, Hyeyoung; Kim, Bora; Kim, Se Hyun; Park, C Hyung Keun; Kim, Eun Young; Ahn, Yong Min

    2018-08-01

    It is essential to understand the latent structure of the population of suicide attempters for effective suicide prevention. The aim of this study was to identify subgroups among Korean suicide attempters in terms of the details of the suicide attempt. A total of 888 people who attempted suicide and were subsequently treated in the emergency rooms of 17 medical centers between May and November of 2013 were included in the analysis. The variables assessed included demographic characteristics, clinical information, and details of the suicide attempt assessed by the Suicide Intent Scale (SIS) and Columbia-Suicide Severity Rating Scale (C-SSRS). Cluster analysis was performed using the Ward method. Of the participants, 85.4% (n = 758) fell into a cluster characterized by less planning, low lethality methods, and ambivalence towards death ("impulsive"). The other cluster (n = 130) involved a more severe and well-planned attempt, used highly lethal methods, and took more precautions to avoid being interrupted ("planned"). The first cluster was dominated by women, while the second cluster was associated more with men, older age, and physical illness. We only included participants who visited the emergency department after their suicide attempt and had no missing values for SIS or C-SSRS. Cluster analysis extracted two distinct subgroups of Korean suicide attempters showing different patterns of suicidal behaviors. Understanding that a significant portion of suicide attempts occur impulsively calls for new prevention strategies tailored to differing subgroup profiles. Copyright © 2018 Elsevier B.V. All rights reserved.

  3. A dynamic lattice searching method with rotation operation for optimization of large clusters

    International Nuclear Information System (INIS)

    Wu Xia; Cai Wensheng; Shao Xueguang

    2009-01-01

    Global optimization of large clusters has been a difficult task, though much effort has been paid and many efficient methods have been proposed. During our works, a rotation operation (RO) is designed to realize the structural transformation from decahedra to icosahedra for the optimization of large clusters, by rotating the atoms below the center atom with a definite degree around the fivefold axis. Based on the RO, a development of the previous dynamic lattice searching with constructed core (DLSc), named as DLSc-RO, is presented. With an investigation of the method for the optimization of Lennard-Jones (LJ) clusters, i.e., LJ 500 , LJ 561 , LJ 600 , LJ 665-667 , LJ 670 , LJ 685 , and LJ 923 , Morse clusters, silver clusters by Gupta potential, and aluminum clusters by NP-B potential, it was found that both the global minima with icosahedral and decahedral motifs can be obtained, and the method is proved to be efficient and universal.

  4. Statistical Significance for Hierarchical Clustering

    Science.gov (United States)

    Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

    2017-01-01

    Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990

  5. Clustering method to process signals from a CdZnTe detector

    International Nuclear Information System (INIS)

    Zhang, Lan; Takahashi, Hiroyuki; Fukuda, Daiji; Nakazawa, Masaharu

    2001-01-01

    The poor mobility of holes in a compound semiconductor detector results in the imperfect collection of the primary charge deposited in the detector. Furthermore the fluctuation of the charge loss efficiency due to the change in the hole collection path length seriously degrades the energy resolution of the detector. Since the charge collection efficiency varies with the signal waveform, we can expect the improvement of the energy resolution through a proper waveform signal processing method. We developed a new digital signal processing technique, a clustering method which derives typical patterns containing the information on the real situation inside a detector from measured signals. The obtained typical patterns for the detector are then used for the pattern matching method. Measured signals are classified through analyzing the practical waveform variation due to the charge trapping, the electric field and the crystal defect etc. Signals with similar shape are placed into the same cluster. For each cluster we calculate an average waveform as a reference pattern. Using these reference patterns obtained from all the clusters, we can classify other measured signal waveforms from the same detector. Then signals are independently processed according to the classified category and form corresponding spectra. Finally these spectra are merged into one spectrum by multiplying normalization coefficients. The effectiveness of this method was verified with a CdZnTe detector of 2 mm thick and a 137 Cs gamma-ray source. The obtained energy resolution as improved to about 8 keV (FWHM). Because the clustering method is only related to the measured waveforms, it can be applied to any type and size of detectors and compatible with any type of filtering methods. (author)

  6. AN EFFICIENT INITIALIZATION METHOD FOR K-MEANS CLUSTERING OF HYPERSPECTRAL DATA

    Directory of Open Access Journals (Sweden)

    A. Alizade Naeini

    2014-10-01

    Full Text Available K-means is definitely the most frequently used partitional clustering algorithm in the remote sensing community. Unfortunately due to its gradient decent nature, this algorithm is highly sensitive to the initial placement of cluster centers. This problem deteriorates for the high-dimensional data such as hyperspectral remotely sensed imagery. To tackle this problem, in this paper, the spectral signatures of the endmembers in the image scene are extracted and used as the initial positions of the cluster centers. For this purpose, in the first step, A Neyman–Pearson detection theory based eigen-thresholding method (i.e., the HFC method has been employed to estimate the number of endmembers in the image. Afterwards, the spectral signatures of the endmembers are obtained using the Minimum Volume Enclosing Simplex (MVES algorithm. Eventually, these spectral signatures are used to initialize the k-means clustering algorithm. The proposed method is implemented on a hyperspectral dataset acquired by ROSIS sensor with 103 spectral bands over the Pavia University campus, Italy. For comparative evaluation, two other commonly used initialization methods (i.e., Bradley & Fayyad (BF and Random methods are implemented and compared. The confusion matrix, overall accuracy and Kappa coefficient are employed to assess the methods’ performance. The evaluations demonstrate that the proposed solution outperforms the other initialization methods and can be applied for unsupervised classification of hyperspectral imagery for landcover mapping.

  7. A semi-supervised method to detect seismic random noise with fuzzy GK clustering

    International Nuclear Information System (INIS)

    Hashemi, Hosein; Javaherian, Abdolrahim; Babuska, Robert

    2008-01-01

    We present a new method to detect random noise in seismic data using fuzzy Gustafson–Kessel (GK) clustering. First, using an adaptive distance norm, a matrix is constructed from the observed seismic amplitudes. The next step is to find centres of ellipsoidal clusters and construct a partition matrix which determines the soft decision boundaries between seismic events and random noise. The GK algorithm updates the cluster centres in order to iteratively minimize the cluster variance. Multiplication of the fuzzy membership function with values of each sample yields new sections; we name them 'clustered sections'. The seismic amplitude values of the clustered sections are given in a way to decrease the level of noise in the original noisy seismic input. In pre-stack data, it is essential to study the clustered sections in a f–k domain; finding the quantitative index for weighting the post-stack data needs a similar approach. Using the knowledge of a human specialist together with the fuzzy unsupervised clustering, the method is a semi-supervised random noise detection. The efficiency of this method is investigated on synthetic and real seismic data for both pre- and post-stack data. The results show a significant improvement of the input noisy sections without harming the important amplitude and phase information of the original data. The procedure for finding the final weights of each clustered section should be carefully done in order to keep almost all the evident seismic amplitudes in the output section. The method interactively uses the knowledge of the seismic specialist in detecting the noise

  8. Conceptual clustering and its relation to numerical taxonomy

    International Nuclear Information System (INIS)

    Fisher, D.; Langley, P.

    1986-01-01

    Artificial Intelligence (AI) methods for machine learning can be viewed as forms of exploratory data analysis, even though they differ markedly from the statistical methods generally connoted by the term. The distinction between methods of machine learning and statistical data analysis is primarily due to differences in the way techniques of each type represent data and structure within data. That is, methods of machine learning are strongly biased toward symbolic (as opposed to numeric) data representations. The authors explore this difference within a limited context, devoting the bulk of our chapter to the explication of conceptual clustering, an extension to the statistically based methods of numerical taxonomy. In conceptual clustering the formation of object cluster is dependent on the quality of 'higher level' characterization, termed concepts, of the clusters. The form of concepts used by existing conceptual clustering systems (sets of necessary and sufficient conditions) is described in some detail. This is followed by descriptions of several conceptual clustering techniques, along with sample output. They conclude with a discussion of how alternative concept representations might enhance the effectiveness of future conceptual clustering systems

  9. Method for Determining Appropriate Clustering Criteria of Location-Sensing Data

    Directory of Open Access Journals (Sweden)

    Youngmin Lee

    2016-08-01

    Full Text Available Large quantities of location-sensing data are generated from location-based social network services. These data are provided as point properties with location coordinates acquired from a global positioning system or Wi-Fi signal. To show the point data on multi-scale map services, the data should be represented by clusters following a grid-based clustering method, in which an appropriate grid size should be determined. Currently, there are no criteria for determining the proper grid size, and the modifiable areal unit problem has been formulated for the purpose of addressing this issue. The method proposed in this paper is applies a hexagonal grid to geotagged Twitter point data, considering the grid size in terms of both quantity and quality to minimize the limitations associated with the modifiable areal unit problem. Quantitatively, we reduced the original Twitter point data by an appropriate amount using Töpfer’s radical law. Qualitatively, we maintained the original distribution characteristics using Moran’s I. Finally, we determined the appropriate sizes of clusters from zoom levels 9–13 by analyzing the distribution of data on the graphs. Based on the visualized clustering results, we confirm that the original distribution pattern is effectively maintained using the proposed method.

  10. Development of small scale cluster computer for numerical analysis

    Science.gov (United States)

    Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.

    2017-09-01

    In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.

  11. A cluster approximation for the transfer-matrix method

    International Nuclear Information System (INIS)

    Surda, A.

    1990-08-01

    A cluster approximation for the transfer-method is formulated. The calculation of the partition function of lattice models is transformed to a nonlinear mapping problem. The method yields the free energy, correlation functions and the phase diagrams for a large class of lattice models. The high accuracy of the method is exemplified by the calculation of the critical temperature of the Ising model. (author). 14 refs, 2 figs, 1 tab

  12. Efficient nonparametric and asymptotic Bayesian model selection methods for attributed graph clustering

    KAUST Repository

    Xu, Zhiqiang

    2017-02-16

    Attributed graph clustering, also known as community detection on attributed graphs, attracts much interests recently due to the ubiquity of attributed graphs in real life. Many existing algorithms have been proposed for this problem, which are either distance based or model based. However, model selection in attributed graph clustering has not been well addressed, that is, most existing algorithms assume the cluster number to be known a priori. In this paper, we propose two efficient approaches for attributed graph clustering with automatic model selection. The first approach is a popular Bayesian nonparametric method, while the second approach is an asymptotic method based on a recently proposed model selection criterion, factorized information criterion. Experimental results on both synthetic and real datasets demonstrate that our approaches for attributed graph clustering with automatic model selection significantly outperform the state-of-the-art algorithm.

  13. Efficient nonparametric and asymptotic Bayesian model selection methods for attributed graph clustering

    KAUST Repository

    Xu, Zhiqiang; Cheng, James; Xiao, Xiaokui; Fujimaki, Ryohei; Muraoka, Yusuke

    2017-01-01

    Attributed graph clustering, also known as community detection on attributed graphs, attracts much interests recently due to the ubiquity of attributed graphs in real life. Many existing algorithms have been proposed for this problem, which are either distance based or model based. However, model selection in attributed graph clustering has not been well addressed, that is, most existing algorithms assume the cluster number to be known a priori. In this paper, we propose two efficient approaches for attributed graph clustering with automatic model selection. The first approach is a popular Bayesian nonparametric method, while the second approach is an asymptotic method based on a recently proposed model selection criterion, factorized information criterion. Experimental results on both synthetic and real datasets demonstrate that our approaches for attributed graph clustering with automatic model selection significantly outperform the state-of-the-art algorithm.

  14. The analysis for energy distribution and biological effects of the clusters from electrons in the tissue equivalent material

    International Nuclear Information System (INIS)

    Zhang Wenzhong; Guo Yong; Luo Yisheng; Wang Yong

    2004-01-01

    Objective: To study energy distribution of the clusters from electrons in the tissue equivalent material, and discuss the important aspects of these clusters on inducing biological effects. Methods: Based on the physical mechanism for electrons interacting with tissue equivalent material, the Monte Carlo (MC) method was used. The electron tracks were lively simulated on an event-by-event (ionization, excitation, elastic scattering, Auger electron emission) basis in the material. The relevant conclusions were drawn from the statistic analysis of these events. Results: The electrons will deposit their energy in the form (30%) of cluster in passing through tissue equivalent material, and most clusters (80%) have the energy amount of more than 50 eV. The cluster density depends on its diameter and energy of electrons, and the deposited energy in the cluster depends on the type and energy of radiation. Conclusion: The deposited energy in cluster is the most important factor in inducing all sort of lesions on DNA molecules in tissue cells

  15. An improved K-means clustering method for cDNA microarray image segmentation.

    Science.gov (United States)

    Wang, T N; Li, T J; Shao, G F; Wu, S X

    2015-07-14

    Microarray technology is a powerful tool for human genetic research and other biomedical applications. Numerous improvements to the standard K-means algorithm have been carried out to complete the image segmentation step. However, most of the previous studies classify the image into two clusters. In this paper, we propose a novel K-means algorithm, which first classifies the image into three clusters, and then one of the three clusters is divided as the background region and the other two clusters, as the foreground region. The proposed method was evaluated on six different data sets. The analyses of accuracy, efficiency, expression values, special gene spots, and noise images demonstrate the effectiveness of our method in improving the segmentation quality.

  16. Clustering based gene expression feature selection method: A computational approach to enrich the classifier efficiency of differentially expressed genes

    KAUST Repository

    Abusamra, Heba

    2016-07-20

    The native nature of high dimension low sample size of gene expression data make the classification task more challenging. Therefore, feature (gene) selection become an apparent need. Selecting a meaningful and relevant genes for classifier not only decrease the computational time and cost, but also improve the classification performance. Among different approaches of feature selection methods, however most of them suffer from several problems such as lack of robustness, validation issues etc. Here, we present a new feature selection technique that takes advantage of clustering both samples and genes. Materials and methods We used leukemia gene expression dataset [1]. The effectiveness of the selected features were evaluated by four different classification methods; support vector machines, k-nearest neighbor, random forest, and linear discriminate analysis. The method evaluate the importance and relevance of each gene cluster by summing the expression level for each gene belongs to this cluster. The gene cluster consider important, if it satisfies conditions depend on thresholds and percentage otherwise eliminated. Results Initial analysis identified 7120 differentially expressed genes of leukemia (Fig. 15a), after applying our feature selection methodology we end up with specific 1117 genes discriminating two classes of leukemia (Fig. 15b). Further applying the same method with more stringent higher positive and lower negative threshold condition, number reduced to 58 genes have be tested to evaluate the effectiveness of the method (Fig. 15c). The results of the four classification methods are summarized in Table 11. Conclusions The feature selection method gave good results with minimum classification error. Our heat-map result shows distinct pattern of refines genes discriminating between two classes of leukemia.

  17. Cluster Analysis of Time-Dependent Crystallographic Data: Direct Identification of Time-Independent Structural Intermediates

    Science.gov (United States)

    Kostov, Konstantin S.; Moffat, Keith

    2011-01-01

    The initial output of a time-resolved macromolecular crystallography experiment is a time-dependent series of difference electron density maps that displays the time-dependent changes in underlying structure as a reaction progresses. The goal is to interpret such data in terms of a small number of crystallographically refinable, time-independent structures, each associated with a reaction intermediate; to establish the pathways and rate coefficients by which these intermediates interconvert; and thereby to elucidate a chemical kinetic mechanism. One strategy toward achieving this goal is to use cluster analysis, a statistical method that groups objects based on their similarity. If the difference electron density at a particular voxel in the time-dependent difference electron density (TDED) maps is sensitive to the presence of one and only one intermediate, then its temporal evolution will exactly parallel the concentration profile of that intermediate with time. The rationale is therefore to cluster voxels with respect to the shapes of their TDEDs, so that each group or cluster of voxels corresponds to one structural intermediate. Clusters of voxels whose TDEDs reflect the presence of two or more specific intermediates can also be identified. From such groupings one can then infer the number of intermediates, obtain their time-independent difference density characteristics, and refine the structure of each intermediate. We review the principles of cluster analysis and clustering algorithms in a crystallographic context, and describe the application of the method to simulated and experimental time-resolved crystallographic data for the photocycle of photoactive yellow protein. PMID:21244840

  18. Measuring Group Synchrony: A Cluster-Phase Method for Analyzing Multivariate Movement Time-Series

    Directory of Open Access Journals (Sweden)

    Michael eRichardson

    2012-10-01

    Full Text Available A new method for assessing group synchrony is introduced as being potentially useful for objectively determining degree of group cohesiveness or entitativity. The cluster-phase method of Frank and Richardson (2010 was used to analyze movement data from the rocking chair movements of six-member groups who rocked their chairs while seated in a circle facing the center. In some trials group members had no information about others’ movements (their eyes were shut or they had their eyes open and gazed at a marker in the center of the group. As predicted, the group level synchrony measure was able to distinguish between situations where synchrony would have been possible and situations where it would be impossible. Moreover, other aspects of the analysis illustrated how the cluster phase measures can be used to determine the type of patterning of group synchrony, and, when integrated with multi-level modeling, can be used to examine individual-level differences in synchrony and dyadic level synchrony as well.

  19. [Typologies of Madrid's citizens (Spain) at the end-of-life: cluster analysis].

    Science.gov (United States)

    Ortiz-Gonçalves, Belén; Perea-Pérez, Bernardo; Labajo González, Elena; Albarrán Juan, Elena; Santiago-Sáez, Andrés

    2018-03-06

    To establish typologies within Madrid's citizens (Spain) with regard to end-of-life by cluster analysis. The SPAD 8 programme was implemented in a sample from a health care centre in the autonomous region of Madrid (Spain). A multiple correspondence analysis technique was used, followed by a cluster analysis to create a dendrogram. A cross-sectional study was made beforehand with the results of the questionnaire. Five clusters stand out. Cluster 1: a group who preferred not to answer numerous questions (5%). Cluster 2: in favour of receiving palliative care and euthanasia (40%). Cluster 3: would oppose assisted suicide and would not ask for spiritual assistance (15%). Cluster 4: would like to receive palliative care and assisted suicide (16%). Cluster 5: would oppose assisted suicide and would ask for spiritual assistance (24%). The following four clusters stood out. Clusters 2 and 4 would like to receive palliative care, euthanasia (2) and assisted suicide (4). Clusters 4 and 5 regularly practiced their faith and their family members did not receive palliative care. Clusters 3 and 5 would be opposed to euthanasia and assisted suicide in particular. Clusters 2, 4 and 5 had not completed an advance directive document (2, 4 and 5). Clusters 2 and 3 seldom practiced their faith. This study could be taken into consideration to improve the quality of end-of-life care choices. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.

  20. Expanding Comparative Literature into Comparative Sciences Clusters with Neutrosophy and Quad-stage Method

    Directory of Open Access Journals (Sweden)

    Fu Yuhua

    2016-08-01

    Full Text Available By using Neutrosophy and Quad-stage Method, the expansions of comparative literature include: comparative social sciences clusters, comparative natural sciences clusters, comparative interdisciplinary sciences clusters, and so on. Among them, comparative social sciences clusters include: comparative literature, comparative history, comparative philosophy, and so on; comparative natural sciences clusters include: comparative mathematics, comparative physics, comparative chemistry, comparative medicine, comparative biology, and so on.

  1. Analytical Energy Gradients for Excited-State Coupled-Cluster Methods

    Science.gov (United States)

    Wladyslawski, Mark; Nooijen, Marcel

    The equation-of-motion coupled-cluster (EOM-CC) and similarity transformed equation-of-motion coupled-cluster (STEOM-CC) methods have been firmly established as accurate and routinely applicable extensions of single-reference coupled-cluster theory to describe electronically excited states. An overview of these methods is provided, with emphasis on the many-body similarity transform concept that is the key to a rationalization of their accuracy. The main topic of the paper is the derivation of analytical energy gradients for such non-variational electronic structure approaches, with an ultimate focus on obtaining their detailed algebraic working equations. A general theoretical framework using Lagrange's method of undetermined multipliers is presented, and the method is applied to formulate the EOM-CC and STEOM-CC gradients in abstract operator terms, following the previous work in [P.G. Szalay, Int. J. Quantum Chem. 55 (1995) 151] and [S.R. Gwaltney, R.J. Bartlett, M. Nooijen, J. Chem. Phys. 111 (1999) 58]. Moreover, the systematics of the Lagrange multiplier approach is suitable for automation by computer, enabling the derivation of the detailed derivative equations through a standardized and direct procedure. To this end, we have developed the SMART (Symbolic Manipulation and Regrouping of Tensors) package of automated symbolic algebra routines, written in the Mathematica programming language. The SMART toolkit provides the means to expand, differentiate, and simplify equations by manipulation of the detailed algebraic tensor expressions directly. The Lagrangian multiplier formulation establishes a uniform strategy to perform the automated derivation in a standardized manner: A Lagrange multiplier functional is constructed from the explicit algebraic equations that define the energy in the electronic method; the energy functional is then made fully variational with respect to all of its parameters, and the symbolic differentiations directly yield the explicit

  2. Data clustering theory, algorithms, and applications

    CERN Document Server

    Gan, Guojun; Wu, Jianhong

    2007-01-01

    Cluster analysis is an unsupervised process that divides a set of objects into homogeneous groups. This book starts with basic information on cluster analysis, including the classification of data and the corresponding similarity measures, followed by the presentation of over 50 clustering algorithms in groups according to some specific baseline methodologies such as hierarchical, center-based, and search-based methods. As a result, readers and users can easily identify an appropriate algorithm for their applications and compare novel ideas with existing results. The book also provides examples of clustering applications to illustrate the advantages and shortcomings of different clustering architectures and algorithms. Application areas include pattern recognition, artificial intelligence, information technology, image processing, biology, psychology, and marketing. Readers also learn how to perform cluster analysis with the C/C++ and MATLAB® programming languages.

  3. Subspace K-means clustering.

    Science.gov (United States)

    Timmerman, Marieke E; Ceulemans, Eva; De Roover, Kim; Van Leeuwen, Karla

    2013-12-01

    To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).

  4. Analysis of precipitation data in Bangladesh through hierarchical clustering and multidimensional scaling

    Science.gov (United States)

    Rahman, Md. Habibur; Matin, M. A.; Salma, Umma

    2017-12-01

    The precipitation patterns of seventeen locations in Bangladesh from 1961 to 2014 were studied using a cluster analysis and metric multidimensional scaling. In doing so, the current research applies four major hierarchical clustering methods to precipitation in conjunction with different dissimilarity measures and metric multidimensional scaling. A variety of clustering algorithms were used to provide multiple clustering dendrograms for a mixture of distance measures. The dendrogram of pre-monsoon rainfall for the seventeen locations formed five clusters. The pre-monsoon precipitation data for the areas of Srimangal and Sylhet were located in two clusters across the combination of five dissimilarity measures and four hierarchical clustering algorithms. The single linkage algorithm with Euclidian and Manhattan distances, the average linkage algorithm with the Minkowski distance, and Ward's linkage algorithm provided similar results with regard to monsoon precipitation. The results of the post-monsoon and winter precipitation data are shown in different types of dendrograms with disparate combinations of sub-clusters. The schematic geometrical representations of the precipitation data using metric multidimensional scaling showed that the post-monsoon rainfall of Cox's Bazar was located far from those of the other locations. The results of a box-and-whisker plot, different clustering techniques, and metric multidimensional scaling indicated that the precipitation behaviour of Srimangal and Sylhet during the pre-monsoon season, Cox's Bazar and Sylhet during the monsoon season, Maijdi Court and Cox's Bazar during the post-monsoon season, and Cox's Bazar and Khulna during the winter differed from those at other locations in Bangladesh.

  5. Study on distinguishing of Chinese ancient porcelains by neutron activation and fuzzy cluster analysis

    International Nuclear Information System (INIS)

    Wang An

    1992-01-01

    By means of the method of neutron activation, the contents of trace elements in some samples of Chinese ancient porcelains from different places of production were determined. The data were analysed by fuzzy cluster analysis. On the basis of the above mentioned works, a method with regard to the distinguishing and determining of Chinese ancient porcelain was suggested

  6. Clustering analysis of line indices for LAMOST spectra with AstroStat

    Science.gov (United States)

    Chen, Shu-Xin; Sun, Wei-Min; Yan, Qi

    2018-06-01

    The application of data mining in astronomical surveys, such as the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) survey, provides an effective approach to automatically analyze a large amount of complex survey data. Unsupervised clustering could help astronomers find the associations and outliers in a big data set. In this paper, we employ the k-means method to perform clustering for the line index of LAMOST spectra with the powerful software AstroStat. Implementing the line index approach for analyzing astronomical spectra is an effective way to extract spectral features for low resolution spectra, which can represent the main spectral characteristics of stars. A total of 144 340 line indices for A type stars is analyzed through calculating their intra and inter distances between pairs of stars. For intra distance, we use the definition of Mahalanobis distance to explore the degree of clustering for each class, while for outlier detection, we define a local outlier factor for each spectrum. AstroStat furnishes a set of visualization tools for illustrating the analysis results. Checking the spectra detected as outliers, we find that most of them are problematic data and only a few correspond to rare astronomical objects. We show two examples of these outliers, a spectrum with abnormal continuumand a spectrum with emission lines. Our work demonstrates that line index clustering is a good method for examining data quality and identifying rare objects.

  7. Comparison of Bayesian clustering and edge detection methods for inferring boundaries in landscape genetics

    Science.gov (United States)

    Safner, T.; Miller, M.P.; McRae, B.H.; Fortin, M.-J.; Manel, S.

    2011-01-01

    Recently, techniques available for identifying clusters of individuals or boundaries between clusters using genetic data from natural populations have expanded rapidly. Consequently, there is a need to evaluate these different techniques. We used spatially-explicit simulation models to compare three spatial Bayesian clustering programs and two edge detection methods. Spatially-structured populations were simulated where a continuous population was subdivided by barriers. We evaluated the ability of each method to correctly identify boundary locations while varying: (i) time after divergence, (ii) strength of isolation by distance, (iii) level of genetic diversity, and (iv) amount of gene flow across barriers. To further evaluate the methods' effectiveness to detect genetic clusters in natural populations, we used previously published data on North American pumas and a European shrub. Our results show that with simulated and empirical data, the Bayesian spatial clustering algorithms outperformed direct edge detection methods. All methods incorrectly detected boundaries in the presence of strong patterns of isolation by distance. Based on this finding, we support the application of Bayesian spatial clustering algorithms for boundary detection in empirical datasets, with necessary tests for the influence of isolation by distance. ?? 2011 by the authors; licensee MDPI, Basel, Switzerland.

  8. Analisis Prioritas Pembangunan Embung Metode Cluster Analysis, AHP dan Weighted Average (Studi Kasus: Embung di Kabupaten Semarang

    Directory of Open Access Journals (Sweden)

    Bima Anjasmoro

    2016-06-01

    Full Text Available The Feasibility study potential of small dams in Semarang District has identified 8 (eight urgent potential small dams. These potential dams here to be constructed within 5 (five years in order to overcome the problem of water shortage in the district. However, the government has limited funding source. It is necessary to select the more urgent small dams to be constructed within the limited budget. The purpose of the research is determining the priority of small dams construction in Semarang District. The method used to determine the priority in this study is cluster analysis, AHP and weighted average method. The criteria used to determine the priority in this study consist of: vegetation in the inundated area, volume of embankment, land acquisition area, useful storage, recervoir life time, water cost/m³, access road to the dam site, land status at abutment and inundated area, construction cost, operation and maintenance cost, irrigation service area and raw water benefit. Based on results of cluster analysis, AHP and weighted average method can be conclude that the priority of small dams construction is 1 Mluweh Small Dam (0.165, 2 Pakis Small Dam (0.142, 3 Lebak Small Dam (0.134, 4 Dadapayam Small Dam (0.128, 5 Gogodalem Small Dam (0.119, 6 Kandangan Small Dam (0.114, 7 Ngrawan Small Dam (0.102 and 8 Jatikurung Small Dam (0.096. Based on analysis of the order of priority of 3 (three method showed that method is more detail than cluster analysis method and weighted average method, because the result of AHP method is closer to the conditions of each dam in the field.

  9. Genome-scale analysis of positional clustering of mouse testis-specific genes

    Directory of Open Access Journals (Sweden)

    Lee Bernett TK

    2005-01-01

    Full Text Available Abstract Background Genes are not randomly distributed on a chromosome as they were thought even after removal of tandem repeats. The positional clustering of co-expressed genes is known in prokaryotes and recently reported in several eukaryotic organisms such as Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens. In order to further investigate the mode of tissue-specific gene clustering in higher eukaryotes, we have performed a genome-scale analysis of positional clustering of the mouse testis-specific genes. Results Our computational analysis shows that a large proportion of testis-specific genes are clustered in groups of 2 to 5 genes in the mouse genome. The number of clusters is much higher than expected by chance even after removal of tandem repeats. Conclusion Our result suggests that testis-specific genes tend to cluster on the mouse chromosomes. This provides another piece of evidence for the hypothesis that clusters of tissue-specific genes do exist.

  10. A comparison of confidence interval methods for the intraclass correlation coefficient in community-based cluster randomization trials with a binary outcome.

    Science.gov (United States)

    Braschel, Melissa C; Svec, Ivana; Darlington, Gerarda A; Donner, Allan

    2016-04-01

    Many investigators rely on previously published point estimates of the intraclass correlation coefficient rather than on their associated confidence intervals to determine the required size of a newly planned cluster randomized trial. Although confidence interval methods for the intraclass correlation coefficient that can be applied to community-based trials have been developed for a continuous outcome variable, fewer methods exist for a binary outcome variable. The aim of this study is to evaluate confidence interval methods for the intraclass correlation coefficient applied to binary outcomes in community intervention trials enrolling a small number of large clusters. Existing methods for confidence interval construction are examined and compared to a new ad hoc approach based on dividing clusters into a large number of smaller sub-clusters and subsequently applying existing methods to the resulting data. Monte Carlo simulation is used to assess the width and coverage of confidence intervals for the intraclass correlation coefficient based on Smith's large sample approximation of the standard error of the one-way analysis of variance estimator, an inverted modified Wald test for the Fleiss-Cuzick estimator, and intervals constructed using a bootstrap-t applied to a variance-stabilizing transformation of the intraclass correlation coefficient estimate. In addition, a new approach is applied in which clusters are randomly divided into a large number of smaller sub-clusters with the same methods applied to these data (with the exception of the bootstrap-t interval, which assumes large cluster sizes). These methods are also applied to a cluster randomized trial on adolescent tobacco use for illustration. When applied to a binary outcome variable in a small number of large clusters, existing confidence interval methods for the intraclass correlation coefficient provide poor coverage. However, confidence intervals constructed using the new approach combined with Smith

  11. Robustness of serial clustering of extratropical cyclones to the choice of tracking method

    Directory of Open Access Journals (Sweden)

    Joaquim G. Pinto

    2016-07-01

    Full Text Available Cyclone clusters are a frequent synoptic feature in the Euro-Atlantic area. Recent studies have shown that serial clustering of cyclones generally occurs on both flanks and downstream regions of the North Atlantic storm track, while cyclones tend to occur more regulary on the western side of the North Atlantic basin near Newfoundland. This study explores the sensitivity of serial clustering to the choice of cyclone tracking method using cyclone track data from 15 methods derived from ERA-Interim data (1979–2010. Clustering is estimated by the dispersion (ratio of variance to mean of winter [December – February (DJF] cyclone passages near each grid point over the Euro-Atlantic area. The mean number of cyclone counts and their variance are compared between methods, revealing considerable differences, particularly for the latter. Results show that all different tracking methods qualitatively capture similar large-scale spatial patterns of underdispersion and overdispersion over the study region. The quantitative differences can primarily be attributed to the differences in the variance of cyclone counts between the methods. Nevertheless, overdispersion is statistically significant for almost all methods over parts of the eastern North Atlantic and Western Europe, and is therefore considered as a robust feature. The influence of the North Atlantic Oscillation (NAO on cyclone clustering displays a similar pattern for all tracking methods, with one maximum near Iceland and another between the Azores and Iberia. The differences in variance between methods are not related with different sensitivities to the NAO, which can account to over 50% of the clustering in some regions. We conclude that the general features of underdispersion and overdispersion of extratropical cyclones over the North Atlantic and Western Europe are robust to the choice of tracking method. The same is true for the influence of the NAO on cyclone dispersion.

  12. Comparison Analysis and Evaluation of Urban Competitiveness in Chinese Urban Clusters

    Directory of Open Access Journals (Sweden)

    Haixiang Guo

    2015-04-01

    Full Text Available With accelerating urbanization, urban competitiveness has become a worldwide academic focus. Previous studies always focused on economic factors but ignored social elements when measuring urban competitiveness. In this paper, a city was considered as a whole containing different units such as departments, individuals and economic activities, which interact with each other and affect its economic operation. Moreover, a city’s development was compared to an object’s movement, and the components were compared to different forces acting upon the object. With the analysis of the principle of object movement, this study has established a more scientific evaluation index system that involves 4 subsystems, 12 elements and 58 indexes. By using the TOPSIS method, the study has worked out the urban competitiveness of 141 cities from 28 Chinese urban clusters in 2009. According to the calculation results, these cities were divided into four levels: A, B, C, D. Furthermore, in order to analyze the competitiveness of cities and urban clusters, cities and urban clusters have been divided into four groups according to their distributive characteristics: the southeast, the northeast and Bohai Rim, the central region and the west. Suggestions and recommendations for each group are provided based on careful analysis.

  13. An approach based on genetic algorithms and DFT for studying clusters: (H2O) n (2 ≤ n ≤ 13) cluster analysis

    International Nuclear Information System (INIS)

    Sabato de Abreu e Silva, Elcio; Anderson Duarte, Helio; Belchior, Jadson Claudio

    2006-01-01

    The present work proposes the application of a genetic algorithm (GA) for determining global minima to be used as seeds for a higher level ab initio method analysis such as density function theory (DFT). Water clusters ((H 2 O) n (2 ≤ n ≤ 13)) are used as a test case and for the initial guesses four empirical potentials (TIP3P, TIP4P, TIP5P and ST2) were considered for the GA calculations. Two types of analysis were performed namely rigid (DFT R M) and non rigid (DFT N RM) molecules for the corresponding structures and energies. For the DFT analysis, the PBE exchange correlation functional and the large basis set A-PVTZ have been used. All structures and their respective energies calculated through the GA method, DFT R M and DFT N RM are compared and discussed. The proposed methodology showed to be very efficient in order to have quasi accurate global minima on the level of ab initio calculations and the data are discussed in the light of previously published results with particular attention to ((H 2 O) n (2 ≤ n ≤ 13)) clusters. The results suggest that the stabilization energy error for the empirical potentials used are additive with respect to the cluster size, roughly 0.5 kcal mol -1 per water molecule after ZPE correction. Finally, the approach of using GA/empirical potential structures as starting point for ab initio optimization methods showed to be a computationally manageable strategy to explore the potential energy surface of large systems at quantum level. In conclusion, this work proposes an alternative approach to accurately study properties of larger systems in a very efficient manner

  14. An approach based on genetic algorithms and DFT for studying clusters: (H{sub 2}O) {sub n} (2 {<=} n {<=} 13) cluster analysis

    Energy Technology Data Exchange (ETDEWEB)

    Sabato de Abreu e Silva, Elcio [Departamento de Quimica - ICEx, Universidade Federal de Minas Gerais, Av. Antonio Carlos 6627, Pampulha (31.270-901) Belo Horizonte, Minas Gerias (Brazil); Anderson Duarte, Helio [Departamento de Quimica - ICEx, Universidade Federal de Minas Gerais, Av. Antonio Carlos 6627, Pampulha (31.270-901) Belo Horizonte, Minas Gerias (Brazil); Belchior, Jadson Claudio [Departamento de Quimica - ICEx, Universidade Federal de Minas Gerais, Av. Antonio Carlos 6627, Pampulha (31.270-901) Belo Horizonte, Minas Gerias (Brazil)], E-mail: jadson@ufmg.br

    2006-04-21

    The present work proposes the application of a genetic algorithm (GA) for determining global minima to be used as seeds for a higher level ab initio method analysis such as density function theory (DFT). Water clusters ((H{sub 2}O) {sub n} (2 {<=} n {<=} 13)) are used as a test case and for the initial guesses four empirical potentials (TIP3P, TIP4P, TIP5P and ST2) were considered for the GA calculations. Two types of analysis were performed namely rigid (DFT{sub R}M) and non rigid (DFT{sub N}RM) molecules for the corresponding structures and energies. For the DFT analysis, the PBE exchange correlation functional and the large basis set A-PVTZ have been used. All structures and their respective energies calculated through the GA method, DFT{sub R}M and DFT{sub N}RM are compared and discussed. The proposed methodology showed to be very efficient in order to have quasi accurate global minima on the level of ab initio calculations and the data are discussed in the light of previously published results with particular attention to ((H{sub 2}O) {sub n} (2 {<=} n {<=} 13)) clusters. The results suggest that the stabilization energy error for the empirical potentials used are additive with respect to the cluster size, roughly 0.5 kcal mol{sup -1} per water molecule after ZPE correction. Finally, the approach of using GA/empirical potential structures as starting point for ab initio optimization methods showed to be a computationally manageable strategy to explore the potential energy surface of large systems at quantum level. In conclusion, this work proposes an alternative approach to accurately study properties of larger systems in a very efficient manner.

  15. The Productivity Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, E.

    2014-07-01

    Chennai, also called the Detroit of India, is India's second fastest growing auto market and exports auto components and vehicles to US, Germany, Japan and Brazil. For inclusive growth and sustainable development, 250 auto component industries in Ambattur, Thirumalisai and Thirumudivakkam Industrial Estates located in Chennai have adopted the Cluster Development Approach called Automotive Component Cluster. The objective is to study the Value Chain, Correlation and Data Envelopment Analysis by determining technical efficiency, peer weights, input and output slacks of 100 auto component industries in three estates. The methodology adopted is using Data Envelopment Analysis of Output Oriented Banker Charnes Cooper model by taking net worth, fixed assets, employment as inputs and gross output as outputs. The non-zero represents the weights for efficient clusters. The higher slack obtained reveals the excess net worth, fixed assets, employment and shortage in gross output. To conclude, the variables are highly correlated and the inefficient industries should increase their gross output or decrease the fixed assets or employment. Moreover for sustainable development, the cluster should strengthen infrastructure, technology, procurement, production and marketing interrelationships to decrease costs and to increase productivity and efficiency to compete in the indigenous and export market.

  16. MMPI profiles of males accused of severe crimes: a cluster analysis

    NARCIS (Netherlands)

    Spaans, M.; Barendregt, M.; Muller, E.; Beurs, E. de; Nijman, H.L.I.; Rinne, T.

    2009-01-01

    In studies attempting to classify criminal offenders by cluster analysis of Minnesota Multiphasic Personality Inventory-2 (MMPI-2) data, the number of clusters found varied between 10 (the Megargee System) and two (one cluster indicating no psychopathology and one exhibiting serious

  17. Multiple-Features-Based Semisupervised Clustering DDoS Detection Method

    Directory of Open Access Journals (Sweden)

    Yonghao Gu

    2017-01-01

    Full Text Available DDoS attack stream from different agent host converged at victim host will become very large, which will lead to system halt or network congestion. Therefore, it is necessary to propose an effective method to detect the DDoS attack behavior from the massive data stream. In order to solve the problem that large numbers of labeled data are not provided in supervised learning method, and the relatively low detection accuracy and convergence speed of unsupervised k-means algorithm, this paper presents a semisupervised clustering detection method using multiple features. In this detection method, we firstly select three features according to the characteristics of DDoS attacks to form detection feature vector. Then, Multiple-Features-Based Constrained-K-Means (MF-CKM algorithm is proposed based on semisupervised clustering. Finally, using MIT Laboratory Scenario (DDoS 1.0 data set, we verify that the proposed method can improve the convergence speed and accuracy of the algorithm under the condition of using a small amount of labeled data sets.

  18. Clustering of attitudes towards obesity: a mixed methods study of Australian parents and children.

    Science.gov (United States)

    Olds, Tim; Thomas, Samantha; Lewis, Sophie; Petkov, John

    2013-10-12

    Current population-based anti-obesity campaigns often target individuals based on either weight or socio-demographic characteristics, and give a 'mass' message about personal responsibility. There is a recognition that attempts to influence attitudes and opinions may be more effective if they resonate with the beliefs that different groups have about the causes of, and solutions for, obesity. Limited research has explored how attitudinal factors may inform the development of both upstream and downstream social marketing initiatives. Computer-assisted face-to-face interviews were conducted with 159 parents and 184 of their children (aged 9-18 years old) in two Australian states. A mixed methods approach was used to assess attitudes towards obesity, and elucidate why different groups held various attitudes towards obesity. Participants were quantitatively assessed on eight dimensions relating to the severity and extent, causes and responsibility, possible remedies, and messaging strategies. Cluster analysis was used to determine attitudinal clusters. Participants were also able to qualify each answer. Qualitative responses were analysed both within and across attitudinal clusters using a constant comparative method. Three clusters were identified. Concerned Internalisers (27% of the sample) judged that obesity was a serious health problem, that Australia had among the highest levels of obesity in the world and that prevalence was rapidly increasing. They situated the causes and remedies for the obesity crisis in individual choices. Concerned Externalisers (38% of the sample) held similar views about the severity and extent of the obesity crisis. However, they saw responsibility and remedies as a societal rather than an individual issue. The final cluster, the Moderates, which contained significantly more children and males, believed that obesity was not such an important public health issue, and judged the extent of obesity to be less extreme than the other clusters

  19. A New Soft Computing Method for K-Harmonic Means Clustering.

    Science.gov (United States)

    Yeh, Wei-Chang; Jiang, Yunzhi; Chen, Yee-Fen; Chen, Zhe

    2016-01-01

    The K-harmonic means clustering algorithm (KHM) is a new clustering method used to group data such that the sum of the harmonic averages of the distances between each entity and all cluster centroids is minimized. Because it is less sensitive to initialization than K-means (KM), many researchers have recently been attracted to studying KHM. In this study, the proposed iSSO-KHM is based on an improved simplified swarm optimization (iSSO) and integrates a variable neighborhood search (VNS) for KHM clustering. As evidence of the utility of the proposed iSSO-KHM, we present extensive computational results on eight benchmark problems. From the computational results, the comparison appears to support the superiority of the proposed iSSO-KHM over previously developed algorithms for all experiments in the literature.

  20. Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra

    Science.gov (United States)

    Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

    2018-04-01

    Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.

  1. Locating irregularly shaped clusters of infection intensity

    DEFF Research Database (Denmark)

    Yiannakoulias, Niko; Wilson, Shona; Kariuki, H. Curtis

    2010-01-01

    of infection intensity identifies two small areas within the study region in which infection intensity is elevated, possibly due to local features of the physical or social environment. Collectively, our results show that the "greedy growth scan" is a suitable method for exploratory geographical analysis...... for cluster detection. Real data are based on samples of hookworm and S. mansoni from Kitengei, Makueni district, Kenya. Our analysis of simulated data shows how methods able to find irregular shapes are more likely to identify clusters along rivers than methods constrained to fixed geometries. Our analysis...

  2. A SURVEY ON DOCUMENT CLUSTERING APPROACH FOR COMPUTER FORENSIC ANALYSIS

    OpenAIRE

    Monika Raghuvanshi*, Rahul Patel

    2016-01-01

    In a forensic analysis, large numbers of files are examined. Much of the information comprises of in unstructured format, so it’s quite difficult task for computer forensic to perform such analysis. That’s why to do the forensic analysis of document within a limited period of time require a special approach such as document clustering. This paper review different document clustering algorithms methodologies for example K-mean, K-medoid, single link, complete link, average link in accorandance...

  3. Cluster Analysis in Rapeseed (Brassica Napus L.)

    International Nuclear Information System (INIS)

    Mahasi, J.M

    2002-01-01

    With widening edible deficit, Kenya has become increasingly dependent on imported edible oils. Many oilseed crops (e.g. sunflower, soya beans, rapeseed/mustard, sesame, groundnuts etc) can be grown in Kenya. But oilseed rape is preferred because it very high yielding (1.5 tons-4.0 tons/ha) with oil content of 42-46%. Other uses include fitting in various cropping systems as; relay/inter crops, rotational crops, trap crops and fodder. It is soft seeded hence oil extraction is relatively easy. The meal is high in protein and very useful in livestock supplementation. Rapeseed can be straight combined using adjusted wheat combines. The priority is to expand domestic oilseed production, hence the need to introduce improved rapeseed germplasm from other countries. The success of any crop improvement programme depends on the extent of genetic diversity in the material. Hence, it is essential to understand the adaptation of introduced genotypes and the similarities if any among them. Evaluation trials were carried out on 17 rapeseed genotypes (nine Canadian origin and eight of European origin) grown at 4 locations namely Endebess, Njoro, Timau and Mau Narok in three years (1992, 1993 and 1994). Results for 1993 were discarded due to severe drought. An analysis of variance was carried out only on seed yields and the treatments were found to be significantly different. Cluster analysis was then carried out on mean seed yields and based on this analysis; only one major group exists within the material. In 1992, varieties 2,3,8 and 9 didn't fall in the same cluster as the rest. Variety 8 was the only one not classified with the rest of the Canadian varieties. Three European varieties (2,3 and 9) were however not classified with the others. In 1994, varieties 10 and 6 didn't fall in the major cluster. Of these two, variety 10 is of Canadian origin. Varieties were more similar in 1994 than 1992 due to favorable weather. It is evident that, genotypes from different geographical

  4. Smoothed analysis of the k-means method

    NARCIS (Netherlands)

    Arthur, David; Manthey, Bodo; Röglin, Heiko

    2011-01-01

    The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means

  5. Prediction of Solvent Physical Properties using the Hierarchical Clustering Method

    Science.gov (United States)

    Recently a QSAR (Quantitative Structure Activity Relationship) method, the hierarchical clustering method, was developed to estimate acute toxicity values for large, diverse datasets. This methodology has now been applied to the estimate solvent physical properties including sur...

  6. A quasiparticle-based multi-reference coupled-cluster method.

    Science.gov (United States)

    Rolik, Zoltán; Kállay, Mihály

    2014-10-07

    The purpose of this paper is to introduce a quasiparticle-based multi-reference coupled-cluster (MRCC) approach. The quasiparticles are introduced via a unitary transformation which allows us to represent a complete active space reference function and other elements of an orthonormal multi-reference (MR) basis in a determinant-like form. The quasiparticle creation and annihilation operators satisfy the fermion anti-commutation relations. On the basis of these quasiparticles, a generalization of the normal-ordered operator products for the MR case can be introduced as an alternative to the approach of Mukherjee and Kutzelnigg [Recent Prog. Many-Body Theor. 4, 127 (1995); Mukherjee and Kutzelnigg, J. Chem. Phys. 107, 432 (1997)]. Based on the new normal ordering any quasiparticle-based theory can be formulated using the well-known diagram techniques. Beyond the general quasiparticle framework we also present a possible realization of the unitary transformation. The suggested transformation has an exponential form where the parameters, holding exclusively active indices, are defined in a form similar to the wave operator of the unitary coupled-cluster approach. The definition of our quasiparticle-based MRCC approach strictly follows the form of the single-reference coupled-cluster method and retains several of its beneficial properties. Test results for small systems are presented using a pilot implementation of the new approach and compared to those obtained by other MR methods.

  7. An evaluation of centrality measures used in cluster analysis

    Science.gov (United States)

    Engström, Christopher; Silvestrov, Sergei

    2014-12-01

    Clustering of data into groups of similar objects plays an important part when analysing many types of data, especially when the datasets are large as they often are in for example bioinformatics, social networks and computational linguistics. Many clustering algorithms such as K-means and some types of hierarchical clustering need a number of centroids representing the 'center' of the clusters. The choice of centroids for the initial clusters often plays an important role in the quality of the clusters. Since a data point with a high centrality supposedly lies close to the 'center' of some cluster, this can be used to assign centroids rather than through some other method such as picking them at random. Some work have been done to evaluate the use of centrality measures such as degree, betweenness and eigenvector centrality in clustering algorithms. The aim of this article is to compare and evaluate the usefulness of a number of common centrality measures such as the above mentioned and others such as PageRank and related measures.

  8. The Quantitative Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, Ethirajan

    2016-07-01

    Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity

  9. A method of clustering observers with different visual characteristics

    Energy Technology Data Exchange (ETDEWEB)

    Niimi, Takanaga [Nagoya University School of Health Sciences, Department of Radiological Technology, 1-1-20 Daiko-minami, Higashi-ku, Nagoya 461-8673 (Japan); Imai, Kuniharu [Nagoya University School of Health Sciences, Department of Radiological Technology, 1-1-20 Daiko-minami, Higashi-ku, Nagoya 461-8673 (Japan); Ikeda, Mitsuru [Nagoya University School of Health Sciences, Department of Radiological Technology, 1-1-20 Daiko-minami, Higashi-ku, Nagoya 461-8673 (Japan); Maeda, Hisatoshi [Nagoya University School of Health Sciences, Department of Radiological Technology, 1-1-20 Daiko-minami, Higashi-ku, Nagoya 461-8673 (Japan)

    2006-01-15

    Evaluation of observer's image perception in medical images is important, and yet has not been performed because it is difficult to quantify visual characteristics. In the present study, we investigated the observer's image perception by clustering a group of 20 observers. Images of a contrast-detail (C-D) phantom, which had cylinders of 10 rows and 10 columns with different diameters and lengths, were acquired with an X-ray screen-film system with fixed exposure conditions. A group of 10 films were prepared for visual evaluations. Sixteen radiological technicians, three radiologists and one medical physicist participated in the observation test. All observers read the phantom radiographs on a transillumination image viewer with room lights off. The detectability was defined as the shortest length of the cylinders of which border the observers could recognize from the background, and was recorded using the number of columns. The detectability was calculated as the average of 10 readings for each observer, and plotted for different phantom diameter. The unweighted pair-group method using arithmetic averages (UPGMA) was adopted for clustering. The observers were clustered into two groups: one group selected objects with a demarcation from the vicinity, and the other group searched for the objects with their eyes constrained. This study showed a usefulness of the cluster method to select personnel with the similar perceptual predisposition when a C-D phantom was used in image quality control.

  10. A method of clustering observers with different visual characteristics

    International Nuclear Information System (INIS)

    Niimi, Takanaga; Imai, Kuniharu; Ikeda, Mitsuru; Maeda, Hisatoshi

    2006-01-01

    Evaluation of observer's image perception in medical images is important, and yet has not been performed because it is difficult to quantify visual characteristics. In the present study, we investigated the observer's image perception by clustering a group of 20 observers. Images of a contrast-detail (C-D) phantom, which had cylinders of 10 rows and 10 columns with different diameters and lengths, were acquired with an X-ray screen-film system with fixed exposure conditions. A group of 10 films were prepared for visual evaluations. Sixteen radiological technicians, three radiologists and one medical physicist participated in the observation test. All observers read the phantom radiographs on a transillumination image viewer with room lights off. The detectability was defined as the shortest length of the cylinders of which border the observers could recognize from the background, and was recorded using the number of columns. The detectability was calculated as the average of 10 readings for each observer, and plotted for different phantom diameter. The unweighted pair-group method using arithmetic averages (UPGMA) was adopted for clustering. The observers were clustered into two groups: one group selected objects with a demarcation from the vicinity, and the other group searched for the objects with their eyes constrained. This study showed a usefulness of the cluster method to select personnel with the similar perceptual predisposition when a C-D phantom was used in image quality control

  11. Locating irregularly shaped clusters of infection intensity

    Directory of Open Access Journals (Sweden)

    Niko Yiannakoulias

    2010-05-01

    Full Text Available Patterns of disease may take on irregular geographic shapes, especially when features of the physical environment influence risk. Identifying these patterns can be important for planning, and also identifying new environmental or social factors associated with high or low risk of illness. Until recently, cluster detection methods were limited in their ability to detect irregular spatial patterns, and limited to finding clusters that were roughly circular in shape. This approach has less power to detect irregularly-shaped, yet important spatial anomalies, particularly at high spatial resolutions. We employ a new method of finding irregularly-shaped spatial clusters at micro-geographical scales using both simulated and real data on Schistosoma mansoni and hookworm infection intensities. This method, which we refer to as the “greedy growth scan”, is a modification of the spatial scan method for cluster detection. Real data are based on samples of hookworm and S. mansoni from Kitengei, Makueni district, Kenya. Our analysis of simulated data shows how methods able to find irregular shapes are more likely to identify clusters along rivers than methods constrained to fixed geometries. Our analysis of infection intensity identifies two small areas within the study region in which infection intensity is elevated, possibly due to local features of the physical or social environment. Collectively, our results show that the “greedy growth scan” is a suitable method for exploratory geographical analysis of infection intensity data when irregular shapes are suspected, especially at micro-geographical scales.

  12. Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability.

    Science.gov (United States)

    Miller, Christopher B; Bartlett, Delwyn J; Mullins, Anna E; Dodds, Kirsty L; Gordon, Christopher J; Kyle, Simon D; Kim, Jong Won; D'Rozario, Angela L; Lee, Rico S C; Comas, Maria; Marshall, Nathaniel S; Yee, Brendon J; Espie, Colin A; Grunstein, Ronald R

    2016-11-01

    To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative ( q )-EEG and heart rate variability (HRV). Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q -EEG. Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. © 2016 Associated Professional Sleep Societies, LLC.

  13. Assessment of genetic divergence in tomato through agglomerative hierarchical clustering and principal component analysis

    International Nuclear Information System (INIS)

    Iqbal, Q.; Saleem, M.Y.; Hameed, A.; Asghar, M.

    2014-01-01

    For the improvement of qualitative and quantitative traits, existence of variability has prime importance in plant breeding. Data on different morphological and reproductive traits of 47 tomato genotypes were analyzed for correlation,agglomerative hierarchical clustering and principal component analysis (PCA) to select genotypes and traits for future breeding program. Correlation analysis revealed significant positive association between yield and yield components like fruit diameter, single fruit weight and number of fruits plant-1. Principal component (PC) analysis depicted first three PCs with Eigen-value higher than 1 contributing 81.72% of total variability for different traits. The PC-I showed positive factor loadings for all the traits except number of fruits plant-1. The contribution of single fruit weight and fruit diameter was highest in PC-1. Cluster analysis grouped all genotypes into five divergent clusters. The genotypes in cluster-II and cluster-V exhibited uniform maturity and higher yield. The D2 statistics confirmed highest distance between cluster- III and cluster-V while maximum similarity was observed in cluster-II and cluster-III. It is therefore suggested that crosses between genotypes of cluster-II and cluster-V with those of cluster-I and cluster-III may exhibit heterosis in F1 for hybrid breeding and for selection of superior genotypes in succeeding generations for cross breeding programme. (author)

  14. Point Cluster Analysis Using a 3D Voronoi Diagram with Applications in Point Cloud Segmentation

    Directory of Open Access Journals (Sweden)

    Shen Ying

    2015-08-01

    Full Text Available Three-dimensional (3D point analysis and visualization is one of the most effective methods of point cluster detection and segmentation in geospatial datasets. However, serious scattering and clotting characteristics interfere with the visual detection of 3D point clusters. To overcome this problem, this study proposes the use of 3D Voronoi diagrams to analyze and visualize 3D points instead of the original data item. The proposed algorithm computes the cluster of 3D points by applying a set of 3D Voronoi cells to describe and quantify 3D points. The decompositions of point cloud of 3D models are guided by the 3D Voronoi cell parameters. The parameter values are mapped from the Voronoi cells to 3D points to show the spatial pattern and relationships; thus, a 3D point cluster pattern can be highlighted and easily recognized. To capture different cluster patterns, continuous progressive clusters and segmentations are tested. The 3D spatial relationship is shown to facilitate cluster detection. Furthermore, the generated segmentations of real 3D data cases are exploited to demonstrate the feasibility of our approach in detecting different spatial clusters for continuous point cloud segmentation.

  15. A method for improved clustering and classification of microscopy images using quantitative co-localization coefficients

    LENUS (Irish Health Repository)

    Singan, Vasanth R

    2012-06-08

    AbstractBackgroundThe localization of proteins to specific subcellular structures in eukaryotic cells provides important information with respect to their function. Fluorescence microscopy approaches to determine localization distribution have proved to be an essential tool in the characterization of unknown proteins, and are now particularly pertinent as a result of the wide availability of fluorescently-tagged constructs and antibodies. However, there are currently very few image analysis options able to effectively discriminate proteins with apparently similar distributions in cells, despite this information being important for protein characterization.FindingsWe have developed a novel method for combining two existing image analysis approaches, which results in highly efficient and accurate discrimination of proteins with seemingly similar distributions. We have combined image texture-based analysis with quantitative co-localization coefficients, a method that has traditionally only been used to study the spatial overlap between two populations of molecules. Here we describe and present a novel application for quantitative co-localization, as applied to the study of Rab family small GTP binding proteins localizing to the endomembrane system of cultured cells.ConclusionsWe show how quantitative co-localization can be used alongside texture feature analysis, resulting in improved clustering of microscopy images. The use of co-localization as an additional clustering parameter is non-biased and highly applicable to high-throughput image data sets.

  16. A Distributed Flocking Approach for Information Stream Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Intelligence analysts are currently overwhelmed with the amount of information streams generated everyday. There is a lack of comprehensive tool that can real-time analyze the information streams. Document clustering analysis plays an important role in improving the accuracy of information retrieval. However, most clustering technologies can only be applied for analyzing the static document collection because they normally require a large amount of computation resource and long time to get accurate result. It is very difficult to cluster a dynamic changed text information streams on an individual computer. Our early research has resulted in a dynamic reactive flock clustering algorithm which can continually refine the clustering result and quickly react to the change of document contents. This character makes the algorithm suitable for cluster analyzing dynamic changed document information, such as text information stream. Because of the decentralized character of this algorithm, a distributed approach is a very natural way to increase the clustering speed of the algorithm. In this paper, we present a distributed multi-agent flocking approach for the text information stream clustering and discuss the decentralized architectures and communication schemes for load balance and status information synchronization in this approach.

  17. Android Malware Clustering through Malicious Payload Mining

    OpenAIRE

    Li, Yuping; Jang, Jiyong; Hu, Xin; Ou, Xinming

    2017-01-01

    Clustering has been well studied for desktop malware analysis as an effective triage method. Conventional similarity-based clustering techniques, however, cannot be immediately applied to Android malware analysis due to the excessive use of third-party libraries in Android application development and the widespread use of repackaging in malware development. We design and implement an Android malware clustering system through iterative mining of malicious payload and checking whether malware s...

  18. Cluster analysis of obesity and asthma phenotypes.

    Directory of Open Access Journals (Sweden)

    E Rand Sutherland

    Full Text Available Asthma is a heterogeneous disease with variability among patients in characteristics such as lung function, symptoms and control, body weight, markers of inflammation, and responsiveness to glucocorticoids (GC. Cluster analysis of well-characterized cohorts can advance understanding of disease subgroups in asthma and point to unsuspected disease mechanisms. We utilized an hypothesis-free cluster analytical approach to define the contribution of obesity and related variables to asthma phenotype.In a cohort of clinical trial participants (n = 250, minimum-variance hierarchical clustering was used to identify clinical and inflammatory biomarkers important in determining disease cluster membership in mild and moderate persistent asthmatics. In a subset of participants, GC sensitivity was assessed via expression of GC receptor alpha (GCRα and induction of MAP kinase phosphatase-1 (MKP-1 expression by dexamethasone. Four asthma clusters were identified, with body mass index (BMI, kg/m(2 and severity of asthma symptoms (AEQ score the most significant determinants of cluster membership (F = 57.1, p<0.0001 and F = 44.8, p<0.0001, respectively. Two clusters were composed of predominantly obese individuals; these two obese asthma clusters differed from one another with regard to age of asthma onset, measures of asthma symptoms (AEQ and control (ACQ, exhaled nitric oxide concentration (F(ENO and airway hyperresponsiveness (methacholine PC(20 but were similar with regard to measures of lung function (FEV(1 (% and FEV(1/FVC, airway eosinophilia, IgE, leptin, adiponectin and C-reactive protein (hsCRP. Members of obese clusters demonstrated evidence of reduced expression of GCRα, a finding which was correlated with a reduced induction of MKP-1 expression by dexamethasoneObesity is an important determinant of asthma phenotype in adults. There is heterogeneity in expression of clinical and inflammatory biomarkers of asthma across obese individuals

  19. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    Directory of Open Access Journals (Sweden)

    Landfors Mattias

    2010-10-01

    Full Text Available Abstract Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered, missing value imputation (2, standardization of data (2, gene selection (19 or clustering method (11. The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that

  20. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    Science.gov (United States)

    2010-01-01

    Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is

  1. Application of a Light-Front Coupled Cluster Method

    International Nuclear Information System (INIS)

    Chabysheva, S.S.; Hiller, J.R.

    2012-01-01

    As a test of the new light-front coupled-cluster method in a gauge theory, we apply it to the nonperturbative construction of the dressed-electron state in QED, for an arbitrary covariant gauge, and compute the electron's anomalous magnetic moment. The construction illustrates the spectator and Fock-sector independence of vertex and self-energy contributions and indicates resolution of the difficulties with uncanceled divergences that plague methods based on Fock-space truncation. (author)

  2. Examining the effectiveness of discriminant function analysis and cluster analysis in species identification of male field crickets based on their calling songs.

    Directory of Open Access Journals (Sweden)

    Ranjana Jaiswara

    Full Text Available Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6-7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species

  3. Ethical and policy issues in cluster randomized trials: rationale and design of a mixed methods research study

    Directory of Open Access Journals (Sweden)

    Chaudhry Shazia H

    2009-07-01

    Full Text Available Abstract Background Cluster randomized trials are an increasingly important methodological tool in health research. In cluster randomized trials, intact social units or groups of individuals, such as medical practices, schools, or entire communities – rather than individual themselves – are randomly allocated to intervention or control conditions, while outcomes are then observed on individual cluster members. The substantial methodological differences between cluster randomized trials and conventional randomized trials pose serious challenges to the current conceptual framework for research ethics. The ethical implications of randomizing groups rather than individuals are not addressed in current research ethics guidelines, nor have they even been thoroughly explored. The main objectives of this research are to: (1 identify ethical issues arising in cluster trials and learn how they are currently being addressed; (2 understand how ethics reviews of cluster trials are carried out in different countries (Canada, the USA and the UK; (3 elicit the views and experiences of trial participants and cluster representatives; (4 develop well-grounded guidelines for the ethical conduct and review of cluster trials by conducting an extensive ethical analysis and organizing a consensus process; (5 disseminate the guidelines to researchers, research ethics boards (REBs, journal editors, and research funders. Methods We will use a mixed-methods (qualitative and quantitative approach incorporating both empirical and conceptual work. Empirical work will include a systematic review of a random sample of published trials, a survey and in-depth interviews with trialists, a survey of REBs, and in-depth interviews and focus group discussions with trial participants and gatekeepers. The empirical work will inform the concurrent ethical analysis which will lead to a guidance document laying out principles, policy options, and rationale for proposed guidelines. An

  4. Reproducibility of Cognitive Profiles in Psychosis Using Cluster Analysis.

    Science.gov (United States)

    Lewandowski, Kathryn E; Baker, Justin T; McCarthy, Julie M; Norris, Lesley A; Öngür, Dost

    2018-04-01

    Cognitive dysfunction is a core symptom dimension that cuts across the psychoses. Recent findings support classification of patients along the cognitive dimension using cluster analysis; however, data-derived groupings may be highly determined by sampling characteristics and the measures used to derive the clusters, and so their interpretability must be established. We examined cognitive clusters in a cross-diagnostic sample of patients with psychosis and associations with clinical and functional outcomes. We then compared our findings to a previous report of cognitive clusters in a separate sample using a different cognitive battery. Participants with affective or non-affective psychosis (n=120) and healthy controls (n=31) were administered the MATRICS Consensus Cognitive Battery, and clinical and community functioning assessments. Cluster analyses were performed on cognitive variables, and clusters were compared on demographic, cognitive, and clinical measures. Results were compared to findings from our previous report. A four-cluster solution provided a good fit to the data; profiles included a neuropsychologically normal cluster, a globally impaired cluster, and two clusters of mixed profiles. Cognitive burden was associated with symptom severity and poorer community functioning. The patterns of cognitive performance by cluster were highly consistent with our previous findings. We found evidence of four cognitive subgroups of patients with psychosis, with cognitive profiles that map closely to those produced in our previous work. Clusters were associated with clinical and community variables and a measure of premorbid functioning, suggesting that they reflect meaningful groupings: replicable, and related to clinical presentation and functional outcomes. (JINS, 2018, 24, 382-390).

  5. Identifying novel phenotypes of acute heart failure using cluster analysis of clinical variables.

    Science.gov (United States)

    Horiuchi, Yu; Tanimoto, Shuzou; Latif, A H M Mahbub; Urayama, Kevin Y; Aoki, Jiro; Yahagi, Kazuyuki; Okuno, Taishi; Sato, Yu; Tanaka, Tetsu; Koseki, Keita; Komiyama, Kota; Nakajima, Hiroyoshi; Hara, Kazuhiro; Tanabe, Kengo

    2018-07-01

    Acute heart failure (AHF) is a heterogeneous disease caused by various cardiovascular (CV) pathophysiology and multiple non-CV comorbidities. We aimed to identify clinically important subgroups to improve our understanding of the pathophysiology of AHF and inform clinical decision-making. We evaluated detailed clinical data of 345 consecutive AHF patients using non-hierarchical cluster analysis of 77 variables, including age, sex, HF etiology, comorbidities, physical findings, laboratory data, electrocardiogram, echocardiogram and treatment during hospitalization. Cox proportional hazards regression analysis was performed to estimate the association between the clusters and clinical outcomes. Three clusters were identified. Cluster 1 (n=108) represented "vascular failure". This cluster had the highest average systolic blood pressure at admission and lung congestion with type 2 respiratory failure. Cluster 2 (n=89) represented "cardiac and renal failure". They had the lowest ejection fraction (EF) and worst renal function. Cluster 3 (n=148) comprised mostly older patients and had the highest prevalence of atrial fibrillation and preserved EF. Death or HF hospitalization within 12-month occurred in 23% of Cluster 1, 36% of Cluster 2 and 36% of Cluster 3 (p=0.034). Compared with Cluster 1, risk of death or HF hospitalization was 1.74 (95% CI, 1.03-2.95, p=0.037) for Cluster 2 and 1.82 (95% CI, 1.13-2.93, p=0.014) for Cluster 3. Cluster analysis may be effective in producing clinically relevant categories of AHF, and may suggest underlying pathophysiology and potential utility in predicting clinical outcomes. Copyright © 2018 Elsevier B.V. All rights reserved.

  6. Grouping of Cities In Terms Of Primary Health Indicators in Turkey: An Application of Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Bilgehan TEKİN

    2015-12-01

    Full Text Available It is thought that to determine the differences between cities that locate in Turkey is important in the context of primary health care indicators. The subject of this study is the classification of cities in Turkey in terms of health indicators. The cluster analysis method which is the one of the data mining and multivariate statistical methods is used for classification method. The main objective of the study is to examine the point of results of movement transformation in health in terms of basic health indicators on the basis of cities.. In this context, 81 cities, in Turkey are grouped with sixteen health indicators which is assumed to demonstrate the effectiveness of health care services, by the years of 2013. And also compared with the health and socio-economic development ranking in the previous studies. Providences are gathered in 21, 13, 11, 7 and 5 clusters. 11’s, 7’s and 5’s clusters are determined as the most significant clusters. As a result of the study the development gap between eastern and western provinces emerges in terms of the health variables.

  7. Prediction of strontium bromide laser efficiency using cluster and decision tree analysis

    Directory of Open Access Journals (Sweden)

    Iliev Iliycho

    2018-01-01

    Full Text Available Subject of investigation is a new high-powered strontium bromide (SrBr2 vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.

  8. Multivariate Statistical Methods as a Tool of Financial Analysis of Farm Business

    Czech Academy of Sciences Publication Activity Database

    Novák, J.; Sůvová, H.; Vondráček, Jiří

    2002-01-01

    Roč. 48, č. 1 (2002), s. 9-12 ISSN 0139-570X Institutional research plan: AV0Z1030915 Keywords : financial analysis * financial ratios * multivariate statistical methods * correlation analysis * discriminant analysis * cluster analysis Subject RIV: BB - Applied Statistics, Operational Research

  9. Dancoff factors with partial absorption in cluster geometry by the direct method

    International Nuclear Information System (INIS)

    Rodrigues, Leticia Jenisch; Leite, Sergio de Queiroz Bogado; Vilhena, Marco Tullio de; Bodmann, Bardo Ernest Josef

    2007-01-01

    Accurate analysis of resonance absorption in heterogeneous systems is essential in problems like criticality, breeding ratios and fuel depletion calculations. In compact arrays of fuel rods, resonance absorption is strongly affected by the Dancoff factor, defined in this study as the probability that a neutron emitted from the surface of a fuel element, enters another fuel element without any collision in the moderator or cladding. In the original WIMS code, Black Dancoff factors were computed in cluster geometry by the collision probability method, for each one of the symmetrically distinct fuel pin positions in the cell. Recent improvements to the code include a new routine (PIJM) that was created to incorporate a more efficient scheme for computing the collision matrices. In that routine, each system region is considered individually, minimizing convergence problems and reducing the number of neutron track lines required in the in-plane integrations of the Bickley functions for any given accuracy. In the present work, PIJM is extended to compute Grey Dancoff factors for two-dimensional cylindrical cells in cluster geometry. The effectiveness of the method is accessed by comparing Grey Dancoff factors as calculated by PIJM, with those available in the literature by the Monte Carlo method, for the irregular geometry of the Canadian CANDU37 assembly. Dancoff factors at five symmetrically distinct fuel pin positions are found in very good agreement with the literature results (author)

  10. GLOBULAR CLUSTER ABUNDANCES FROM HIGH-RESOLUTION, INTEGRATED-LIGHT SPECTROSCOPY. II. EXPANDING THE METALLICITY RANGE FOR OLD CLUSTERS AND UPDATED ANALYSIS TECHNIQUES

    Energy Technology Data Exchange (ETDEWEB)

    Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew [The Observatories of the Carnegie Institution for Science, 813 Santa Barbara St., Pasadena, CA 91101 (United States)

    2017-01-10

    We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ∼0.1 dex for GCs with metallicities as high as [Fe/H] = −0.3, but the abundances measured for more metal-rich clusters may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na i, Mg i, Al i, Si i, Ca i, Ti i, Ti ii, Sc ii, V i, Cr i, Mn i, Co i, Ni i, Cu i, Y ii, Zr i, Ba ii, La ii, Nd ii, and Eu ii. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe i, Ca i, Si i, Ni i, and Ba ii. The elements that show the greatest differences include Mg i and Zr i. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements.

  11. Voting-based consensus clustering for combining multiple clusterings of chemical structures

    Directory of Open Access Journals (Sweden)

    Saeed Faisal

    2012-12-01

    Full Text Available Abstract Background Although many consensus clustering methods have been successfully used for combining multiple classifiers in many areas such as machine learning, applied statistics, pattern recognition and bioinformatics, few consensus clustering methods have been applied for combining multiple clusterings of chemical structures. It is known that any individual clustering method will not always give the best results for all types of applications. So, in this paper, three voting and graph-based consensus clusterings were used for combining multiple clusterings of chemical structures to enhance the ability of separating biologically active molecules from inactive ones in each cluster. Results The cumulative voting-based aggregation algorithm (CVAA, cluster-based similarity partitioning algorithm (CSPA and hyper-graph partitioning algorithm (HGPA were examined. The F-measure and Quality Partition Index method (QPI were used to evaluate the clusterings and the results were compared to the Ward’s clustering method. The MDL Drug Data Report (MDDR dataset was used for experiments and was represented by two 2D fingerprints, ALOGP and ECFP_4. The performance of voting-based consensus clustering method outperformed the Ward’s method using F-measure and QPI method for both ALOGP and ECFP_4 fingerprints, while the graph-based consensus clustering methods outperformed the Ward’s method only for ALOGP using QPI. The Jaccard and Euclidean distance measures were the methods of choice to generate the ensembles, which give the highest values for both criteria. Conclusions The results of the experiments show that consensus clustering methods can improve the effectiveness of chemical structures clusterings. The cumulative voting-based aggregation algorithm (CVAA was the method of choice among consensus clustering methods.

  12. Genomic signal processing for DNA sequence clustering.

    Science.gov (United States)

    Mendizabal-Ruiz, Gerardo; Román-Godínez, Israel; Torres-Ramos, Sulema; Salido-Ruiz, Ricardo A; Vélez-Pérez, Hugo; Morales, J Alejandro

    2018-01-01

    Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.

  13. Degradation Assessment and Fault Diagnosis for Roller Bearing Based on AR Model and Fuzzy Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Lingli Jiang

    2011-01-01

    Full Text Available This paper proposes a new approach combining autoregressive (AR model and fuzzy cluster analysis for bearing fault diagnosis and degradation assessment. AR model is an effective approach to extract the fault feature, and is generally applied to stationary signals. However, the fault vibration signals of a roller bearing are non-stationary and non-Gaussian. Aiming at this problem, the set of parameters of the AR model is estimated based on higher-order cumulants. Consequently, the AR parameters are taken as the feature vectors, and fuzzy cluster analysis is applied to perform classification and pattern recognition. Experiments analysis results show that the proposed method can be used to identify various types and severities of fault bearings. This study is significant for non-stationary and non-Gaussian signal analysis, fault diagnosis and degradation assessment.

  14. Fast EEG spike detection via eigenvalue analysis and clustering of spatial amplitude distribution

    Science.gov (United States)

    Fukami, Tadanori; Shimada, Takamasa; Ishikawa, Bunnoshin

    2018-06-01

    Objective. In the current study, we tested a proposed method for fast spike detection in electroencephalography (EEG). Approach. We performed eigenvalue analysis in two-dimensional space spanned by gradients calculated from two neighboring samples to detect high-amplitude negative peaks. We extracted the spike candidates by imposing restrictions on parameters regarding spike shape and eigenvalues reflecting detection characteristics of individual medical doctors. We subsequently performed clustering, classifying detected peaks by considering the amplitude distribution at 19 scalp electrodes. Clusters with a small number of candidates were excluded. We then defined a score for eliminating spike candidates for which the pattern of detected electrodes differed from the overall pattern in a cluster. Spikes were detected by setting the score threshold. Main results. Based on visual inspection by a psychiatrist experienced in EEG, we evaluated the proposed method using two statistical measures of precision and recall with respect to detection performance. We found that precision and recall exhibited a trade-off relationship. The average recall value was 0.708 in eight subjects with the score threshold that maximized the F-measure, with 58.6  ±  36.2 spikes per subject. Under this condition, the average precision was 0.390, corresponding to a false positive rate 2.09 times higher than the true positive rate. Analysis of the required processing time revealed that, using a general-purpose computer, our method could be used to perform spike detection in 12.1% of the recording time. The process of narrowing down spike candidates based on shape occupied most of the processing time. Significance. Although the average recall value was comparable with that of other studies, the proposed method significantly shortened the processing time.

  15. Supplier Risk Assessment Based on Best-Worst Method and K-Means Clustering: A Case Study

    Directory of Open Access Journals (Sweden)

    Merve Er Kara

    2018-04-01

    Full Text Available Supplier evaluation and selection is one of the most critical strategic decisions for developing a competitive and sustainable organization. Companies have to consider supplier related risks and threats in their purchasing decisions. In today’s competitive and risky business environment, it is very important to work with reliable suppliers. This study proposes a clustering based approach to group suppliers based on their risk profile. Suppliers of a company in the heavy-machinery sector are assessed based on 17 qualitative and quantitative risk types. The weights of the criteria are determined by using the Best-Worst method. Four factors are extracted by applying Factor Analysis to the supplier risk data. Then k-means clustering algorithm is applied to group core suppliers of the company based on the four risk factors. Three clusters are created with different risk exposure levels. The interpretation of the results provides insights for risk management actions and supplier development programs to mitigate supplier risk.

  16. Comparative analysis of methods for classification in predicting the quality of bread

    OpenAIRE

    E. A. Balashova; V. K. Bitjukov; E. A. Savvina

    2013-01-01

    The comparative analysis of classification methods of two-stage cluster and discriminant analysis and neural networks was performed. System of informative signs which classifies with a minimum of errors has been proposed.

  17. Analysis of mixed nitric oxide - Water clusters by complementary ionization methods

    Czech Academy of Sciences Publication Activity Database

    Šmídová, Daniela; Lengyel, Jozef; Kočišek, Jaroslav; Pysanenko, Andriy; Fárník, Michal

    2017-01-01

    Roč. 421, OCT 2017 (2017), s. 144-149 ISSN 1387-3806 R&D Projects: GA ČR(CZ) GA17-04068S Institutional support: RVO:61388955 Keywords : Cluster mass spectrometry * Atmospheric aerosols * Electron attachment Subject RIV: CF - Physical ; Theoretical Chemistry OBOR OECD: Physical chemistry Impact factor: 1.702, year: 2016

  18. Feasibility Study of Parallel Finite Element Analysis on Cluster-of-Clusters

    Science.gov (United States)

    Muraoka, Masae; Okuda, Hiroshi

    With the rapid growth of WAN infrastructure and development of Grid middleware, it's become a realistic and attractive methodology to connect cluster machines on wide-area network for the execution of computation-demanding applications. Many existing parallel finite element (FE) applications have been, however, designed and developed with a single computing resource in mind, since such applications require frequent synchronization and communication among processes. There have been few FE applications that can exploit the distributed environment so far. In this study, we explore the feasibility of FE applications on the cluster-of-clusters. First, we classify FE applications into two types, tightly coupled applications (TCA) and loosely coupled applications (LCA) based on their communication pattern. A prototype of each application is implemented on the cluster-of-clusters. We perform numerical experiments executing TCA and LCA on both the cluster-of-clusters and a single cluster. Thorough these experiments, by comparing the performances and communication cost in each case, we evaluate the feasibility of FEA on the cluster-of-clusters.

  19. Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis.

    Science.gov (United States)

    Liao, Minlei; Li, Yunfeng; Kianifard, Farid; Obi, Engels; Arcona, Stephen

    2016-03-02

    Cluster analysis (CA) is a frequently used applied statistical technique that helps to reveal hidden structures and "clusters" found in large data sets. However, this method has not been widely used in large healthcare claims databases where the distribution of expenditure data is commonly severely skewed. The purpose of this study was to identify cost change patterns of patients with end-stage renal disease (ESRD) who initiated hemodialysis (HD) by applying different clustering methods. A retrospective, cross-sectional, observational study was conducted using the Truven Health MarketScan® Research Databases. Patients aged ≥18 years with ≥2 ESRD diagnoses who initiated HD between 2008 and 2010 were included. The K-means CA method and hierarchical CA with various linkage methods were applied to all-cause costs within baseline (12-months pre-HD) and follow-up periods (12-months post-HD) to identify clusters. Demographic, clinical, and cost information was extracted from both periods, and then examined by cluster. A total of 18,380 patients were identified. Meaningful all-cause cost clusters were generated using K-means CA and hierarchical CA with either flexible beta or Ward's methods. Based on cluster sample sizes and change of cost patterns, the K-means CA method and 4 clusters were selected: Cluster 1: Average to High (n = 113); Cluster 2: Very High to High (n = 89); Cluster 3: Average to Average (n = 16,624); or Cluster 4: Increasing Costs, High at Both Points (n = 1554). Median cost changes in the 12-month pre-HD and post-HD periods increased from $185,070 to $884,605 for Cluster 1 (Average to High), decreased from $910,930 to $157,997 for Cluster 2 (Very High to High), were relatively stable and remained low from $15,168 to $13,026 for Cluster 3 (Average to Average), and increased from $57,909 to $193,140 for Cluster 4 (Increasing Costs, High at Both Points). Relatively stable costs after starting HD were associated with more stable scores

  20. Modeling, Stability Analysis and Active Stabilization of Multiple DC-Microgrids Clusters

    DEFF Research Database (Denmark)

    Shafiee, Qobad; Dragicevic, Tomislav; Vasquez, Juan Carlos

    2014-01-01

    ), and more especially during interconnection with other MGs, creating dc MG clusters. This paper develops a small signal model for dc MGs from the control point of view, in order to study stability analysis and investigate effects of CPLs and line impedances between the MGs on stability of these systems....... This model can be also used to synthesis and study dynamics of control loops in dc MGs and also dc MG clusters. An active stabilization method is proposed to be implemented as a dc active power filter (APF) inside the MGs in order to not only increase damping of dc MGs at the presence of CPLs but also...... to improve their stability while connecting to the other MGs. Simulation results are provided to evaluate the developed models and demonstrate the effectiveness of proposed active stabilization technique....

  1. Steady state subchannel analysis of AHWR fuel cluster

    International Nuclear Information System (INIS)

    Dasgupta, A.; Chandraker, D.K.; Vijayan, P.K.; Saha, D.

    2006-09-01

    Subchannel analysis is a technique used to predict the thermal hydraulic behavior of reactor fuel assemblies. The rod cluster is subdivided into a number of parallel interacting flow subchannels. The conservation equations are solved for each of these subchannels, taking into account subchannel interactions. Subchannel analysis of AHWR D-5 fuel cluster has been carried out to determine the variations in thermal hydraulic conditions of coolant and fuel temperatures along the length of the fuel bundle. The hottest regions within the AHWR fuel bundle have been identified. The effect of creep on the fuel performance has also been studied. MCHFR has been calculated using Jansen-Levy correlation. The calculations have been backed by sensitivity analysis for parameters whose values are not known accurately. The sensitivity analysis showed the calculations to have a very low sensitivity to these parameters. Apart from the analysis, the report also includes a brief introduction of a few subchannel codes. A brief description of the equations and solution methodology used in COBRA-IIIC and COBRA-IV-I is also given. (author)

  2. Study of methods to increase cluster/dislocation loop densities in electrodes

    Science.gov (United States)

    Yang, Xiaoling; Miley, George H.

    2009-03-01

    Recent research has developed a technique for imbedding ultra-high density deuterium ``clusters'' (50 to 100 atoms per cluster) in various metals such as Palladium (Pd), Beryllium (Be) and Lithium (Li). It was found the thermally dehydrogenated PdHx retained the clusters and exhibited up to 12 percent lower resistance compared to the virginal Pd samplesootnotetextA. G. Lipson, et al. Phys. Solid State. 39 (1997) 1891. SQUID measurements showed that in Pd these condensed matter clusters approach metallic conditions, exhibiting superconducting propertiesootnotetextA. Lipson, et al. Phys. Rev. B 72, 212507 (2005ootnotetextA. G. Lipson, et al. Phys. Lett. A 339, (2005) 414-423. If the fabrication methods under study are successful, a large packing fraction of nuclear reactive clusters can be developed in the electrodes by electrolyte or high pressure gas loading. This will provide a much higher low-energy-nuclear- reaction (LENR) rate than achieved with earlier electrodeootnotetextCastano, C.H., et al. Proc. ICCF-9, Beijing, China 19-24 May, 2002..

  3. Mobility in Europe: Recent Trends from a Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Ioana Manafi

    2017-08-01

    Full Text Available During the past decade, Europe was confronted with major changes and events offering large opportunities for mobility. The EU enlargement process, the EU policies regarding youth, the economic crisis affecting national economies on different levels, political instabilities in some European countries, high rates of unemployment or the increasing number of refugees are only a few of the factors influencing net migration in Europe. Based on a set of socio-economic indicators for EU/EFTA countries and cluster analysis, the paper provides an overview of regional differences across European countries, related to migration magnitude in the identified clusters. The obtained clusters are in accordance with previous studies in migration, and appear stable during the period of 2005-2013, with only some exceptions. The analysis revealed three country clusters: EU/EFTA center-receiving countries, EU/EFTA periphery-sending countries and EU/EFTA outlier countries, the names suggesting not only the geographical position within Europe, but the trends in net migration flows during the years. Therewith, the results provide evidence for the persistence of a movement from periphery to center countries, which is correlated with recent flows of mobility in Europe.

  4. Cluster analysis for portfolio optimization

    OpenAIRE

    Vincenzo Tola; Fabrizio Lillo; Mauro Gallegati; Rosario N. Mantegna

    2005-01-01

    We consider the problem of the statistical uncertainty of the correlation matrix in the optimization of a financial portfolio. We show that the use of clustering algorithms can improve the reliability of the portfolio in terms of the ratio between predicted and realized risk. Bootstrap analysis indicates that this improvement is obtained in a wide range of the parameters N (number of assets) and T (investment horizon). The predicted and realized risk level and the relative portfolio compositi...

  5. Hierarchical cluster analysis of progression patterns in open-angle glaucoma patients with medical treatment.

    Science.gov (United States)

    Bae, Hyoung Won; Rho, Seungsoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2014-04-29

    To classify medically treated open-angle glaucoma (OAG) by the pattern of progression using hierarchical cluster analysis, and to determine OAG progression characteristics by comparing clusters. Ninety-five eyes of 95 OAG patients who received medical treatment, and who had undergone visual field (VF) testing at least once per year for 5 or more years. OAG was classified into subgroups using hierarchical cluster analysis based on the following five variables: baseline mean deviation (MD), baseline visual field index (VFI), MD slope, VFI slope, and Glaucoma Progression Analysis (GPA) printout. After that, other parameters were compared between clusters. Two clusters were made after a hierarchical cluster analysis. Cluster 1 showed -4.06 ± 2.43 dB baseline MD, 92.58% ± 6.27% baseline VFI, -0.28 ± 0.38 dB per year MD slope, -0.52% ± 0.81% per year VFI slope, and all "no progression" cases in GPA printout, whereas cluster 2 showed -8.68 ± 3.81 baseline MD, 77.54 ± 12.98 baseline VFI, -0.72 ± 0.55 MD slope, -2.22 ± 1.89 VFI slope, and seven "possible" and four "likely" progression cases in GPA printout. There were no significant differences in age, sex, mean IOP, central corneal thickness, and axial length between clusters. However, cluster 2 included more high-tension glaucoma patients and used a greater number of antiglaucoma eye drops significantly compared with cluster 1. Hierarchical cluster analysis of progression patterns divided OAG into slow and fast progression groups, evidenced by assessing the parameters of glaucomatous progression in VF testing. In the fast progression group, the prevalence of high-tension glaucoma was greater and the number of antiglaucoma medications administered was increased versus the slow progression group. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.

  6. Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms.

    Science.gov (United States)

    Esplin, M Sean; Manuck, Tracy A; Varner, Michael W; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M; Ilekis, John

    2015-09-01

    We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators' ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB. We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks' gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software. One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ(2) analysis (PD and DH, P cluster 3 of SPTB. We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. Stability of maximum-likelihood-based clustering methods: exploring the backbone of classifications

    International Nuclear Information System (INIS)

    Mungan, Muhittin; Ramasco, José J

    2010-01-01

    Components of complex systems are often classified according to the way they interact with each other. In graph theory such groups are known as clusters or communities. Many different techniques have been recently proposed to detect them, some of which involve inference methods using either Bayesian or maximum likelihood approaches. In this paper, we study a statistical model designed for detecting clusters based on connection similarity. The basic assumption of the model is that the graph was generated by a certain grouping of the nodes and an expectation maximization algorithm is employed to infer that grouping. We show that the method admits further development to yield a stability analysis of the groupings that quantifies the extent to which each node influences its neighbors' group membership. Our approach naturally allows for the identification of the key elements responsible for the grouping and their resilience to changes in the network. Given the generality of the assumptions underlying the statistical model, such nodes are likely to play special roles in the original system. We illustrate this point by analyzing several empirical networks for which further information about the properties of the nodes is available. The search and identification of stabilizing nodes constitutes thus a novel technique to characterize the relevance of nodes in complex networks

  8. tclust: An R Package for a Trimming Approach to Cluster Analysis

    Directory of Open Access Journals (Sweden)

    2012-04-01

    Full Text Available Outlying data can heavily influence standard clustering methods. At the same time, clustering principles can be useful when robustifying statistical procedures. These two reasons motivate the development of feasible robust model-based clustering approaches. With this in mind, an R package for performing non-hierarchical robust clustering, called tclust, is presented here. Instead of trying to “fit” noisy data, a proportion α of the most outlying observations is trimmed. The tclust package efficiently handles different cluster scatter constraints. Graphical exploratory tools are also provided to help the user make sensible choices for the trimming proportion as well as the number of clusters to search for.

  9. Don't spin the pen: two alternative methods for second-stage sampling in urban cluster surveys

    Directory of Open Access Journals (Sweden)

    Rose Angela MC

    2007-06-01

    Full Text Available Abstract In two-stage cluster surveys, the traditional method used in second-stage sampling (in which the first household in a cluster is selected is time-consuming and may result in biased estimates of the indicator of interest. Firstly, a random direction from the center of the cluster is selected, usually by spinning a pen. The houses along that direction are then counted out to the boundary of the cluster, and one is then selected at random to be the first household surveyed. This process favors households towards the center of the cluster, but it could easily be improved. During a recent meningitis vaccination coverage survey in Maradi, Niger, we compared this method of first household selection to two alternatives in urban zones: 1 using a superimposed grid on the map of the cluster area and randomly selecting an intersection; and 2 drawing the perimeter of the cluster area using a Global Positioning System (GPS and randomly selecting one point within the perimeter. Although we only compared a limited number of clusters using each method, we found the sampling grid method to be the fastest and easiest for field survey teams, although it does require a map of the area. Selecting a random GPS point was also found to be a good method, once adequate training can be provided. Spinning the pen and counting households to the boundary was the most complicated and time-consuming. The two methods tested here represent simpler, quicker and potentially more robust alternatives to spinning the pen for cluster surveys in urban areas. However, in rural areas, these alternatives would favor initial household selection from lower density (or even potentially empty areas. Bearing in mind these limitations, as well as available resources and feasibility, investigators should choose the most appropriate method for their particular survey context.

  10. Pharmacokinetic analysis and k-means clustering of DCEMR images for radiotherapy outcome prediction of advanced cervical cancers

    International Nuclear Information System (INIS)

    Andersen, Erlend K. F.; Kristensen, Gunnar B.; Lyng, Heidi; Malinen, Eirik

    2011-01-01

    Introduction. Pharmacokinetic analysis of dynamic contrast enhanced magnetic resonance images (DCEMRI) allows for quantitative characterization of vascular properties of tumors. The aim of this study is twofold, first to determine if tumor regions with similar vascularization could be labeled by clustering methods, second to determine if the identified regions can be associated with local cancer relapse. Materials and methods. Eighty-one patients with locally advanced cervical cancer treated with chemoradiotherapy underwent DCEMRI with Gd-DTPA prior to external beam radiotherapy. The median follow-up time after treatment was four years, in which nine patients had primary tumor relapse. By fitting a pharmacokinetic two-compartment model function to the temporal contrast enhancement in the tumor, two pharmacokinetic parameters, K trans and u e , were estimated voxel by voxel from the DCEMR-images. Intratumoral regions with similar vascularization were identified by k-means clustering of the two pharmacokinetic parameter estimates over all patients. The volume fraction of each cluster was used to evaluate the prognostic value of the clusters. Results. Three clusters provided a sufficient reduction of the cluster variance to label different vascular properties within the tumors. The corresponding median volume fraction of each cluster was 38%, 46% and 10%. The second cluster was significantly associated with primary tumor control in a log-rank survival test (p-value: 0.042), showing a decreased risk of treatment failure for patients with high volume fraction of voxels. Conclusions. Intratumoral regions showing similar vascular properties could successfully be labeled in three distinct clusters and the volume fraction of one cluster region was associated with primary tumor control

  11. Pharmacokinetic analysis and k-means clustering of DCEMR images for radiotherapy outcome prediction of advanced cervical cancers

    Energy Technology Data Exchange (ETDEWEB)

    Andersen, Erlend K. F. (Dept. of Medical Physics, The Norwegian Radium Hospital, Oslo Univ. Hospital, Oslo (Norway)), e-mail: eirik.malinen@fys.uio.no; Kristensen, Gunnar B. (Section for Gynaecological Oncology, The Norwegian Radium Hospital, Oslo Univ. Hospital, Oslo (Norway)); Lyng, Heidi (Dept. of Radiation Biology, The Norwegian Radium Hospital, Oslo Univ. Hospital, Oslo (Norway)); Malinen, Eirik (Dept. of Medical Physics, The Norwegian Radium Hospital, Oslo Univ. Hospital, Oslo (Norway); Dept. of Physics, Univ. of Oslo, Oslo (Norway))

    2011-08-15

    Introduction. Pharmacokinetic analysis of dynamic contrast enhanced magnetic resonance images (DCEMRI) allows for quantitative characterization of vascular properties of tumors. The aim of this study is twofold, first to determine if tumor regions with similar vascularization could be labeled by clustering methods, second to determine if the identified regions can be associated with local cancer relapse. Materials and methods. Eighty-one patients with locally advanced cervical cancer treated with chemoradiotherapy underwent DCEMRI with Gd-DTPA prior to external beam radiotherapy. The median follow-up time after treatment was four years, in which nine patients had primary tumor relapse. By fitting a pharmacokinetic two-compartment model function to the temporal contrast enhancement in the tumor, two pharmacokinetic parameters, Ktrans and u{sub e}, were estimated voxel by voxel from the DCEMR-images. Intratumoral regions with similar vascularization were identified by k-means clustering of the two pharmacokinetic parameter estimates over all patients. The volume fraction of each cluster was used to evaluate the prognostic value of the clusters. Results. Three clusters provided a sufficient reduction of the cluster variance to label different vascular properties within the tumors. The corresponding median volume fraction of each cluster was 38%, 46% and 10%. The second cluster was significantly associated with primary tumor control in a log-rank survival test (p-value: 0.042), showing a decreased risk of treatment failure for patients with high volume fraction of voxels. Conclusions. Intratumoral regions showing similar vascular properties could successfully be labeled in three distinct clusters and the volume fraction of one cluster region was associated with primary tumor control

  12. A critical cluster analysis of 44 indicators of author-level performance

    DEFF Research Database (Denmark)

    Wildgaard, Lorna Elizabeth

    2016-01-01

    -four indicators of individual researcher performance were computed using the data. The clustering solution was supported by continued reference to the researcher’s curriculum vitae, an effect analysis and a risk analysis. Disciplinary appropriate indicators were identified and used to divide the researchers......This paper explores a 7-stage cluster methodology as a process to identify appropriate indicators for evaluation of individual researchers at a disciplinary and seniority level. Publication and citation data for 741 researchers from 4 disciplines was collected in Web of Science. Forty...... of statistics in research evaluation. The strength of the 7-stage cluster methodology is that it makes clear that in the evaluation of individual researchers, statistics cannot stand alone. The methodology is reliant on contextual information to verify the bibliometric values and cluster solution...

  13. Symptom Cluster Research With Biomarkers and Genetics Using Latent Class Analysis.

    Science.gov (United States)

    Conley, Samantha

    2017-12-01

    The purpose of this article is to provide an overview of latent class analysis (LCA) and examples from symptom cluster research that includes biomarkers and genetics. A review of LCA with genetics and biomarkers was conducted using Medline, Embase, PubMed, and Google Scholar. LCA is a robust latent variable model used to cluster categorical data and allows for the determination of empirically determined symptom clusters. Researchers should consider using LCA to link empirically determined symptom clusters to biomarkers and genetics to better understand the underlying etiology of symptom clusters. The full potential of LCA in symptom cluster research has not yet been realized because it has been used in limited populations, and researchers have explored limited biologic pathways.

  14. The composite sequential clustering technique for analysis of multispectral scanner data

    Science.gov (United States)

    Su, M. Y.

    1972-01-01

    The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.

  15. Clustering methods and visualization algorithms to aid nuclear reactor operative diagnostics

    International Nuclear Information System (INIS)

    Pepelyshev, Yu.N.; Dzwinel, W.

    1990-01-01

    The software system developed plays the role of the aid to an operator for nuclear reactor diagnostics. The noise analysis of the reactor parameters such as power, temperature and coolant flow rate constitutes the basis of the system. Combination of data acquisition, data preprocessing, clustering and cluster visualization algorithms with heuristic techniques of results analysis, determine the way of its implementation. Two regimes are available. The first one - extended - is recommended for a long term investigations and the second - suppressed for the aid to the reactor operation monitoring. The system has been tested and developed at the JINR IBR-2 pulsed reactor. 13 refs.; 4 figs.; 2 tabs

  16. Fuzzy Clustering

    DEFF Research Database (Denmark)

    Berks, G.; Keyserlingk, Diedrich Graf von; Jantzen, Jan

    2000-01-01

    A symptom is a condition indicating the presence of a disease, especially, when regarded as an aid in diagnosis.Symptoms are the smallest units indicating the existence of a disease. A syndrome on the other hand is an aggregate, set or cluster of concurrent symptoms which together indicate...... and clustering are the basic concerns in medicine. Classification depends on definitions of the classes and their required degree of participant of the elements in the cases' symptoms. In medicine imprecise conditions are the rule and therefore fuzzy methods are much more suitable than crisp ones. Fuzzy c......-mean clustering is an easy and well improved tool, which has been applied in many medical fields. We used c-mean fuzzy clustering after feature extraction from an aphasia database. Factor analysis was applied on a correlation matrix of 26 symptoms of language disorders and led to five factors. The factors...

  17. Automated classification of mouse pup isolation syllables: from cluster analysis to an Excel based ‘mouse pup syllable classification calculator’

    Directory of Open Access Journals (Sweden)

    Jasmine eGrimsley

    2013-01-01

    Full Text Available Mouse pups vocalize at high rates when they are cold or isolated from the nest. The proportions of each syllable type produced carry information about disease state and are being used as behavioral markers for the internal state of animals. Manual classifications of these vocalizations identified ten syllable types based on their spectro-temporal features. However, manual classification of mouse syllables is time consuming and vulnerable to experimenter bias. This study uses an automated cluster analysis to identify acoustically distinct syllable types produced by CBA/CaJ mouse pups, and then compares the results to prior manual classification methods. The cluster analysis identified two syllable types, based on their frequency bands, that have continuous frequency-time structure, and two syllable types featuring abrupt frequency transitions. Although cluster analysis computed fewer syllable types than manual classification, the clusters represented well the probability distributions of the acoustic features within syllables. These probability distributions indicate that some of the manually classified syllable types are not statistically distinct. The characteristics of the four classified clusters were used to generate a Microsoft Excel-based mouse syllable classifier that rapidly categorizes syllables, with over a 90% match, into the syllable types determined by cluster analysis.

  18. Comparison of cluster and principal component analysis techniques to derive dietary patterns in Irish adults.

    Science.gov (United States)

    Hearty, Aine P; Gibney, Michael J

    2009-02-01

    The aims of the present study were to examine and compare dietary patterns in adults using cluster and factor analyses and to examine the format of the dietary variables on the pattern solutions (i.e. expressed as grams/day (g/d) of each food group or as the percentage contribution to total energy intake). Food intake data were derived from the North/South Ireland Food Consumption Survey 1997-9, which was a randomised cross-sectional study of 7 d recorded food and nutrient intakes of a representative sample of 1379 Irish adults aged 18-64 years. Cluster analysis was performed using the k-means algorithm and principal component analysis (PCA) was used to extract dietary factors. Food data were reduced to thirty-three food groups. For cluster analysis, the most suitable format of the food-group variable was found to be the percentage contribution to energy intake, which produced six clusters: 'Traditional Irish'; 'Continental'; 'Unhealthy foods'; 'Light-meal foods & low-fat milk'; 'Healthy foods'; 'Wholemeal bread & desserts'. For PCA, food groups in the format of g/d were found to be the most suitable format, and this revealed four dietary patterns: 'Unhealthy foods & high alcohol'; 'Traditional Irish'; 'Healthy foods'; 'Sweet convenience foods & low alcohol'. In summary, cluster and PCA identified similar dietary patterns when presented with the same dataset. However, the two dietary pattern methods required a different format of the food-group variable, and the most appropriate format of the input variable should be considered in future studies.

  19. Exploring the individual patterns of spiritual well-being in people newly diagnosed with advanced cancer: a cluster analysis.

    Science.gov (United States)

    Bai, Mei; Dixon, Jane; Williams, Anna-Leila; Jeon, Sangchoon; Lazenby, Mark; McCorkle, Ruth

    2016-11-01

    Research shows that spiritual well-being correlates positively with quality of life (QOL) for people with cancer, whereas contradictory findings are frequently reported with respect to the differentiated associations between dimensions of spiritual well-being, namely peace, meaning and faith, and QOL. This study aimed to examine individual patterns of spiritual well-being among patients newly diagnosed with advanced cancer. Cluster analysis was based on the twelve items of the 12-item Functional Assessment of Chronic Illness Therapy-Spiritual Well-Being Scale at Time 1. A combination of hierarchical and k-means (non-hierarchical) clustering methods was employed to jointly determine the number of clusters. Self-rated health, depressive symptoms, peace, meaning and faith, and overall QOL were compared at Time 1 and Time 2. Hierarchical and k-means clustering methods both suggested four clusters. Comparison of the four clusters supported statistically significant and clinically meaningful differences in QOL outcomes among clusters while revealing contrasting relations of faith with QOL. Cluster 1, Cluster 3, and Cluster 4 represented high, medium, and low levels of overall QOL, respectively, with correspondingly high, medium, and low levels of peace, meaning, and faith. Cluster 2 was distinguished from other clusters by its medium levels of overall QOL, peace, and meaning and low level of faith. This study provides empirical support for individual difference in response to a newly diagnosed cancer and brings into focus conceptual and methodological challenges associated with the measure of spiritual well-being, which may partly contribute to the attenuated relation between faith and QOL.

  20. Epidemiological analysis of Salmonella clusters identified by whole genome sequencing, England and Wales 2014.

    Science.gov (United States)

    Waldram, Alison; Dolan, Gayle; Ashton, Philip M; Jenkins, Claire; Dallman, Timothy J

    2018-05-01

    The unprecedented level of bacterial strain discrimination provided by whole genome sequencing (WGS) presents new challenges with respect to the utility and interpretation of the data. Whole genome sequences from 1445 isolates of Salmonella belonging to the most commonly identified serotypes in England and Wales isolated between April and August 2014 were analysed. Single linkage single nucleotide polymorphism thresholds at the 10, 5 and 0 level were explored for evidence of epidemiological links between clustered cases. Analysis of the WGS data organised 566 of the 1445 isolates into 32 clusters of five or more. A statistically significant epidemiological link was identified for 17 clusters. The clusters were associated with foreign travel (n = 8), consumption of Chinese takeaways (n = 4), chicken eaten at home (n = 2), and one each of the following; eating out, contact with another case in the home and contact with reptiles. In the same time frame, one cluster was detected using traditional outbreak detection methods. WGS can be used for the highly specific and highly sensitive detection of biologically related isolates when epidemiological links are obscured. Improvements in the collection of detailed, standardised exposure information would enhance cluster investigations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Co-clustering Analysis of Weblogs Using Bipartite Spectral Projection Approach

    DEFF Research Database (Denmark)

    Xu, Guandong; Zong, Yu; Dolog, Peter

    2010-01-01

    Web clustering is an approach for aggregating Web objects into various groups according to underlying relationships among them. Finding co-clusters of Web objects is an interesting topic in the context of Web usage mining, which is able to capture the underlying user navigational interest...... and content preference simultaneously. In this paper we will present an algorithm using bipartite spectral clustering to co-cluster Web users and pages. The usage data of users visiting Web sites is modeled as a bipartite graph and the spectral clustering is then applied to the graph representation of usage...... data. The proposed approach is evaluated by experiments performed on real datasets, and the impact of using various clustering algorithms is also investigated. Experimental results have demonstrated the employed method can effectively reveal the subset aggregates of Web users and pages which...

  2. ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time.

    Science.gov (United States)

    Cai, Yunpeng; Sun, Yijun

    2011-08-01

    Taxonomy-independent analysis plays an essential role in microbial community analysis. Hierarchical clustering is one of the most widely employed approaches to finding operational taxonomic units, the basis for many downstream analyses. Most existing algorithms have quadratic space and computational complexities, and thus can be used only for small or medium-scale problems. We propose a new online learning-based algorithm that simultaneously addresses the space and computational issues of prior work. The basic idea is to partition a sequence space into a set of subspaces using a partition tree constructed using a pseudometric, then recursively refine a clustering structure in these subspaces. The technique relies on new methods for fast closest-pair searching and efficient dynamic insertion and deletion of tree nodes. To avoid exhaustive computation of pairwise distances between clusters, we represent each cluster of sequences as a probabilistic sequence, and define a set of operations to align these probabilistic sequences and compute genetic distances between them. We present analyses of space and computational complexity, and demonstrate the effectiveness of our new algorithm using a human gut microbiota data set with over one million sequences. The new algorithm exhibits a quasilinear time and space complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard hierarchical clustering algorithm.

  3. Dynamic Trajectory Extraction from Stereo Vision Using Fuzzy Clustering

    Science.gov (United States)

    Onishi, Masaki; Yoda, Ikushi

    In recent years, many human tracking researches have been proposed in order to analyze human dynamic trajectory. These researches are general technology applicable to various fields, such as customer purchase analysis in a shopping environment and safety control in a (railroad) crossing. In this paper, we present a new approach for tracking human positions by stereo image. We use the framework of two-stepped clustering with k-means method and fuzzy clustering to detect human regions. In the initial clustering, k-means method makes middle clusters from objective features extracted by stereo vision at high speed. In the last clustering, c-means fuzzy method cluster middle clusters based on attributes into human regions. Our proposed method can be correctly clustered by expressing ambiguity using fuzzy clustering, even when many people are close to each other. The validity of our technique was evaluated with the experiment of trajectories extraction of doctors and nurses in an emergency room of a hospital.

  4. Using hierarchical clustering methods to classify motor activities of COPD patients from wearable sensor data

    Directory of Open Access Journals (Sweden)

    Reilly John J

    2005-06-01

    Full Text Available Abstract Background Advances in miniature sensor technology have led to the development of wearable systems that allow one to monitor motor activities in the field. A variety of classifiers have been proposed in the past, but little has been done toward developing systematic approaches to assess the feasibility of discriminating the motor tasks of interest and to guide the choice of the classifier architecture. Methods A technique is introduced to address this problem according to a hierarchical framework and its use is demonstrated for the application of detecting motor activities in patients with chronic obstructive pulmonary disease (COPD undergoing pulmonary rehabilitation. Accelerometers were used to collect data for 10 different classes of activity. Features were extracted to capture essential properties of the data set and reduce the dimensionality of the problem at hand. Cluster measures were utilized to find natural groupings in the data set and then construct a hierarchy of the relationships between clusters to guide the process of merging clusters that are too similar to distinguish reliably. It provides a means to assess whether the benefits of merging for performance of a classifier outweigh the loss of resolution incurred through merging. Results Analysis of the COPD data set demonstrated that motor tasks related to ambulation can be reliably discriminated from tasks performed in a seated position with the legs in motion or stationary using two features derived from one accelerometer. Classifying motor tasks within the category of activities related to ambulation requires more advanced techniques. While in certain cases all the tasks could be accurately classified, in others merging clusters associated with different motor tasks was necessary. When merging clusters, it was found that the proposed method could lead to more than 12% improvement in classifier accuracy while retaining resolution of 4 tasks. Conclusion Hierarchical

  5. Comparative analysis of methods for classification in predicting the quality of bread

    Directory of Open Access Journals (Sweden)

    E. A. Balashova

    2013-01-01

    Full Text Available The comparative analysis of classification methods of two-stage cluster and discriminant analysis and neural networks was performed. System of informative signs which classifies with a minimum of errors has been proposed.

  6. Perbandingan Kinerja Metode Ward Dan K-means Dalam Menentukan Cluster Data Mahasiswa Pemohon Beasiswa (Studi Kasus : STMIK Pringsewu)

    OpenAIRE

    Satria, Fiqih; Aziz, R Z Abdul

    2016-01-01

    This research aims to determine the steps cluster analysis method with Ward method and K-Means method, and compare the results of the analysis of the two methods for clustering student data related decision-making to determine the students are eligible to receive a Peningkatan PrestasiAkademik (PPA) scholarship and Bantuan Biaya Akademik (BBA) scholarship in STMIK Pringsewu. Cluster analysis was performed using IBM SPSS Version 23. Cluster Analysis results of both methods were compared using ...

  7. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

    Directory of Open Access Journals (Sweden)

    Marco Borri

    Full Text Available To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment.The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4. Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters.The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4, determined with cluster validation, produced the best separation between reducing and non-reducing clusters.The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.

  8. The contribution of cluster and discriminant analysis to the classification of complex aquifer systems.

    Science.gov (United States)

    Panagopoulos, G P; Angelopoulou, D; Tzirtzilakis, E E; Giannoulopoulos, P

    2016-10-01

    This paper presents an innovated method for the discrimination of groundwater samples in common groups representing the hydrogeological units from where they have been pumped. This method proved very efficient even in areas with complex hydrogeological regimes. The proposed method requires chemical analyses of water samples only for major ions, meaning that it is applicable to most of cases worldwide. Another benefit of the method is that it gives a further insight of the aquifer hydrogeochemistry as it provides the ions that are responsible for the discrimination of the group. The procedure begins with cluster analysis of the dataset in order to classify the samples in the corresponding hydrogeological unit. The feasibility of the method is proven from the fact that the samples of volcanic origin were separated into two different clusters, namely the lava units and the pyroclastic-ignimbritic aquifer. The second step is the discriminant analysis of the data which provides the functions that distinguish the groups from each other and the most significant variables that define the hydrochemical composition of the aquifer. The whole procedure was highly successful as the 94.7 % of the samples were classified to the correct aquifer system. Finally, the resulted functions can be safely used to categorize samples of either unknown or doubtful origin improving thus the quality and the size of existing hydrochemical databases.

  9. An improved clustering algorithm based on reverse learning in intelligent transportation

    Science.gov (United States)

    Qiu, Guoqing; Kou, Qianqian; Niu, Ting

    2017-05-01

    With the development of artificial intelligence and data mining technology, big data has gradually entered people's field of vision. In the process of dealing with large data, clustering is an important processing method. By introducing the reverse learning method in the clustering process of PAM clustering algorithm, to further improve the limitations of one-time clustering in unsupervised clustering learning, and increase the diversity of clustering clusters, so as to improve the quality of clustering. The algorithm analysis and experimental results show that the algorithm is feasible.

  10. A Novel Clustering Model Based on Set Pair Analysis for the Energy Consumption Forecast in China

    Directory of Open Access Journals (Sweden)

    Mingwu Wang

    2014-01-01

    Full Text Available The energy consumption forecast is important for the decision-making of national economic and energy policies. But it is a complex and uncertainty system problem affected by the outer environment and various uncertainty factors. Herein, a novel clustering model based on set pair analysis (SPA was introduced to analyze and predict energy consumption. The annual dynamic relative indicator (DRI of historical energy consumption was adopted to conduct a cluster analysis with Fisher’s optimal partition method. Combined with indicator weights, group centroids of DRIs for influence factors were transferred into aggregating connection numbers in order to interpret uncertainty by identity-discrepancy-contrary (IDC analysis. Moreover, a forecasting model based on similarity to group centroid was discussed to forecast energy consumption of a certain year on the basis of measured values of influence factors. Finally, a case study predicting China’s future energy consumption as well as comparison with the grey method was conducted to confirm the reliability and validity of the model. The results indicate that the method presented here is more feasible and easier to use and can interpret certainty and uncertainty of development speed of energy consumption and influence factors as a whole.

  11. Clustering potential of agriculture in Lviv region

    Directory of Open Access Journals (Sweden)

    N.A. Tsymbalista

    2015-03-01

    Full Text Available The paper emphasizes the need to stimulate the development of integration processes in agro-industrial complex of Ukraine. The advantages of the cluster model of integration are shown: along with the growth of competitiveness of agricultural products, it helps to increase the efficiency of inventory management of material flows, as well as to expand opportunities to attract investment and to implement innovation in agricultural production. Clusters also help to reduce transaction costs by establishing an optimal cooperation between the contracting parties. The theoretical essentiality of agro-industrial clusters is studied and a conceptual model of that kind of clusters is shown. The preconditions of clustering of agriculture in Lviv region are analyzed and feasibility of specific methods of statistical analysis to identify localization areas of the potential members of cluster-forming blocks of regional food clusters is verified. Cluster analysis is carried out to identify potential cluster-forming areas in the region in various sectors of agricultural production.

  12. Shape Analysis of HII Regions - I. Statistical Clustering

    Science.gov (United States)

    Campbell-White, Justyn; Froebrich, Dirk; Kume, Alfred

    2018-04-01

    We present here our shape analysis method for a sample of 76 Galactic HII regions from MAGPIS 1.4 GHz data. The main goal is to determine whether physical properties and initial conditions of massive star cluster formation is linked to the shape of the regions. We outline a systematic procedure for extracting region shapes and perform hierarchical clustering on the shape data. We identified six groups that categorise HII regions by common morphologies. We confirmed the validity of these groupings by bootstrap re-sampling and the ordinance technique multidimensional scaling. We then investigated associations between physical parameters and the assigned groups. Location is mostly independent of group, with a small preference for regions of similar longitudes to share common morphologies. The shapes are homogeneously distributed across Galactocentric distance and latitude. One group contains regions that are all younger than 0.5 Myr and ionised by low- to intermediate-mass sources. Those in another group are all driven by intermediate- to high-mass sources. One group was distinctly separated from the other five and contained regions at the surface brightness detection limit for the survey. We find that our hierarchical procedure is most sensitive to the spatial sampling resolution used, which is determined for each region from its distance. We discuss how these errors can be further quantified and reduced in future work by utilising synthetic observations from numerical simulations of HII regions. We also outline how this shape analysis has further applications to other diffuse astronomical objects.

  13. Multichannel response analysis on 2D projection views for detection of clustered microcalcifications in digital breast tomosynthesis

    International Nuclear Information System (INIS)

    Wei, Jun; Chan, Heang-Ping; Hadjiiski, Lubomir M.; Helvie, Mark A.; Lu, Yao; Zhou, Chuan; Samala, Ravi

    2014-01-01

    Purpose: To investigate the feasibility of a new two-dimensional (2D) multichannel response (MCR) analysis approach for the detection of clustered microcalcifications (MCs) in digital breast tomosynthesis (DBT). Methods: With IRB approval and informed consent, a data set of two-view DBTs from 42 breasts containing biopsy-proven MC clusters was collected in this study. The authors developed a 2D approach for MC detection using projection view (PV) images rather than the reconstructed three-dimensional (3D) DBT volume. Signal-to-noise ratio (SNR) enhancement processing was first applied to each PV to enhance the potential MCs. The locations of MC candidates were then identified with iterative thresholding. The individual MCs were decomposed with Hermite–Gaussian (HG) and Laguerre–Gaussian (LG) basis functions and the channelized Hotelling model was trained to produce the MCRs for each MC on the 2D images. The MCRs from the PVs were fused in 3D by a coincidence counting method that backprojects the MC candidates on the PVs and traces the coincidence of their ray paths in 3D. The 3D MCR was used to differentiate the true MCs from false positives (FPs). Finally a dynamic clustering method was used to identify the potential MC clusters in the DBT volume based on the fact that true MCs of clinical significance appear in clusters. Using two-fold cross validation, the performance of the 3D MCR for classification of true and false MCs was estimated by the area under the receiver operating characteristic (ROC) curve and the overall performance of the MCR approach for detection of clustered MCs was assessed by free response receiver operating characteristic (FROC) analysis. Results: When the HG basis function was used for MCR analysis, the detection of MC cluster achieved case-based test sensitivities of 80% and 90% at the average FP rates of 0.65 and 1.55 FPs per DBT volume, respectively. With LG basis function, the average FP rates were 0.62 and 1.57 per DBT volume at

  14. Gene cluster statistics with gene families.

    Science.gov (United States)

    Raghupathy, Narayanan; Durand, Dannie

    2009-05-01

    analysis in genomes of various sizes and illustrate the utility of our methods by applying them to gene clusters recently reported in the literature. Mathematical code to compute cluster probabilities using our methods is available as supplementary material.

  15. Clustering analysis for muon tomography data elaboration in the Muon Portal project

    Science.gov (United States)

    Bandieramonte, M.; Antonuccio-Delogu, V.; Becciani, U.; Costa, A.; La Rocca, P.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, F.; Riggi, S.; Sciacca, E.; Vitello, F.

    2015-05-01

    Clustering analysis is one of multivariate data analysis techniques which allows to gather statistical data units into groups, in order to minimize the logical distance within each group and to maximize the one between different groups. In these proceedings, the authors present a novel approach to the muontomography data analysis based on clustering algorithms. As a case study we present the Muon Portal project that aims to build and operate a dedicated particle detector for the inspection of harbor containers to hinder the smuggling of nuclear materials. Clustering techniques, working directly on scattering points, help to detect the presence of suspicious items inside the container, acting, as it will be shown, as a filter for a preliminary analysis of the data.

  16. Analysis of ligand-protein exchange by Clustering of Ligand Diffusion Coefficient Pairs (CoLD-CoP)

    Science.gov (United States)

    Snyder, David A.; Chantova, Mihaela; Chaudhry, Saadia

    2015-06-01

    NMR spectroscopy is a powerful tool in describing protein structures and protein activity for pharmaceutical and biochemical development. This study describes a method to determine weak binding ligands in biological systems by using hierarchic diffusion coefficient clustering of multidimensional data obtained with a 400 MHz Bruker NMR. Comparison of DOSY spectrums of ligands of the chemical library in the presence and absence of target proteins show translational diffusion rates for small molecules upon interaction with macromolecules. For weak binders such as compounds found in fragment libraries, changes in diffusion rates upon macromolecular binding are on the order of the precision of DOSY diffusion measurements, and identifying such subtle shifts in diffusion requires careful statistical analysis. The "CoLD-CoP" (Clustering of Ligand Diffusion Coefficient Pairs) method presented here uses SAHN clustering to identify protein-binders in a chemical library or even a not fully characterized metabolite mixture. We will show how DOSY NMR and the "CoLD-CoP" method complement each other in identifying the most suitable candidates for lysozyme and wheat germ acid phosphatase.

  17. Vibration impact acoustic emission technique for identification and analysis of defects in carbon steel tubes: Part B Cluster analysis

    Energy Technology Data Exchange (ETDEWEB)

    Halim, Zakiah Abd [Universiti Teknikal Malaysia Melaka (Malaysia); Jamaludin, Nordin; Junaidi, Syarif [Faculty of Engineering and Built, Universiti Kebangsaan Malaysia, Bangi (Malaysia); Yahya, Syed Yusainee Syed [Universiti Teknologi MARA, Shah Alam (Malaysia)

    2015-04-15

    Current steel tubes inspection techniques are invasive, and the interpretation and evaluation of inspection results are manually done by skilled personnel. Part A of this work details the methodology involved in the newly developed non-invasive, non-destructive tube inspection technique based on the integration of vibration impact (VI) and acoustic emission (AE) systems known as the vibration impact acoustic emission (VIAE) technique. AE signals have been introduced into a series of ASTM A179 seamless steel tubes using the impact hammer. Specifically, a good steel tube as the reference tube and four steel tubes with through-hole artificial defect at different locations were used in this study. The AEs propagation was captured using a high frequency sensor of AE systems. The present study explores the cluster analysis approach based on autoregressive (AR) coefficients to automatically interpret the AE signals. The results from the cluster analysis were graphically illustrated using a dendrogram that demonstrated the arrangement of the natural clusters of AE signals. The AR algorithm appears to be the more effective method in classifying the AE signals into natural groups. This approach has successfully classified AE signals for quick and confident interpretation of defects in carbon steel tubes.

  18. Vibration impact acoustic emission technique for identification and analysis of defects in carbon steel tubes: Part B Cluster analysis

    International Nuclear Information System (INIS)

    Halim, Zakiah Abd; Jamaludin, Nordin; Junaidi, Syarif; Yahya, Syed Yusainee Syed

    2015-01-01

    Current steel tubes inspection techniques are invasive, and the interpretation and evaluation of inspection results are manually done by skilled personnel. Part A of this work details the methodology involved in the newly developed non-invasive, non-destructive tube inspection technique based on the integration of vibration impact (VI) and acoustic emission (AE) systems known as the vibration impact acoustic emission (VIAE) technique. AE signals have been introduced into a series of ASTM A179 seamless steel tubes using the impact hammer. Specifically, a good steel tube as the reference tube and four steel tubes with through-hole artificial defect at different locations were used in this study. The AEs propagation was captured using a high frequency sensor of AE systems. The present study explores the cluster analysis approach based on autoregressive (AR) coefficients to automatically interpret the AE signals. The results from the cluster analysis were graphically illustrated using a dendrogram that demonstrated the arrangement of the natural clusters of AE signals. The AR algorithm appears to be the more effective method in classifying the AE signals into natural groups. This approach has successfully classified AE signals for quick and confident interpretation of defects in carbon steel tubes.

  19. Fast clustering using adaptive density peak detection.

    Science.gov (United States)

    Wang, Xiao-Feng; Xu, Yifan

    2017-12-01

    Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.

  20. SOMFlow: Guided Exploratory Cluster Analysis with Self-Organizing Maps and Analytic Provenance.

    Science.gov (United States)

    Sacha, Dominik; Kraus, Matthias; Bernard, Jurgen; Behrisch, Michael; Schreck, Tobias; Asano, Yuki; Keim, Daniel A

    2018-01-01

    Clustering is a core building block for data analysis, aiming to extract otherwise hidden structures and relations from raw datasets, such as particular groups that can be effectively related, compared, and interpreted. A plethora of visual-interactive cluster analysis techniques has been proposed to date, however, arriving at useful clusterings often requires several rounds of user interactions to fine-tune the data preprocessing and algorithms. We present a multi-stage Visual Analytics (VA) approach for iterative cluster refinement together with an implementation (SOMFlow) that uses Self-Organizing Maps (SOM) to analyze time series data. It supports exploration by offering the analyst a visual platform to analyze intermediate results, adapt the underlying computations, iteratively partition the data, and to reflect previous analytical activities. The history of previous decisions is explicitly visualized within a flow graph, allowing to compare earlier cluster refinements and to explore relations. We further leverage quality and interestingness measures to guide the analyst in the discovery of useful patterns, relations, and data partitions. We conducted two pair analytics experiments together with a subject matter expert in speech intonation research to demonstrate that the approach is effective for interactive data analysis, supporting enhanced understanding of clustering results as well as the interactive process itself.

  1. Chemical Fingerprint and Quantitative Analysis for the Quality Evaluation of Platycladi cacumen by Ultra-performance Liquid Chromatography Coupled with Hierarchical Cluster Analysis.

    Science.gov (United States)

    Shan, Mingqiu; Li, Sam Fong Yau; Yu, Sheng; Qian, Yan; Guo, Shuchen; Zhang, Li; Ding, Anwei

    2018-01-01

    Platycladi cacumen (dried twigs and leaves of Platycladus orientalis (L.) Franco) is a frequently utilized Chinese medicinal herb. To evaluate the quality of the phytomedcine, an ultra-performance liquid chromatographic method with diode array detection was established for chemical fingerprinting and quantitative analysis. In this study, 27 batches of P. cacumen from different regions were collected for analysis. A chemical fingerprint with 20 common peaks was obtained using Similarity Evaluation System for Chromatographic Fingerprint of Traditional Chinese Medicine (Version 2004A). Among these 20 components, seven flavonoids (myricitrin, isoquercitrin, quercitrin, afzelin, cupressuflavone, amentoflavone and hinokiflavone) were identified and determined simultaneously. In the method validation, the seven analytes showed good regressions (R ≥ 0.9995) within linear ranges and good recoveries from 96.4% to 103.3%. Furthermore, with the contents of these seven flavonoids, hierarchical clustering analysis was applied to distinguish the 27 batches into five groups. The chemometric results showed that these groups were almost consistent with geographical positions and climatic conditions of the production regions. Integrating fingerprint analysis, simultaneous determination and hierarchical clustering analysis, the established method is rapid, sensitive, accurate and readily applicable, and also provides a significant foundation for quality control of P. cacumen efficiently. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  2. A combined method for analysis of the acoustic emission signals from aboveground storage tank bottom

    Energy Technology Data Exchange (ETDEWEB)

    Yewei Kang; Mingchun Ling; Min Xiong; Yi Sun; Dongjie Tan [PetroChina Pipeline R and D Center, Langfang (China)

    2009-07-01

    In the late 1980s acoustic emission (AE) technology was first used to assess the corrosion of aboveground storage tank (AST) bottoms. From then on, it attracts great attention because it can do in-service inspection. Recognizing and eliminating noise is still the main challenge due to the small size of the signals in the presence of potential process noise when the AE signals are processed. In this paper a method combining pattern recognition with traditional AE parametric analysis is introduced to assess the corrosion of AST bottom. First, the AE signals are clustered in different clusters by a clustering method based on the distances of AE signal features. The reasonable cluster is selected for next analysis step. Second, a statistical method is used to evaluate the activities of AE on which the final evaluation report is based. Practical inspection is done on a large oil storage tank in the Chongqing distribution station of Lanzhou- Chengdu-Chongqing product oil pipeline of PetroChina Pipeline Company. The analytical result indicates that the combined method is reliable and feasible. (author)

  3. Heuristic methods using grasp, path relinking and variable neighborhood search for the clustered traveling salesman problem

    Directory of Open Access Journals (Sweden)

    Mário Mestria

    2013-08-01

    Full Text Available The Clustered Traveling Salesman Problem (CTSP is a generalization of the Traveling Salesman Problem (TSP in which the set of vertices is partitioned into disjoint clusters and objective is to find a minimum cost Hamiltonian cycle such that the vertices of each cluster are visited contiguously. The CTSP is NP-hard and, in this context, we are proposed heuristic methods for the CTSP using GRASP, Path Relinking and Variable Neighborhood Descent (VND. The heuristic methods were tested using Euclidean instances with up to 2000 vertices and clusters varying between 4 to 150 vertices. The computational tests were performed to compare the performance of the heuristic methods with an exact algorithm using the Parallel CPLEX software. The computational results showed that the hybrid heuristic method using VND outperforms other heuristic methods.

  4. Clustering User Behavior in Scientific Collections

    OpenAIRE

    Blixhavn, Øystein Hoel

    2014-01-01

    This master thesis looks at how clustering techniques can be appliedto a collection of scientific documents. Approximately one year of serverlogs from the CERN Document Server (CDS) are analyzed and preprocessed.Based on the findings of this analysis, and a review of thecurrent state of the art, three different clustering methods are selectedfor further work: Simple k-Means, Hierarchical Agglomerative Clustering(HAC) and Graph Partitioning. In addition, a custom, agglomerativeclustering algor...

  5. Family-based clusters of cognitive test performance in familial schizophrenia

    Directory of Open Access Journals (Sweden)

    Partonen Timo

    2004-07-01

    Full Text Available Abstract Background Cognitive traits derived from neuropsychological test data are considered to be potential endophenotypes of schizophrenia. Previously, these traits have been found to form a valid basis for clustering samples of schizophrenia patients into homogeneous subgroups. We set out to identify such clusters, but apart from previous studies, we included both schizophrenia patients and family members into the cluster analysis. The aim of the study was to detect family clusters with similar cognitive test performance. Methods Test scores from 54 randomly selected families comprising at least two siblings with schizophrenia spectrum disorders, and at least two unaffected family members were included in a complete-linkage cluster analysis with interactive data visualization. Results A well-performing, an impaired, and an intermediate family cluster emerged from the analysis. While the neuropsychological test scores differed significantly between the clusters, only minor differences were observed in the clinical variables. Conclusions The visually aided clustering algorithm was successful in identifying family clusters comprising both schizophrenia patients and their relatives. The present classification method may serve as a basis for selecting phenotypically more homogeneous groups of families in subsequent genetic analyses.

  6. PARTIAL TRAINING METHOD FOR HEURISTIC ALGORITHM OF POSSIBLE CLUSTERIZATION UNDER UNKNOWN NUMBER OF CLASSES

    Directory of Open Access Journals (Sweden)

    D. A. Viattchenin

    2009-01-01

    Full Text Available A method for constructing a subset of labeled objects which is used in a heuristic algorithm of possible  clusterization with partial  training is proposed in the  paper.  The  method  is  based  on  data preprocessing by the heuristic algorithm of possible clusterization using a transitive closure of a fuzzy tolerance. Method efficiency is demonstrated by way of an illustrative example.

  7. Analyzing patients' values by applying cluster analysis and LRFM model in a pediatric dental clinic in Taiwan.

    Science.gov (United States)

    Wu, Hsin-Hung; Lin, Shih-Yen; Liu, Chih-Wei

    2014-01-01

    This study combines cluster analysis and LRFM (length, recency, frequency, and monetary) model in a pediatric dental clinic in Taiwan to analyze patients' values. A two-stage approach by self-organizing maps and K-means method is applied to segment 1,462 patients into twelve clusters. The average values of L, R, and F excluding monetary covered by national health insurance program are computed for each cluster. In addition, customer value matrix is used to analyze customer values of twelve clusters in terms of frequency and monetary. Customer relationship matrix considering length and recency is also applied to classify different types of customers from these twelve clusters. The results show that three clusters can be classified into loyal patients with L, R, and F values greater than the respective average L, R, and F values, while three clusters can be viewed as lost patients without any variable above the average values of L, R, and F. When different types of patients are identified, marketing strategies can be designed to meet different patients' needs.

  8. Analyzing Patients' Values by Applying Cluster Analysis and LRFM Model in a Pediatric Dental Clinic in Taiwan

    Science.gov (United States)

    Lin, Shih-Yen; Liu, Chih-Wei

    2014-01-01

    This study combines cluster analysis and LRFM (length, recency, frequency, and monetary) model in a pediatric dental clinic in Taiwan to analyze patients' values. A two-stage approach by self-organizing maps and K-means method is applied to segment 1,462 patients into twelve clusters. The average values of L, R, and F excluding monetary covered by national health insurance program are computed for each cluster. In addition, customer value matrix is used to analyze customer values of twelve clusters in terms of frequency and monetary. Customer relationship matrix considering length and recency is also applied to classify different types of customers from these twelve clusters. The results show that three clusters can be classified into loyal patients with L, R, and F values greater than the respective average L, R, and F values, while three clusters can be viewed as lost patients without any variable above the average values of L, R, and F. When different types of patients are identified, marketing strategies can be designed to meet different patients' needs. PMID:25045741

  9. Analyzing Patients’ Values by Applying Cluster Analysis and LRFM Model in a Pediatric Dental Clinic in Taiwan

    Directory of Open Access Journals (Sweden)

    Hsin-Hung Wu

    2014-01-01

    Full Text Available This study combines cluster analysis and LRFM (length, recency, frequency, and monetary model in a pediatric dental clinic in Taiwan to analyze patients’ values. A two-stage approach by self-organizing maps and K-means method is applied to segment 1,462 patients into twelve clusters. The average values of L, R, and F excluding monetary covered by national health insurance program are computed for each cluster. In addition, customer value matrix is used to analyze customer values of twelve clusters in terms of frequency and monetary. Customer relationship matrix considering length and recency is also applied to classify different types of customers from these twelve clusters. The results show that three clusters can be classified into loyal patients with L, R, and F values greater than the respective average L, R, and F values, while three clusters can be viewed as lost patients without any variable above the average values of L, R, and F. When different types of patients are identified, marketing strategies can be designed to meet different patients’ needs.

  10. Statistical method on nonrandom clustering with application to somatic mutations in cancer

    Directory of Open Access Journals (Sweden)

    Rejto Paul A

    2010-01-01

    Full Text Available Abstract Background Human cancer is caused by the accumulation of tumor-specific mutations in oncogenes and tumor suppressors that confer a selective growth advantage to cells. As a consequence of genomic instability and high levels of proliferation, many passenger mutations that do not contribute to the cancer phenotype arise alongside mutations that drive oncogenesis. While several approaches have been developed to separate driver mutations from passengers, few approaches can specifically identify activating driver mutations in oncogenes, which are more amenable for pharmacological intervention. Results We propose a new statistical method for detecting activating mutations in cancer by identifying nonrandom clusters of amino acid mutations in protein sequences. A probability model is derived using order statistics assuming that the location of amino acid mutations on a protein follows a uniform distribution. Our statistical measure is the differences between pair-wise order statistics, which is equivalent to the size of an amino acid mutation cluster, and the probabilities are derived from exact and approximate distributions of the statistical measure. Using data in the Catalog of Somatic Mutations in Cancer (COSMIC database, we have demonstrated that our method detects well-known clusters of activating mutations in KRAS, BRAF, PI3K, and β-catenin. The method can also identify new cancer targets as well as gain-of-function mutations in tumor suppressors. Conclusions Our proposed method is useful to discover activating driver mutations in cancer by identifying nonrandom clusters of somatic amino acid mutations in protein sequences.

  11. Merged consensus clustering to assess and improve class discovery with microarray data

    Directory of Open Access Journals (Sweden)

    Jarman Andrew P

    2010-12-01

    Full Text Available Abstract Background One of the most commonly performed tasks when analysing high throughput gene expression data is to use clustering methods to classify the data into groups. There are a large number of methods available to perform clustering, but it is often unclear which method is best suited to the data and how to quantify the quality of the classifications produced. Results Here we describe an R package containing methods to analyse the consistency of clustering results from any number of different clustering methods using resampling statistics. These methods allow the identification of the the best supported clusters and additionally rank cluster members by their fidelity within the cluster. These metrics allow us to compare the performance of different clustering algorithms under different experimental conditions and to select those that produce the most reliable clustering structures. We show the application of this method to simulated data, canonical gene expression experiments and our own novel analysis of genes involved in the specification of the peripheral nervous system in the fruitfly, Drosophila melanogaster. Conclusions Our package enables users to apply the merged consensus clustering methodology conveniently within the R programming environment, providing both analysis and graphical display functions for exploring clustering approaches. It extends the basic principle of consensus clustering by allowing the merging of results between different methods to provide an averaged clustering robustness. We show that this extension is useful in correcting for the tendency of clustering algorithms to treat outliers differently within datasets. The R package, clusterCons, is freely available at CRAN and sourceforge under the GNU public licence.

  12. Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

    Science.gov (United States)

    Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

    2016-04-01

    Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.

  13. Clustering performance comparison using K-means and expectation maximization algorithms.

    Science.gov (United States)

    Jung, Yong Gyu; Kang, Min Soo; Heo, Jun

    2014-11-14

    Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.

  14. Improved Ant Colony Clustering Algorithm and Its Performance Study

    Science.gov (United States)

    Gao, Wei

    2016-01-01

    Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533

  15. Learning from environmental data: Methods for analysis of forest nutrition time series

    Energy Technology Data Exchange (ETDEWEB)

    Sulkava, M. (Helsinki Univ. of Technology, Espoo (Finland). Computer and Information Science)

    2008-07-01

    Data analysis methods play an important role in increasing our knowledge of the environment as the amount of data measured from the environment increases. This thesis fits under the scope of environmental informatics and environmental statistics. They are fields, in which data analysis methods are developed and applied for the analysis of environmental data. The environmental data studied in this thesis are time series of nutrient concentration measurements of pine and spruce needles. In addition, there are data of laboratory quality and related environmental factors, such as the weather and atmospheric depositions. The most important methods used for the analysis of the data are based on the self-organizing map and linear regression models. First, a new clustering algorithm of the self-organizing map is proposed. It is found to provide better results than two other methods for clustering of the self-organizing map. The algorithm is used to divide the nutrient concentration data into clusters, and the result is evaluated by environmental scientists. Based on the clustering, the temporal development of the forest nutrition is modeled and the effect of nitrogen and sulfur deposition on the foliar mineral composition is assessed. Second, regression models are used for studying how much environmental factors and properties of the needles affect the changes in the nutrient concentrations of the needles between their first and second year of existence. The aim is to build understandable models with good prediction capabilities. Sparse regression models are found to outperform more traditional regression models in this task. Third, fusion of laboratory quality data from different sources is performed to estimate the precisions of the analytical methods. Weighted regression models are used to quantify how much the precision of observations can affect the time needed to detect a trend in environmental time series. The results of power analysis show that improving the

  16. A clustering based method to evaluate soil corrosivity for pipeline external integrity management

    International Nuclear Information System (INIS)

    Yajima, Ayako; Wang, Hui; Liang, Robert Y.; Castaneda, Homero

    2015-01-01

    One important category of transportation infrastructure is underground pipelines. Corrosion of these buried pipeline systems may cause pipeline failures with the attendant hazards of property loss and fatalities. Therefore, developing the capability to estimate the soil corrosivity is important for designing and preserving materials and for risk assessment. The deterioration rate of metal is highly influenced by the physicochemical characteristics of a material and the environment of its surroundings. In this study, the field data obtained from the southeast region of Mexico was examined using various data mining techniques to determine the usefulness of these techniques for clustering soil corrosivity level. Specifically, the soil was classified into different corrosivity level clusters by k-means and Gaussian mixture model (GMM). In terms of physical space, GMM shows better separability; therefore, the distributions of the material loss of the buried petroleum pipeline walls were estimated via the empirical density within GMM clusters. The soil corrosivity levels of the clusters were determined based on the medians of metal loss. The proposed clustering method was demonstrated to be capable of classifying the soil into different levels of corrosivity severity. - Highlights: • The clustering approach is applied to the data extracted from a real-life pipeline system. • Soil properties in the right-of-way are analyzed via clustering techniques to assess corrosivity. • GMM is selected as the preferred method for detecting the hidden pattern of in-situ data. • K–W test is performed for significant difference of corrosivity level between clusters

  17. Cluster Analysis of Customer Reviews Extracted from Web Pages

    Directory of Open Access Journals (Sweden)

    S. Shivashankar

    2010-01-01

    Full Text Available As e-commerce is gaining popularity day by day, the web has become an excellent source for gathering customer reviews / opinions by the market researchers. The number of customer reviews that a product receives is growing at very fast rate (It could be in hundreds or thousands. Customer reviews posted on the websites vary greatly in quality. The potential customer has to read necessarily all the reviews irrespective of their quality to make a decision on whether to purchase the product or not. In this paper, we make an attempt to assess are view based on its quality, to help the customer make a proper buying decision. The quality of customer review is assessed as most significant, more significant, significant and insignificant.A novel and effective web mining technique is proposed for assessing a customer review of a particular product based on the feature clustering techniques, namely, k-means method and fuzzy c-means method. This is performed in three steps : (1Identify review regions and extract reviews from it, (2 Extract and cluster the features of reviews by a clustering technique and then assign weights to the features belonging to each of the clusters (groups and (3 Assess the review by considering the feature weights and group belongingness. The k-means and fuzzy c-means clustering techniques are implemented and tested on customer reviews extracted from web pages. Performance of these techniques are analyzed.

  18. Unsupervised Learning —A Novel Clustering Method for Rolling Bearing Faults Identification

    Science.gov (United States)

    Kai, Li; Bo, Luo; Tao, Ma; Xuefeng, Yang; Guangming, Wang

    2017-12-01

    To promptly process the massive fault data and automatically provide accurate diagnosis results, numerous studies have been conducted on intelligent fault diagnosis of rolling bearing. Among these studies, such as artificial neural networks, support vector machines, decision trees and other supervised learning methods are used commonly. These methods can detect the failure of rolling bearing effectively, but to achieve better detection results, it often requires a lot of training samples. Based on above, a novel clustering method is proposed in this paper. This novel method is able to find the correct number of clusters automatically the effectiveness of the proposed method is validated using datasets from rolling element bearings. The diagnosis results show that the proposed method can accurately detect the fault types of small samples. Meanwhile, the diagnosis results are also relative high accuracy even for massive samples.

  19. Investigation of the cluster formation in lithium niobate crystals by computer modeling method

    Energy Technology Data Exchange (ETDEWEB)

    Voskresenskii, V. M.; Starodub, O. R., E-mail: ol-star@mail.ru; Sidorov, N. V.; Palatnikov, M. N. [Russian Academy of Sciences, Tananaev Institute of Chemistry and Technology of Rare Earth Elements and Mineral Raw Materials, Kola Science Centre (Russian Federation)

    2017-03-15

    The processes occurring upon the formation of energetically equilibrium oxygen-octahedral clusters in the ferroelectric phase of a stoichiometric lithium niobate (LiNbO{sub 3}) crystal have been investigated by the computer modeling method within the semiclassical atomistic model. An energetically favorable cluster size (at which a structure similar to that of a congruent crystal is organized) is shown to exist. A stoichiometric cluster cannot exist because of the electroneutrality loss. The most energetically favorable cluster is that with a Li/Nb ratio of about 0.945, a value close to the lithium-to-niobium ratio for a congruent crystal.

  20. Application of instrumental neutron activation analysis and multivariate statistical methods to archaeological Syrian ceramics

    International Nuclear Information System (INIS)

    Bakraji, E. H.; Othman, I.; Sarhil, A.; Al-Somel, N.

    2002-01-01

    Instrumental neutron activation analysis (INAA) has been utilized in the analysis of thirty-seven archaeological ceramics fragment samples collected from Tal AI-Wardiate site, Missiaf town, Hamma city, Syria. 36 chemical elements were determined. These elemental concentrations have been processed using two multivariate statistical methods, cluster and factor analysis in order to determine similarities and correlation between the various samples. Factor analysis confirms that samples were correctly classified by cluster analysis. The results showed that samples can be considered to be manufactured using three different sources of raw material. (author)

  1. Threshold selection for classification of MR brain images by clustering method

    Energy Technology Data Exchange (ETDEWEB)

    Moldovanu, Simona [Faculty of Sciences and Environment, Department of Chemistry, Physics and Environment, Dunărea de Jos University of Galaţi, 47 Domnească St., 800008, Romania, Phone: +40 236 460 780 (Romania); Dumitru Moţoc High School, 15 Milcov St., 800509, Galaţi (Romania); Obreja, Cristian; Moraru, Luminita, E-mail: luminita.moraru@ugal.ro [Faculty of Sciences and Environment, Department of Chemistry, Physics and Environment, Dunărea de Jos University of Galaţi, 47 Domnească St., 800008, Romania, Phone: +40 236 460 780 (Romania)

    2015-12-07

    Given a grey-intensity image, our method detects the optimal threshold for a suitable binarization of MR brain images. In MR brain image processing, the grey levels of pixels belonging to the object are not substantially different from the grey levels belonging to the background. Threshold optimization is an effective tool to separate objects from the background and further, in classification applications. This paper gives a detailed investigation on the selection of thresholds. Our method does not use the well-known method for binarization. Instead, we perform a simple threshold optimization which, in turn, will allow the best classification of the analyzed images into healthy and multiple sclerosis disease. The dissimilarity (or the distance between classes) has been established using the clustering method based on dendrograms. We tested our method using two classes of images: the first consists of 20 T2-weighted and 20 proton density PD-weighted scans from two healthy subjects and from two patients with multiple sclerosis. For each image and for each threshold, the number of the white pixels (or the area of white objects in binary image) has been determined. These pixel numbers represent the objects in clustering operation. The following optimum threshold values are obtained, T = 80 for PD images and T = 30 for T2w images. Each mentioned threshold separate clearly the clusters that belonging of the studied groups, healthy patient and multiple sclerosis disease.

  2. Peeking Network States with Clustered Patterns

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Jinoh [Texas A & M Univ., Commerce, TX (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Sim, Alex [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2015-10-20

    Network traffic monitoring has long been a core element for effec- tive network management and security. However, it is still a chal- lenging task with a high degree of complexity for comprehensive analysis when considering multiple variables and ever-increasing traffic volumes to monitor. For example, one of the widely con- sidered approaches is to scrutinize probabilistic distributions, but it poses a scalability concern and multivariate analysis is not gen- erally supported due to the exponential increase of the complexity. In this work, we propose a novel method for network traffic moni- toring based on clustering, one of the powerful deep-learning tech- niques. We show that the new approach enables us to recognize clustered results as patterns representing the network states, which can then be utilized to evaluate “similarity” of network states over time. In addition, we define a new quantitative measure for the similarity between two compared network states observed in dif- ferent time windows, as a supportive means for intuitive analysis. Finally, we demonstrate the clustering-based network monitoring with public traffic traces, and show that the proposed approach us- ing the clustering method has a great opportunity for feasible, cost- effective network monitoring.

  3. On the Analysis of Case-Control Studies in Cluster-correlated Data Settings.

    Science.gov (United States)

    Haneuse, Sebastien; Rivera-Rodriguez, Claudia

    2018-01-01

    In resource-limited settings, long-term evaluation of national antiretroviral treatment (ART) programs often relies on aggregated data, the analysis of which may be subject to ecological bias. As researchers and policy makers consider evaluating individual-level outcomes such as treatment adherence or mortality, the well-known case-control design is appealing in that it provides efficiency gains over random sampling. In the context that motivates this article, valid estimation and inference requires acknowledging any clustering, although, to our knowledge, no statistical methods have been published for the analysis of case-control data for which the underlying population exhibits clustering. Furthermore, in the specific context of an ongoing collaboration in Malawi, rather than performing case-control sampling across all clinics, case-control sampling within clinics has been suggested as a more practical strategy. To our knowledge, although similar outcome-dependent sampling schemes have been described in the literature, a case-control design specific to correlated data settings is new. In this article, we describe this design, discuss balanced versus unbalanced sampling techniques, and provide a general approach to analyzing case-control studies in cluster-correlated settings based on inverse probability-weighted generalized estimating equations. Inference is based on a robust sandwich estimator with correlation parameters estimated to ensure appropriate accounting of the outcome-dependent sampling scheme. We conduct comprehensive simulations, based in part on real data on a sample of N = 78,155 program registrants in Malawi between 2005 and 2007, to evaluate small-sample operating characteristics and potential trade-offs associated with standard case-control sampling or when case-control sampling is performed within clusters.

  4. Performance Evaluation of Hadoop-based Large-scale Network Traffic Analysis Cluster

    Directory of Open Access Journals (Sweden)

    Tao Ran

    2016-01-01

    Full Text Available As Hadoop has gained popularity in big data era, it is widely used in various fields. The self-design and self-developed large-scale network traffic analysis cluster works well based on Hadoop, with off-line applications running on it to analyze the massive network traffic data. On purpose of scientifically and reasonably evaluating the performance of analysis cluster, we propose a performance evaluation system. Firstly, we set the execution times of three benchmark applications as the benchmark of the performance, and pick 40 metrics of customized statistical resource data. Then we identify the relationship between the resource data and the execution times by a statistic modeling analysis approach, which is composed of principal component analysis and multiple linear regression. After training models by historical data, we can predict the execution times by current resource data. Finally, we evaluate the performance of analysis cluster by the validated predicting of execution times. Experimental results show that the predicted execution times by trained models are within acceptable error range, and the evaluation results of performance are accurate and reliable.

  5. Symptom clusters in women with breast cancer: an analysis of data from social media and a research study.

    Science.gov (United States)

    Marshall, Sarah A; Yang, Christopher C; Ping, Qing; Zhao, Mengnan; Avis, Nancy E; Ip, Edward H

    2016-03-01

    User-generated content on social media sites, such as health-related online forums, offers researchers a tantalizing amount of information, but concerns regarding scientific application of such data remain. This paper compares and contrasts symptom cluster patterns derived from messages on a breast cancer forum with those from a symptom checklist completed by breast cancer survivors participating in a research study. Over 50,000 messages generated by 12,991 users of the breast cancer forum on MedHelp.org were transformed into a standard form and examined for the co-occurrence of 25 symptoms. The k-medoid clustering method was used to determine appropriate placement of symptoms within clusters. Findings were compared with a similar analysis of a symptom checklist administered to 653 breast cancer survivors participating in a research study. The following clusters were identified using forum data: menopausal/psychological, pain/fatigue, gastrointestinal, and miscellaneous. Study data generated the clusters: menopausal, pain, fatigue/sleep/gastrointestinal, psychological, and increased weight/appetite. Although the clusters are somewhat different, many symptoms that clustered together in the social media analysis remained together in the analysis of the study participants. Density of connections between symptoms, as reflected by rates of co-occurrence and similarity, was higher in the study data. The copious amount of data generated by social media outlets can augment findings from traditional data sources. When different sources of information are combined, areas of overlap and discrepancy can be detected, perhaps giving researchers a more accurate picture of reality. However, data derived from social media must be used carefully and with understanding of its limitations.

  6. Lexical preferences in Dutch verbal cluster ordering

    NARCIS (Netherlands)

    Bloem, J.; Bellamy, K.; Karvovskaya, E.; Kohlberger, M.; Saad, G.

    2016-01-01

    This study discusses lexical preferences as a factor affecting the word order variation in Dutch verbal clusters. There are two grammatical word orders for Dutch two-verb clusters, with no clear meaning difference. Using the method of collostructional analysis, I find significant associations

  7. Clustering with position-specific constraints on variance: Applying redescending M-estimators to label-free LC-MS data analysis

    Directory of Open Access Journals (Sweden)

    Mani D R

    2011-08-01

    Full Text Available Abstract Background Clustering is a widely applicable pattern recognition method for discovering groups of similar observations in data. While there are a large variety of clustering algorithms, very few of these can enforce constraints on the variation of attributes for data points included in a given cluster. In particular, a clustering algorithm that can limit variation within a cluster according to that cluster's position (centroid location can produce effective and optimal results in many important applications ranging from clustering of silicon pixels or calorimeter cells in high-energy physics to label-free liquid chromatography based mass spectrometry (LC-MS data analysis in proteomics and metabolomics. Results We present MEDEA (M-Estimator with DEterministic Annealing, an M-estimator based, new unsupervised algorithm that is designed to enforce position-specific constraints on variance during the clustering process. The utility of MEDEA is demonstrated by applying it to the problem of "peak matching"--identifying the common LC-MS peaks across multiple samples--in proteomic biomarker discovery. Using real-life datasets, we show that MEDEA not only outperforms current state-of-the-art model-based clustering methods, but also results in an implementation that is significantly more efficient, and hence applicable to much larger LC-MS data sets. Conclusions MEDEA is an effective and efficient solution to the problem of peak matching in label-free LC-MS data. The program implementing the MEDEA algorithm, including datasets, clustering results, and supplementary information is available from the author website at http://www.hephy.at/user/fru/medea/.

  8. Robust continuous clustering.

    Science.gov (United States)

    Shah, Sohil Atul; Koltun, Vladlen

    2017-09-12

    Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank.

  9. Analysis of RXTE data on Clusters of Galaxies

    Science.gov (United States)

    Petrosian, Vahe

    2004-01-01

    This grant provided support for the reduction, analysis and interpretation of of hard X-ray (HXR, for short) observations of the cluster of galaxies RXJO658--5557 scheduled for the week of August 23, 2002 under the RXTE Cycle 7 program (PI Vahe Petrosian, Obs. ID 70165). The goal of the observation was to search for and characterize the shape of the HXR component beyond the well established thermal soft X-ray (SXR) component. Such hard components have been detected in several nearby clusters. distant cluster would provide information on the characteristics of this radiation at a different epoch in the evolution of the imiverse and shed light on its origin. We (Petrosian, 2001) have argued that thermal bremsstrahlung, as proposed earlier, cannot be the mechanism for the production of the HXRs and that the most likely mechanism is Compton upscattering of the cosmic microwave radiation by relativistic electrons which are known to be present in the clusters and be responsible for the observed radio emission. Based on this picture we estimated that this cluster, in spite of its relatively large distance, will have HXR signal comparable to the other nearby ones. The planned observation of a relatively The proposed RXTE observations were carried out and the data have been analyzed. We detect a hard X-ray tail in the spectrum of this cluster with a flux very nearly equal to our predicted value. This has strengthen the case for the Compton scattering model. We intend the data obtained via this observation to be a part of a larger data set. We have identified other clusters of galaxies (in archival RXTE and other instrument data sets) with sufficiently high quality data where we can search for and measure (or at least put meaningful limits) on the strength of the hard component. With these studies we expect to clarify the mechanism for acceleration of particles in the intercluster medium and provide guidance for future observations of this intriguing phenomenon by instrument

  10. Puzzle of magnetic moments of Ni clusters revisited using quantum Monte Carlo method.

    Science.gov (United States)

    Lee, Hung-Wen; Chang, Chun-Ming; Hsing, Cheng-Rong

    2017-02-28

    The puzzle of the magnetic moments of small nickel clusters arises from the discrepancy between values predicted using density functional theory (DFT) and experimental measurements. Traditional DFT approaches underestimate the magnetic moments of nickel clusters. Two fundamental problems are associated with this puzzle, namely, calculating the exchange-correlation interaction accurately and determining the global minimum structures of the clusters. Theoretically, the two problems can be solved using quantum Monte Carlo (QMC) calculations and the ab initio random structure searching (AIRSS) method correspondingly. Therefore, we combined the fixed-moment AIRSS and QMC methods to investigate the magnetic properties of Ni n (n = 5-9) clusters. The spin moments of the diffusion Monte Carlo (DMC) ground states are higher than those of the Perdew-Burke-Ernzerhof ground states and, in the case of Ni 8-9 , two new ground-state structures have been discovered using the DMC calculations. The predicted results are closer to the experimental findings, unlike the results predicted in previous standard DFT studies.

  11. Clustering-based analysis for residential district heating data

    DEFF Research Database (Denmark)

    Gianniou, Panagiota; Liu, Xiufeng; Heller, Alfred

    2018-01-01

    The wide use of smart meters enables collection of a large amount of fine-granular time series, which can be used to improve the understanding of consumption behavior and used for consumption optimization. This paper presents a clustering-based knowledge discovery in databases method to analyze r....... These findings will be valuable for district heating utilities and energy planners to optimize their operations, design demand-side management strategies, and develop targeting energy-efficiency programs or policies.......The wide use of smart meters enables collection of a large amount of fine-granular time series, which can be used to improve the understanding of consumption behavior and used for consumption optimization. This paper presents a clustering-based knowledge discovery in databases method to analyze...... residential heating consumption data and evaluate information included in national building databases. The proposed method uses the K-means algorithm to segment consumption groups based on consumption intensity and representative patterns and ranks the groups according to daily consumption. This paper also...

  12. NetFCM: A Semi-Automated Web-Based Method for Flow Cytometry Data Analysis

    DEFF Research Database (Denmark)

    Frederiksen, Juliet Wairimu; Buggert, Marcus; Karlsson, Annika C.

    2014-01-01

    data analysis has become more complex and labor-intensive than previously. We have therefore developed a semi-automatic gating strategy (NetFCM) that uses clustering and principal component analysis (PCA) together with other statistical methods to mimic manual gating approaches. NetFCM is an online...... tool both for subset identification as well as for quantification of differences between samples. Additionally, NetFCM can classify and cluster samples based on multidimensional data. We tested the method using a data set of peripheral blood mononuclear cells collected from 23 HIV-infected individuals...... corresponding to those obtained by manual gating strategies. These data demonstrate that NetFCM has the potential to identify relevant T cell populations by mimicking classical FCM data analysis and reduce the subjectivity and amount of time associated with such analysis. (c) 2014 International Society...

  13. Multichannel biomedical time series clustering via hierarchical probabilistic latent semantic analysis.

    Science.gov (United States)

    Wang, Jin; Sun, Xiangping; Nahavandi, Saeid; Kouzani, Abbas; Wu, Yuchuan; She, Mary

    2014-11-01

    Biomedical time series clustering that automatically groups a collection of time series according to their internal similarity is of importance for medical record management and inspection such as bio-signals archiving and retrieval. In this paper, a novel framework that automatically groups a set of unlabelled multichannel biomedical time series according to their internal structural similarity is proposed. Specifically, we treat a multichannel biomedical time series as a document and extract local segments from the time series as words. We extend a topic model, i.e., the Hierarchical probabilistic Latent Semantic Analysis (H-pLSA), which was originally developed for visual motion analysis to cluster a set of unlabelled multichannel time series. The H-pLSA models each channel of the multichannel time series using a local pLSA in the first layer. The topics learned in the local pLSA are then fed to a global pLSA in the second layer to discover the categories of multichannel time series. Experiments on a dataset extracted from multichannel Electrocardiography (ECG) signals demonstrate that the proposed method performs better than previous state-of-the-art approaches and is relatively robust to the variations of parameters including length of local segments and dictionary size. Although the experimental evaluation used the multichannel ECG signals in a biometric scenario, the proposed algorithm is a universal framework for multichannel biomedical time series clustering according to their structural similarity, which has many applications in biomedical time series management. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  14. Cluster-based upper body marker models for three-dimensional kinematic analysis: Comparison with an anatomical model and reliability analysis.

    Science.gov (United States)

    Boser, Quinn A; Valevicius, Aïda M; Lavoie, Ewen B; Chapman, Craig S; Pilarski, Patrick M; Hebert, Jacqueline S; Vette, Albert H

    2018-04-27

    Quantifying angular joint kinematics of the upper body is a useful method for assessing upper limb function. Joint angles are commonly obtained via motion capture, tracking markers placed on anatomical landmarks. This method is associated with limitations including administrative burden, soft tissue artifacts, and intra- and inter-tester variability. An alternative method involves the tracking of rigid marker clusters affixed to body segments, calibrated relative to anatomical landmarks or known joint angles. The accuracy and reliability of applying this cluster method to the upper body has, however, not been comprehensively explored. Our objective was to compare three different upper body cluster models with an anatomical model, with respect to joint angles and reliability. Non-disabled participants performed two standardized functional upper limb tasks with anatomical and cluster markers applied concurrently. Joint angle curves obtained via the marker clusters with three different calibration methods were compared to those from an anatomical model, and between-session reliability was assessed for all models. The cluster models produced joint angle curves which were comparable to and highly correlated with those from the anatomical model, but exhibited notable offsets and differences in sensitivity for some degrees of freedom. Between-session reliability was comparable between all models, and good for most degrees of freedom. Overall, the cluster models produced reliable joint angles that, however, cannot be used interchangeably with anatomical model outputs to calculate kinematic metrics. Cluster models appear to be an adequate, and possibly advantageous alternative to anatomical models when the objective is to assess trends in movement behavior. Copyright © 2018 Elsevier Ltd. All rights reserved.

  15. Behavioral Health Risk Profiles of Undergraduate University Students in England, Wales, and Northern Ireland: A Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Walid El Ansari

    2018-05-01

    Full Text Available BackgroundLimited research has explored clustering of lifestyle behavioral risk factors (BRFs among university students. This study aimed to explore clustering of BRFs, composition of clusters, and the association of the clusters with self-rated health and perceived academic performance.MethodWe assessed (BRFs, namely tobacco smoking, physical inactivity, alcohol consumption, illicit drug use, unhealthy nutrition, and inadequate sleep, using a self-administered general Student Health Survey among 3,706 undergraduates at seven UK universities.ResultsA two-step cluster analysis generated: Cluster 1 (the high physically active and health conscious with very high health awareness/consciousness, good nutrition, and physical activity (PA, and relatively low alcohol, tobacco, and other drug (ATOD use. Cluster 2 (the abstinent had very low ATOD use, high health awareness, good nutrition, and medium high PA. Cluster 3 (the moderately health conscious included the highest regard for healthy eating, second highest fruit/vegetable consumption, and moderately high ATOD use. Cluster 4 (the risk taking showed the highest ATOD use, were the least health conscious, least fruit consuming, and attached the least importance on eating healthy. Compared to the healthy cluster (Cluster 1, students in other clusters had lower self-rated health, and particularly, students in the risk taking cluster (Cluster 4 reported lower academic performance. These associations were stronger for men than for women. Of the four clusters, Cluster 4 had the youngest students.ConclusionOur results suggested that prevention among university students should address multiple BRFs simultaneously, with particular focus on the younger students.

  16. Dynamic Fuzzy Clustering Method for Decision Support in Electricity Markets Negotiation

    Directory of Open Access Journals (Sweden)

    Ricardo FAIA

    2016-10-01

    Full Text Available Artificial Intelligence (AI methods contribute to the construction of systems where there is a need to automate the tasks. They are typically used for problems that have a large response time, or when a mathematical method cannot be used to solve the problem. However, the application of AI brings an added complexity to the development of such applications. AI has been frequently applied in the power systems field, namely in Electricity Markets (EM. In this area, AI applications are essentially used to forecast / estimate the prices of electricity or to search for the best opportunity to sell the product. This paper proposes a clustering methodology that is combined with fuzzy logic in order to perform the estimation of EM prices. The proposed method is based on the application of a clustering methodology that groups historic energy contracts according to their prices’ similarity. The optimal number of groups is automatically calculated taking into account the preference for the balance between the estimation error and the number of groups. The centroids of each cluster are used to define a dynamic fuzzy variable that approximates the tendency of contracts’ history. The resulting fuzzy variable allows estimating expected prices for contracts instantaneously and approximating missing values in the historic contracts.

  17. Recurrent-neural-network-based Boolean factor analysis and its application to word clustering.

    Science.gov (United States)

    Frolov, Alexander A; Husek, Dusan; Polyakov, Pavel Yu

    2009-07-01

    The objective of this paper is to introduce a neural-network-based algorithm for word clustering as an extension of the neural-network-based Boolean factor analysis algorithm (Frolov , 2007). It is shown that this extended algorithm supports even the more complex model of signals that are supposed to be related to textual documents. It is hypothesized that every topic in textual data is characterized by a set of words which coherently appear in documents dedicated to a given topic. The appearance of each word in a document is coded by the activity of a particular neuron. In accordance with the Hebbian learning rule implemented in the network, sets of coherently appearing words (treated as factors) create tightly connected groups of neurons, hence, revealing them as attractors of the network dynamics. The found factors are eliminated from the network memory by the Hebbian unlearning rule facilitating the search of other factors. Topics related to the found sets of words can be identified based on the words' semantics. To make the method complete, a special technique based on a Bayesian procedure has been developed for the following purposes: first, to provide a complete description of factors in terms of component probability, and second, to enhance the accuracy of classification of signals to determine whether it contains the factor. Since it is assumed that every word may possibly contribute to several topics, the proposed method might be related to the method of fuzzy clustering. In this paper, we show that the results of Boolean factor analysis and fuzzy clustering are not contradictory, but complementary. To demonstrate the capabilities of this attempt, the method is applied to two types of textual data on neural networks in two different languages. The obtained topics and corresponding words are at a good level of agreement despite the fact that identical topics in Russian and English conferences contain different sets of keywords.

  18. Fatigue Feature Extraction Analysis based on a K-Means Clustering Approach

    Directory of Open Access Journals (Sweden)

    M.F.M. Yunoh

    2015-06-01

    Full Text Available This paper focuses on clustering analysis using a K-means approach for fatigue feature dataset extraction. The aim of this study is to group the dataset as closely as possible (homogeneity for the scattered dataset. Kurtosis, the wavelet-based energy coefficient and fatigue damage are calculated for all segments after the extraction process using wavelet transform. Kurtosis, the wavelet-based energy coefficient and fatigue damage are used as input data for the K-means clustering approach. K-means clustering calculates the average distance of each group from the centroid and gives the objective function values. Based on the results, maximum values of the objective function can be seen in the two centroid clusters, with a value of 11.58. The minimum objective function value is found at 8.06 for five centroid clusters. It can be seen that the objective function with the lowest value for the number of clusters is equal to five; which is therefore the best cluster for the dataset.

  19. The dynamics of cyclone clustering in re-analysis and a high-resolution climate model

    Science.gov (United States)

    Priestley, Matthew; Pinto, Joaquim; Dacre, Helen; Shaffrey, Len

    2017-04-01

    Extratropical cyclones have a tendency to occur in groups (clusters) in the exit of the North Atlantic storm track during wintertime, potentially leading to widespread socioeconomic impacts. The Winter of 2013/14 was the stormiest on record for the UK and was characterised by the recurrent clustering of intense extratropical cyclones. This clustering was associated with a strong, straight and persistent North Atlantic 250 hPa jet with Rossby wave-breaking (RWB) on both flanks, pinning the jet in place. Here, we provide for the first time an analysis of all clustered events in 36 years of the ERA-Interim Re-analysis at three latitudes (45˚ N, 55˚ N, 65˚ N) encompassing various regions of Western Europe. The relationship between the occurrence of RWB and cyclone clustering is studied in detail. Clustering at 55˚ N is associated with an extended and anomalously strong jet flanked on both sides by RWB. However, clustering at 65(45)˚ N is associated with RWB to the south (north) of the jet, deflecting the jet northwards (southwards). A positive correlation was found between the intensity of the clustering and RWB occurrence to the north and south of the jet. However, there is considerable spread in these relationships. Finally, analysis has shown that the relationships identified in the re-analysis are also present in a high-resolution coupled global climate model (HiGEM). In particular, clustering is associated with the same dynamical conditions at each of our three latitudes in spite of the identified biases in frequency and intensity of RWB.

  20. Intra Cluster Light properties in the CLASH-VLT cluster MACS J1206.2-0847

    CERN Document Server

    Presotto, V; Nonino, M; Mercurio, A; Grillo, C; Rosati, P; Biviano, A; Annunziatella, M; Balestra, I; Cui, W; Sartoris, B; Lemze, D; Ascaso, B; Moustakas, J; Ford, H; Fritz, A; Czoske, O; Ettori, S; Kuchner, U; Lombardi, M; Maier, C; Medezinski, E; Molino, A; Scodeggio, M; Strazzullo, V; Tozzi, P; Ziegler, B; Bartelmann, M; Benitez, N; Bradley, L; Brescia, M; Broadhurst, T; Coe, D; Donahue, M; Gobat, R; Graves, G; Kelson, D; Koekemoer, A; Melchior, P; Meneghetti, M; Merten, J; Moustakas, L; Munari, E; Postman, M; Regős, E; Seitz, S; Umetsu, K; Zheng, W; Zitrin, A

    2014-01-01

    We aim at constraining the assembly history of clusters by studying the intra cluster light (ICL) properties, estimating its contribution to the fraction of baryons in stars, f*, and understanding possible systematics/bias using different ICL detection techniques. We developed an automated method, GALtoICL, based on the software GALAPAGOS to obtain a refined version of typical BCG+ICL maps. We applied this method to our test case MACS J1206.2-0847, a massive cluster located at z=0.44, that is part of the CLASH sample. Using deep multi-band SUBARU images, we extracted the surface brightness (SB) profile of the BCG+ICL and we studied the ICL morphology, color, and contribution to f* out to R500. We repeated the same analysis using a different definition of the ICL, SBlimit method, i.e. a SB cut-off level, to compare the results. The most peculiar feature of the ICL in MACS1206 is its asymmetric radial distribution, with an excess in the SE direction and extending towards the 2nd brightest cluster galaxy which i...

  1. Pseudo-potential method for taking into account the Pauli principle in cluster systems

    International Nuclear Information System (INIS)

    Krasnopol'skii, V.M.; Kukulin, V.I.

    1975-01-01

    In order to take account of the Pauli principle in cluster systems (such as 3α, α + α + n) a convenient method of renormalization of the cluster-cluster deep attractive potentials with forbidden states is suggested. The renormalization consists of adding projectors upon the occupied states with an infinite coupling constant to the initial deep potential which means that we pass to pseudo-potentials. The pseudo-potential approach in projecting upon the noneigenstates is shown to be equivalent to the orthogonality condition model of Saito et al. The orthogonality of the many-particle wave function to the forbidden states of each two-cluster sub-system is clearly demonstrated

  2. Innovation, learning and cluster dynamics

    NARCIS (Netherlands)

    B. Nooteboom (Bart)

    2004-01-01

    textabstractThis chapter offers a theory and method for the analysis of the dynamics, i.e. the development, of clusters for innovation. It employs an analysis of three types of embedding: institutional embedding, which is often localized, structural embedding (network structure), and relational

  3. Cluster analysis of autoantibodies in 852 patients with systemic lupus erythematosus from a single center.

    Science.gov (United States)

    Artim-Esen, Bahar; Çene, Erhan; Şahinkaya, Yasemin; Ertan, Semra; Pehlivan, Özlem; Kamali, Sevil; Gül, Ahmet; Öcal, Lale; Aral, Orhan; Inanç, Murat

    2014-07-01

    Associations between autoantibodies and clinical features have been described in systemic lupus erythematosus (SLE). Herein, we aimed to define autoantibody clusters and their clinical correlations in a large cohort of patients with SLE. We analyzed 852 patients with SLE who attended our clinic. Seven autoantibodies were selected for cluster analysis: anti-DNA, anti-Sm, anti-RNP, anticardiolipin (aCL) immunoglobulin (Ig)G or IgM, lupus anticoagulant (LAC), anti-Ro, and anti-La. Two-step clustering and Kaplan-Meier survival analyses were used. Five clusters were identified. A cluster consisted of patients with only anti-dsDNA antibodies, a cluster of anti-Sm and anti-RNP, a cluster of aCL IgG/M and LAC, and a cluster of anti-Ro and anti-La antibodies. Analysis revealed 1 more cluster that consisted of patients who did not belong to any of the clusters formed by antibodies chosen for cluster analysis. Sm/RNP cluster had significantly higher incidence of pulmonary hypertension and Raynaud phenomenon. DsDNA cluster had the highest incidence of renal involvement. In the aCL/LAC cluster, there were significantly more patients with neuropsychiatric involvement, antiphospholipid syndrome, autoimmune hemolytic anemia, and thrombocytopenia. According to the Systemic Lupus International Collaborating Clinics damage index, the highest frequency of damage was in the aCL/LAC cluster. Comparison of 10 and 20 years survival showed reduced survival in the aCL/LAC cluster. This study supports the existence of autoantibody clusters with distinct clinical features in SLE and shows that forming clinical subsets according to autoantibody clusters may be useful in predicting the outcome of the disease. Autoantibody clusters in SLE may exhibit differences according to the clinical setting or population.

  4. Detecting space-time disease clusters with arbitrary shapes and sizes using a co-clustering approach

    Directory of Open Access Journals (Sweden)

    Sami Ullah

    2017-11-01

    Full Text Available Ability to detect potential space-time clusters in spatio-temporal data on disease occurrences is necessary for conducting surveillance and implementing disease prevention policies. Most existing techniques use geometrically shaped (circular, elliptical or square scanning windows to discover disease clusters. In certain situations, where the disease occurrences tend to cluster in very irregularly shaped areas, these algorithms are not feasible in practise for the detection of space-time clusters. To address this problem, a new algorithm is proposed, which uses a co-clustering strategy to detect prospective and retrospective space-time disease clusters with no restriction on shape and size. The proposed method detects space-time disease clusters by tracking the changes in space–time occurrence structure instead of an in-depth search over space. This method was utilised to detect potential clusters in the annual and monthly malaria data in Khyber Pakhtunkhwa Province, Pakistan from 2012 to 2016 visualising the results on a heat map. The results of the annual data analysis showed that the most likely hotspot emerged in three sub-regions in the years 2013-2014. The most likely hotspots in monthly data appeared in the month of July to October in each year and showed a strong periodic trend.

  5. Influence of birth cohort on age of onset cluster analysis in bipolar I disorder

    DEFF Research Database (Denmark)

    Bauer, M; Glenn, T; Alda, M

    2015-01-01

    Purpose: Two common approaches to identify subgroups of patients with bipolar disorder are clustering methodology (mixture analysis) based on the age of onset, and a birth cohort analysis. This study investigates if a birth cohort effect will influence the results of clustering on the age of onset...... cohort. Model-based clustering (mixture analysis) was then performed on the age of onset data using the residuals. Clinical variables in subgroups were compared. Results: There was a strong birth cohort effect. Without adjusting for the birth cohort, three subgroups were found by clustering. After...... on the age of onset, and that there is a birth cohort effect. Including the birth cohort adjustment altered the number and characteristics of subgroups detected when clustering by age of onset. Further investigation is needed to determine if combining both approaches will identify subgroups that are more...

  6. Comparison of Molecular Typing Methods Useful for Detecting Clusters of Campylobacter jejuni and C. coli Isolates through Routine Surveillance

    Science.gov (United States)

    Taboada, Eduardo; Grant, Christopher C. R.; Blakeston, Connie; Pollari, Frank; Marshall, Barbara; Rahn, Kris; MacKinnon, Joanne; Daignault, Danielle; Pillai, Dylan; Ng, Lai-King

    2012-01-01

    Campylobacter spp. may be responsible for unreported outbreaks of food-borne disease. The detection of these outbreaks is made more difficult by the fact that appropriate methods for detecting clusters of Campylobacter have not been well defined. We have compared the characteristics of five molecular typing methods on Campylobacter jejuni and C. coli isolates obtained from human and nonhuman sources during sentinel site surveillance during a 3-year period. Comparative genomic fingerprinting (CGF) appears to be one of the optimal methods for the detection of clusters of cases, and it could be supplemented by the sequencing of the flaA gene short variable region (flaA SVR sequence typing), with or without subsequent multilocus sequence typing (MLST). Different methods may be optimal for uncovering different aspects of source attribution. Finally, the use of several different molecular typing or analysis methods for comparing individuals within a population reveals much more about that population than a single method. Similarly, comparing several different typing methods reveals a great deal about differences in how the methods group individuals within the population. PMID:22162562

  7. SU-E-J-98: Radiogenomics: Correspondence Between Imaging and Genetic Features Based On Clustering Analysis

    International Nuclear Information System (INIS)

    Harmon, S; Wendelberger, B; Jeraj, R

    2014-01-01

    Purpose: Radiogenomics aims to establish relationships between patient genotypes and imaging phenotypes. An open question remains on how best to integrate information from these distinct datasets. This work investigates if similarities in genetic features across patients correspond to similarities in PET-imaging features, assessed with various clustering algorithms. Methods: [ 18 F]FDG PET data was obtained for 26 NSCLC patients from a public database (TCIA). Tumors were contoured using an in-house segmentation algorithm combining gradient and region-growing techniques; resulting ROIs were used to extract 54 PET-based features. Corresponding genetic microarray data containing 48,778 elements were also obtained for each tumor. Given mismatch in feature sizes, two dimension reduction techniques were also applied to the genetic data: principle component analysis (PCA) and selective filtering of 25 NSCLC-associated genes-ofinterest (GOI). Gene datasets (full, PCA, and GOI) and PET feature datasets were independently clustered using K-means and hierarchical clustering using variable number of clusters (K). Jaccard Index (JI) was used to score similarity of cluster assignments across different datasets. Results: Patient clusters from imaging data showed poor similarity to clusters from gene datasets, regardless of clustering algorithms or number of clusters (JI mean = 0.3429±0.1623). Notably, we found clustering algorithms had different sensitivities to data reduction techniques. Using hierarchical clustering, the PCA dataset showed perfect cluster agreement to the full-gene set (JI =1) for all values of K, and the agreement between the GOI set and the full-gene set decreased as number of clusters increased (JI=0.9231 and 0.5769 for K=2 and 5, respectively). K-means clustering assignments were highly sensitive to data reduction and showed poor stability for different values of K (JI range : 0.2301–1). Conclusion: Using commonly-used clustering algorithms, we found

  8. SU-E-J-98: Radiogenomics: Correspondence Between Imaging and Genetic Features Based On Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Harmon, S; Wendelberger, B [University of Wisconsin-Madison, Madison, WI (United States); Jeraj, R [University of Wisconsin-Madison, Madison, WI (United States); University of Ljubljana (Slovenia)

    2014-06-01

    Purpose: Radiogenomics aims to establish relationships between patient genotypes and imaging phenotypes. An open question remains on how best to integrate information from these distinct datasets. This work investigates if similarities in genetic features across patients correspond to similarities in PET-imaging features, assessed with various clustering algorithms. Methods: [{sup 18}F]FDG PET data was obtained for 26 NSCLC patients from a public database (TCIA). Tumors were contoured using an in-house segmentation algorithm combining gradient and region-growing techniques; resulting ROIs were used to extract 54 PET-based features. Corresponding genetic microarray data containing 48,778 elements were also obtained for each tumor. Given mismatch in feature sizes, two dimension reduction techniques were also applied to the genetic data: principle component analysis (PCA) and selective filtering of 25 NSCLC-associated genes-ofinterest (GOI). Gene datasets (full, PCA, and GOI) and PET feature datasets were independently clustered using K-means and hierarchical clustering using variable number of clusters (K). Jaccard Index (JI) was used to score similarity of cluster assignments across different datasets. Results: Patient clusters from imaging data showed poor similarity to clusters from gene datasets, regardless of clustering algorithms or number of clusters (JI{sub mean}= 0.3429±0.1623). Notably, we found clustering algorithms had different sensitivities to data reduction techniques. Using hierarchical clustering, the PCA dataset showed perfect cluster agreement to the full-gene set (JI =1) for all values of K, and the agreement between the GOI set and the full-gene set decreased as number of clusters increased (JI=0.9231 and 0.5769 for K=2 and 5, respectively). K-means clustering assignments were highly sensitive to data reduction and showed poor stability for different values of K (JI{sub range}: 0.2301–1). Conclusion: Using commonly-used clustering algorithms

  9. Test computations on the dynamical evolution of star clusters. [Fluid dynamic method

    Energy Technology Data Exchange (ETDEWEB)

    Angeletti, L; Giannone, P. (Rome Univ. (Italy))

    1977-01-01

    Test calculations have been carried out on the evolution of star clusters using the fluid-dynamical method devised by Larson (1970). Large systems of stars have been considered with specific concern with globular clusters. With reference to the analogous 'standard' model by Larson, the influence of varying in turn the various free parameters (cluster mass, star mass, tidal radius, mass concentration of the initial model) has been studied for the results. Furthermore, the partial release of some simplifying assumptions with regard to the relaxation time and distribution of the 'target' stars has been considered. The change of the structural properties is discussed, and the variation of the evolutionary time scale is outlined. An indicative agreement of the results obtained here with structural properties of globular clusters as deduced from previous theoretical models is pointed out.

  10. Cluster analysis in soft X-ray spectromicroscopy: finding the patterns in complex specimens

    International Nuclear Information System (INIS)

    Lerotic, M.; Jacobsen, C.

    2004-01-01

    Full text: Soft x-ray spectromicroscopy provides spectral data on the chemical speciation of light elements at sub-100 nanometer spatial resolution. When all chemical species in a specimen are known and separately characterized, existing approaches can be used to measure the concentration of each component at each pixel. In other situations such as in biology or environmental science, this approach may not be possible. A method to find natural groupings of data without prior knowledge of the spectra of all components will be presented. Principal component analysis is used to orthogonalize spectromicroscopy data, and discard much of the noise present in data set. Then cluster analysis is used to find a hierarchical classification of pixels with similar spectra, to extract representative, cluster-averaged spectra with good signal-to-noise ratio, and to obtain gradations of concentration of these representative spectra at each pixel. The method is illustrated with a simulated data set of organic compounds, and a mixture of lutetium in hematite used to understand colloidal transport properties of radionuclides. We gratefully acknowledge funding from the National Institutes for Health under contract R01 EB00479-01A1, and from the National Science Foundation under contracts OCE-0221029 and CHE-0221934

  11. A hierarchical cluster analysis of normal-tension glaucoma using spectral-domain optical coherence tomography parameters.

    Science.gov (United States)

    Bae, Hyoung Won; Ji, Yongwoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2015-01-01

    Normal-tension glaucoma (NTG) is a heterogenous disease, and there is still controversy about subclassifications of this disorder. On the basis of spectral-domain optical coherence tomography (SD-OCT), we subdivided NTG with hierarchical cluster analysis using optic nerve head (ONH) parameters and retinal nerve fiber layer (RNFL) thicknesses. A total of 200 eyes of 200 NTG patients between March 2011 and June 2012 underwent SD-OCT scans to measure ONH parameters and RNFL thicknesses. We classified NTG into homogenous subgroups based on these variables using a hierarchical cluster analysis, and compared clusters to evaluate diverse NTG characteristics. Three clusters were found after hierarchical cluster analysis. Cluster 1 (62 eyes) had the thickest RNFL and widest rim area, and showed early glaucoma features. Cluster 2 (60 eyes) was characterized by the largest cup/disc ratio and cup volume, and showed advanced glaucomatous damage. Cluster 3 (78 eyes) had small disc areas in SD-OCT and were comprised of patients with significantly younger age, longer axial length, and greater myopia than the other 2 groups. A hierarchical cluster analysis of SD-OCT scans divided NTG patients into 3 groups based upon ONH parameters and RNFL thicknesses. It is anticipated that the small disc area group comprised of younger and more myopic patients may show unique features unlike the other 2 groups.

  12. Principal Component Clustering Approach to Teaching Quality Discriminant Analysis

    Science.gov (United States)

    Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan

    2016-01-01

    Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…

  13. ASTM clustering for improving coal analysis by near-infrared spectroscopy.

    Science.gov (United States)

    Andrés, J M; Bona, M T

    2006-11-15

    Multivariate analysis techniques have been applied to near-infrared (NIR) spectra coals to investigate the relationship between nine coal properties (moisture (%), ash (%), volatile matter (%), fixed carbon (%), heating value (kcal/kg), carbon (%), hydrogen (%), nitrogen (%) and sulphur (%)) and the corresponding predictor variables. In this work, a whole set of coal samples was grouped into six more homogeneous clusters following the ASTM reference method for classification prior to the application of calibration methods to each coal set. The results obtained showed a considerable improvement of the error determination compared with the calibration for the whole sample set. For some groups, the established calibrations approached the quality required by the ASTM/ISO norms for laboratory analysis. To predict property values for a new coal sample it is necessary the assignation of that sample to its respective group. Thus, the discrimination and classification ability of coal samples by Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS) in the NIR range was also studied by applying Soft Independent Modelling of Class Analogy (SIMCA) and Linear Discriminant Analysis (LDA) techniques. Modelling of the groups by SIMCA led to overlapping models that cannot discriminate for unique classification. On the other hand, the application of Linear Discriminant Analysis improved the classification of the samples but not enough to be satisfactory for every group considered.

  14. Functional Interference Clusters in Cancer Patients With Bone Metastases: A Secondary Analysis of RTOG 9714

    International Nuclear Information System (INIS)

    Chow, Edward; James, Jennifer; Barsevick, Andrea; Hartsell, William; Ratcliffe, Sarah; Scarantino, Charles; Ivker, Robert; Roach, Mack; Suh, John; Petersen, Ivy; Konski, Andre; Demas, William; Bruner, Deborah

    2010-01-01

    Purpose: To explore the relationships (clusters) among the functional interference items in the Brief Pain Inventory (BPI) in patients with bone metastases. Methods: Patients enrolled in the Radiation Therapy Oncology Group (RTOG) 9714 bone metastases study were eligible. Patients were assessed at baseline and 4, 8, and 12 weeks after randomization for the palliative radiotherapy with the BPI, which consists of seven functional items: general activity, mood, walking ability, normal work, relations with others, sleep, and enjoyment of life. Principal component analysis with varimax rotation was used to determine the clusters between the functional items at baseline and the follow-up. Cronbach's alpha was used to determine the consistency and reliability of each cluster at baseline and follow-up. Results: There were 448 male and 461 female patients, with a median age of 67 years. There were two functional interference clusters at baseline, which accounted for 71% of the total variance. The first cluster (physical interference) included normal work and walking ability, which accounted for 58% of the total variance. The second cluster (psychosocial interference) included relations with others and sleep, which accounted for 13% of the total variance. The Cronbach's alpha statistics were 0.83 and 0.80, respectively. The functional clusters changed at week 12 in responders but persisted through week 12 in nonresponders. Conclusion: Palliative radiotherapy is effective in reducing bone pain. Functional interference component clusters exist in patients treated for bone metastases. These clusters changed over time in this study, possibly attributable to treatment. Further research is needed to examine these effects.

  15. Environmental data processing by clustering methods for energy forecast and planning

    Energy Technology Data Exchange (ETDEWEB)

    Di Piazza, Annalisa [Dipartimento di Ingegneria Idraulica e Applicazioni Ambientali (DIIAA), viale delle Scienze, Universita degli Studi di Palermo, 90128 Palermo (Italy); Di Piazza, Maria Carmela; Ragusa, Antonella; Vitale, Gianpaolo [Consiglio Nazionale delle Ricerche Istituto di Studi sui Sistemi Intelligenti per l' Automazione (ISSIA - CNR), sezione di Palermo, Via Dante, 12, 90141 Palermo (Italy)

    2011-03-15

    This paper presents a statistical approach based on the k-means clustering technique to manage environmental sampled data to evaluate and to forecast of the energy deliverable by different renewable sources in a given site. In particular, wind speed and solar irradiance sampled data are studied in association to the energy capability of a wind generator and a photovoltaic (PV) plant, respectively. The proposed method allows the sub-sets of useful data, describing the energy capability of a site, to be extracted from a set of experimental observations belonging the considered site. The data collection is performed in Sicily, in the south of Italy, as case study. As far as the wind generation is concerned, a suitable generator, matching the wind profile of the studied sites, has been selected for the evaluation of the producible energy. With respect to the photovoltaic generation, the irradiance data have been taken from the acquisition system of an actual installation. It is demonstrated, in both cases, that the use of the k-means clustering method allows data that do not contribute to the produced energy to be grouped into a cluster, moreover it simplifies the problem of the energy assessment since it permits to obtain the desired information on energy capability by managing a reduced amount of experimental samples. In the studied cases, the proposed method permitted a reduction of the 50% of the data with a maximum discrepancy of 10% in energy estimation compared to the classical statistical approach. Therefore, the adopted k-means clustering technique represents an useful tool for an appropriate and less demanding energy forecast and planning in distributed generation systems. (author)

  16. A cluster analysis investigation of workaholism as a syndrome.

    Science.gov (United States)

    Aziz, Shahnaz; Zickar, Michael J

    2006-01-01

    Workaholism has been conceptualized as a syndrome although there have been few tests that explicitly consider its syndrome status. The authors analyzed a three-dimensional scale of workaholism developed by Spence and Robbins (1992) using cluster analysis. The authors identified three clusters of individuals, one of which corresponded to Spence and Robbins's profile of the workaholic (high work involvement, high drive to work, low work enjoyment). Consistent with previously conjectured relations with workaholism, individuals in the workaholic cluster were more likely to label themselves as workaholics, more likely to have acquaintances label them as workaholics, and more likely to have lower life satisfaction and higher work-life imbalance. The importance of considering workaholism as a syndrome and the implications for effective interventions are discussed. Copyright 2006 APA.

  17. Comparison Of Keyword Based Clustering Of Web Documents By Using Openstack 4j And By Traditional Method

    Directory of Open Access Journals (Sweden)

    Shiza Anand

    2015-08-01

    Full Text Available As the number of hypertext documents are increasing continuously day by day on world wide web. Therefore clustering methods will be required to bind documents into the clusters repositories according to the similarity lying between the documents. Various clustering methods exist such as Hierarchical Based K-means Fuzzy Logic Based Centroid Based etc. These keyword based clustering methods takes much more amount of time for creating containers and putting documents in their respective containers. These traditional methods use File Handling techniques of different programming languages for creating repositories and transferring web documents into these containers. In contrast openstack4j SDK is a new technique for creating containers and shifting web documents into these containers according to the similarity in much more less amount of time as compared to the traditional methods. Another benefit of this technique is that this SDK understands and reads all types of files such as jpg html pdf doc etc. This paper compares the time required for clustering of documents by using openstack4j and by traditional methods and suggests various search engines to adopt this technique for clustering so that they give result to the user querries in less amount of time.

  18. Sejong Open Cluster Survey (SOS). 0. Target Selection and Data Analysis

    Science.gov (United States)

    Sung, Hwankyung; Lim, Beomdu; Bessell, Michael S.; Kim, Jinyoung S.; Hur, Hyeonoh; Chun, Moo-Young; Park, Byeong-Gon

    2013-06-01

    Star clusters are superb astrophysical laboratories containing cospatial and coeval samples of stars with similar chemical composition. We initiate the Sejong Open cluster Survey (SOS) - a project dedicated to providing homogeneous photometry of a large number of open clusters in the SAAO Johnson-Cousins' UBVI system. To achieve our main goal, we pay much attention to the observation of standard stars in order to reproduce the SAAO standard system. Many of our targets are relatively small sparse clusters that escaped previous observations. As clusters are considered building blocks of the Galactic disk, their physical properties such as the initial mass function, the pattern of mass segregation, etc. give valuable information on the formation and evolution of the Galactic disk. The spatial distribution of young open clusters will be used to revise the local spiral arm structure of the Galaxy. In addition, the homogeneous data can also be used to test stellar evolutionary theory, especially concerning rare massive stars. In this paper we present the target selection criteria, the observational strategy for accurate photometry, and the adopted calibrations for data analysis such as color-color relations, zero-age main sequence relations, Sp - M_V relations, Sp - T_{eff} relations, Sp - color relations, and T_{eff} - BC relations. Finally we provide some data analysis such as the determination of the reddening law, the membership selection criteria, and distance determination.

  19. Patterns of comorbidity in community-dwelling older people hospitalised for fall-related injury: A cluster analysis

    Directory of Open Access Journals (Sweden)

    Finch Caroline F

    2011-08-01

    Full Text Available Abstract Background Community-dwelling older people aged 65+ years sustain falls frequently; these can result in physical injuries necessitating medical attention including emergency department care and hospitalisation. Certain health conditions and impairments have been shown to contribute independently to the risk of falling or experiencing a fall injury, suggesting that individuals with these conditions or impairments should be the focus of falls prevention. Since older people commonly have multiple conditions/impairments, knowledge about which conditions/impairments coexist in at-risk individuals would be valuable in the implementation of a targeted prevention approach. The objective of this study was therefore to examine the prevalence and patterns of comorbidity in this population group. Methods We analysed hospitalisation data from Victoria, Australia's second most populous state, to estimate the prevalence of comorbidity in patients hospitalised at least once between 2005-6 and 2007-8 for treatment of acute fall-related injuries. In patients with two or more comorbid conditions (multicomorbidity we used an agglomerative hierarchical clustering method to cluster comorbidity variables and identify constellations of conditions. Results More than one in four patients had at least one comorbid condition and among patients with comorbidity one in three had multicomorbidity (range 2-7. The prevalence of comorbidity varied by gender, age group, ethnicity and injury type; it was also associated with a significant increase in the average cumulative length of stay per patient. The cluster analysis identified five distinct, biologically plausible clusters of comorbidity: cardiopulmonary/metabolic, neurological, sensory, stroke and cancer. The cardiopulmonary/metabolic cluster was the largest cluster among the clusters identified. Conclusions The consequences of comorbidity clustering in terms of falls and/or injury outcomes of hospitalised patients

  20. Improved multi-objective clustering algorithm using particle swarm optimization.

    Science.gov (United States)

    Gong, Congcong; Chen, Haisong; He, Weixiong; Zhang, Zhanliang

    2017-01-01

    Multi-objective clustering has received widespread attention recently, as it can obtain more accurate and reasonable solution. In this paper, an improved multi-objective clustering framework using particle swarm optimization (IMCPSO) is proposed. Firstly, a novel particle representation for clustering problem is designed to help PSO search clustering solutions in continuous space. Secondly, the distribution of Pareto set is analyzed. The analysis results are applied to the leader selection strategy, and make algorithm avoid trapping in local optimum. Moreover, a clustering solution-improved method is proposed, which can increase the efficiency in searching clustering solution greatly. In the experiments, 28 datasets are used and nine state-of-the-art clustering algorithms are compared, the proposed method is superior to other approaches in the evaluation index ARI.